Introduction to R Bern R Bootcamp |
adapted from paherald.sk.ca
In this practical you’ll practice customizing plots created using the ggplot2
package. By the end of this practical you will know how to:
grid.arrange()
.ggsave()
.BernRBootcamp
R project. It should already have the folders 1_Data
and 2_Code
. Make sure that the data files listed in the Datasets
section above are in your 1_Data
folder.# Done!
Open a new R script. At the top of the script, using comments, write your name and the date. Save it as a new file called plottingII_practical.R
in the 2_Code
folder.
Using library()
load the set of packages for this practical listed in the functions section above.
## NAME
## DATE
## Plotting Practical
library(XX)
library(XX)
library(XX)
crime.csv
data set, containing crime data of US counties across various states. Using read_csv()
, load the data into R and store it as a new object called crime
.crime <- read_csv("1_Data/crime.csv")
crime
# A tibble: 1,071 x 36
communityname state population householdsize pctUrban medIncome pctWSocSec
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 BerkeleyHeig… NJ 11980 3.1 100 75122 23.6
2 Marpletownsh… PA 23123 2.82 100 47917 35.5
3 Norwoodtown MA 28700 2.6 100 42805 30.2
4 Wacocity TX 103590 2.62 100 17852 29.1
5 Shermancity TX 31601 2.54 100 24763 32.7
6 SanPablocity CA 25158 2.89 100 25479 23.0
7 Glendalecity CA 180038 2.62 100 34372 20.3
8 Worthingtonc… OH 14869 2.67 100 49851 23.8
9 Arlingtoncity TX 261721 2.6 100 35048 11.0
10 Marinacity CA 26436 3.34 100 29043 10.7
# … with 1,061 more rows, and 29 more variables: pctWRetire <dbl>,
# whitePerCap <dbl>, blackPerCap <dbl>, AsianPerCap <dbl>, HispPerCap <dbl>,
# PctPopUnderPov <dbl>, PctNotHSGrad <dbl>, PctUnemployed <dbl>,
# TotalPctDiv <dbl>, PersPerFam <dbl>, PctWorkMom <dbl>, NumImmig <dbl>,
# PctImmigRecent <dbl>, PctNotSpeakEnglWell <dbl>, RentMedian <dbl>,
# NumInShelters <dbl>, NumStreet <dbl>, PctForeignBorn <dbl>,
# PctBornSameState <dbl>, LandArea <dbl>, PopDens <dbl>,
# PctUsePubTrans <dbl>, murders <dbl>, robberies <dbl>, assaults <dbl>,
# burglaries <dbl>, larcenies <dbl>, autoTheft <dbl>, arsons <dbl>
geom_points()
) pitting each of the 7 crime indicators (murders
, robberies
, assaults
, burglaries
, larcenies
, autoTheft
, arsons
) against the percentage of people using public transportation (PctUsePubTrans
).ggplot(data = crime,
mapping = aes(x = XX, y = XX)) +
geom_point()
# and so on
scale_y_log10()
to fix this. Run the plots again.ggplot(data = crime,
mapping = aes(x = XX, y = XX)) +
geom_point() +
scale_y_log10()
# and so on
crime
data set called crime_long
, using the code below. (Note the use of crime_vars
as a positive selector for gather()
).# vector of crime variables
crime_vars = c("murders","robberies","assaults","burglaries","larcenies","autoTheft","arsons")
# transform to long
crime_long <- crime %>%
pivot_longer(names_to = "crime_var",
values_to = "frequency",
cols = crime_vars)
crime_long
data set, you can now make use of the amazing power of ggplot2
’s facet functions, such as facet_wrap()
. Use facet_wrap()
to automatically plot crime frequency against the percentage of people using public transportation for each of the crime variables.ggplot(data = crime_long,
mapping = aes(x = XX, y = XX)) +
geom_point() +
scale_y_log10() +
facet_wrap(~ XX)
ggplot(data = crime_long,
mapping = aes(x = PctUsePubTrans, y = frequency)) +
geom_point() +
scale_y_log10() +
facet_wrap(~ crime_var)
medIncome
, TotalPctDiv
, or PctNotHSGrad
, for each of the crime measures. What variables do predict, which kind of crime? Explore!theme()
Now that we have an informative plot, let’s focus on making it a bit more “pretty”", using ggplot’s theme()
function. The goal is to create a plot that looks like the plot below.
crime_facets <- ggplot(data = crime_long,
mapping = aes(x = PctUsePubTrans, y = frequency)) +
geom_point() +
scale_y_log10() +
facet_wrap(~ crime_var) +
theme(
panel.background = element_rect(fill='white'),
panel.grid.major = element_line(color = 'grey75',
size = .25),
panel.grid.minor = element_line(color = 'grey75',
size = .1),
strip.background = element_rect(fill='white'),
strip.text = element_text(face='italic', size=12, hjust=1),
axis.title.y = element_text(size=12,margin=margin(r = 10)),
axis.title.x = element_text(size=12,margin=margin(t = 10)),
panel.spacing = unit(1.1, "lines")) +
labs(x = '% public transportation', y = 'Crime frequency')
crime_facets
crime_facets
.crime_facets <- XX
"white"
of the panel using the panel.background
argument and the element_rect()
function.crime_facets +
theme(
panel.background = element_rect(fill = XX)
)
crime_facets +
theme(
panel.background = element_rect(fill = 'white')
)
"grey75"
and sizes .25
and .1
, respectively, using the panel.grid.major
and panel.grid.minor
arguments and the element_line()
function.crime_facets +
theme(
panel.background = element_rect(fill = XX),
panel.grid.major = element_line(color = XX, size = XX),
panel.grid.minor = element_line(color = XX, size = XX)
)
crime_facets +
theme(
panel.background = element_rect(fill = 'white'),
panel.grid.major = element_line(color = 'grey75', size = .25),
panel.grid.minor = element_line(color = 'grey75', size = .1)
)
"white"
using the strip.background
argument and the element_rect()
function.crime_facets +
theme(
panel.background = element_rect(fill = XX),
panel.grid.major = element_line(color = XX, size = XX),
panel.grid.minor = element_line(color = XX, size = XX),
strip.background = element_rect(fill = XX),
)
crime_facets +
theme(
panel.background = element_rect(fill = 'white'),
panel.grid.major = element_line(color = 'grey75', size = .25),
panel.grid.minor = element_line(color = 'grey75', size = .1),
strip.background = element_rect(fill = 'white')
)
"italic"
, adjust it to the right side, and set size to 12
using the strip.text
argument and the element_text()
function. See ?element_text()
.crime_facets +
theme(
panel.background = element_rect(fill = XX),
panel.grid.major = element_line(color = XX, size = XX),
panel.grid.minor = element_line(color = XX, size = XX),
strip.background = element_rect(fill = XX),
strip.text = element_text(face = XX, size = XX, hjust = XX)
)
crime_facets +
theme(
panel.background = element_rect(fill = 'white'),
panel.grid.major = element_line(color = 'grey75', size = .25),
panel.grid.minor = element_line(color = 'grey75', size = .1),
strip.background = element_rect(fill = 'white'),
strip.text = element_text(face = 'italic', size = 12, hjust = 1)
)
12
and add a margin of 10
to the top and right side, respectively, of the labels respectively, using axis.title.x
and axis.title.y
functions and the element_text()
and margin
functions. See ?margins()
.crime_facets +
theme(
panel.background = element_rect(fill = XX),
panel.grid.major = element_line(color = XX, size = XX),
panel.grid.minor = element_line(color = XX, size = XX),
strip.background = element_rect(fill = XX),
strip.text = element_text(face = XX, size = XX, hjust = XX),
axis.title.x = element_text(size = XX, margin = margin(t = XX)),
axis.title.y = element_text(size = XX, margin = margin(r = XX)),
)
crime_facets +
theme(
panel.background = element_rect(fill = 'white'),
panel.grid.major = element_line(color = 'grey75', size = .25),
panel.grid.minor = element_line(color = 'grey75', size = .1),
strip.background = element_rect(fill = 'white'),
strip.text = element_text(face = 'italic', size = 12, hjust = 1),
axis.title.x = element_text(size = 12, margin = margin(t = 10)),
axis.title.y = element_text(size = 12, margin = margin(r = 10))
)
1.1
"lines"
using the panel.spacing
argument and the unit
function.crime_facets +
theme(
panel.background = element_rect(fill = XX),
panel.grid.major = element_line(color = XX, size = XX),
panel.grid.minor = element_line(color = XX, size = XX),
strip.background = element_rect(fill = XX),
strip.text = element_text(face = XX, size = XX, hjust = XX),
axis.title.x = element_text(size = XX, margin = margin(t = XX)),
axis.title.y = element_text(size = XX, margin = margin(r = XX)),
panel.spacing = unit(XX, units = XX)
)
crime_facets +
theme(
panel.background = element_rect(fill = 'white'),
panel.grid.major = element_line(color = 'grey75', size = .25),
panel.grid.minor = element_line(color = 'grey75', size = .1),
strip.background = element_rect(fill = 'white'),
strip.text = element_text(face = 'italic', size = 12, hjust = 1),
axis.title.x = element_text(size = 12, margin = margin(t = 10)),
axis.title.y = element_text(size = 12, margin = margin(r = 10)),
panel.spacing = unit(1.1, units = "lines")
)
labs()
function.crime_facets +
theme(
panel.background = element_rect(fill = XX),
panel.grid.major = element_line(color = XX, size = XX),
panel.grid.minor = element_line(color = XX, size = XX),
strip.background = element_rect(fill = XX),
strip.text = element_text(face = XX, size = XX, hjust = XX),
axis.title.x = element_text(size = XX, margin = margin(t = XX)),
axis.title.y = element_text(size = XX, margin = margin(r = XX)),
panel.spacing = unit(XX, units = XX)
) +
labs(x = XX, y = XX)
crime_facets +
theme(
panel.background = element_rect(fill = 'white'),
panel.grid.major = element_line(color = 'grey75', size = .25),
panel.grid.minor = element_line(color = 'grey75', size = .1),
strip.background = element_rect(fill = 'white'),
strip.text = element_text(face = 'italic', size = 12, hjust = 1),
axis.title.x = element_text(size = 12, margin = margin(t = 10)),
axis.title.y = element_text(size = 12, margin = margin(r = 10)),
panel.spacing = unit(1.1, units = "lines")
) +
labs(x = '% public transportation', y = 'Crime frequency')
theme()
crime_theme
.crime_theme <- theme(
XX = XX,
XX = XX,
...
)
crime_theme <- theme(
panel.background = element_rect(fill = 'white'),
panel.grid.major = element_line(color = 'grey75', size = .25),
panel.grid.minor = element_line(color = 'grey75', size = .1),
strip.background = element_rect(fill = 'white'),
strip.text = element_text(face = 'italic', size = 12, hjust = 1),
axis.title.x = element_text(size = 12, margin = margin(t = 10)),
axis.title.y = element_text(size = 12, margin = margin(r = 10)),
panel.spacing = unit(1.1, units = "lines")
)
crime_theme
in order to apply your personalized theme.new_crime_plot + crime_theme
theme()
(see ?theme
), such as axis.ticks
or strip.placement
.When creating a plot ggplot
automatically chooses sensible dimensions for your plot in terms of x- and y-axis limits, geom sizes, or colors. However, all of these aspects of the plot can also be controlled manually or semi-manually using various scale_*
functions.
state
variable onto the color
argument and size the points according to the county’s population by mapping the population
variable onto the size
argument. Store the resulting plot in an object called crime_plot
.crime_plot <-
ggplot(data = crime_long,
mapping = aes(x = XX, y = XX,
color = XX, size = XX)) +
geom_point() +
scale_y_log10() +
facet_wrap(~ XX) +
crime_theme
crime_plot <- ggplot(data = crime_long,
mapping = aes(x = PctUsePubTrans, y = frequency,
color = state, size = population)) +
geom_point() +
scale_y_log10() +
facet_wrap(~ crime_var) +
crime_theme
crime_plot
scale_size()
and the range
argument, change the scaling of the points to reduce the degree of overlap among the points (see ?scale_size
). Try out a few numbers (smaller than 10) to create a version of the plot with a decent trade-off between point size and point overlap.crime_plot + scale_size(range = c(XX, XX))
crime_plot + scale_size(range = c(.5, 3))
scale_y_log10()
function. Set the limits to 0
and 2e+5
(i.e., 200,000
). (Note that R will tell you that this will overwrite the previous use of scale_y_log10()
, which is what we intend to do).crime_plot +
scale_size(range = c(XX, XX)) +
scale_y_continuous(limits = c(XX, XX))
crime_plot +
scale_size(range = c(.5, 3)) +
scale_y_log10(limits = c(1, 2e+5))
scale_color_gradient()
or similar functions. Another is to use a specific, pre-defined scheme, such as scale_color_colorblind()
. Use the latter. You will see that the colors have much more contrast making it distinguishing the colors based on luminescence alone easier.crime_plot +
scale_size(range = c(XX, XX)) +
scale_y_log10(limits = c(1, XX)) +
scale_color_colorblind()
crime_plot +
scale_size(range = c(.5, 3)) +
scale_y_log10(limits = c(1, 2e+5)) +
scale_color_colorblind()
scale_color_manual()
. Try assigning your own choice of colors. You may pick them from colors()
or generate them using, for instance, the viridis
function from the viridis
package (you may need to run install.packages('viridis')
before using it), which provides an optimized set of colors designed to be (1) colorful, (2) perceptually uniform, (3) robust to colorblindness, (4) and pretty. Take the latter approach, i.e., use the viridis()
function to generate colors, in the context of the scale_color_manual()
function.crime_plot +
scale_size(range = c(XX, XX)) +
scale_y_log10(limits = c(1, XX)) +
scale_color_manual(values = viridis(7))
crime_plot +
scale_size(range = c(.5, 3)) +
scale_y_log10(limits = c(1, 2e+5)) +
scale_color_manual(values = viridis(7))
crime_final
.crime_final <- ggplot(...) + ... # Include your plotting code here
Run your crime_final
object to see that it does indeed contain your plot.
Save your plot to a .pdf
-file called crime_final
using ggsave()
. When you finish, find your plot in 3_Figures
and open it to see how it looks!
# Save crime_final to a pdf file
ggsave(filename = "crime_plot",
plot = crime_final,
device = "pdf",
path = '3_Figures',
width = 4,
height = 4,
units = "in")
Play around with the width
and height
arguments to change the dimensions of the plot.
Customize the code to create a .png
image.
ggplot2
also allows you to work with maps. The code below, plots a simple globe representation of the world.world <- map_data("world")
map <- ggplot() +
geom_polygon(data = world, aes(long, lat, group = group), fill = "grey") +
coord_map("ortho", orientation = c(30, 55, 0)) +
theme_void()
map
locations
tibble containing longitude (lon
) and latitude (lat
) values and then using it in, e.g., geom_points()
.# define locations tibble
locations <- tibble(
city = c('Basel', 'New Delhi'),
lon = c(7.58, 77.21),
lat = c(47.55, 28.64))
# add locations to map
map +
geom_point(data = locations,
mapping = aes(x = lon, y = lat),
color = "steelblue", size = 4)
# define locations tibble
itineraries <- tibble(
city = c('Basel-New Delhi'),
lon_start = 7.58,
lon_end = 77.21,
lat_start = 47.55,
lat_end = 28.64)
# flat map
map <- ggplot() +
geom_polygon(data = world, aes(long, lat, group = group), fill = "grey") +
theme_void()
# add locations to map
map +
geom_curve(data = itineraries,
mapping = aes(x = lon_start, y = lat_start,
xend = lon_end, yend = lat_end),
col = 'steelblue', lty = 2, lwd = 1.2) +
geom_point(data = locations,
mapping = aes(x = lon, y = lat),
color = "steelblue", size = 4)
plotly::ggplotly()
ggplotly()
-function from the plotly
package, you can turn any ggplot
object into an interactive plot like the one below! Run the following code to see it in action.# Create a standard ggplot object
crime_plot <- ggplot(data = crime_long,
mapping = aes(x = PctUsePubTrans, y = frequency,
color = state, size = population)) +
geom_point() +
scale_y_log10()
geom_point()
geom_point: na.rm = FALSE
stat_identity: na.rm = FALSE
position_identity
# Make it interactive with ggplotly()!
library(plotly)
ggplotly(crime_plot)
Play around with your plot! See what happens when you hover over the points with your mouse. You can even zoom in by dragging your mouse.
Try turning one of your favorite previous plots into an interactive plotly
plot using the ggplotly()
function!
# ggplot2 -----------------------
library(tidyverse) # Load tidyverse (contains ggplot2!)
# create a scatter plot of highway miles per gallon against engine displacement
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy)) +
geom_point()
# Store plot objects ------------
# store
my_plot <- ggplot(data = mpg,
mapping = aes(x = displ, y = hwy)) +
geom_point()
# evaluate (aka plot)
my_plot
# Facets ------------
# create separate plots for each car class
my_plot <- my_plot + facet_wrap(~class)
# plot
my_plot
# Customize themes ------------
# change panel background to 'green'
my_plot +
theme(
panel.background = element_rect(fill='green')
)
# change grid lines
my_plot +
theme(
panel.grid.major = element_line(color = 'red', size = 2),
panel.grid.minor = element_line(color = 'blue', size = 1)
)
# change strip background and text
my_plot +
theme(
strip.background = element_rect(fill = 'blue'),
strip.text = element_text(face = 'bold', size = 12)
)
# change axis titles
my_plot +
theme(
axis.title.y = element_text(size = 12, margin = margin(r = 10)),
axis.title.x = element_text(size = 12, margin = margin(t = 10))
)
# change panel spacing
my_plot +
theme(
panel.spacing = unit(2, "lines")
)
# Store themes ------------
# create theme
my_theme <- theme(
panel.background = element_rect(fill='green'),
panel.grid.major = element_line(color = 'red', size = 2),
panel.grid.minor = element_line(color = 'blue', size = 1),
strip.background = element_rect(fill = 'blue'),
strip.text = element_text(face = 'bold', size = 12),
strip.background = element_rect(fill = 'blue'),
strip.text = element_text(face = 'bold', size = 12),
axis.title.y = element_text(size = 12, margin = margin(r = 10)),
axis.title.x = element_text(size = 12, margin = margin(t = 10)),
panel.spacing = unit(2, "lines")
)
# apply theme
my_plot + my_theme # no parentheses
# Scaling ------------
# change x-axis scaling
my_plot + scale_x_continuous(limits = c(0, 10))
# change coloring
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy,
color = class)) +
geom_point() +
scale_color_manual(values = viridis(7))
# Create image files ------------
# create pdf of my_plot
ggsave(filename = "my_plot_name",
plot = my_plot,
device = "pdf",
path = 'plotting_folder',
width = 4,
height = 4,
units = "in")
File | Rows | Columns |
---|---|---|
crime.csv | 1071 | 36 |
The crime
data set is subsets of the Communities and Crime Unnormalized Data Set data set from the UCI Machine Learning Repository. Find variable descriptions below or at Communities and Crime Unnormalized Data Set
Variable | Description |
---|---|
communityname | Community name |
state | US state (by 2 letter postal abbreviation) |
population | population for community |
householdsize | mean people per household |
pctUrban | number of people living in areas classified as urban |
medIncome | median household income |
pctWSocSec | percentage of households with social security income in 1989 |
pctWRetire | percentage of households with retirement income in 1989 |
whitePerCap | per capita income for caucasians |
blackPerCap | per capita income for african americans |
AsianPerCap | per capita income for people with asian |
HispPerCap | per capita income for people with hispanic heritage |
PctPopUnderPov | percentage of people under the poverty level |
PctNotHSGrad | percentage of people 25 and over that are not high school graduates |
PctUnemployed | percentage of people 16 and over, in the labor force, and unemployed |
TotalPctDiv | percentage of population who are divorced |
PersPerFam | mean number of people per family |
PctWorkMom | percentage of moms of kids under 18 in labor force |
NumImmig | total number of people known to be foreign born |
PctImmigRecent | percentage of immigrants who immigated within last 3 years |
PctNotSpeakEnglWell | percent of people who do not speak English well |
RentMedian | rental housing - median rent |
NumInShelters | number of people in homeless shelters |
NumStreet | number of homeless people counted in the street |
PctForeignBorn | percent of people foreign born |
PctBornSameState | percent of people born in the same state as currently living |
LandArea | land area in square miles |
PopDens | population density in persons per square mile |
PctUsePubTrans | percent of people using public transit for commuting |
murders | number of murders in 1995 |
robberies | number of robberies in 1995 |
assaults | number of assaults in 1995 |
burglaries | number of burglaries in 1995 |
larcenies | number of larcenies in 1995 |
autoTheft | number of auto thefts in 1995 |
arsons | number of arsons in 1995 |
Package | Installation |
---|---|
tidyverse |
install.packages("tidyverse") |
Package | Installation |
---|---|
viridis |
install.packages("viridis") |
ggmap |
install.packages("ggmap") |
plotly |
install.packages("plotly") |
Facets
Function | Package | Description |
---|---|---|
facet_wrap() |
ggplot2 |
Create facets that wrap to fit the screen |
facet_grid() |
ggplot2 |
Create facets along one or more variables in a grid |
themes
Function | Package | Description |
---|---|---|
theme() |
ggplot2 |
Customize theme (see ?theme ) |
element_rect() |
ggplot2 |
Customize rect elements of theme |
element_line() |
ggplot2 |
Customize line elements of theme |
element_text() |
ggplot2 |
Customize text elements of theme |
element_blank() |
ggplot2 |
Remove elements from theme |
scales
Function | Package | Description |
---|---|---|
scale_x_*() , scale_y_*() |
ggplot2 |
Various functions to control the x- and y-axes |
scale_size_*() |
ggplot2 |
Various functions to control sizes |
scale_color_*() |
ggplot2 |
Various functions to control colors |
scale_fill_*() |
ggplot2 |
Various functions to control fill colors |
scale_alpha_*() |
ggplot2 |
Various functions to control color transparency |
colors
Function | Package | Description |
---|---|---|
viridis() |
viridis |
Generate colors from the viridis palette |
maps
Function | Package | Description |
---|---|---|
geom_polygon() |
ggplot2 |
Geom used to draw map elements |
register_google() |
ggmap |
Register Google API |
geocode() |
ggmap |
Extract geocode for location (e.g., city) |
plotly
Function | Package | Description |
---|---|---|
ggplotly() |
plotly |
Plotlify any ggplot plot (i.e., make it interactive) |
The main ggplot2
webpage at http://ggplot2.tidyverse.org/ has great tutorials and examples.
Check out Selva Prabhakaran’s website for a nice gallery of ggplot2 graphics http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html
ggplot2
is also great for making maps. For examples, check out Eric Anderson’s page at http://eriqande.github.io/rep-res-web/lectures/making-maps-with-R.html
from R Studio