adapted from paherald.sk.ca

Overview

In this practical you’ll practice customizing plots created using the ggplot2 package. By the end of this practical you will know how to:

  1. Use facetting to create multiple plots.
  2. Use scaling to alter the plots dimensions.
  3. Alter and store themes to adjust a plots appearance.
  4. Create multiple plots in one using the patchwork package.
  5. Create image files using ggsave().

Tasks

A - Setup

  1. Open your TheRBootcamp R project. It should already have the folders 1_Data and 2_Code. Make sure that the data files listed in the Datasets section above are in your 1_Data folder.
# Done!
  1. Open a new R script. At the top of the script, using comments, write your name and the date. Save it as a new file called plottingII_practical.R in the 2_Code folder.

  2. Using library() load the set of packages for this practical listed in the functions section above.

## NAME
## DATE
## Plotting II Practical

library(XX)
library(XX)
library(XX)
  1. For this practical, we’ll use the crime.csv data set, containing crime data of US counties across various states. Using read_csv(), load the data into R and store it as a new object called crime.
crime <- read_csv("1_Data/crime.csv")
  1. Take a look at the first few rows of the data set(s) by printing them to the console.
crime
# A tibble: 1,071 × 36
   communityname    state population householdsize pctUrban medIncome pctWSocSec
   <chr>            <chr>      <dbl>         <dbl>    <dbl>     <dbl>      <dbl>
 1 BerkeleyHeights… NJ         11980          3.1       100     75122       23.6
 2 Marpletownship   PA         23123          2.82      100     47917       35.5
 3 Norwoodtown      MA         28700          2.6       100     42805       30.2
 4 Wacocity         TX        103590          2.62      100     17852       29.1
 5 Shermancity      TX         31601          2.54      100     24763       32.7
 6 SanPablocity     CA         25158          2.89      100     25479       23.0
 7 Glendalecity     CA        180038          2.62      100     34372       20.3
 8 Worthingtoncity  OH         14869          2.67      100     49851       23.8
 9 Arlingtoncity    TX        261721          2.6       100     35048       11.0
10 Marinacity       CA         26436          3.34      100     29043       10.7
# … with 1,061 more rows, and 29 more variables: pctWRetire <dbl>,
#   whitePerCap <dbl>, blackPerCap <dbl>, AsianPerCap <dbl>, HispPerCap <dbl>,
#   PctPopUnderPov <dbl>, PctNotHSGrad <dbl>, PctUnemployed <dbl>,
#   TotalPctDiv <dbl>, PersPerFam <dbl>, PctWorkMom <dbl>, NumImmig <dbl>,
#   PctImmigRecent <dbl>, PctNotSpeakEnglWell <dbl>, RentMedian <dbl>,
#   NumInShelters <dbl>, NumStreet <dbl>, PctForeignBorn <dbl>,
#   PctBornSameState <dbl>, LandArea <dbl>, PopDens <dbl>,
#   PctUsePubTrans <dbl>, murders <dbl>, robberies <dbl>, assaults <dbl>,
#   burglaries <dbl>, larcenies <dbl>, autoTheft <dbl>, arsons <dbl>

B - Create facets

  1. To begin with, create a basic scatter plot (using geom_points()) pitting each of the 7 crime indicators (murders, robberies, assaults, burglaries, larcenies, autoTheft, arsons) against the percentage of people using public transportation (PctUsePubTrans).
ggplot(data = crime,
       mapping = aes(x = XX, y = XX)) + 
  geom_point()

# and so on
  1. None of these looked very informative, right? This is because the counts of the each of the crime measures is heavily right skewed. You will learn more about scaling later, but for now add scale_y_log10() to fix this. Run the plots again.
ggplot(data = crime,
       mapping = aes(x = XX, y = XX)) + 
  geom_point() + 
  scale_y_log10()

# and so on
  1. This should have been more telling. Pretty much all crimes seem to have been positively related to the percentage of individuals using public transportation. Interesting! But wasn’t it a bit of a pain to derive this insight by created 7 separate plots. Let’s fix this using facets. To do this, first, create a long version of the crime data set called crime_long, using the code below. (Note the use of crime_vars as a positive selector for gather()).
# vector of crime variables
crime_vars = c("murders","robberies","assaults","burglaries","larcenies","autoTheft","arsons")

# transform to long
crime_long <- crime %>% 
  pivot_longer(names_to = "crime_var",
               values_to = "frequency",
               cols = all_of(crime_vars))
  1. Using the the crime_long data set, you can now make use of the amazing power of ggplot2’s facet functions, such as facet_wrap(). Use facet_wrap() to automatically plot crime frequency against the percentage of people using public transportation for each of the crime variables.
ggplot(data = crime_long,
       mapping = aes(x = XX, y = XX)) + 
  geom_point() + 
  scale_y_log10() + 
  facet_wrap(~ XX)
ggplot(data = crime_long,
       mapping = aes(x = PctUsePubTrans, y = frequency)) + 
  geom_point() + 
  scale_y_log10() + 
  facet_wrap(~ crime_var)

  1. This was much more efficient, right? Now explore the relationship of frequency to other variables, such as medIncome, TotalPctDiv, or PctNotHSGrad, for each of the crime measures. What variables do predict, which kind of crime? Explore!

C - Customize plots using theme()

Now that we have an informative plot, let’s focus on making it a bit more “pretty”", using ggplot’s theme() function. The goal is to create a plot that looks like the plot below.

crime_facets <- ggplot(data = crime_long,
       mapping = aes(x = PctUsePubTrans, y = frequency)) + 
  geom_point() + 
  scale_y_log10() + 
  facet_wrap(~ crime_var) +
  theme(
    panel.background = element_rect(fill='white'),
    panel.grid.major = element_line(color = 'grey75',
                                    size = .25),
    panel.grid.minor = element_line(color = 'grey75',
                                    size = .1),
    strip.background = element_rect(fill='white'),
    strip.text = element_text(face='italic', size=12, hjust=1),
    axis.title.y = element_text(size=12,margin=margin(r = 10)),
    axis.title.x = element_text(size=12,margin=margin(t = 10)),
    panel.spacing = unit(1.1, "lines")) + 
  labs(x = '% public transportation', y = 'Crime frequency')
crime_facets

  1. To begin with store one of the facetted plots of section B as crime_facets.
crime_facets <- XX
  1. Now let’s begin changing its appearance. First, change the color of the background to "white" of the panel using the panel.background argument and the element_rect() function.
crime_facets + 
  theme(
    panel.background = element_rect(fill = XX)
    )
crime_facets + 
  theme(
    panel.background = element_rect(fill = 'white')
    )

  1. Next, change the major and minor grid lines to color "grey75" and sizes .25 and .1, respectively, using the panel.grid.major and panel.grid.minor arguments and the element_line() function.
crime_facets + 
  theme(
    panel.background = element_rect(fill = XX),
    panel.grid.major = element_line(color = XX, size = XX),
    panel.grid.minor = element_line(color = XX, size = XX)
    )
crime_facets + 
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'grey75', size = .25),
    panel.grid.minor = element_line(color = 'grey75', size = .1)
    )

  1. Next, change the strip background - the background of the panel headers - to color "white" using the strip.background argument and the element_rect() function.
crime_facets + 
  theme(
    panel.background = element_rect(fill = XX),
    panel.grid.major = element_line(color = XX, size = XX),
    panel.grid.minor = element_line(color = XX, size = XX),
    strip.background = element_rect(fill = XX),
    )
crime_facets + 
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'grey75', size = .25),
    panel.grid.minor = element_line(color = 'grey75', size = .1),
    strip.background = element_rect(fill = 'white')
    )

  1. Next, change the font in the strip to "italic", adjust it to the right side, and set size to 12 using the strip.text argument and the element_text() function. See ?element_text().
crime_facets + 
  theme(
    panel.background = element_rect(fill = XX),
    panel.grid.major = element_line(color = XX, size = XX),
    panel.grid.minor = element_line(color = XX, size = XX),
    strip.background = element_rect(fill = XX),
    strip.text = element_text(face = XX, size = XX, hjust = XX)
    )
crime_facets + 
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'grey75', size = .25),
    panel.grid.minor = element_line(color = 'grey75', size = .1),
    strip.background = element_rect(fill = 'white'),
    strip.text = element_text(face = 'italic', size = 12, hjust = 1)
    )

  1. Next, set the font size of the axis labels also to 12 and add a margin of 10 to the top and right side, respectively, of the labels respectively, using axis.title.x and axis.title.y functions and the element_text() and margin functions. See ?margins().
crime_facets + 
  theme(
    panel.background = element_rect(fill = XX),
    panel.grid.major = element_line(color = XX, size = XX),
    panel.grid.minor = element_line(color = XX, size = XX),
    strip.background = element_rect(fill = XX),
    strip.text = element_text(face = XX, size = XX, hjust = XX),
    axis.title.x = element_text(size = XX, margin = margin(t = XX)),
    axis.title.y = element_text(size = XX, margin = margin(r = XX)),
    )
crime_facets + 
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'grey75', size = .25),
    panel.grid.minor = element_line(color = 'grey75', size = .1),
    strip.background = element_rect(fill = 'white'),
    strip.text = element_text(face = 'italic', size = 12, hjust = 1),
    axis.title.x = element_text(size = 12, margin = margin(t = 10)),
    axis.title.y = element_text(size = 12, margin = margin(r = 10))
    )

  1. Finally, increase the spacing between the panels slightly by setting the space between to 1.1 "lines" using the panel.spacing argument and the unit function.
crime_facets + 
  theme(
    panel.background = element_rect(fill = XX),
    panel.grid.major = element_line(color = XX, size = XX),
    panel.grid.minor = element_line(color = XX, size = XX),
    strip.background = element_rect(fill = XX),
    strip.text = element_text(face = XX, size = XX, hjust = XX),
    axis.title.x = element_text(size = XX, margin = margin(t = XX)),
    axis.title.y = element_text(size = XX, margin = margin(r = XX)),
    panel.spacing = unit(XX, units = XX)
    )
crime_facets + 
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'grey75', size = .25),
    panel.grid.minor = element_line(color = 'grey75', size = .1),
    strip.background = element_rect(fill = 'white'),
    strip.text = element_text(face = 'italic', size = 12, hjust = 1),
    axis.title.x = element_text(size = 12, margin = margin(t = 10)),
    axis.title.y = element_text(size = 12, margin = margin(r = 10)),
    panel.spacing = unit(1.1, units = "lines")
    )

  1. Did you manage to reproduce the plot above? One other thing seems missing. Add appropriate labels using the labs() function.
crime_facets + 
  theme(
    panel.background = element_rect(fill = XX),
    panel.grid.major = element_line(color = XX, size = XX),
    panel.grid.minor = element_line(color = XX, size = XX),
    strip.background = element_rect(fill = XX),
    strip.text = element_text(face = XX, size = XX, hjust = XX),
    axis.title.x = element_text(size = XX, margin = margin(t = XX)),
    axis.title.y = element_text(size = XX, margin = margin(r = XX)),
    panel.spacing = unit(XX, units = XX)
    ) + 
  labs(x = XX, y = XX)
crime_facets + 
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'grey75', size = .25),
    panel.grid.minor = element_line(color = 'grey75', size = .1),
    strip.background = element_rect(fill = 'white'),
    strip.text = element_text(face = 'italic', size = 12, hjust = 1),
    axis.title.x = element_text(size = 12, margin = margin(t = 10)),
    axis.title.y = element_text(size = 12, margin = margin(r = 10)),
    panel.spacing = unit(1.1, units = "lines")
    ) + 
  labs(x = '% public transportation', y = 'Crime frequency')

D - Customize plots using theme()

  1. When you managed to reproduce the target theme, save all of the theme setting in an independent object called crime_theme.
crime_theme <- theme(
  XX = XX,
  XX = XX,
  ...
  )
crime_theme <- theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'grey75', size = .25),
    panel.grid.minor = element_line(color = 'grey75', size = .1),
    strip.background = element_rect(fill = 'white'),
    strip.text = element_text(face = 'italic', size = 12, hjust = 1),
    axis.title.x = element_text(size = 12, margin = margin(t = 10)),
    axis.title.y = element_text(size = 12, margin = margin(r = 10)),
    panel.spacing = unit(1.1, units = "lines")
    )
  1. Now create new plots with different variables on x-axis and simply add the crime_theme in order to apply your personalized theme.
new_crime_plot + crime_theme
  1. If you don’t like your theme, go back and make changes to it, and then apply your new theme onto your plots. Go explore! Try out other arguments of theme() (see ?theme), such as axis.ticks or strip.placement.

E - Scaling

When creating a plot ggplot automatically chooses sensible dimensions for your plot in terms of x- and y-axis limits, geom sizes, or colors. However, all of these aspects of the plot can also be controlled manually or semi-manually using various scale_* functions.

  1. Before playing around with them, add one more element to your plot, which will help you to realize the importance of scaling. That is, color the points according to state by mapping the state variable onto the color argument and size the points according to the county’s population by mapping the population variable onto the size argument. Store the resulting plot in an object called crime_plot.
crime_plot <- 
  ggplot(data = crime_long,
       mapping = aes(x = XX, y = XX,
                     color = XX, size = XX)) + 
  geom_point() + 
  scale_y_log10() + 
  facet_wrap(~ XX) + 
  crime_theme
crime_plot <- ggplot(data = crime_long,
       mapping = aes(x = PctUsePubTrans, y = frequency,
                     color = state, size = population)) + 
  geom_point() + 
  scale_y_log10() + 
  facet_wrap(~ crime_var) + 
  crime_theme
crime_plot

  1. First, using scale_size() and the range argument, change the scaling of the points to reduce the degree of overlap among the points (see ?scale_size). Try out a few numbers (smaller than 10) to create a version of the plot with a decent trade-off between point size and point overlap.
crime_plot + scale_size(range = c(XX, XX))
crime_plot + scale_size(range = c(.5, 3))

  1. You may find that still some of the larger points are cropped off at the upper end of the panels. Fix this by increasing the y-axis limits using the scale_y_log10() function. Set the limits to 0 and 2e+5 (i.e., 200,000). (Note that R will tell you that this will overwrite the previous use of scale_y_log10(), which is what we intend to do).
crime_plot + 
  scale_size(range = c(XX, XX)) + 
  scale_y_continuous(limits = c(XX, XX))
crime_plot + 
  scale_size(range = c(.5, 3)) + 
  scale_y_log10(limits = c(1, 2e+5))

  1. Next, change the colors to a different, possibly more appropriate color scheme. One way to this is via the scale_color_gradient() or similar functions. Another is to use a specific, pre-defined scheme, such as scale_color_colorblind(). Use the latter. You will see that the colors have much more contrast making it distinguishing the colors based on luminescence alone easier.
crime_plot + 
  scale_size(range = c(XX, XX)) + 
  scale_y_log10(limits = c(1, XX)) + 
  scale_color_colorblind()
crime_plot + 
  scale_size(range = c(.5, 3)) + 
  scale_y_log10(limits = c(1, 2e+5)) + 
  scale_color_colorblind()

  1. Another approach to changing colors is to supply them manually, e.g., using scale_color_manual(). Try assigning your own choice of colors. You may pick them from colors() or generate them using, for instance, the viridis function from the viridis package (you may need to run install.packages('viridis') before using it), which provides an optimized set of colors designed to be (1) colorful, (2) perceptually uniform, (3) robust to colorblindness, (4) and pretty. Take the latter approach, i.e., use the viridis() function to generate colors, in the context of the scale_color_manual() function.
crime_plot + 
  scale_size(range = c(XX, XX)) + 
  scale_y_log10(limits = c(1, XX)) + 
  scale_color_manual(values = viridis(7))
crime_plot + 
  scale_size(range = c(.5, 3)) + 
  scale_y_log10(limits = c(1, 2e+5)) + 
  scale_color_manual(values = viridis(7))

  1. Alright the plot fairly pretty and readable now. But there is always more to be done and tastes differ, of course. Go explore!

F - Creating image files

  1. When you have found a plot that suits your taste, it’s time to save it as an image file. Store your favorite plot in a new object called crime_final.
crime_final <- ggplot(...) + ... # Include your plotting code here
  1. Run your crime_final object to see that it does indeed contain your plot.

  2. Save your plot to a .pdf-file called crime_final using ggsave(). When you finish, find your plot in 3_Figures and open it to see how it looks!

# Save crime_final to a pdf file
ggsave(filename = "crime_plot", 
       plot = crime_final,
       device = "pdf", 
       path = '3_Figures',
       width = 4, 
       height = 4, 
       units = "in")
  1. Play around with the width and height arguments to change the dimensions of the plot.

  2. Customize the code to create a .png image.

X - Challenges: Multiple Plots

In this section, you can play around with the patchwork package (don’t forget to load it) do combine multiple plots into a single one.

  1. Create three different plots that plot frequency against different variables in our data set and call these three crime_a, crime_b, und crime_c

  2. Now, combine all three plots horizontally using verbrechen_a + verbrechen_b + verbrechen_c.

  3. Or, place the third plot below the first two verbrechen_a + verbrechen_b / verbrechen_c.

  4. Now change the theme by applying theme_void() to all three using the &-Operator.

  5. Finally, save your plot as a pdf using ggsave().

Examples

# ggplot2 -----------------------

library(tidyverse) # Load tidyverse (contains ggplot2!)

# create a scatter plot of highway miles per gallon against engine displacement
ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy)) +
       geom_point()  

# Store plot objects ------------

# store
my_plot <- ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy)) +
       geom_point()  

# evaluate (aka plot)
my_plot

# Facets ------------

# create separate plots for each car class
my_plot <- my_plot + facet_wrap(~class)

# plot
my_plot

# Customize themes ------------

# change panel background to 'green'
my_plot +
  theme(
    panel.background = element_rect(fill='green')
  )

# change grid lines
my_plot +
  theme(
    panel.grid.major = element_line(color = 'red', size = 2),
    panel.grid.minor = element_line(color = 'blue', size = 1)
  )

# change strip background and text
my_plot +
  theme(
    strip.background = element_rect(fill = 'blue'),
    strip.text = element_text(face = 'bold', size = 12)
  )

# change axis titles
my_plot +
  theme(
    axis.title.y = element_text(size = 12, margin = margin(r = 10)),
    axis.title.x = element_text(size = 12, margin = margin(t = 10))
  )

# change panel spacing
my_plot +
  theme(
    panel.spacing = unit(2, "lines")
  )

# Store themes ------------

# create theme
my_theme <- theme(
  panel.background = element_rect(fill='green'),
  panel.grid.major = element_line(color = 'red', size = 2),
  panel.grid.minor = element_line(color = 'blue', size = 1),
  strip.background = element_rect(fill = 'blue'),
  strip.text = element_text(face = 'bold', size = 12),
  strip.background = element_rect(fill = 'blue'),
  strip.text = element_text(face = 'bold', size = 12),
  axis.title.y = element_text(size = 12, margin = margin(r = 10)),
  axis.title.x = element_text(size = 12, margin = margin(t = 10)),
  panel.spacing = unit(2, "lines")
)

# apply theme
my_plot + my_theme # no parentheses

# Scaling ------------

# change x-axis scaling
my_plot + scale_x_continuous(limits = c(0, 10))

# change coloring
ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy,
                     color = class)) +
       geom_point() +
  scale_color_manual(values = viridis(7))
  
# Create image files ------------

# create pdf of my_plot
ggsave(filename = "my_plot_name", 
       plot = my_plot,
       device = "pdf", 
       path = 'plotting_folder',
       width = 4, 
       height = 4, 
       units = "in")

Datasets

File Rows Columns
crime.csv 1071 36

The crime data set is subsets of the Communities and Crime Unnormalized Data Set data set from the UCI Machine Learning Repository. Find variable descriptions below or at Communities and Crime Unnormalized Data Set

Variable descriptions

Variable Description
communityname Community name
state US state (by 2 letter postal abbreviation)
population population for community
householdsize mean people per household
pctUrban number of people living in areas classified as urban
medIncome median household income
pctWSocSec percentage of households with social security income in 1989
pctWRetire percentage of households with retirement income in 1989
whitePerCap per capita income for caucasians
blackPerCap per capita income for african americans
AsianPerCap per capita income for people with asian
HispPerCap per capita income for people with hispanic heritage
PctPopUnderPov percentage of people under the poverty level
PctNotHSGrad percentage of people 25 and over that are not high school graduates
PctUnemployed percentage of people 16 and over, in the labor force, and unemployed
TotalPctDiv percentage of population who are divorced
PersPerFam mean number of people per family
PctWorkMom percentage of moms of kids under 18 in labor force
NumImmig total number of people known to be foreign born
PctImmigRecent percentage of immigrants who immigated within last 3 years
PctNotSpeakEnglWell percent of people who do not speak English well
RentMedian rental housing - median rent
NumInShelters number of people in homeless shelters
NumStreet number of homeless people counted in the street
PctForeignBorn percent of people foreign born
PctBornSameState percent of people born in the same state as currently living
LandArea land area in square miles
PopDens population density in persons per square mile
PctUsePubTrans percent of people using public transit for commuting
murders number of murders in 1995
robberies number of robberies in 1995
assaults number of assaults in 1995
burglaries number of burglaries in 1995
larcenies number of larcenies in 1995
autoTheft number of auto thefts in 1995
arsons number of arsons in 1995

Functions

Packages

Package Installation
tidyverse install.packages("tidyverse")
patchwork install.packages("patchwork")

Optional

Package Installation
viridis install.packages("viridis")

Functions

Facets

Function Package Description
facet_wrap() ggplot2 Create facets that wrap to fit the screen
facet_grid() ggplot2 Create facets along one or more variables in a grid

themes

Function Package Description
theme() ggplot2 Customize theme (see ?theme)
element_rect() ggplot2 Customize rect elements of theme
element_line() ggplot2 Customize line elements of theme
element_text() ggplot2 Customize text elements of theme
element_blank() ggplot2 Remove elements from theme

scales

Function Package Description
scale_x_*(), scale_y_*() ggplot2 Various functions to control the x- and y-axes
scale_size_*() ggplot2 Various functions to control sizes
scale_color_*() ggplot2 Various functions to control colors
scale_fill_*() ggplot2 Various functions to control fill colors
scale_alpha_*() ggplot2 Various functions to control color transparency

colors

Function Package Description
viridis() viridis Generate colors from the viridis palette

Resources

Documentation

Cheatsheet


from R Studio