How to create plots with subplots in R

data science R visualization tutorial

Some tips on creating figures with multiple panels in R

Matti Vuorre https://vuorre.netlify.com (University of Oxford)https://www.oii.ox.ac.uk/people/matti-vuorre/
2016-03-15

Visualizations are great for learning from data and communicating the results of a statistical investigation. In this post, I illustrate how to create small multiples from data using R and ggplot2.

Small multiples display the same basic plot for many different groups simultaneously. For example, a data set might consist of a X ~ Y correlation measured simultaneously in many countries; small multiples display each country’s correlation in its own panel. Similarly, you might have conducted a within-individuals experiment, and would like to display the effects of the repeated-measures factors simultaneously at the average level, and at the individual level—thus showing each individual’s results in a separate panel. Whenever you would like to show the same figure, but separately for many subsets of the data, the appropriate google term is “small multiples”.

We’ll use the following R packages:

Example Data

The data I’ll use here consist of responses to the Big 5 personality questionnaire from various demographic groups, and is from the psych R package. I’ve computed a mean for each subscale:

dat <- as_tibble(bfi)
Table 1: Example data (from psych package)
gender education age Extraversion Openness Agreeableness Neuroticism Conscientiousness
Female some college 21 4.00 3.8 5.6 3.0 4.4
Male HS 19 3.20 3.4 2.8 4.2 3.0
Male some HS 19 3.75 5.0 3.8 3.6 4.8
Male some HS 21 3.00 4.4 4.8 3.0 3.4
Male some HS 17 4.20 4.4 2.8 2.6 3.8
Male graduate degree 68 2.40 3.8 4.6 2.0 3.6

Univariate plots

I’ll start with displaying histograms of the outcome variables (the individual-specific Big 5 category means). Picking up a variable to plot in ggplot2 is done by specifying the column to plot, so to select a specific Big 5 category, I just tell ggplot2 to plot it on the x axis.

ggplot(dat, aes(x = Openness)) +
  geom_histogram() +
  # Fix bars to y=0
  scale_y_continuous(expand = expansion(c(0, 0.05)))

Long format data

Next, we’ll be drawing the same figure, but display all Big 5 categories using small multiples. ggplot2 calls small multiples “facets,” and the operation is conceptually to subset the input data frame by values found in one of the data frame’s columns.

The key to using facets in ggplot2 is to make sure that the data is in long format; I would like to display histograms of each category in separate facets, so I’ll need to reshape the data from wide (each category in its own column) to long form (a column with category labels, and another with the value).

dat_long <- dat %>%
  pivot_longer(Extraversion:Conscientiousness, names_to = "Scale")
Table 2: Example data in long format.
gender education age Scale value
Female some college 21 Extraversion 4.0
Female some college 21 Openness 3.8
Female some college 21 Agreeableness 5.6
Female some college 21 Neuroticism 3.0
Female some college 21 Conscientiousness 4.4
Male HS 19 Extraversion 3.2

The values for each Big 5 categories are now in the same column, called value. Each observation, or row in the data, contains all variables associated with that observation. This is the essence of long form data. We can now use the Scale variable to subset the data to subplots for each category.

Basic facets

Display all scales in small multiples

Now that value holds all mean Big 5 category values, asking ggplot() to plot it on the x-axis is not too meaningful. However, because we have another column identifying each observations’ (row) category, we can pass it to facet_wrap() to split the histograms by category. Making use of the long data form with facets is easy:

ggplot(dat_long, aes(x = value)) +
  geom_histogram(fill = "grey20", col = "white") +
  facet_wrap("Scale") +
  scale_y_continuous(expand = expansion(c(0, 0.05)))

Perfect! The same works for any arbitrary variable that we can think of as a meaningful grouping factor.

Display different education levels’ openness in small multiples

Because the value column contains values of all scales, I need to specify which scale to display by subsetting the data. I use data wrangling verbs from the dplyr package to subset the data on the fly, and pass the resulting objects to further functions using the pipe operator %>%.

# Filter out all rows where category is "openness", and pass forward
filter(dat_long, Scale == "Openness") %>%
  # Place value on x-axis
  ggplot(aes(x = value)) +
  scale_y_continuous(expand = expansion(c(0, 0.05))) +
  # Histogram
  geom_histogram(fill = "grey20") +
  # Facet
  facet_wrap("education")

That didn’t quite work, because in an observational study such as this one, the design is far from balanced; each education category has a different number of observations and thus the y-axis scales are different.

Adjusting facet scales

I can ask facet_wrap() to use different axis scales for each subplot. Note also that we can access the last plot using a shortcut:

last_plot() +
  facet_wrap("education", scales = "free_y")

Brilliant.

Grid of facets

We repeatedly called facet_wrap("variable") to separate the plot to several facets, based on variable. However, we’re not restricted to one facetting variable, and can enter multiple variables simultaneously. To illustrate, I’ll plot all categories separately for each gender, using facet_grid()

last_plot() +
  facet_grid(gender ~ education, scales = "free_y")

The argument to the left of the tilde in facet_grid() specifies the rows (here gender), the one after the tilde specifies the columns.

Ordering facets

Sometimes it is helpful to convey information through structure. One way to do this with subplots is to arrange the subplots in a meaningful manner, such as a data summary, or even a summary statistic. Ordering subplots allows the observer to quickly learn more from the figure, even though it still presents the same information, only differently arranged.

Order facets by number of observations

To order subplots, we need to add the variable that we would like to order by to the data frame. Here we add a “number of observations” column to the data frame, then order the facetting variable on that variable. The following code snippet takes all openness-rows, calculates the number of observations for each education level, and reorders the education factor based on the number. The result is visible in a figure where the number of observations in each facet increases from top left to bottom right.

dat_long %>%
  filter(Scale == "Openness") %>%
  add_count(education) %>%
  mutate(education = reorder(education, n)) %>% # The important bit
  ggplot(aes(x = value)) +
  scale_y_continuous(expand = expansion(c(0, 0.05))) +
  geom_histogram(fill = "grey20") +
  facet_wrap("education", scales = "free_y", nrow = 1)

Support this work

Software used

The following software packages were used in this blog post: R [Version 4.0.3; R Core Team (2020)] and the R-packages dplyr [Version 1.0.5; Wickham et al. (2021)], forcats [Version 0.5.1; Wickham (2021a)], ggplot2 [Version 3.3.3; Wickham (2016)], knitr [Version 1.31; Xie (2015)], psych [Version 2.0.12; Revelle (2020)], purrr [Version 0.3.4; Henry and Wickham (2020)], readr [Version 1.4.0; Wickham and Hester (2020)], scales [Version 1.1.1; Wickham and Seidel (2020)], stringr [Version 1.4.0; Wickham (2019)], tibble [Version 3.1.0; Müller and Wickham (2021)], tidyr [Version 1.1.3; Wickham (2021b)], and tidyverse [Version 1.3.0; Wickham et al. (2019)].

Henry, Lionel, and Hadley Wickham. 2020. Purrr: Functional Programming Tools. https://CRAN.R-project.org/package=purrr.
Müller, Kirill, and Hadley Wickham. 2021. Tibble: Simple Data Frames. https://CRAN.R-project.org/package=tibble.
R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Revelle, William. 2020. Psych: Procedures for Psychological, Psychometric, and Personality Research. Evanston, Illinois: Northwestern University. https://CRAN.R-project.org/package=psych.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
———. 2019. Stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr.
———. 2021a. Forcats: Tools for Working with Categorical Variables (Factors). https://CRAN.R-project.org/package=forcats.
———. 2021b. Tidyr: Tidy Messy Data. https://CRAN.R-project.org/package=tidyr.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2021. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, and Jim Hester. 2020. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.
Wickham, Hadley, and Dana Seidel. 2020. Scales: Scale Functions for Visualization. https://CRAN.R-project.org/package=scales.
Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.

References

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/mvuorre/mvuorre.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Vuorre (2016, March 15). Sometimes I R: How to create plots with subplots in R. Retrieved from https://mvuorre.github.io/posts/2016-03-15-ggplot-plots-subplots/

BibTeX citation

@misc{vuorre2016how,
  author = {Vuorre, Matti},
  title = {Sometimes I R: How to create plots with subplots in R},
  url = {https://mvuorre.github.io/posts/2016-03-15-ggplot-plots-subplots/},
  year = {2016}
}