Meta-analysis is a special case of Bayesian multilevel modeling

Recently, there’s been a lot of talk about meta-analysis, and here I would just like to quickly show that Bayesian multilevel modeling nicely takes care of your meta-analysis needs, and that it is easy to do in R with the rstan and brms packages. As you’ll see, meta-analysis is a special case of Bayesian multilevel modeling when you are unable or unwilling to put a prior distribution on the meta-analytic effect size estimate.

The idea for this post came from Wolfgang Viechtbauer’s website, where he compared results for meta-analytic models fitted with his great (frequentist) package metafor and the swiss army knife of multilevel modeling, lme4. It turns out that even though you can fit meta-analytic models with lme4, the results are slightly different from traditional meta-analytic models, because the experiment-wise variances are treated slightly differently.

Here are the packages we’ll use:

Here I’ll only focus on a simple random effects meta-analysis of effect sizes, and will use the same example data as in the aforementioned website. The data are included in the metafor package, and describe the relationship between conscientiousness and medication adherence. The effect sizes are r to z transformed correlations.

study | year | ni | ri | yi | vi | sei |
---|---|---|---|---|---|---|

Axelsson et al. (2009) | 2009 | 109 | 0.19 | 0.19 | 0.01 | 0.10 |

Axelsson et al. (2011) | 2011 | 749 | 0.16 | 0.16 | 0.00 | 0.04 |

Bruce et al. (2010) | 2010 | 55 | 0.34 | 0.35 | 0.02 | 0.14 |

Christensen et al. (1995) | 1995 | 72 | 0.27 | 0.28 | 0.01 | 0.12 |

Christensen et al. (1999) | 1999 | 107 | 0.32 | 0.33 | 0.01 | 0.10 |

Cohen et al. (2004) | 2004 | 65 | 0.00 | 0.00 | 0.02 | 0.13 |

Dobbels et al. (2005) | 2005 | 174 | 0.17 | 0.18 | 0.01 | 0.08 |

Ediger et al. (2007) | 2007 | 326 | 0.05 | 0.05 | 0.00 | 0.06 |

Insel et al. (2006) | 2006 | 58 | 0.26 | 0.27 | 0.02 | 0.13 |

Jerant et al. (2011) | 2011 | 771 | 0.01 | 0.01 | 0.00 | 0.04 |

Moran et al. (1997) | 1997 | 56 | -0.09 | -0.09 | 0.02 | 0.14 |

O’Cleirigh et al. (2007) | 2007 | 91 | 0.37 | 0.39 | 0.01 | 0.11 |

Penedo et al. (2003) | 2003 | 116 | 0.00 | 0.00 | 0.01 | 0.09 |

Quine et al. (2012) | 2012 | 537 | 0.15 | 0.15 | 0.00 | 0.04 |

Stilley et al. (2004) | 2004 | 158 | 0.24 | 0.24 | 0.01 | 0.08 |

Wiebe & Christensen (1997) | 1997 | 65 | 0.04 | 0.04 | 0.02 | 0.13 |

We are going to fit a random-effects meta-analysis model to these observed effect sizes and their standard errors. Here’s what this model looks like, loosely following notation from the R package Metafor’s manual (p.6):

\[y_i \sim N(\theta_i, \sigma_i^2)\]

where each recorded effect size, \(y_i\) is a draw from a normal distribution which is centered on that study’s “true” effect size \(\theta_i\) and has standard deviation equal to the study’s observed standard error \(\sigma_i\).

Our next set of assumptions is that the studies’ true effect sizes approximate some underlying effect size in the (hypothetical) population of all studies. We call this underlying population effect size \(\mu\), and its standard deviation \(\tau\), such that the true effect sizes are thus distributed:

\[\theta_i \sim N(\mu, \tau^2)\]

We now have two interesting parameters: \(\mu\) tells us, all else being equal, what I may expect the “true” effect to be, in the population of similar studies. \(\tau\) tells us how much individual studies of this effect vary.

I think it is most straightforward to write this model as yet another mixed-effects model (metafor manual p.6):

\[y_i \sim N(\mu + \theta_i, \sigma^2_i)\]

where \(\theta_i \sim N(0, \tau^2)\), studies’ true effects are normally distributed with between-study heterogeneity \(\tau^2\). The reason this is a little confusing (to me at least), is that we know the \(\sigma_i\)s (this being the fact that separates meta-analysis from other more common regression modeling).

Super easy!

```
Random-Effects Model (k = 16; tau^2 estimator: REML)
logLik deviance AIC BIC AICc
8.6096 -17.2191 -13.2191 -11.8030 -12.2191
tau^2 (estimated amount of total heterogeneity): 0.0081 (SE = 0.0055)
tau (square root of estimated tau^2 value): 0.0901
I^2 (total heterogeneity / total variability): 61.73%
H^2 (total variability / sampling variability): 2.61
Test for Heterogeneity:
Q(df = 15) = 38.1595, p-val = 0.0009
Model Results:
estimate se zval pval ci.lb ci.ub
0.1499 0.0316 4.7501 <.0001 0.0881 0.2118 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

So far so good, we’re strictly in the realm of standard meta-analysis. But I would like to propose that instead of using custom meta-analysis software, we simply consider the above model as just another regression model, and fit it like we would any other (multilevel) regression model. That is, using Stan, usually through the brms interface. Going Bayesian allows us to assign prior distributions on the population-level parameters \(\mu\) and \(\tau\), and we would usually want to use some very mildly regularizing priors. Here we proceed with brms’ default priors (which I print below with the output)

Here’s how to fit this model with brms:

```
brm_out <- brm(
yi | se(sei) ~ 1 + (1 | study),
data = dat,
cores = 4,
file = "metaanalysismodel"
)
```

```
Family: gaussian
Links: mu = identity; sigma = identity
Formula: yi | se(sei) ~ 1 + (1 | study)
Data: dat (Number of observations: 16)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Priors:
Intercept ~ student_t(3, 0.2, 2.5)
sd ~ student_t(3, 0, 2.5)
Group-Level Effects:
~study (Number of levels: 16)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.10 0.04 0.04 0.18 1.00 1182 1722
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.15 0.04 0.08 0.22 1.00 2175 2094
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.00 0.00 0.00 0.00 1.00 4000 4000
Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
```

These results are the same as the ones obtained with metafor. Note the Student’s *t* prior distributions, which are diffuse enough not to exert influence on the posterior distribution.

We can now compare the results of these two estimation methods. Of course, the Bayesian method has a tremendous advantage, because it results in a full distribution of plausible values.

We can see from the numeric output, and especially the figures, that these modes of inference yield the same numerical results. Keep in mind though, that the Bayesian estimates actually allow you to discuss probabilities, and generally the things that we’d like to discuss when talking about results.

For example, what is the probability that the average effect size is greater than 0.2? About eight percent:

```
hypothesis(brm_out, "Intercept > 0.2")
```

```
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (Intercept)-(0.2) > 0 -0.05 0.04 -0.1 0.01 0.08 0.08
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
```

The forest plot displays the entire posterior distribution of each \(\theta_i\). The meta-analytic effect size \(\mu\) is also displayed in the bottom row. I’ll show a considerable amount of code here so that you can create your own forest plots from brms output:

```
library(tidybayes)
library(ggdist)
# Study-specific effects are deviations + average
out_r <- spread_draws(brm_out, r_study[study,term], b_Intercept) %>%
mutate(b_Intercept = r_study + b_Intercept)
# Average effect
out_f <- spread_draws(brm_out, b_Intercept) %>%
mutate(study = "Average")
# Combine average and study-specific effects' data frames
out_all <- bind_rows(out_r, out_f) %>%
ungroup() %>%
# Ensure that Average effect is on the bottom of the forest plot
mutate(study = fct_relevel(study, "Average")) %>%
# tidybayes garbles names so fix here
mutate(study = str_replace_all(study, "\\.", " "))
# Data frame of summary numbers
out_all_sum <- group_by(out_all, study) %>%
mean_qi(b_Intercept)
# Draw plot
out_all %>%
ggplot(aes(b_Intercept, study)) +
# Zero!
geom_vline(xintercept = 0, size = .25, lty = 2) +
stat_halfeye(.width = c(.8, .95), fill = "dodgerblue") +
# Add text labels
geom_text(
data = mutate_if(out_all_sum, is.numeric, round, 2),
aes(label = str_glue("{b_Intercept} [{.lower}, {.upper}]"), x = 0.75),
hjust = "inward"
) +
# Observed as empty points
geom_point(
data = dat %>% mutate(study = str_replace_all(study, "\\.", " ")),
aes(x=yi), position = position_nudge(y = -.2), shape = 1
)
```

Focus on Moran et al. (1997)’s observed effect size (the empty circle): This is an anomalous result compared to all other studies. One might describe it as *incredible*, and that is indeed what the bayesian estimation procedure has done, and the resulting posterior distribution is no longer equivalent to the observed effect size. Instead, it is shrunken toward the average effect size. Now look at the table above, this study only had 56 participants, so we should be more skeptical of this study’s observed ES, and perhaps we should then adjust our beliefs about this study in the context of other studies. Therefore, our best guess about this study’s effect size, given all the other studies is no longer the observed mean, but something closer to the average across the studies.

If this shrinkage business seems radical, consider Quine et al. (2012). This study had a much greater sample size (537), and therefore a smaller SE. It was also generally more in line with the average effect size estimate. Therefore, the observed mean ES and the mean of the posterior distribution are pretty much identical. This is also a fairly desirable feature.

The way these different methods are presented (regression, meta-analysis, ANOVA, …), it is quite easy for a beginner, like me, to lose sight of the forest for the trees. I also feel that this is a general experience for students of applied statistics: Every experiment, situation, and question results in a different statistical method (or worse: “Which test should I use?”), and the student doesn’t see how the methods relate to each other. So I think focusing on the (regression) model is key, but often overlooked in favor of this sort of decision tree model of choosing statistical methods (McElreath 2020).

Accordingly, I think we’ve ended up in a situation where meta-analysis, for example, is seen as somehow separate from all the other modeling we do, such as repeated measures t-tests. In fact I think applied statistics in Psychology may too often appear as an unconnected bunch of tricks and models, leading to confusion and inefficient implementation of appropriate methods.

As I’ve been learning more about statistics, I’ve often noticed that some technique, applied in a specific set of situations, turns out to be a special case of a more general modeling approach. I’ll call this approach here *Bayesian multilevel modeling* (McElreath 2020). If you are forced to choose one statistical method to learn, it should be Bayesian multilevel modeling, because it allows you to do and understand most things, and allows you to see how similar all these methods are, under the hood.

The following software packages were used in this blog post: R [Version 4.0.3; R Core Team (2020)] and the R-packages *brms* [Version 2.15.0; Bürkner (2017); Bürkner (2018)], *dplyr* [Version 1.0.5; Wickham et al. (2021)], *forcats* [Version 0.5.1; Wickham (2021a)], *ggdist* [Version 2.4.0; Kay (2021)], *ggplot2* [Version 3.3.3; Wickham (2016)], *kableExtra* [Version 1.3.4; Zhu (2021)], *knitr* [Version 1.31; Xie (2015)], *lme4* [Version 1.1.26; Bates et al. (2015)], *Matrix* [Version 1.3.2; Bates and Maechler (2021)], *metafor* [Version 2.4.0; Viechtbauer (2010)], *purrr* [Version 0.3.4; Henry and Wickham (2020)], *Rcpp* [Version 1.0.6; Eddelbuettel and François (2011); Eddelbuettel and Balamuta (2018)], *readr* [Version 1.4.0; Wickham and Hester (2020)], *scales* [Version 1.1.1; Wickham and Seidel (2020)], *stringr* [Version 1.4.0; Wickham (2019)], *tibble* [Version 3.1.0; Müller and Wickham (2021)], *tidybayes* [Version 2.3.1; Kay (2020)], *tidyr* [Version 1.1.3; Wickham (2021b)], and *tidyverse* [Version 1.3.0; Wickham et al. (2019)].

Bates, Douglas, and Martin Maechler. 2021. *Matrix: Sparse and Dense Matrix Classes and Methods*. https://CRAN.R-project.org/package=Matrix.

Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using lme4.” *Journal of Statistical Software* 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.

Bürkner, Paul-Christian. 2017. “brms: An R Package for Bayesian Multilevel Models Using Stan.” *Journal of Statistical Software* 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01.

———. 2018. “Advanced Bayesian Multilevel Modeling with the R Package brms.” *The R Journal* 10 (1): 395–411. https://doi.org/10.32614/RJ-2018-017.

Eddelbuettel, Dirk, and James Joseph Balamuta. 2018. “Extending extitR with extitC++: A Brief Introduction to extitRcpp.” *The American Statistician* 72 (1): 28–36. https://doi.org/10.1080/00031305.2017.1375990.

Eddelbuettel, Dirk, and Romain François. 2011. “Rcpp: Seamless R and C++ Integration.” *Journal of Statistical Software* 40 (8): 1–18. https://doi.org/10.18637/jss.v040.i08.

Henry, Lionel, and Hadley Wickham. 2020. *Purrr: Functional Programming Tools*. https://CRAN.R-project.org/package=purrr.

Kay, Matthew. 2020. *tidybayes: Tidy Data and Geoms for Bayesian Models*. https://doi.org/10.5281/zenodo.1308151.

———. 2021. *ggdist: Visualizations of Distributions and Uncertainty*. https://doi.org/10.5281/zenodo.3879620.

McElreath, Richard. 2020. *Statistical Rethinking: A Bayesian Course with Examples in R and Stan*. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.

Müller, Kirill, and Hadley Wickham. 2021. *Tibble: Simple Data Frames*. https://CRAN.R-project.org/package=tibble.

R Core Team. 2020. *R: A Language and Environment for Statistical Computing*. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Viechtbauer, Wolfgang. 2010. “Conducting Meta-Analyses in R with the metafor Package.” *Journal of Statistical Software* 36 (3): 1–48. https://www.jstatsoft.org/v36/i03/.

Wickham, Hadley. 2016. *Ggplot2: Elegant Graphics for Data Analysis*. Springer-Verlag New York. https://ggplot2.tidyverse.org.

———. 2019. *Stringr: Simple, Consistent Wrappers for Common String Operations*. https://CRAN.R-project.org/package=stringr.

———. 2021a. *Forcats: Tools for Working with Categorical Variables (Factors)*. https://CRAN.R-project.org/package=forcats.

———. 2021b. *Tidyr: Tidy Messy Data*. https://CRAN.R-project.org/package=tidyr.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” *Journal of Open Source Software* 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2021. *Dplyr: A Grammar of Data Manipulation*. https://CRAN.R-project.org/package=dplyr.

Wickham, Hadley, and Jim Hester. 2020. *Readr: Read Rectangular Text Data*. https://CRAN.R-project.org/package=readr.

Wickham, Hadley, and Dana Seidel. 2020. *Scales: Scale Functions for Visualization*. https://CRAN.R-project.org/package=scales.

Xie, Yihui. 2015. *Dynamic Documents with R and Knitr*. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.

Zhu, Hao. 2021. *kableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax*. https://CRAN.R-project.org/package=kableExtra.

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/mvuorre/mvuorre.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

For attribution, please cite this work as

Vuorre (2016, Sept. 29). Sometimes I R: Bayesian Meta-Analysis with R, Stan & brms. Retrieved from https://mvuorre.github.io/posts/2016-09-29-bayesian-meta-analysis/

BibTeX citation

@misc{vuorre2016bayesian, author = {Vuorre, Matti}, title = {Sometimes I R: Bayesian Meta-Analysis with R, Stan & brms}, url = {https://mvuorre.github.io/posts/2016-09-29-bayesian-meta-analysis/}, year = {2016} }