StudyID yearPub SampleID EffID corr n
1 2017 1_1 1 0.547 171
2 2024 2_1 2 0.180 452
2 2024 2_2 3 0.299 245
3 2017 3_1 4 0.183 399
4 2019 4_1 5 0.120 445
5 2019 5_1 6 0.100 230
5 2019 5_2 7 0.027 69
6 2017 6_1 8 -0.167 194
7 2014 7_1 9 0.245 373
7 2014 7_2 10 0.321 195
7 2014 7_3 11 0.146 330
7 2014 7_4 12 0.259 427
8 2016 8_1 13 0.155 434
8 2016 8_2 14 0.170 220
9 2010 9_1 15 -0.266 215
Structuring a dataset for a meta-analysis
Dataset for a meta-analysis of a correlation coefficient
Here is an example of a small meta-analysis dataset coding effect sizes (correlations) from 9 studies encompassing a total of 15 samples.
Little but extremely common complication: Some studies include more than one sample. Luckily, there is a single effect size for each sample (so SampleID and EffID are practically redundant here). If there were multiple effect sizes for the same sample, you might consider preliminarily aggregating them (unless to-be-aggregated effects are associated with different levels of a moderator of interest, in which case you should keep them separate and add an additional level in a multilevel model; note that this complicates everything a bit because you need to account for correlated sampling error when entering multiple effect sizes from a same group of subjects).
- Step 1: Converting all correlations to Fisher’s z (good for normalizing the distribution, making the variances independent from the effect size, and simplifying subsequent computations):
library(psych)
df$eff = fisherz(df$corr) StudyID yearPub SampleID EffID corr n eff
1 2017 1_1 1 0.547 171 0.61409036
2 2024 2_1 2 0.180 452 0.18198269
2 2024 2_2 3 0.299 245 0.30842106
3 2017 3_1 4 0.183 399 0.18508488
4 2019 4_1 5 0.120 445 0.12058103
5 2019 5_1 6 0.100 230 0.10033535
5 2019 5_2 7 0.027 69 0.02700656
6 2017 6_1 8 -0.167 194 -0.16857900
7 2014 7_1 9 0.245 373 0.25008653
7 2014 7_2 10 0.321 195 0.33276159
7 2014 7_3 11 0.146 330 0.14705085
7 2014 7_4 12 0.259 427 0.26503620
8 2016 8_1 13 0.155 434 0.15625950
8 2016 8_2 14 0.170 220 0.17166666
9 2010 9_1 15 -0.266 215 -0.27255429
- Step 2: Compute the effect size variances using the analytical formula:
library(psych)
df$vi = 1 / (df$n - 3) StudyID yearPub SampleID EffID corr n eff vi
1 2017 1_1 1 0.547 171 0.61409036 0.005952381
2 2024 2_1 2 0.180 452 0.18198269 0.002227171
2 2024 2_2 3 0.299 245 0.30842106 0.004132231
3 2017 3_1 4 0.183 399 0.18508488 0.002525253
4 2019 4_1 5 0.120 445 0.12058103 0.002262443
5 2019 5_1 6 0.100 230 0.10033535 0.004405286
5 2019 5_2 7 0.027 69 0.02700656 0.015151515
6 2017 6_1 8 -0.167 194 -0.16857900 0.005235602
7 2014 7_1 9 0.245 373 0.25008653 0.002702703
7 2014 7_2 10 0.321 195 0.33276159 0.005208333
7 2014 7_3 11 0.146 330 0.14705085 0.003058104
7 2014 7_4 12 0.259 427 0.26503620 0.002358491
8 2016 8_1 13 0.155 434 0.15625950 0.002320186
8 2016 8_2 14 0.170 220 0.17166666 0.004608295
9 2010 9_1 15 -0.266 215 -0.27255429 0.004716981
Actually, you could directly use a built-in function named “escalc” from the “metafor” package itself:
library(metafor)
escalc(measure="ZCOR", ri=corr, ni=n, data=df2)
StudyID yearPub SampleID EffID corr n yi vi
1 1 2017 1_1 1 0.547 171 0.6141 0.0060
2 2 2024 2_1 2 0.180 452 0.1820 0.0022
3 2 2024 2_2 3 0.299 245 0.3084 0.0041
4 3 2017 3_1 4 0.183 399 0.1851 0.0025
5 4 2019 4_1 5 0.120 445 0.1206 0.0023
6 5 2019 5_1 6 0.100 230 0.1003 0.0044
7 5 2019 5_2 7 0.027 69 0.0270 0.0152
8 6 2017 6_1 8 -0.167 194 -0.1686 0.0052
9 7 2014 7_1 9 0.245 373 0.2501 0.0027
10 7 2014 7_2 10 0.321 195 0.3328 0.0052
11 7 2014 7_3 11 0.146 330 0.1471 0.0031
12 7 2014 7_4 12 0.259 427 0.2650 0.0024
13 8 2016 8_1 13 0.155 434 0.1563 0.0023
14 8 2016 8_2 14 0.170 220 0.1717 0.0046
15 9 2010 9_1 15 -0.266 215 -0.2726 0.0047
Fitting the meta-analytic model
Now, let’s proceed with a simple meta-analytic model… well, not so simple, since it’s random-effects and multilevel!
fit = rma.mv(eff, vi, random =~ 1|StudyID/SampleID, data = df)
summary(fit)
Multivariate Meta-Analysis Model (k = 15; method: REML)
logLik Deviance AIC BIC AICc
5.7280 -11.4560 -5.4560 -3.5389 -3.0560
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0557 0.2360 9 no StudyID
sigma^2.2 0.0011 0.0334 15 no StudyID/SampleID
Test for Heterogeneity:
Q(df = 14) = 116.4132, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
0.1340 0.0814 1.6471 0.0995 -0.0255 0.2935 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Heterogeneity across effect sizes is quantified by the two “sigma”. The first “sigma” represents the estimated heterogeneity across studies, often reported as \(\tau\), the second “sigma” represents the estimated heterogeneity across samples/effect sizes (within studies), often reported as \(\omega\).
Forest plot
Now let’s have a look at the forest plot:
forest(fit,
slab=paste0("Study",StudyID," - Sample",SampleID),
header=T,
transf=psych::fisherz2r,
xlab="Pearson's r")Funnel plot
Routinely, funnel plot is reported for having a look at the distribution of effect sizes while simultaneously considering their variances. This may be the basis for applying methods for checking for publication bias, such as the “trim-and-fill” method… but note that trim-and-fill should NOT be used for multilevel models (as in this case) because it fails to account for the real (multilevel) structure of dependencies across effect sizes.
funnel(fit,
xlab="Fisher's z")Here is a better version of the funnel plot with a meta-regression that takes into account the multilevel structure of the data:
# devtools::install_github("enricotoffalini/toffee")
library(ggplot2)
toffee::funnelT(fit, petpeese="petpeese")[1] Based on the pet-peese procedure, the method used was: pet
[1] The p-value for the pet regression was p = 0.6611
[1] The bias-adjusted effect size was d = 0.08 [95% CI: -0.19, 0.36], p = 0.5495
Meta-regression, just for fun
Let’s add publication year (numerical) as a possible moderator:
fit1 = rma.mv(eff, vi, random =~ 1|StudyID/SampleID, mods =~ yearPub, data = df)
summary(fit1)
Multivariate Meta-Analysis Model (k = 15; method: REML)
logLik Deviance AIC BIC AICc
6.1301 -12.2601 -4.2601 -2.0003 0.7399
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0541 0.2325 9 no StudyID
sigma^2.2 0.0011 0.0336 15 no StudyID/SampleID
Test for Residual Heterogeneity:
QE(df = 13) = 111.1710, p-val < .0001
Test of Moderators (coefficient 2):
QM(df = 1) = 1.2922, p-val = 0.2556
Model Results:
estimate se zval pval ci.lb ci.ub
intrcpt -51.0046 44.9865 -1.1338 0.2569 -139.1765 37.1672
yearPub 0.0254 0.0223 1.1368 0.2556 -0.0184 0.0691
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Comments? What would you say about:
Have the heterogeneity estimates (“
sigma”) changed? (if moderators are useful, heterogeneity should be reduced)How do you establish the relevance of the moderators?
How do you interpret the coefficients in “
Model Results”?
Finally, here is the easiest way for plotting meta-regression:
regplot(fit1)