Structuring a dataset for a meta-analysis

Author

Enrico Toffalini

Dataset for a meta-analysis of a correlation coefficient

Here is an example of a small meta-analysis dataset coding effect sizes (correlations) from 9 studies encompassing a total of 15 samples.

 StudyID yearPub SampleID EffID   corr   n
       1    2017      1_1     1  0.547 171
       2    2024      2_1     2  0.180 452
       2    2024      2_2     3  0.299 245
       3    2017      3_1     4  0.183 399
       4    2019      4_1     5  0.120 445
       5    2019      5_1     6  0.100 230
       5    2019      5_2     7  0.027  69
       6    2017      6_1     8 -0.167 194
       7    2014      7_1     9  0.245 373
       7    2014      7_2    10  0.321 195
       7    2014      7_3    11  0.146 330
       7    2014      7_4    12  0.259 427
       8    2016      8_1    13  0.155 434
       8    2016      8_2    14  0.170 220
       9    2010      9_1    15 -0.266 215

Little but extremely common complication: Some studies include more than one sample. Luckily, there is a single effect size for each sample (so SampleID and EffID are practically redundant here). If there were multiple effect sizes for the same sample, you might consider preliminarily aggregating them (unless to-be-aggregated effects are associated with different levels of a moderator of interest, in which case you should keep them separate and add an additional level in a multilevel model; note that this complicates everything a bit because you need to account for correlated sampling error when entering multiple effect sizes from a same group of subjects).

Step 1: Converting all correlations to Fisher’s z (good for normalizing the distribution, making the variances independent from the effect size, and simplifying subsequent computations):

library(psych)
df$eff = fisherz(df$corr)

 StudyID yearPub SampleID EffID   corr   n         eff
       1    2017      1_1     1  0.547 171  0.61409036
       2    2024      2_1     2  0.180 452  0.18198269
       2    2024      2_2     3  0.299 245  0.30842106
       3    2017      3_1     4  0.183 399  0.18508488
       4    2019      4_1     5  0.120 445  0.12058103
       5    2019      5_1     6  0.100 230  0.10033535
       5    2019      5_2     7  0.027  69  0.02700656
       6    2017      6_1     8 -0.167 194 -0.16857900
       7    2014      7_1     9  0.245 373  0.25008653
       7    2014      7_2    10  0.321 195  0.33276159
       7    2014      7_3    11  0.146 330  0.14705085
       7    2014      7_4    12  0.259 427  0.26503620
       8    2016      8_1    13  0.155 434  0.15625950
       8    2016      8_2    14  0.170 220  0.17166666
       9    2010      9_1    15 -0.266 215 -0.27255429

Step 2: Compute the effect size variances using the analytical formula:

library(psych)
df$vi = 1 / (df$n - 3)

 StudyID yearPub SampleID EffID   corr   n         eff          vi
       1    2017      1_1     1  0.547 171  0.61409036 0.005952381
       2    2024      2_1     2  0.180 452  0.18198269 0.002227171
       2    2024      2_2     3  0.299 245  0.30842106 0.004132231
       3    2017      3_1     4  0.183 399  0.18508488 0.002525253
       4    2019      4_1     5  0.120 445  0.12058103 0.002262443
       5    2019      5_1     6  0.100 230  0.10033535 0.004405286
       5    2019      5_2     7  0.027  69  0.02700656 0.015151515
       6    2017      6_1     8 -0.167 194 -0.16857900 0.005235602
       7    2014      7_1     9  0.245 373  0.25008653 0.002702703
       7    2014      7_2    10  0.321 195  0.33276159 0.005208333
       7    2014      7_3    11  0.146 330  0.14705085 0.003058104
       7    2014      7_4    12  0.259 427  0.26503620 0.002358491
       8    2016      8_1    13  0.155 434  0.15625950 0.002320186
       8    2016      8_2    14  0.170 220  0.17166666 0.004608295
       9    2010      9_1    15 -0.266 215 -0.27255429 0.004716981

Actually, you could directly use a built-in function named “escalc” from the “metafor” package itself:

library(metafor)
escalc(measure="ZCOR", ri=corr, ni=n, data=df2)


   StudyID yearPub SampleID EffID   corr   n      yi     vi 
1        1    2017      1_1     1  0.547 171  0.6141 0.0060 
2        2    2024      2_1     2  0.180 452  0.1820 0.0022 
3        2    2024      2_2     3  0.299 245  0.3084 0.0041 
4        3    2017      3_1     4  0.183 399  0.1851 0.0025 
5        4    2019      4_1     5  0.120 445  0.1206 0.0023 
6        5    2019      5_1     6  0.100 230  0.1003 0.0044 
7        5    2019      5_2     7  0.027  69  0.0270 0.0152 
8        6    2017      6_1     8 -0.167 194 -0.1686 0.0052 
9        7    2014      7_1     9  0.245 373  0.2501 0.0027 
10       7    2014      7_2    10  0.321 195  0.3328 0.0052 
11       7    2014      7_3    11  0.146 330  0.1471 0.0031 
12       7    2014      7_4    12  0.259 427  0.2650 0.0024 
13       8    2016      8_1    13  0.155 434  0.1563 0.0023 
14       8    2016      8_2    14  0.170 220  0.1717 0.0046 
15       9    2010      9_1    15 -0.266 215 -0.2726 0.0047

Fitting the meta-analytic model

Now, let’s proceed with a simple meta-analytic model… well, not so simple, since it’s random-effects and multilevel!

fit = rma.mv(eff, vi, random =~ 1|StudyID/SampleID, data = df)
summary(fit)


Multivariate Meta-Analysis Model (k = 15; method: REML)

  logLik  Deviance       AIC       BIC      AICc   
  5.7280  -11.4560   -5.4560   -3.5389   -3.0560   

Variance Components:

            estim    sqrt  nlvls  fixed            factor 
sigma^2.1  0.0557  0.2360      9     no           StudyID 
sigma^2.2  0.0011  0.0334     15     no  StudyID/SampleID 

Test for Heterogeneity:
Q(df = 14) = 116.4132, p-val < .0001

Model Results:

estimate      se    zval    pval    ci.lb   ci.ub    
  0.1340  0.0814  1.6471  0.0995  -0.0255  0.2935  . 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Heterogeneity across effect sizes is quantified by the two “sigma”. The first “sigma” represents the estimated heterogeneity across studies, often reported as \(\tau\), the second “sigma” represents the estimated heterogeneity across samples/effect sizes (within studies), often reported as \(\omega\).

Forest plot

Now let’s have a look at the forest plot:

forest(fit, 
       slab=paste0("Study",StudyID," - Sample",SampleID), 
       header=T,
       transf=psych::fisherz2r,
       xlab="Pearson's r")

Funnel plot

Routinely, funnel plot is reported for having a look at the distribution of effect sizes while simultaneously considering their variances. This may be the basis for applying methods for checking for publication bias, such as the “trim-and-fill” method… but note that trim-and-fill should NOT be used for multilevel models (as in this case) because it fails to account for the real (multilevel) structure of dependencies across effect sizes.

funnel(fit,
       xlab="Fisher's z")

Here is a better version of the funnel plot with a meta-regression that takes into account the multilevel structure of the data:

# devtools::install_github("enricotoffalini/toffee")
library(ggplot2)
toffee::funnelT(fit, petpeese="petpeese")

[1] Based on the pet-peese procedure, the method used was: pet
[1] The p-value for the pet regression was p = 0.6611
[1] The bias-adjusted effect size was d = 0.08 [95% CI: -0.19, 0.36], p = 0.5495

Meta-regression, just for fun

Let’s add publication year (numerical) as a possible moderator:

fit1 = rma.mv(eff, vi, random =~ 1|StudyID/SampleID, mods =~ yearPub, data = df)
summary(fit1)


Multivariate Meta-Analysis Model (k = 15; method: REML)

  logLik  Deviance       AIC       BIC      AICc   
  6.1301  -12.2601   -4.2601   -2.0003    0.7399   

Variance Components:

            estim    sqrt  nlvls  fixed            factor 
sigma^2.1  0.0541  0.2325      9     no           StudyID 
sigma^2.2  0.0011  0.0336     15     no  StudyID/SampleID 

Test for Residual Heterogeneity:
QE(df = 13) = 111.1710, p-val < .0001

Test of Moderators (coefficient 2):
QM(df = 1) = 1.2922, p-val = 0.2556

Model Results:

         estimate       se     zval    pval      ci.lb    ci.ub    
intrcpt  -51.0046  44.9865  -1.1338  0.2569  -139.1765  37.1672    
yearPub    0.0254   0.0223   1.1368  0.2556    -0.0184   0.0691    

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comments? What would you say about:

Have the heterogeneity estimates (“sigma”) changed? (if moderators are useful, heterogeneity should be reduced)
How do you establish the relevance of the moderators?
How do you interpret the coefficients in “Model Results”?

Finally, here is the easiest way for plotting meta-regression:

regplot(fit1)