StudyID yearPub N_TG M_TG SD_TG N_CG M_CG SD_CG weeks
1 2011 88 104.41 8.65 88 97.08 7.71 13
2 2022 15 87.54 9.92 15 76.06 9.08 21
3 2008 87 6.14 0.53 85 5.59 0.52 18
4 2012 49 72.96 6.44 48 63.06 7.44 21
5 2009 43 43.49 3.88 43 40.74 3.85 16
6 2016 100 10.45 0.97 100 9.64 1.09 16
7 2008 113 92.64 7.95 113 87.87 7.45 21
8 2013 42 55.36 5.06 43 54.63 5.91 5
9 2019 34 3.27 0.28 33 2.98 0.24 15
10 2004 75 37.50 2.93 73 33.93 3.19 20
11 2021 101 36.83 3.08 101 33.64 2.96 19
12 2009 21 3.44 0.28 22 3.38 0.36 15
13 2020 110 95.21 9.86 109 84.69 9.93 22
14 2010 113 43.38 4.22 113 40.99 4.38 14
15 2009 104 25.81 2.45 104 24.10 2.97 10
16 2012 114 0.92 0.08 115 0.90 0.08 6
17 2022 104 37.72 3.32 103 34.64 3.27 7
18 2021 65 89.38 7.13 62 83.99 8.14 11
19 2011 113 12.10 1.01 114 11.29 1.18 15
20 2019 65 7.79 0.63 63 7.37 0.66 18
Meta-analysis of Cohen’s d - From structuring the dataset to (not actually so) basic analyses
Dataset for a meta-analysis of a Cohen’s d / Standardized Mean Difference
Here is an example of a small dataset for meta-analyzing a Standardized Mean Difference (Cohen’s d). Data from 20 studies is reported. This may represent on the efficacy of a treatment (for simplicity here considering only the post-test scores) or on any other case involving comparing mean scores. “yearPub” and “weeks” might be moderators of interest.
1. Compute the effect sizes
Typically, adequate estimates of Standardized Mean Differences are not already reported in the articles, so you have to compute them yourself. Luckily, they can be computed from descriptive statistics that are generally reported (N, M, SD).
Since it is very clear that different studies used completely different instruments (note the variability across descriptive statistics), effect size must be standardized. Also, since some studies present very small samples, Hedges’ correction of Cohen’s d (Hedges’ g is preferred). Luckily, the “measure="SMD"” argument in the “escalc” function of the “metafor” package already applies Hedges’ correction for small samples.
StudyID yearPub N_TG M_TG SD_TG N_CG M_CG SD_CG weeks yi vi
1 2011 88 104.41 8.65 88 97.08 7.71 13 0.8907 0.0250
2 2022 15 87.54 9.92 15 76.06 9.08 21 1.1746 0.1563
3 2008 87 6.14 0.53 85 5.59 0.52 18 1.0428 0.0264
4 2012 49 72.96 6.44 48 63.06 7.44 21 1.4126 0.0515
5 2009 43 43.49 3.88 43 40.74 3.85 16 0.7051 0.0494
6 2016 100 10.45 0.97 100 9.64 1.09 16 0.7821 0.0215
7 2008 113 92.64 7.95 113 87.87 7.45 21 0.6171 0.0185
8 2013 42 55.36 5.06 43 54.63 5.91 5 0.1314 0.0472
9 2019 34 3.27 0.28 33 2.98 0.24 15 1.0979 0.0687
10 2004 75 37.50 2.93 73 33.93 3.19 20 1.1603 0.0316
11 2021 101 36.83 3.08 101 33.64 2.96 19 1.0521 0.0225
12 2009 21 3.44 0.28 22 3.38 0.36 15 0.1821 0.0935
13 2020 110 95.21 9.86 109 84.69 9.93 22 1.0595 0.0208
14 2010 113 43.38 4.22 113 40.99 4.38 14 0.5539 0.0184
15 2009 104 25.81 2.45 104 24.10 2.97 10 0.6258 0.0202
16 2012 114 0.92 0.08 115 0.90 0.08 6 0.2492 0.0176
17 2022 104 37.72 3.32 103 34.64 3.27 7 0.9313 0.0214
18 2021 65 89.38 7.13 62 83.99 8.14 11 0.7013 0.0334
19 2011 113 12.10 1.01 114 11.29 1.18 15 0.7348 0.0188
20 2019 65 7.79 0.63 63 7.37 0.66 18 0.6473 0.0329
2. Intercept-only model
Traditionally, a first good step is to test the “intercept-only model”. This computes nothing but a meta-analytic “weighted average”, plus the estimated heterogeneity (unless you set “method="EE"” for a fixed-effects [Equal-Effects] model, which is generally suboptimal).
Just doing this is very simple:
fit0 = rma(yi, vi, data=df)
summary(fit0)
Random-Effects Model (k = 20; tau^2 estimator: REML)
logLik deviance AIC BIC AICc
-5.8247 11.6493 15.6493 17.5382 16.3993
tau^2 (estimated amount of total heterogeneity): 0.0655 (SE = 0.0315)
tau (square root of estimated tau^2 value): 0.2558
I^2 (total heterogeneity / total variability): 69.87%
H^2 (total variability / sampling variability): 3.32
Test for Heterogeneity:
Q(df = 19) = 60.9459, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
0.7799 0.0703 11.0920 <.0001 0.6421 0.9177 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Any comment about…
… the overall mean effect?
… the heterogeneity?
Lastly, let’s have a look at the forest plot:
par(mar=c(5,0,0,0))
forest(fit0, header=T)3. Testing moderator(s)
Would you proceed to test moderators in this case? why?
Let’s fit a few alternative models with moderators. Practically, “moderators” are just (linear) predictors of the effect size. They are generally called “moderators” rather than “predictors” because the dependent variable is an effect size, not a score (if you “predict” a SMD you are practically saying that this predictor “moderates” the effect of Group -> Score).
These may be some alternative models including moderators:
fit1 = rma(yi, vi, mods=~weeks+yearPub, data=df)
fit2 = rma(yi, vi, mods=~yearPub, data=df)
fit3 = rma(yi, vi, mods=~weeks, data=df)First, let’s see whether adding both moderators “in block” improves the model (compared to the original intercept-only model). We need to set “refit=TRUE” because only models fitted with Maximum Likelihood can be directly compared, while models fitted with Restricted Maximum Likelihood (which is the default) cannot be directly compared.
anova(fit0, fit1, refit=T)
df AIC BIC AICc logLik LRT pval QE tau^2
Full 4 6.4718 10.4547 9.1385 0.7641 33.4510 0.0185
Reduced 2 15.1431 17.1345 15.8490 -5.5715 12.6713 0.0018 60.9459 0.0599
R^2
Full
Reduced 69.1291%
“Full” model means the one with more parameters (i.e., the one including the predictors/moderators). “Reduced” model measn the one with less parameters (i.e., the one including no predictors/moderators, or fewer of them).
After having established that entering moderators contributes to meaningfully explain heterogeneity (BTW, how much?), we can see whether each of them is “significant” when taken alone. So, we separately remove them from the “full” model. IMPORTANT: this procedure is just one among many possible alternatives! You may proceed to test moderators in the exact same way as you would proceed when testing predictors in any linear regression models!
anova(fit1, fit2, refit=T)
df AIC BIC AICc logLik LRT pval QE tau^2
Full 4 6.4718 10.4547 9.1385 0.7641 33.4510 0.0185
Reduced 3 16.2224 19.2096 17.7224 -5.1112 11.7506 0.0006 57.4867 0.0561
R^2
Full
Reduced 66.9883%
In the above comparison, we tested “fit1” (with both “weeks” and “yearPub”) against “fit2” (with only “yearPub”). Any comment?
anova(fit1, fit3, refit=T)
df AIC BIC AICc logLik LRT pval QE tau^2 R^2
Full 4 6.4718 10.4547 9.1385 0.7641 33.4510 0.0185
Reduced 3 6.8112 9.7984 8.3112 -0.4056 2.3394 0.1261 38.3251 0.0252 26.5672%
In the above comparison, we tested “fit1” (with both “weeks” and “yearPub”) against “fit3” (with only “weeks”). Any comment?
Now, let’s see the summary of the final “best-fitting” model (note that some parameters may differ from those shown via the “anova” comparisons, because those were fitted via “ML”, while the original model is fitted via “REML”):
summary(fit3)
Mixed-Effects Model (k = 20; tau^2 estimator: REML)
logLik deviance AIC BIC AICc
-1.3988 2.7975 8.7975 11.4686 10.5118
tau^2 (estimated amount of residual heterogeneity): 0.0318 (SE = 0.0206)
tau (square root of estimated tau^2 value): 0.1783
I^2 (residual heterogeneity / unaccounted variability): 52.62%
H^2 (unaccounted variability / sampling variability): 2.11
R^2 (amount of heterogeneity accounted for): 51.45%
Test for Residual Heterogeneity:
QE(df = 18) = 38.3251, p-val = 0.0035
Test of Moderators (coefficient 2):
QM(df = 1) = 11.6179, p-val = 0.0007
Model Results:
estimate se zval pval ci.lb ci.ub
intrcpt 0.2166 0.1737 1.2465 0.2126 -0.1240 0.5571
weeks 0.0376 0.0110 3.4085 0.0007 0.0160 0.0593 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Any comment about…
… the overall mean effect? (is it still there?!)
… the meaning of the “weeks” coefficient?
… the (residual) heterogeneity?
If your model is not too complex and especially if you moderators are quantitative, showing the regression plot might be a good idea:
regplot(fit3)
And what about the forest plot now?
par(mar=c(5,0,0,0))
forest(fit3, header=T)