set.seed(0)
library(effsize)
n1 = 40
n2 = 40
true_d = 0.50
niter = 20000
observed_d = rep(NA, niter)
for(i in 1:niter){
x1 = rnorm(n=n1, mean=0, sd=1)
x2 = rnorm(n=n2, mean=0, sd=1) + true_d
observed_d[i] = cohen.d(x2, x1)$estimate
}Understanding effect size variance
What is the “variance” of an effect size?
In meta-analysis, we “combine” (or “average”) many effect sizes from different studies. Effect sizes alone are not sufficient for a proper meta-analysis. Each effect size must have its own variance. But what is it?
Variance of an effect size quantifies the amount of uncertainty associated with that particular estimate. For example, a study with a very large sample will provide a very precise effect size (i.e., an effect size with a small variance) while a study with a small sample will provide an effect size associated with a lot of uncertainty (low precision), so with a larger variance.
Knowing the variance of each effect size is crucial for the “averaging” process that the meta-analysis itself implies. That is, we want effect sizes coming from larger samples being weighted more than effect sizes coming from smaller samples.
Generally, computing the variance of an effect size implies knowing the “n” (sample size) of the study. The effect size itself or the metric of the variables being used may affect the effect size variance, but when standardized effect sizes are used this is true to a limited extent.
Here are two formulas for the variance of common effect sizes:
Variance of a Cohen’s d: \(Var(d) = (n_1+n_2)/(n_1*n_2) + d^2/(2*(n_1+n_2))\)
Variance of a Fisher’s z (i.e., transformed correlation coefficient): \(Var(z) = 1/(n-3)\)
Intuitively, a larger sample is associated with a smaller variance
Let’s do a simple Monte Carlo simulation. We set a true Cohen’s d = 0.50. In this first example, n1 = 40, n2 = 40. The study is re-simulated iteratively for 10000 times.
- Let’s look at the first 20 results:
observed_d[1:20] [1] 0.3447162 0.2665318 0.5238096 0.5073771 0.4243554 0.3873372 0.1961296
[8] 0.6751543 0.6704280 0.7806759 0.5339568 0.3513478 0.2288129 0.3949171
[15] 0.5179228 0.5158888 0.6785557 0.4505741 0.0721020 0.7408895
- Let’s look at the whole distribution:
library(ggplot2)
ggplot()+
geom_vline(xintercept=c(0,0.5),size=1,linetype=2,color="darkgray")+
geom_histogram(aes(x=observed_d,y=after_stat(density)),fill="blue",alpha=.5,binwidth=.05)+
theme(text=element_text(size=20))+
scale_x_continuous(limits=c(-0.5,1.5),breaks=seq(-0.5,1.5,0.25))- Unsurprisingly, the mean observed Cohen’s d (over 10000 iterations) is extremely close to the “ground truth” Cohen’s d (the more iterations, the closer it will be):
mean(observed_d)[1] 0.503879
- And now the crucial point! the standard deviation of the distribution of Cohen’s d is its standard error, and the squared standard deviation is the variance (which we are interested in):
sd(observed_d)[1] 0.2288063
sd(observed_d)^2[1] 0.05235233
var(observed_d)[1] 0.05235233
Remember that \(Var(d) = (n_1+n_2)/(n_1*n_2) + d^2/(2*(n_1+n_2))\), so let’s apply the formula. Indeed, not exactly the same, but very close:
(n1 + n2) / (n1 * n2) + true_d^2 / (2 * (n1 + n2))[1] 0.0515625
Now let’s do a second Monte Carlo simulation. Everything is the same, but now n1 = 400, n2 = 400:
set.seed(0)
library(effsize)
n1 = 400
n2 = 400
true_d = 0.50
niter = 20000
observed_d = rep(NA, niter)
for(i in 1:niter){
x1 = rnorm(n=n1, mean=0, sd=1)
x2 = rnorm(n=n2, mean=0, sd=1) + true_d
observed_d[i] = cohen.d(x2, x1)$estimate
}
mean(observed_d)[1] 0.5016131
ggplot()+
geom_vline(xintercept=c(0,0.5),size=1,linetype=2,color="darkgray")+
geom_histogram(aes(x=observed_d,y=after_stat(density)),fill="blue",alpha=.5,binwidth=.05)+
theme(text=element_text(size=20))+
scale_x_continuous(limits=c(-0.5,1.5),breaks=seq(-0.5,1.5,0.25))sd(observed_d)[1] 0.07144863
var(observed_d)[1] 0.005104907
(n1 + n2) / (n1 * n2) + true_d^2 / (2 * (n1 + n2))[1] 0.00515625
In conclusion, know that the “effect size” provided by a study is just an estimate. The larger the sample, the more precise the estimate. The “variance” of an effect size gives you information about the precision of the estimate (smaller variance is better).
Knowing the variance of an effect size, you know its standard error, and thus you may compute its 95% confidence interval (forest plots are built in this way!).