Logo

Exercises – Monte Carlo Simulations for Statistical Power

Basics of R for Data Science

Scenario

Performing Monte Carlo simulations for statistical power is much more than just computing power. It is about building a complete research scenario, clarifying all sources of variability, defining the key parameters, and writing explicit code for the intended analysis. It gives you an insight that G*Power and other similar point-and-click software will never give you.

Simulating helps you:

  • Understand the data-generating process (super important! especially when fitting models),
  • Understand the general logic behind power analysis,
  • Write pre-registered, transparent analyses for fully reproducible research.

We will start from the simplest scenario: a t-test (actually not something for which you actually need a simulation, and not something you would normally choose for a pre-registration, but that’s for learning), and then move toward correlation and regression models.

We will use for loops as the primary method because they are explicit and easy to read
However, we will also show a compact alternative using the replicate() function, a member of the apply family, that performs the same task more concisely once you’re comfortable with functional programming.


Part 1 – Power of a t-Test via Monte Carlo Simulation

In this exercise, we will simulate the statistical power of a two-sample t-test for a given effect size (d) and sample size (N). Although power can be computed analytically using pwr.t.test() from the pwr package, we will replicate the process using simulation to better understand what happens “under the hood”.

Step 1: Setup

# Define parameters
d  = 0.5        # standardized mean difference (Cohen's d)
N  = 50         # sample size per group
nsim = 5000     # number of simulations (reduce for speed if needed)
alpha = 0.05    # significance threshold

# Initialize a vector to store p-values
p_values = rep(NA, nsim)

Step 2: Run the Simulation (for loop)

Each iteration: 1. simulates two groups; 2. performs a t-test; 3. stores the resulting p-value.

set.seed(0) # set a seed for reproducibility

for (i in 1:nsim) {
  group1 = rnorm(N, mean = 0, sd = 1)
  group2 = rnorm(N, mean = d, sd = 1)
  
  test = t.test(group1, group2)
  
  p_values[i] = test$p.value
}

Step 3: Estimate Power

Power is simply the proportion of simulations where the null hypothesis is rejected (p < α).

simulated_power = mean(p_values < alpha)
simulated_power
[1] 0.6974

Step 4: Compare with Analytical Solution

# install.packages("pwr")
library(pwr)

analytic_power = pwr.t.test(d = d, n = N, sig.level = alpha, type = "two.sample")$power
analytic_power
[1] 0.6968934

Question: Are the two power estimates close?
Try increasing nsim to improve precision.


Step 5 (Optional): Rewriting with replicate()

The same simulation can be written more compactly using **replicate()**, which repeatedly applies a function to a sequence (here, 1 tonsim`).

set.seed(0) # set a seed for reproducibility

simulate_t_p = function(N, d) {
  g1 = rnorm(N, 0, 1)
  g2 = rnorm(N, d, 1)
  
  return(t.test(g1, g2)$p.value)
}

p_values_replicate = replicate(nsim, simulate_t_p(N, d))
mean(p_values_replicate < alpha)
[1] 0.6974

replicate() performs the same job as the for loop but is designed specifically for repeated simulations; replicate() is actually just a wrapper of sapply() for repeating the exact same expression (usually with just different random numbers, like in our case).


Part 2 – Power Curve for Different Effect Sizes and Sample Sizes

Next, we will compute power for multiple combinations of d (effect size) and N (sample size).

Step 1: Define Parameter Grid

# Define parameters
d_values = c(0.1, 0.3, 0.5)
N_values = c(50, 100, 200, 300, 500)
nsim = 1000
alpha = 0.05

# Prepare all combinations of parameters (d and N)
param_grid = expand.grid(d = d_values, N = N_values)
param_grid
     d   N
1  0.1  50
2  0.3  50
3  0.5  50
4  0.1 100
5  0.3 100
6  0.5 100
7  0.1 200
8  0.3 200
9  0.5 200
10 0.1 300
11 0.3 300
12 0.5 300
13 0.1 500
14 0.3 500
15 0.5 500
# Initialize dataframe of results (best practice)
results = cbind(param_grid, power=NA)

Step 2: Compute Power for Each Combination (nested for)

For computational simplicity, we will use fewer simulations per condition (e.g., 1000).

set.seed(0) # set a seed for reproducibility

for (i in 1:nrow(results)) {
  d  = results$d[i]
  N  = results$N[i]
  p_values = rep(NA, nsim)
  for (j in 1:nsim) {
    group1 = rnorm(N, mean = 0, sd = 1)
    group2 = rnorm(N, mean = d, sd = 1)
    p_values[j] = t.test(group1, group2)$p.value
  }
  power_est = mean(p_values < alpha)
  results$power[i] = power_est
}
results
     d   N power
1  0.1  50 0.065
2  0.3  50 0.303
3  0.5  50 0.694
4  0.1 100 0.116
5  0.3 100 0.573
6  0.5 100 0.949
7  0.1 200 0.185
8  0.3 200 0.841
9  0.5 200 0.997
10 0.1 300 0.235
11 0.3 300 0.961
12 0.5 300 1.000
13 0.1 500 0.342
14 0.3 500 0.998
15 0.5 500 1.000

Step 3: Plot the Power Curves

library(ggplot2)

ggplot(results, aes(x = N, y = power, color = factor(d))) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 2) +
  labs(
    title = "Power Curves for Two-Sample t-Test",
    x = "Sample Size per Group (N)",
    y = "Estimated Power",
    color = "Effect Size (d)"
  ) +
  theme_minimal(base_size = 14)

Question: For what combination of N and d does power exceed 0.80?


Step 4 (Optional): Using replicate() (and mapply()) Instead of Loops

You can achieve the same results using functions of the *apply() family and the support of a custom function… This might look less readable if you are not used to it, but an approach like this will become more convenient and efficient if you need to run computationally heavy simulations, and especially if you want to parallelize.

power_once = function(N, d, nsim = 1000, alpha = 0.05) {
  pvals = replicate(nsim, {
    g1 = rnorm(N, 0, 1)
    g2 = rnorm(N, d, 1)
    t.test(g1, g2)$p.value
  })
  return(mean(pvals < alpha))
}

results = data.frame(
  d = param_grid$d,
  N = param_grid$N,
  power = mapply(power_once, param_grid$N, param_grid$d,
                 nsim = nsim, alpha = alpha)
)

Part 3 – Type M Error for Correlation Tests

When power is low, statistically significant correlations tend to overestimate the true effect size.
This phenomenon is called the Type M (Magnitude) Error.

We will simulate correlations and assess how much the significant results overestimate the true correlation r.

Step 1: Setup

# Recommended package
# install.packages("MASS")

library(MASS)

# Set parameters
N = 100
true_r = 0.15
nsim = 3000
alpha = 0.05

# Initialize results vectors (best practice)
p_values = rep(NA, nsim)
r_observed = rep(NA, nsim)

Step 2: Simulation Loop (Primary: for)

set.seed(0) # set a seed for reproducibility

for (i in 1:nsim) {
  Sigma = matrix(c(1, true_r, 
                   true_r, 1), nrow = 2)
  data = MASS::mvrnorm(N, mu = c(0, 0), Sigma = Sigma)
  
  test = cor.test(data[,1], data[,2])
  p_values[i] = test$p.value
  r_observed[i] = test$estimate
}

# Compute power and Type M error
power_est = mean(p_values < alpha)
typeM = mean(r_observed[p_values < alpha]) / true_r

power_est
[1] 0.3126667
typeM
[1] 1.71148

Interpretation:
If Type M = 1.5, significant results overestimate the true correlation by 50%.


Step 3 (Optional): Using replicate() for the Same Simulation

simulate_cor_once = function(N, true_r) {
  Sigma = matrix(c(1, true_r, true_r, 1), nrow = 2)
  XY = MASS::mvrnorm(N, mu = c(0, 0), Sigma = Sigma)
  test = cor.test(XY[,1], XY[,2])
  return(c(p = test$p.value, r_obs = unname(test$estimate)))
}

# Apply the simulation nsim times using replicate()
out = replicate(nsim, simulate_cor_once(N, true_r))

# Extract results (rows: p-values, observed r)
power_est = mean(out["p", ] < alpha)
typeM = mean(out["r_obs", out["p", ] < alpha]) / true_r

power_est
typeM

Part 4 – Power for a Simple Linear Regression

Now we extend the simulation idea to a linear regression model.

We will simulate data following:

\[ y = b0 + b1 \cdot x + \epsilon \]

where \(\epsilon \sim Normal(0, 1)\)

Step 1: Setup

N = 80

b0 = 0    # intercept, not of interest now
b1 = 0.25 # regression coefficient (true effect)
sigma = 1 # std.dev. of residuals/epsilon

nsim = 3000
alpha = 0.05

p_values = rep(NA, nsim)

Step 2: Simulation (Primary: for)

set.seed(0) # set a seed for reproducibility

for (i in 1:nsim) {
  x = rnorm(N)
  y = b0 + b1 * x + rnorm(N,0,sigma)
  fit = lm(y ~ x)
  p_values[i] = summary(fit)$coefficients["x", "Pr(>|t|)"]
}

power_est = mean(p_values < alpha)
power_est
[1] 0.5876667

Step 3 (Optional): Same with replicate()

simulate_lm = function(N, b1) {
  x = rnorm(N)
  y = b0 + b1 * x + rnorm(N, 0, sigma)
  fit = lm(y ~ x)
  pval = summary(fit)$coefficients["x", "Pr(>|t|)"]
  return(pval)
}

p_values = replicate(nsim, simulate_lm(N, b1))
mean(p_values < alpha)

Questions:

  • How does increasing b1 affect power?
  • What if you double the sample size?
  • What if you increase or reduce sigma?

What Could You Do Next?

  • Extend the above t.test simulations to paired t-tests or one-sample t-tests.
  • Explore how power changes with different alpha levels (e.g., 0.01, 0.10) or with various p-value adjustments (p.adjust()) for multiple comparisons.
  • Visualize the distribution of p-values or observed effect sizes across all iterations.
  • Understand how increasing the number of iterations nsim make your power simulation more precise.
  • Try power simulation for logistic regression (e.g., plogis() for generating probabilities from a linear term, rbinom() for generating binomial data, and glm(y ~ x, data=df, family=binomial(link="logit")) for fitting models).
Simulations can be computationally heavy

Simulations with many iterations or complex models (like generalized mixed-effects models, SEM, computational models) can take a long time to run. If time is very long, start small (e.g., 5-10 iterations) to test your code and then increase to improve precision. Plus, learning how to parallelize computation (e.g., with the parallel() package and its mclapply() function).