Fitting a model with two main (non-interacting) effects but an incorrect link function increases the risk of detecting a false interaction between them.
Is it a relevant problem?
The issue is fundamentally about power: the probability of detecting a false interaction is about equal to the power for that interaction if the link function were correctly specified.
Example 1: log vs identity
Let’s consider this scenario (\(\Delta \approx 50 ms\))
(correct)glmmTMB(rt ~ group * cond + (1|id), family=Gamma(link="log"), data=df),
(wrong)glmmTMB(rt ~ group * cond + (1|id), family=Gamma(link="identity"), data=df)
the interaction is NOT in the data generating process which is only group + cond + (1|id); the difference between differences of \(\Delta \approx 50 ms\) is entirely due to the exponential transformation of \(X\)
N = 150 participants divided into 2 groups (between), all participants undergo 2 conditions (within) each with k = 20 trials (total k = 40 trials per participant), ICC = 0.72
Example 1: Gamma(link="log")
Show code
set.seed(10)library(glmmTMB)pvals =rep(NA,niter)for(i in1:niter){ rInt =rep(rnorm(N,0,tau),each=k*2) X = b0 + b1*group + b2*cond + rInt rt =rgamma(N*k*2, shape = shape, scale = scaleConstant*exp(X)/shape) df =data.frame(id,group,cond,rt) df$group =as.factor(df$group) df$cond =as.factor(df$cond) fitGLog =glmmTMB(rt ~ group * cond + (1|id), family=Gamma(link="log"), data=df) pvals[i] =summary(fitGLog)$coefficients$cond["group1:cond1","Pr(>|z|)"]}
N = 150 participants divided into 2 groups (between), all participants undergo 2 conditions (within) each with k = 20 trials (total k = 40 trials per participant), ICC = 0.72
correct link="log" → false positive interactions are 3.5% (about ok)
Example 1: Gamma(link="identity")
Show code
set.seed(10)library(glmmTMB)pvals =rep(NA,niter)for(i in1:niter){ rInt =rep(rnorm(N,0,tau),each=k*2) X = b0 + b1*group + b2*cond + rInt rt =rgamma(N*k*2, shape = shape, scale = scaleConstant*exp(X)/shape) df =data.frame(id,group,cond,rt) df$group =as.factor(df$group) df$cond =as.factor(df$cond) fitGId =glmmTMB(rt ~ group * cond + (1|id), family=Gamma(link="identity"), data=df) pvals[i] =summary(fitGId)$coefficients$cond["group1:cond1","Pr(>|z|)"]}
N = 150 participants divided into 2 groups (between), all participants undergo 2 conditions (within) each with k = 20 trials (total k = 40 trials per participant), ICC = 0.72
incorrect link="identity" → false positive interactions are 76.9% (bad!)
Example 2: Probit vs Logit
Example 2: Probit vs Logit
Example 2: binomial(link="probit")
Show code
set.seed(10)trueLF ="probit"usedLF ="probit"N =150k =20b0 =1.5b1 =-0.7b2 =-0.8pvals =rep(NA,niter)for(i in1:niter){ id =rep(1:N,each=k*2) rInt =rep(rnorm(N,0,1),each=k*2) group =rep(0:1,each=k*N) condition =rep(0:1,each=k,times=N) yLin = b0 + b1*group + b2*conditionif(trueLF=="logit") yProb =plogis(yLin)if(trueLF=="probit") yProb =pnorm(yLin) y =rbinom(length(yLin),1,yProb) df =data.frame(y,id,group,condition) fit =glmmTMB(y ~ group*condition+(1|id), data=df, family=binomial(link=usedLF)) pvals[i] =summary(fit)$coefficients$cond["group:condition","Pr(>|z|)"]}
N = 150 participants divided into 2 groups (between), all participants undergo 2 conditions (within) each with k = 20 trials (total k = 40 trials per participant), ICC = 0.50
correct link="probit" → false positive interactions are 4.4% (about ok)
Example 2: binomial(link="logit")
Show code
set.seed(10)trueLF ="probit"usedLF ="logit"N =150k =20b0 =1.5b1 =-0.7b2 =-0.8pvals =rep(NA,niter)for(i in1:niter){ id =rep(1:N,each=k*2) rInt =rep(rnorm(N,0,1),each=k*2) group =rep(0:1,each=k*N) condition =rep(0:1,each=k,times=N) yLin = b0 + b1*group + b2*conditionif(trueLF=="logit") yProb =plogis(yLin)if(trueLF=="probit") yProb =pnorm(yLin) y =rbinom(length(yLin),1,yProb) df =data.frame(y,id,group,condition) fit =glmmTMB(y ~ group*condition+(1|id), data=df, family=binomial(link=usedLF)) pvals[i] =summary(fit)$coefficients$cond["group:condition","Pr(>|z|)"]}
N = 150 participants divided into 2 groups (between), all participants undergo 2 conditions (within) each with k = 20 trials (total k = 40 trials per participant), ICC = 0.50
incorrect link="logit" → false positive interactions are 22.6% (bad!)
Paradox - Probit vs Logit ?
Paradox - FORCED CHOICE !
When (inference) may NOT be a problem (with caution)