fit = lmer(y ~ group + cond + (1|id), data=df)
Generally, random effects and residuals are taken as normally-distributed:
random intercepts ~ \(N(0,\tau)\)
residuals ~ \(N(0,\sigma)\)
But why normally-distributed?
underlying normality of responses (residuals) and/or individual differences (random intercepts) arise from the sum of a great number of independent factors/causes (e.g., environmental, genetic, contextual), which makes a lot of sense in psychology and beyond!
this is a generalization of the Central Limit Theorem (CLT) to the sum of (many) independent and identically distributed varaibles (Lindeberg–Lévy CLT) or even non-identically distributed variables (Lyapunov CLT)
in many (most) cases, observed scores are reflective of the underlying dimension of interest; but we do not directly observe the dimension of interest, we only observe scores that generally:
are aggregate of non-normally distributed responses that may only approximate normality (e.g., sum scores from binomial/ordinal responses of a limited number of items)
have a lower bound (e.g., zero for times, errors), or
have both a lower and an upper bound (e.g., accuracies, sum scores)
while the underlying ability of interest might be normally distributed, observed times cannot be, because they present a lower bound on zero
IMPORTANT: equal intervals on the right panel do NOT reflect equal interval on the left panel
link="log"
; typical for times)the case of error is very similar: a lower bound on zero again exists, with the difference that the observations are discrete (not continuous)
IMPORTANT: equal intervals on the right panel do NOT reflect equal interval on the left panel
link = "probit"
this is very typical of distributions arising from binomial processes (e.g., accuracies) but also ordinal processes (e.g., sum scores of scales, questionnaires)
IMPORTANT: once again, note that equal intervals on the right panel do NOT reflect equal interval on the left panel
binomial(link = "probit")
In binomial and ordinal processes, even if underlying individual differences (random intercepts) are normally-distributed, observed sum scores may not be “normal”.
Accuracies or sum scores computed on 5, 10, 20 items must consider that the underlying data-generating process is binomial.
Only with very many items/trials the error term (residuals) is normally distributed (family = gaussian(link = "probit")
). This is the main reason why we (should) use family = binomial
instead of family = gaussian
when dealing with accuracies…
In all previous cases, we noted that equal differences on the observed scores do NOT reflect equal differences on the underlying ability/trait.
➜ This may have devastating consequences when testing interactions, because interactions can be seen as tests of whether there are differences between differences (i.e., whether a difference is equal to another difference)
Note that the link function transforms equal intervals into unequal intervals