(Non)Normal distributions

toffa et al. x psicostat

a typical (mixed-effects) linear model

fit = lmer(y ~ group + cond + (1|id), data=df)

Generally, random effects and residuals are taken as normally-distributed:

random intercepts ~ \(N(0,\tau)\)

residuals ~ \(N(0,\sigma)\)


But why normally-distributed?

tendency towards “normality”?

underlying normality of responses (residuals) and/or individual differences (random intercepts) arise from the sum of a great number of independent factors/causes (e.g., environmental, genetic, contextual), which makes a lot of sense in psychology and beyond!

Example Figure

tendency towards “normality”?

this is a generalization of the Central Limit Theorem (CLT) to the sum of (many) independent and identically distributed varaibles (Lindeberg–Lévy CLT) or even non-identically distributed variables (Lyapunov CLT)

but then, why are so many variables skewed?

in many (most) cases, observed scores are reflective of the underlying dimension of interest; but we do not directly observe the dimension of interest, we only observe scores that generally:

  • are aggregate of non-normally distributed responses that may only approximate normality (e.g., sum scores from binomial/ordinal responses of a limited number of items)

  • have a lower bound (e.g., zero for times, errors), or

  • have both a lower and an upper bound (e.g., accuracies, sum scores)

TIME

while the underlying ability of interest might be normally distributed, observed times cannot be, because they present a lower bound on zero

IMPORTANT: equal intervals on the right panel do NOT reflect equal interval on the left panel

TIME: mean vs variance

  • true underlying (maybe normally-distributed) scores: experimental condition (or group) purple is more difficult (less able) than experimental condition (group) orange there is only a shift in mean value;
  • observed scores: as mean increases, variance also increases
  • the LINK FUNCTION links the observed scores to the true underlying scores; in the case above it is logarithm (link="log"; typical for times)

ERRORS

the case of error is very similar: a lower bound on zero again exists, with the difference that the observations are discrete (not continuous)

IMPORTANT: equal intervals on the right panel do NOT reflect equal interval on the left panel

ACCURACIES, BOUNDED SUM SCORES

Differences between differences

In all previous cases, we noted that equal differences on the observed scores do NOT reflect equal differences on the underlying ability/trait.

➜ This may have devastating consequences when testing interactions, because interactions can be seen as tests of whether there are differences between differences (i.e., whether a difference is equal to another difference)

Note that the link function transforms equal intervals into unequal intervals