What are the Causes of Non-Normal Residuals?

In statistical analysis, it is standard for researchers to observe the residuals, or differences between the actual data and their models, in their data analysis before stating results. If the residuals are non-normal, or not forming a bell-curve shape, it is often the case that making conclusion using the model would be statistically incorrect and inappropriate. Thus, when a researcher notices that the residuals in his model are non-normal, she naturally asks why this is so. There are a number of possible causes for non-normal residuals, and a researcher must look at all of the possibilities to understand the full picture.
  1. The Distribution

    • If the distribution of the original data is not normal, that is not arriving from a Gaussian distribution, then it is extremely likely that the residuals will also not be normal. The researcher can find out if this is the cause of the non-normality of the residuals by looking at the original data population or sample distribution. If the original data does not appear normal, the researcher may have made a mistake in assuming that the data came from a normal distribution before performing statistical analyses. If this is the case, the researcher must recreate a model that accounts for the true distribution of the population.

    Incorrect Model Choice

    • To have residuals, you must first have a model. If the researcher chooses a model that is not in line with reality, she may find that the residuals have a mean far from zero. This would push the distribution of the residuals away from a standard normal distribution.

    Interdependence

    • Most models assume the values they are predicting are independent. That is, if your model is running on data that are dependent, your model’s assumption will not hold. This affects the residuals, making them interdependent. Interdependent values cannot come from a normal distribution, explaining the non-normality of residuals.

    Non-constant Variance

    • The residuals of a model should have the same variance. What this means is that residuals should differ from the mean randomly and independently; if the third residual is subjected to a variance of 4, then the fifth, sixth, and one-millionth residual should have the same variance associated with them. If you find that the variance changes as you predict different values, this is likely the cause of the non-normality of residuals.

Learnify Hub © www.0685.com All Rights Reserved