How to Fix Normality in Multiple Regression With Time Series Data

In many statistical techniques, the assumption of the normality of errors is one of the bases of performing statistical hypothesis tests. Thus, it is important for a researcher to check her data for normality; if the assumption of normality fails, this implies that some statistical methods will be unfounded if used on the researcher's dataset. This fact holds true for multiple regression on time series data. Time series data -- data measuring a particular phenomenon and taken at specific times -- can be sensitive to the assumption of normality, as errors in measurement might change with time. Running multiple regression on such data complicates matters further, due to the high level of dimensionality it involves. However, a good statistician should still make an attempt to fix any deviations from the assumption of normality in the data.

Instructions

    • 1

      Remove outliers. Often the data itself does conform to normality, but plots and tests that probe for normality fail simply due to the existence of outliers in the data. Plot your time series data, looking for outliers (any points that deviate wildly from the main patterns present in the data). Take these outliers out of the data and re-run the multiple regression. If the assumption of normality holds, then the problem is fixed and you can simply state in your data analysis that you had removed outliers prior to data analysis. If the normality assumption still does not hold after the removal of outliers, move on to other techniques.

    • 2

      Test data transformations. In many cases, transforming the dependent variable in the data leads to a distribution with normality. Three common functions that can transform your data in a way that leaves the data still easily interpretable are the log function, the square root function and the inverse function. Try applying these functions to your dependent variable, one at a time, and checking normality. It is quite possible that such a transformation will leave you with a normal dataset, which you can use in your data analysis directly, merely stating that you performed a transformation prior to analysis.

    • 3

      Increase your sample size. It is known that the assumption of normality is especially important and easily violated in datasets that contain few data points. Avoid this problem by sampling more points. For time series data, this means tightening the spaces between measurements. If your data arrived from a record of a phenomenon, it is easy to re-sample your data, yielding a new dataset that could be normal. Even if your data still breaks the normality assumption, the fact that you have a large dataset makes this assumption less important, due to the implications of the central limit theorem.

Learnify Hub © www.0685.com All Rights Reserved