# >> College & Higher Education >> PHD Programs

How to Calculate a Regression Incorporating Data-Point Errors

The regression function in statistical analysis is an ideal function. When working with real data, statisticians hope to use the regression function to model the data of interest. However, there is data-point error involved in gathering the sample data that will be the subject of regression. Thus, it is often important for statisticians to account for this error in the regression model itself. While it is impossible to calculate the true values of the data-point error, it is possible to estimate them and incorporate them into the regression model.

Instructions

- 1
  Organize the data into a data matrix prepared for regression. The data matrix should be of size n by p, where n is the number of data points and p is the number of independent variables in the model. For example, if your data included 336 subjects that were measured on 12 different variables (criteria), you will have a data matrix with 336 rows and 12 columns. Call this matrix X.
- 2
  Run a regression model as usual. Use statistical software. For example, in the statistical software package R, the command lm(Y ~ X) regresses the dependent variable Y on the independent variables X. The result will be a list of coefficients. Put them in vector form and call them B.
- 3
  Transpose the matrix X. In transposing the matrix, its rows become the columns and columns become the rows. Thus, you will end up with a p by n matrix. Call this matrix X’. For large matrices, it is advisable to find X’ through statistical software. For example, in the software package R, the command t(X) yields the transpose of X.
- 4
  Multiply the transpose of X by X itself. In this calculation, the order counts. Therefore, computing X’X is correct whereas XX’ is not. For large matrices you should perform this calculation in statistical software. If using R, for instance, use %*% to multiply two matrices. Thus, the command is X’ %*% X.
- 5
  Find the inverse of X’X. This calculation is infeasible by hand, so use statistical software. In R, the command for this is inv(). Hence, this step is performed by inv(X’X). Call this matrix (X’X)-1.
- 6
  Compute the hat matrix. The hat matrix is given by the formula X(X’X)-1X’. That is, multiply X by (X’X)-1 and then multiply the result by X’. Again, large matrices preclude hand calculations, so use statistical software. Call this matrix H.
- 7
  Subtract the hat matrix from the identity matrix, I. The identity matrix is a matrix that has the value 1 for all diagonal entries and 0 for all off-diagonal entries. Call the resulting matrix (I-H).
- 8
  Calculate the estimated data-point errors. Multiply (I-H) by the vector of dependent variables from your data set. Call this vector e. The solution will be a vector of data-point error estimates.
- 9
  Add the data-point error estimates to the regression model. The resulting model is then Y = XB + e.

Guidelines for Writing a PhD Synopsis

How to Create a Path of Points in Spatial Data