What Is a Regression Line?

Linear regression is known as a line of best fit. Trying to find a straight line that best represents plotted points of observable data on a graph helps to understand the correlation between the data, if there is any. Once an equation for a regression line is derived, it can be used to predict possible future values that have yet to be observed.
  1. Linear Regression

    • There are several aspects to linear regression modeling, or finding a best line of fit. The first step is to collect data pertaining to two presumably related traits, such as height and weight of several random people. In more advanced regression problems, a correlation between two variables isn't as obvious, but for the sake of comprehension we'll keep this simple. Height and weight are collected and put in a table--these numbers are your data, and they are the variables in your regression equation, which will represent the line of best fit. You plot the data on a graph, with one variable per axis. You note that the points on the graph seem to be correlated, as they are moving in the same direction. The taller a person is, the more he weighs. You can see the correlation.

    Least-Squares Regression

    • While a simple example can be worked out manually, and should be for beginners, the process of linear regression is done automatically by graphing calculators. Least-squares regression is the process by which a line of best fit is derived and its equation, in the form of y = mX + b, is discovered. You use a process of calculations from the data; these calculations are a combination of summing observable x and y values, squaring them, subtracting them and dividing them. Once you know what each value in the formula represents, it's a simple matter of plugging the numbers in on a calculator and recording the value of your operations. The result is an equation for your line of best fit or regression; it is a straight line that can be drawn through your data points that best represents their correlation mathematically.

    Residuals

    • Residuals are the distance between points of data and the line of best fit. Taking residuals into account allows the observer to know the validity of his assumption that there is a correlation between such data. In most regression problems, part of the solution is discovering whether there is a correlation at all, unlike with height and weight. On a graphing calculator, a new graph can be made solely of the residuals, and the independent variable, which can give away the presence of lurking variables.

    Lurking Variables

    • A lurking variable may exist when data points yield a poor line of regression, or outliers. A lurking variable is simply a factor in the problem that affects the data but may not have been taken into consideration initially. Height and weight are correlated, but to explain a graph in which there is no correlation between them, we might imagine that food intake and/or exercise is a lurking variable that makes a short person heavy, or a tall person thin. A point that represents either of those scenarios would lie far off the regression line and have a high residual.

    Outliers

    • The data point that is the farthest away from the line of regression on your graph has the largest residual value. A term for points that are far away from the line of regression is outliers. Outliers are important because they may represent erroneous data or a line of poor fit, and they can change the slope of your line significantly.

Learnify Hub © www.0685.com All Rights Reserved