Assessing the fit of a regression model is to ask how closely the actual data match the relationships or predictions specified in a regression equation. For most regression equations, R-square provides an adequate measure of fit. However, because the dependent variable in logistic regression can take only the values of zero or 1, its value can differ greatly for some range of an independent variable. Thus, R-square may be an unreliable measure of fit for a logistic equation.
Because of R-square's unreliability as a measure of fit for logistic regression, some statisticians and scholars propose the use of graphical methods as an alternative. There are different graphical methods for assessing the fit of a regression model, but in general, how closely the regression line--formed from predicted scores in the model--comes to summarizing the actual observed scores determines the adequacy of the model. The better the fit, the more appropriate the model or equation to the study in question.
Logistic regression is a specialized procedure, one that requires the use of statistical software such as SPSS, SAS or Stata. Spreadsheet programs such as Excel are not designed for logistic regression.
A common graphical method for assessing the fit of a regression model is a plot of the residuals, or the difference between observed and predicted values. The residuals appear as individual points on the graph, with the vertical and horizontal axes representing the values of the dependent and independent variables, respectively. If the residuals appear independent of the predictors, or independent variables, then a given model is adequate. Any systematic patterns in the plot of the actual residuals suggest a dependence on the independent variables, rendering a model inadequate for use.
Residual plots, however, have limitations in logistic regression because of the dichotomous nature of the dependent variable. This makes the residual plot difficult to interpret.
A more reliable alternative to the residual plot is the marginal model plot, or MMP. The plot represents two curves, with one representing the predicted values and the other representing the actual observed values. As with the residual plot, the vertical axis represents the values of the dependent variable (1 or zero in logistic regression), while the horizontal axis represents the values of the predictors. If the two curves match closely, then the logistic regression model in question is deemed adequate. Unlike most forms of regression, in which the regression line is straight, the logistic regression line has a curve.