How to Use Discriminant Analysis

Discriminant analysis is a classification method that uses statistical measures such as covariance and geometrical measurements such as Euclidean distance to determine which group an unknown data point belongs to. Using discriminant analysis requires two main steps: finding the discriminant coefficients from a well-understood set of data and applying the coefficients to unknown data to yield group classifications.

Things You'll Need

  • Statistical Software (R, SPSS, SAS or S+)
Show More

Instructions

    • 1

      Decide on the variables you wish to include in the study. These variables should be characteristics that you believe will help classify data points into specific, mutually exclusive groups. For example, if your groups are to be "men and women," possible variables include number of children, years of schooling and yearly income.

    • 2

      Collect a set of data that can be classified into mutually exclusive groups (e.g., men and women, buyers and sellers, or Chinese and Taiwanese). Collect data on the variables that you have previously decided on for each data point.

    • 3

      Calculate the centroids for each group. The calculation of the centroids depends on the number of variables you have chosen to include in the analysis. For example, if you have decided to investigate only two variables, then your centroids will exist in Euclidean 2-space.

    • 4

      Calculate the distance between the two centroids, and denote this distance as a vector, "d." The vector will be as many dimensions as the number of variables of interest. In the case that you are investigating two variables, your vector, "d," will be two-dimensional.

    • 5

      Compute the within-group sum of squares matrices for each group. Call these matrices "W1" and "W2."

    • 6

      Pool the within-group sum of square matrices to yield a within-group covariance matrix. Call this matrix "Cw."

    • 7

      Compute the inverse of "Cw." Call this inverse matrix "Cw-1."

    • 8

      Multiply "Cw-1" and "d." Call this vector "Cw-1d." Its dimension should be equal to the number of variables you have included in the analysis.

    • 9

      Calculate the discriminant function coefficients. These coefficients are proportional to "Cw-1d."

    • 10

      Collect data of interest (data you wish to classify into groups). To properly apply discriminant analysis, only collect data on the variables of interest; knowing the classifications beforehand defeats the purpose of performing discriminant analysis.

    • 11

      Write each data point as a vector. The dimensions of the vectors are the same as the dimensions of the original set of data.

    • 12

      Classify each data point. Multiply each data point by the discriminant function coefficients. The output will give you the classification of the data point. For example, if you are using years of schooling and yearly income as variables to predict the gender of the data points, the resulting number will either be closer to "male" or "female." The group the point is closer to is the group it is classified as.

Learnify Hub © www.0685.com All Rights Reserved