How to Create Clusters With Equal Number of Members & Equal Average Attributes

Clustering analysis is a method of examining data by assigning the data to previously undefined groups. The membership in each group is based upon similarity of characteristics. Each member of a cluster should have more in common with the other members of the same cluster than with members of other clusters. The variables by which the data is sorted may be numeric, binary, categorical, or of any other type. How the data is sorted may be slightly different for different types of variables.

Things You'll Need

  • Calculator
Show More

Instructions

  1. Numerical Data

    • 1

      Organize the data. Use a histogram if only one variable is involved, or graph the points onto a coordinate plane if two variables are involved. If the data contain more than two variables, organize it into tables or matrices.

    • 2

      Divide the number of data items by the number of desired clusters to get the average number of members per cluster.

    • 3

      Group the data into clusters containing the average number of members. If a remainder exists, distribute each remaining point of data to a different cluster, so that no cluster exceeds any other in size by more than one.

    • 4

      Find the centroid of each cluster by adding the values of each member and dividing by the number of members in the cluster. This will give you the average value for the cluster.

    • 5

      Find the distance of each member of each cluster from its centroid. If any points of data are closer to the centroid of another cluster, then move it to the other cluster.

    • 6

      Count the numbers of points in each cluster. If any clusters contain more than the average number, move the excess members furthest from the centroid to the neighboring cluster closest to them.

    • 7

      Repeat steps four through six until no further redistribution is needed.

    Binary or Categorical Data

    • 8

      Find the number of desired members per cluster by dividing the total number of data items by the desired number of groups.

    • 9

      Put the appropriate number of data items into clusters having similar characteristics.

    • 10

      Find the most common (modal) value for each variable for the data within each cluster. The centroid of the cluster will have the most common value for each variable.

    • 11

      Divide the number of variables each item has in common with the centroid, by the total number of variables. This ratio shows the degree to which the data point resembles the rest of the cluster

    • 12

      Move any points of data having variable ratios less than 0.5 to another cluster with a greater similarity to that point. Redistribute the points as needed to maintain equal sizes of the clusters. Repeat steps three through five until no further moves are needed.

Learnify Hub © www.0685.com All Rights Reserved