Cluster analysis divides data
into groups that are meaningful or serve a purpose. The basic idea is to
identify characteristics that seem to be common across groups within the data
and also differentiates different groups. Essentially the outcome of a cluster
analysis results in data groups that have the greatest similarity of key
characteristics within the group but are dissimilar across groups. For
example, Toyota Camry buyers may share common demographics or psychographics
which will be different from Lexus buyers.
Clustering is a critical
component of a segmented marketing strategy.
This is useful in identifying
similar groups that can be targeted separately in advertising and promotions.
These groups are called clusters and individuals are assigned to a cluster
based on their how close their values on different profiling variables are to
the cluster average or ‘center’.
The graph below
demonstrates how stores, outlets, dealerships or hotel locations can be clustered based on demographic structure and
performance.

A common
application is. assigning similar customers to groups based on their profile
on different variables like age, income, geographic location, gender, family
size and education.
Cluster Analysis Approaches
Partitional: Partitional Cluster Analysis
directly divides data into disjoint groups- it doesn't consider subsets or
groups within clusters. The "K-Means" algorithm is a common partitional
clustering approach that attempts to divide n datapoints to K
clusters. This is an iterative process, which starts with 2 assumptions- the
number of clusters and their initial means for each clustering factor (the
variables on the basis of which clustering will be done). The means may be
randomly picked or explicitly provided based on prior beliefs. The algorithm
then proceeds to assign each datapoint to the nearest mean (by minimizing the
Euclidean distance or the sum of the squared distances across all clustering
factors). Once all datapoints are assigned, cluster means are recalculated
based on the datapoints they now contain and the assignment process is
repeated, means recalculated and so on until the means do not change between
iterations, at this point the algorithm would be considered to have converged.
Partitional Clusters are also typically exclusive or "Hard" clusters- clusters
are mutually exclusive and one point cannot be in two clusters at the same
time.
Hierarchical:
Hierarchical clusters do not consider clusters as disjoint but rather as
forming a hierarchy, almost like a tree (actually a dendrogram)- at the lowest
level, every point can potentially form it's own cluster, at the highest
point, all points fall into one cluster. Hierarchical Cluster Analysis falls
into two broad classes:
Agglomerative:
Starts with each point as it's own cluster and progressively merges the two
closest clusters together, forming a hierarchy, until you reach the "top of
the tree" where all points fall into a single cluster.
Divisive:
You start at the "top of the tree", with all points in one cluster and
progressively divide datapoints into clusters. The first split involves
separating out the datapoint that is farthest from the mean of the remaining
points and then other points that are closer to this point than to the "centroid"
of the other points are assigned to this new cluster. The two new clusters
thus created are further separated into sets of two using the same process and
so on until every single point becomes its own cluster- this will be the
"root" of the tree.
Hierarchical
clusters are "exclusive" within each level of the dendrogram or "tree".
Fuzzy Clusters: Not all clustering
approaches are exclusive or hard- in Fuzzy clustering a data point can belong
to two clusters at the same time. A common technique of Fuzzy clustering is
the C-Means algorithm, where a point belongs to a cluster with some
probability, and the centroid of the cluster is the probability weighted
average of all the points. Fuzzy clusters aren't uncommon in business- for
instance segmenting consumers into disjoint clusters for marketing purposes
may not be optimal- different marketing programs may require different cluster
structure. For instance a heavily income based cluster structure may be best
for price promotion programs, whereas an ethnographics or psychographics
weighted clustering structure may be optimal for advertising programs. Fuzzy
clustering would provide an efficient approach to combining these different
purposes.