Comparative
Analysis of Advanced Clustering Algorithms for Market Segmentation - A Case
Study on Mall Customer Data
Abstract
This
study conducts a comparative analysis of advanced clustering algorithms for
market segmentation using Mall Customer Data. The algorithms evaluated include
K-Means, Hierarchical Clustering, DBSCAN, Gaussian Mixture Models (GMM),
Agglomerative Clustering, BIRCH, Spectral Clustering, OPTICS, and Affinity
Propagation. Evaluation metrics such as Silhouette Score, Davies-Bouldin Score,
and Calinski-Harabasz Score are employed to assess the clustering performance
and determine the most suitable algorithm for segmenting mall customers based
on their spending habits.
Methodology
The
methodology involves several key steps:
1. Data Collection: Mall Customer Data is obtained,
comprising various demographic and spending attributes.
2. Data Preprocessing: Data is cleaned, normalized, and
prepared for clustering algorithms.
3. Clustering Algorithms: Nine clustering algorithms are
applied to the preprocessed data.
4. Evaluation: Each algorithm's performance is
evaluated using Silhouette Score, Davies-Bouldin Score, and Calinski-Harabasz
Score.
5. Comparison: Results are compared to identify
the algorithm that best segments mall customers based on the evaluation
metrics.
Introduction
Market
segmentation plays a crucial role in understanding customer behavior and
tailoring marketing strategies. Clustering algorithms provide a powerful means
to group similar customers based on shared characteristics, enabling businesses
to target specific customer segments effectively. This study explores various
clustering techniques to uncover distinct customer groups within mall customer
data, aiming to assist businesses in optimizing their marketing efforts and
enhancing customer satisfaction.
Algorithms
1) K-Means: A centroid-based clustering method
that partitions data into K clusters.
2) Hierarchical Clustering: Builds a hierarchy of clusters by
either agglomerative (bottom-up) or divisive (top-down) methods.
3) DBSCAN: Density-based clustering that
identifies clusters of varying shapes and sizes based on density.
4) GMM (Gaussian Mixture Models): Assumes data points are generated
from a mixture of several Gaussian distributions.
5) Mean Shift: Non-parametric clustering that
identifies centroids of clusters by shifting towards higher density regions.
6) Agglomerative Clustering: Hierarchical clustering that
recursively merges clusters based on distance.
7) BIRCH (Balanced Iterative Reducing
and Clustering using Hierarchies):
Builds a tree structure to quickly summarize the data and cluster it
hierarchically.
8) Spectral Clustering: Uses eigenvalues of a similarity
matrix to perform dimensionality reduction before clustering.
9) OPTICS (Ordering Points to Identify
the Clustering Structure):
Density-based algorithm that detects clusters of varying densities.
Dataset
The
dataset used in this study is Mall Customer Data, which includes:
·
CustomerID: Unique identifier for each
customer.
·
Gender: Gender of the customer.
·
Age: Age of the customer.
·
Annual
Income (k$):
Annual income of the customer.
·
Spending
Score (1-100):
Score assigned based on customer behavior and spending nature.
Data Preprocessing
Before
applying clustering algorithms, the following preprocessing steps were
performed:
·
Normalization: Scale numerical features to a
standard range.
·
Encoding: Convert categorical variables
(e.g., Gender) into numerical format if necessary.
·
Feature
Selection: Select
relevant features (e.g., Annual Income and Spending Score) for clustering.
Results
Clustering Algorithm |
Silhouette Score |
Davies-Bouldin Score |
Calinski-Harabasz Score |
K-Means |
0.358 |
1.033 |
101.530 |
Hierarchical |
0.321 |
1.128 |
88.102 |
DBSCAN |
0.185 |
1.757 |
34.071 |
GMM |
0.335 |
1.019 |
90.864 |
Mean Shift |
Warning: Produced 1 label. Skipping evaluation. |
||
Agglomerative |
0.321 |
1.128 |
88.102 |
BIRCH |
0.266 |
1.061 |
63.583 |
Spectral |
0.353 |
0.993 |
99.602 |
OPTICS |
-0.063 |
1.399 |
12.523 |
Affinity Propagation |
0.369 |
0.949 |
128.602 |
Table: Clustering Algorithm Performance Metrics for Mall Customers
This table summarizes the performance metrics for each clustering
algorithm applied to the mall customers dataset, including the Silhouette
Score, Davies-Bouldin Score, and Calinski-Harabasz Score. The Mean Shift
algorithm produced only one label, so its evaluation was skipped.
K-Means clustering was applied to the Mall Customer Data to segment customers based on their spending behavior. The algorithm produced a Silhouette Score of 0.358, indicating reasonably well-defined clusters but with some overlap observed between clusters. The Davies-Bouldin Score of 1.033 suggests moderate clustering quality, where lower values would indicate better clustering. The Calinski-Harabasz Score of 101.530 indicates good cluster separation and compactness. K-Means' performance is notable for its efficiency and scalability, although its reliance on the initial random centroids can affect results, as seen in the slight overlap between clusters.
Hierarchical Clustering was employed to segment the mall customers into distinct groups based on their spending patterns. The algorithm achieved a Silhouette Score of 0.321, indicating reasonable cluster separations but with clusters that are less well-defined compared to other methods. The Davies-Bouldin Score of 1.128 suggests moderate clustering quality, with some overlap or ambiguity in cluster boundaries. The Calinski-Harabasz Score of 88.102 reflects good within-cluster similarity and between-cluster differences, although slightly lower than optimal for highly distinct clusters. Hierarchical Clustering's advantage lies in its ability to reveal hierarchical relationships but may require parameter tuning to improve cluster quality.
DBSCAN was utilized to segment customers based on their spending behaviors, focusing on density-based clustering. The algorithm yielded a Silhouette Score of 0.185, indicating challenges in forming well-defined clusters due to sensitivity to its parameters such as epsilon and minimum samples. The Davies-Bouldin Score of 1.757 suggests significant overlap or inconsistency in cluster formations, impacting the algorithm's ability to distinguish between different customer segments effectively. The Calinski-Harabasz Score of 34.071 reflects weaker cluster separation compared to other methods, indicating potential difficulties in handling varying densities and noise in the data.
Gaussian Mixture Models were employed to segment mall customers based on their spending patterns, assuming data distributions are a mixture of Gaussian distributions. The algorithm achieved a Silhouette Score of 0.335, indicating reasonably well-separated clusters with moderate overlap. The Davies-Bouldin Score of 1.019 suggests decent cluster quality, though with some ambiguity in cluster boundaries. The Calinski-Harabasz Score of 90.864 reflects good cluster separation and compactness, suitable for data with Gaussian-like distributions. GMM's flexibility in capturing complex data distributions makes it robust but sensitive to the number of components and initialization.
Agglomerative Clustering was applied to the customer data to segment them into meaningful groups based on their spending behavior. The algorithm achieved a Silhouette Score of 0.321, similar to Hierarchical Clustering, indicating reasonable cluster separations but with clusters that may not be well-defined. The Davies-Bouldin Score of 1.128 suggests moderate clustering quality, reflecting some ambiguity or overlap in cluster boundaries. The Calinski-Harabasz Score of 88.102 indicates good cluster separation, though not as distinct as other methods like Affinity Propagation. Agglomerative Clustering's hierarchical nature offers insights into cluster relationships but may require careful parameter tuning for optimal results.
BIRCH clustering was employed to segment mall customers based on their spending patterns, focusing on hierarchical clustering using a balanced clustering tree. The algorithm achieved a Silhouette Score of 0.266, indicating some challenges in forming well-separated clusters. The Davies-Bouldin Score of 1.061 suggests moderate clustering quality, with potential overlap or ambiguity in cluster boundaries. The Calinski-Harabasz Score of 63.583 reflects decent cluster separation but lower compared to other methods, indicating limitations in handling varying cluster shapes and densities effectively.
Spectral Clustering was utilized to segment customers based on their spending behaviors, focusing on graph-based clustering. The algorithm achieved a Silhouette Score of 0.353, indicating reasonably well-separated clusters with moderate overlap. The Davies-Bouldin Score of 0.993 suggests good clustering quality, with well-defined clusters and minimal overlap. The Calinski-Harabasz Score of 99.602 reflects strong cluster separation and compactness, suitable for datasets with complex structures. Spectral Clustering's ability to capture non-linear relationships makes it robust for various data distributions, though it may require parameter tuning for optimal performance.
OPTICS was applied to segment customers based on their spending patterns, focusing on density-based clustering. The algorithm produced a Silhouette Score of -0.063, indicating challenges in forming distinct and well-separated clusters, potentially due to noise or parameter sensitivity. The Davies-Bouldin Score of 1.399 suggests significant overlap or inconsistency in cluster formations, impacting the algorithm's ability to distinguish between different customer segments effectively. The Calinski-Harabasz Score of 12.523 reflects weaker cluster separation compared to other methods, highlighting limitations in handling varying densities and noise in the data.
Affinity
Propagation was utilized to segment mall customers based on their spending
behaviors, focusing on exemplar-based clustering. The algorithm achieved a
Silhouette Score of 0.369, indicating well-defined and distinct clusters in the
data. The Davies-Bouldin Score of 0.949 suggests good clustering quality, with
well-separated clusters and minimal overlap. The Calinski-Harabasz Score of
128.602 reflects strong cluster separation and compactness, suitable for
datasets with complex and varied cluster shapes. Affinity Propagation's ability
to automatically determine the number of clusters and capture diverse data
patterns makes it robust for market segmentation tasks, outperforming other
methods evaluated.
Comparison
From
the evaluation results, Affinity Propagation demonstrates the highest
Silhouette Score (0.369) and Calinski-Harabasz Score (128.602), indicating
better-defined clusters and higher cluster separation compared to other
algorithms. However, it is noted that OPTICS produced only one label,
suggesting it may not be suitable for this dataset due to its inability to
distinguish clusters effectively.
Conclusion
This
study comprehensively analyzed various clustering algorithms for market
segmentation using Mall Customer Data. Affinity Propagation emerged as the most
effective algorithm based on Silhouette Score and Calinski-Harabasz Score,
indicating its capability to identify distinct customer segments with
significant differences in spending behavior. The findings provide valuable
insights for businesses aiming to optimize marketing strategies and enhance
customer targeting through effective market segmentation techniques. Future
research could explore ensemble clustering methods or incorporate additional
features for further refinement of customer segmentation models.
To see code: Click Here
Comparative
Analysis of Advanced Clustering Algorithms for Market Segmentation - A Case
Study on Online Retail
Abstract
This
case study presents a comparative analysis of advanced clustering algorithms
for market segmentation using the Online Retail dataset. Various clustering
techniques, including K-Means, Hierarchical, DBSCAN, Gaussian Mixture Model
(GMM), Mean Shift, Agglomerative, BIRCH, Spectral, OPTICS, and Affinity
Propagation, were applied to the dataset to evaluate their performance based on
key metrics such as the Silhouette Score, Davies-Bouldin Score, and
Calinski-Harabasz Score. The results demonstrate significant differences in
clustering quality, highlighting the strengths and weaknesses of each algorithm
in the context of online retail market segmentation.
Introduction
Market
segmentation is crucial for businesses to effectively target and serve
different customer groups. Clustering algorithms provide a means to identify
distinct segments within a customer base. This study focuses on the application
of various clustering algorithms to the Online Retail dataset, aiming to
identify the most suitable technique for market segmentation. By comparing the
performance of each algorithm, we aim to provide insights into their
effectiveness and applicability in real-world scenarios.
Methodology
The
methodology involves the following steps:
·
Data
Collection: The
Online Retail dataset was obtained, containing transactions from a UK-based
online retailer.
·
Data
Preprocessing: The
dataset was cleaned by removing rows with missing values and outliers. Feature
engineering was performed to create relevant attributes for clustering.
·
Clustering
Algorithms: Nine
clustering algorithms were implemented and their performance was evaluated
using three metrics: Silhouette Score, Davies-Bouldin Score, and
Calinski-Harabasz Score.
·
Comparison: The performance metrics of each
algorithm were compared to determine their effectiveness in clustering the
online retail data.
Algorithms
1) K-Means: A partitioning method that divides
the dataset into K clusters based on feature similarity.
2) Hierarchical Clustering: A method that builds a hierarchy
of clusters through either agglomerative or divisive approaches.
3) DBSCAN: A density-based clustering
algorithm that identifies clusters based on the density of data points.
4) Gaussian Mixture Model (GMM): A probabilistic model that assumes
the data is generated from a mixture of several Gaussian distributions.
5) Mean Shift: A non-parametric clustering
technique that aims to find the modes of a density function.
6) Agglomerative Clustering: A type of hierarchical clustering
that merges data points into clusters based on their similarity.
7) BIRCH: A clustering method that builds a
tree structure from the data to find clusters efficiently.
8) Spectral Clustering: A technique that uses the
eigenvalues of a similarity matrix to perform dimensionality reduction before
clustering.
9) OPTICS: An algorithm similar to DBSCAN but
can identify clusters with varying densities.
10) Affinity Propagation: A clustering algorithm that
identifies exemplars among data points and forms clusters based on message
passing.
Dataset
The
Online Retail dataset contains transactions occurring between December 2010 and
December 2011 for a UK-based online retailer. The dataset includes attributes
such as InvoiceNo, StockCode, Description, Quantity, InvoiceDate, UnitPrice,
CustomerID, and Country.
Data Preprocessing
Data
preprocessing involved:
·
Removing
Missing Values:
Transactions with missing CustomerID were removed.
·
Outlier
Detection and Removal:
Outliers in the Quantity and UnitPrice fields were identified and removed.
·
Feature
Engineering: A
TotalPrice feature was created by multiplying Quantity by UnitPrice. RFM
(Recency, Frequency, Monetary) features were computed for clustering.
Results
Algorithm |
Silhouette Score |
Davies-Bouldin Score |
Calinski-Harabasz Score |
K-Means |
0.601 |
0.729 |
3074.447 |
Hierarchical |
0.552 |
0.711 |
2631.877 |
DBSCAN |
0.660 |
1.389 |
433.904 |
GMM |
0.121 |
1.421 |
655.595 |
Mean Shift |
0.409 |
0.371 |
433.865 |
Agglomerative |
0.552 |
0.711 |
2631.877 |
BIRCH |
0.947 |
0.380 |
1484.277 |
Spectral |
0.506 |
0.534 |
855.436 |
OPTICS |
-0.375 |
1.667 |
5.450 |
Affinity Propagation |
0.186 |
0.615 |
1283.033 |
Table: Clustering Algorithm Performance Metrics for Online Retail
This table provides the Silhouette Score, Davies-Bouldin Score, and
Calinski-Harabasz Score for each clustering algorithm applied to the online
retail dataset, facilitating a comparison of their clustering performance.
The
comparative analysis of clustering algorithms on the Online Retail dataset
yielded varied results across different metrics, providing insights into the
effectiveness of each method for market segmentation.
K-Means Clustering demonstrated robust
performance with a Silhouette Score of 0.601, a Davies-Bouldin Score of 0.729,
and a high Calinski-Harabasz Score of 3074.447. These results indicate that
K-Means produced well-defined and distinct clusters, making it a strong contender
for market segmentation.
Hierarchical Clustering achieved a Silhouette
Score of 0.552, a Davies-Bouldin Score of 0.711, and a Calinski-Harabasz Score
of 2631.877. While its performance was slightly below K-Means, it still
produced distinct clusters, showing its potential for market segmentation.
DBSCAN (Density-Based Spatial Clustering of
Applications with Noise) excelled in identifying high-density clusters with a
Silhouette Score of 0.660. However, its Davies-Bouldin Score of 1.389 and
Calinski-Harabasz Score of 433.904 were less favorable, indicating that while
it can identify dense regions effectively, the overall cluster compactness and
separation were not as strong as K-Means or Hierarchical Clustering.
Gaussian Mixture Model (GMM) underperformed with
a low Silhouette Score of 0.121, a high Davies-Bouldin Score of 1.421, and a
moderate Calinski-Harabasz Score of 655.595. These results suggest that GMM
struggled with overlapping clusters and did not provide clear separation
between clusters.
Mean Shift Clustering produced moderate results
with a Silhouette Score of 0.409, a Davies-Bouldin Score of 0.371, and a
Calinski-Harabasz Score of 433.865. While its Davies-Bouldin Score was
relatively low, indicating compact clusters, the overall performance was not as
high as K-Means or DBSCAN.
Agglomerative Clustering, similar to
Hierarchical Clustering, achieved a Silhouette Score of 0.552, a Davies-Bouldin
Score of 0.711, and a Calinski-Harabasz Score of 2631.877. These identical
results suggest that Agglomerative Clustering, which is a type of hierarchical
clustering, produced similar cluster quality.
BIRCH (Balanced Iterative Reducing and Clustering using
Hierarchies) outperformed all other algorithms with the highest
Silhouette Score of 0.947 and a low Davies-Bouldin Score of 0.380. Its
Calinski-Harabasz Score of 1484.277 was also favorable, indicating
well-separated and compact clusters. BIRCH's exceptional performance makes it a
top choice for market segmentation in this context.
Spectral Clustering showed balanced performance
with a Silhouette Score of 0.506, a Davies-Bouldin Score of 0.534, and a
Calinski-Harabasz Score of 855.436. While not the best, Spectral Clustering
provided reasonably distinct clusters, making it a viable option.
OPTICS (Ordering Points to Identify the Clustering Structure)
performed poorly with a negative Silhouette Score of -0.375, indicating
incorrect clustering. Its high Davies-Bouldin Score of 1.667 and low
Calinski-Harabasz Score of 5.450 further confirmed its inadequacy for this
dataset.
Affinity Propagation delivered moderate results
with a Silhouette Score of 0.186, a Davies-Bouldin Score of 0.615, and a
Calinski-Harabasz Score of 1283.033. While its Silhouette Score was low,
indicating overlapping clusters, its other scores were decent, showing
potential in specific scenarios.
Comparison
In summary, BIRCH emerged as the best-performing algorithm, followed by K-Means and DBSCAN. Hierarchical, Agglomerative, and Spectral Clustering also provided good results. GMM, Mean Shift, OPTICS, and Affinity Propagation were less effective, with OPTICS showing particularly poor performance. These findings can guide businesses in selecting appropriate clustering algorithms for market segmentation and other analytical purposes in the online retail industry.
Conclusion
The
comparative analysis of clustering algorithms for the Online Retail dataset
revealed that BIRCH outperformed other methods with the highest Silhouette
Score and a low Davies-Bouldin Score, indicating well-defined and compact
clusters. K-Means and DBSCAN also showed good performance, making them suitable
for market segmentation tasks. In contrast, OPTICS and GMM exhibited poor
performance. These findings can guide businesses in selecting appropriate
clustering algorithms for market segmentation and other analytical purposes.
To see code: Click Here
Comparative
Analysis of Advanced Clustering Algorithms for Market Segmentation - A Case
Study on Whole Customers Data
Abstract
Clustering
is a pivotal technique for market segmentation, enabling businesses to
categorize customers based on their purchasing behaviors. This study evaluates
the performance of various clustering algorithms on a Wholesale Customers
dataset using three key metrics: Silhouette Score, Davies-Bouldin Score, and
Calinski-Harabasz Score. The analysis reveals that DBSCAN and BIRCH algorithms
outperform others in defining compact and well-separated clusters, providing
valuable insights for businesses seeking effective customer segmentation
strategies.
Introduction
In
the domain of wholesale business, understanding customer behavior is crucial
for optimizing marketing strategies and enhancing customer satisfaction.
Clustering, an unsupervised machine learning technique, helps in segmenting
customers into distinct groups based on their purchasing patterns. This study
aims to evaluate the effectiveness of various clustering algorithms on the
Wholesale Customers dataset to identify the best method for customer
segmentation.
Methodology
The
methodology involves applying multiple clustering algorithms to the Wholesale
Customers dataset and evaluating their performance using standard clustering
metrics. This approach ensures a comprehensive analysis of each algorithm's
ability to form meaningful and well-defined clusters.
Algorithms
The
following clustering algorithms were evaluated:
1) K-Means Clustering
2) Hierarchical Clustering
3) DBSCAN (Density-Based Spatial
Clustering of Applications with Noise)
4) Gaussian Mixture Model (GMM)
5) Mean Shift Clustering
6) Agglomerative Clustering
7) BIRCH (Balanced Iterative Reducing
and Clustering using Hierarchies)
8) Spectral Clustering
9) OPTICS (Ordering Points to Identify
the Clustering Structure)
10) Affinity Propagation
Dataset
The
dataset used for this analysis is the Wholesale Customers dataset, which
includes the following attributes:
·
Fresh
·
Milk
·
Grocery
·
Frozen
·
Detergents_Paper
·
Delicassen
Data Preprocessing
Data
preprocessing steps included:
·
Handling
missing values by imputing or removing them.
·
Standardizing
the dataset to ensure each attribute contributes equally to the clustering
process.
Metrics Used
Three
key metrics were used to evaluate the clustering performance:
1) Silhouette Score: Measures the cohesion within
clusters and separation between clusters.
2) Davies-Bouldin Score: Evaluates the average similarity
ratio of each cluster with the cluster most similar to it.
3) Calinski-Harabasz Score: Assesses the ratio of the sum of
between-cluster dispersion to within-cluster dispersion.
Terms Explanation
·
Silhouette
Score: Ranges from
-1 to 1, where a higher value indicates better-defined clusters.
·
Davies-Bouldin
Score: Lower
values indicate better clustering with less intra-cluster variance.
·
Calinski-Harabasz
Score: Higher
values indicate better-defined clusters with greater separation.
Results
Algorithm |
Silhouette Score |
Davies-Bouldin Score |
Calinski-Harabasz Score |
K-Means Clustering |
0.458 |
1.249 |
132.363 |
Hierarchical Clustering |
0.265 |
1.285 |
111.151 |
DBSCAN |
0.803 |
1.126 |
68.548 |
Gaussian Mixture Model |
0.316 |
1.471 |
91.252 |
Mean Shift Clustering |
0.354 |
0.503 |
62.346 |
Agglomerative Clustering |
0.265 |
1.285 |
111.151 |
BIRCH |
0.526 |
0.667 |
130.222 |
Spectral Clustering |
0.208 |
1.349 |
98.475 |
OPTICS |
-0.407 |
1.561 |
3.371 |
Affinity Propagation |
0.179 |
0.852 |
149.436 |
Table: Clustering Algorithm Performance Metrics for Whole Customers Data
K-Means
Clustering resulted in a Silhouette Score of 0.458, indicating moderate
cohesion and separation within the clusters. The Davies-Bouldin Score was
1.249, suggesting average similarity between clusters. The Calinski-Harabasz
Score was 132.363, reflecting a reasonable ratio of between-cluster to
within-cluster dispersion.
Hierarchical
Clustering showed a lower performance with a Silhouette Score of 0.265,
indicating less defined clusters. The Davies-Bouldin Score was 1.285, slightly
higher than K-Means, indicating more similarity between clusters. The
Calinski-Harabasz Score was 111.151, lower than K-Means, indicating less
defined clusters.
DBSCAN
outperformed many other algorithms with a Silhouette Score of 0.803, indicating
well-defined clusters. However, the Davies-Bouldin Score was 1.126, showing
some similarity between clusters. The Calinski-Harabasz Score was 68.548, lower
than expected, suggesting that while clusters are well-defined, they are not
well-separated.
GMM
showed a moderate Silhouette Score of 0.316, indicating less cohesion within
clusters. The Davies-Bouldin Score was 1.471, higher than other algorithms,
indicating more similarity between clusters. The Calinski-Harabasz Score was
91.252, showing moderate separation between clusters.
Mean
Shift Clustering showed a Silhouette Score of 0.354, indicating moderate
cluster cohesion. The Davies-Bouldin Score was 0.503, the lowest among the
algorithms, indicating less similarity between clusters. The Calinski-Harabasz
Score was 62.346, suggesting moderate cluster separation.
Agglomerative
Clustering had the same performance as Hierarchical Clustering with a
Silhouette Score of 0.265, a Davies-Bouldin Score of 1.285, and a
Calinski-Harabasz Score of 111.151, indicating moderate cluster definition.
BIRCH
showed strong performance with a Silhouette Score of 0.526, indicating
well-defined clusters. The Davies-Bouldin Score was 0.667, suggesting low
similarity between clusters. The Calinski-Harabasz Score was 130.222,
indicating well-separated clusters.
Spectral
Clustering had a Silhouette Score of 0.208, indicating less cohesive clusters.
The Davies-Bouldin Score was 1.349, showing high similarity between clusters.
The Calinski-Harabasz Score was 98.475, indicating moderate separation between
clusters.
OPTICS
performed poorly with a Silhouette Score of -0.407, indicating incorrect
clustering. The Davies-Bouldin Score was 1.561, the highest among the
algorithms, indicating high similarity between clusters. The Calinski-Harabasz
Score was 3.371, suggesting poor cluster separation.
Affinity
Propagation showed a Silhouette Score of 0.179, indicating less defined
clusters. The Davies-Bouldin Score was 0.852, indicating moderate similarity
between clusters. The Calinski-Harabasz Score was 149.436, the highest among
the algorithms, indicating well-separated clusters.
Among the evaluated clustering algorithms, DBSCAN emerged as the best-performing method for the Wholesale Customers dataset, particularly in terms of the Silhouette Score. BIRCH also demonstrated strong performance across various metrics, followed by K-Means Clustering, which provided a good balance of compactness and separation.
Hierarchical,
Agglomerative, and Mean Shift Clustering provided moderate results, showing
that while they could identify distinct clusters, the overall cluster quality
was not as high as the top-performing algorithms. GMM, Spectral Clustering,
OPTICS, and Affinity Propagation were less effective, with OPTICS showing
particularly poor performance due to incorrect clustering and poor cluster
separation.
Conclusion
This
study highlights the effectiveness of different clustering algorithms for
segmenting wholesale customers. DBSCAN and BIRCH were found to be the most
effective, providing well-defined and meaningful clusters. These findings guide
businesses in selecting appropriate clustering algorithms for market
segmentation and other analytical purposes within the wholesale customer
segment, ensuring better-targeted marketing strategies and improved customer
satisfaction.
Comments
Post a Comment