Skip to main content

Comparative Analysis of Advanced Clustering Algorithms for Market Segmentation

 

Comparative Analysis of Advanced Clustering Algorithms for Market Segmentation - A Case Study on Mall Customer Data

Abstract

This study conducts a comparative analysis of advanced clustering algorithms for market segmentation using Mall Customer Data. The algorithms evaluated include K-Means, Hierarchical Clustering, DBSCAN, Gaussian Mixture Models (GMM), Agglomerative Clustering, BIRCH, Spectral Clustering, OPTICS, and Affinity Propagation. Evaluation metrics such as Silhouette Score, Davies-Bouldin Score, and Calinski-Harabasz Score are employed to assess the clustering performance and determine the most suitable algorithm for segmenting mall customers based on their spending habits.

Methodology

The methodology involves several key steps:

1.     Data Collection: Mall Customer Data is obtained, comprising various demographic and spending attributes.

2.     Data Preprocessing: Data is cleaned, normalized, and prepared for clustering algorithms.

3.     Clustering Algorithms: Nine clustering algorithms are applied to the preprocessed data.

4.     Evaluation: Each algorithm's performance is evaluated using Silhouette Score, Davies-Bouldin Score, and Calinski-Harabasz Score.

5.     Comparison: Results are compared to identify the algorithm that best segments mall customers based on the evaluation metrics.

Introduction

Market segmentation plays a crucial role in understanding customer behavior and tailoring marketing strategies. Clustering algorithms provide a powerful means to group similar customers based on shared characteristics, enabling businesses to target specific customer segments effectively. This study explores various clustering techniques to uncover distinct customer groups within mall customer data, aiming to assist businesses in optimizing their marketing efforts and enhancing customer satisfaction.

Algorithms

1)    K-Means: A centroid-based clustering method that partitions data into K clusters.

2)    Hierarchical Clustering: Builds a hierarchy of clusters by either agglomerative (bottom-up) or divisive (top-down) methods.

3)    DBSCAN: Density-based clustering that identifies clusters of varying shapes and sizes based on density.

4)    GMM (Gaussian Mixture Models): Assumes data points are generated from a mixture of several Gaussian distributions.

5)    Mean Shift: Non-parametric clustering that identifies centroids of clusters by shifting towards higher density regions.

6)    Agglomerative Clustering: Hierarchical clustering that recursively merges clusters based on distance.

7)    BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies): Builds a tree structure to quickly summarize the data and cluster it hierarchically.

8)    Spectral Clustering: Uses eigenvalues of a similarity matrix to perform dimensionality reduction before clustering.

9)    OPTICS (Ordering Points to Identify the Clustering Structure): Density-based algorithm that detects clusters of varying densities.

Dataset

The dataset used in this study is Mall Customer Data, which includes:

·       CustomerID: Unique identifier for each customer.

·       Gender: Gender of the customer.

·       Age: Age of the customer.

·       Annual Income (k$): Annual income of the customer.

·       Spending Score (1-100): Score assigned based on customer behavior and spending nature.

Data Preprocessing

Before applying clustering algorithms, the following preprocessing steps were performed:

·       Normalization: Scale numerical features to a standard range.

·       Encoding: Convert categorical variables (e.g., Gender) into numerical format if necessary.

·       Feature Selection: Select relevant features (e.g., Annual Income and Spending Score) for clustering.

Results

Clustering Algorithm

Silhouette Score

Davies-Bouldin Score

Calinski-Harabasz Score

K-Means

0.358

1.033

101.530

Hierarchical

0.321

1.128

88.102

DBSCAN

0.185

1.757

34.071

GMM

0.335

1.019

90.864

Mean Shift

Warning: Produced 1 label. Skipping evaluation.

Agglomerative

0.321

1.128

88.102

BIRCH

0.266

1.061

63.583

Spectral

0.353

0.993

99.602

OPTICS

-0.063

1.399

12.523

Affinity Propagation

0.369

0.949

128.602

Table: Clustering Algorithm Performance Metrics for Mall Customers

This table summarizes the performance metrics for each clustering algorithm applied to the mall customers dataset, including the Silhouette Score, Davies-Bouldin Score, and Calinski-Harabasz Score. The Mean Shift algorithm produced only one label, so its evaluation was skipped.


K-Means clustering was applied to the Mall Customer Data to segment customers based on their spending behavior. The algorithm produced a Silhouette Score of 0.358, indicating reasonably well-defined clusters but with some overlap observed between clusters. The Davies-Bouldin Score of 1.033 suggests moderate clustering quality, where lower values would indicate better clustering. The Calinski-Harabasz Score of 101.530 indicates good cluster separation and compactness. K-Means' performance is notable for its efficiency and scalability, although its reliance on the initial random centroids can affect results, as seen in the slight overlap between clusters.

Hierarchical Clustering was employed to segment the mall customers into distinct groups based on their spending patterns. The algorithm achieved a Silhouette Score of 0.321, indicating reasonable cluster separations but with clusters that are less well-defined compared to other methods. The Davies-Bouldin Score of 1.128 suggests moderate clustering quality, with some overlap or ambiguity in cluster boundaries. The Calinski-Harabasz Score of 88.102 reflects good within-cluster similarity and between-cluster differences, although slightly lower than optimal for highly distinct clusters. Hierarchical Clustering's advantage lies in its ability to reveal hierarchical relationships but may require parameter tuning to improve cluster quality.

DBSCAN was utilized to segment customers based on their spending behaviors, focusing on density-based clustering. The algorithm yielded a Silhouette Score of 0.185, indicating challenges in forming well-defined clusters due to sensitivity to its parameters such as epsilon and minimum samples. The Davies-Bouldin Score of 1.757 suggests significant overlap or inconsistency in cluster formations, impacting the algorithm's ability to distinguish between different customer segments effectively. The Calinski-Harabasz Score of 34.071 reflects weaker cluster separation compared to other methods, indicating potential difficulties in handling varying densities and noise in the data.

Gaussian Mixture Models were employed to segment mall customers based on their spending patterns, assuming data distributions are a mixture of Gaussian distributions. The algorithm achieved a Silhouette Score of 0.335, indicating reasonably well-separated clusters with moderate overlap. The Davies-Bouldin Score of 1.019 suggests decent cluster quality, though with some ambiguity in cluster boundaries. The Calinski-Harabasz Score of 90.864 reflects good cluster separation and compactness, suitable for data with Gaussian-like distributions. GMM's flexibility in capturing complex data distributions makes it robust but sensitive to the number of components and initialization.

Agglomerative Clustering was applied to the customer data to segment them into meaningful groups based on their spending behavior. The algorithm achieved a Silhouette Score of 0.321, similar to Hierarchical Clustering, indicating reasonable cluster separations but with clusters that may not be well-defined. The Davies-Bouldin Score of 1.128 suggests moderate clustering quality, reflecting some ambiguity or overlap in cluster boundaries. The Calinski-Harabasz Score of 88.102 indicates good cluster separation, though not as distinct as other methods like Affinity Propagation. Agglomerative Clustering's hierarchical nature offers insights into cluster relationships but may require careful parameter tuning for optimal results.

BIRCH clustering was employed to segment mall customers based on their spending patterns, focusing on hierarchical clustering using a balanced clustering tree. The algorithm achieved a Silhouette Score of 0.266, indicating some challenges in forming well-separated clusters. The Davies-Bouldin Score of 1.061 suggests moderate clustering quality, with potential overlap or ambiguity in cluster boundaries. The Calinski-Harabasz Score of 63.583 reflects decent cluster separation but lower compared to other methods, indicating limitations in handling varying cluster shapes and densities effectively.

Spectral Clustering was utilized to segment customers based on their spending behaviors, focusing on graph-based clustering. The algorithm achieved a Silhouette Score of 0.353, indicating reasonably well-separated clusters with moderate overlap. The Davies-Bouldin Score of 0.993 suggests good clustering quality, with well-defined clusters and minimal overlap. The Calinski-Harabasz Score of 99.602 reflects strong cluster separation and compactness, suitable for datasets with complex structures. Spectral Clustering's ability to capture non-linear relationships makes it robust for various data distributions, though it may require parameter tuning for optimal performance.

OPTICS was applied to segment customers based on their spending patterns, focusing on density-based clustering. The algorithm produced a Silhouette Score of -0.063, indicating challenges in forming distinct and well-separated clusters, potentially due to noise or parameter sensitivity. The Davies-Bouldin Score of 1.399 suggests significant overlap or inconsistency in cluster formations, impacting the algorithm's ability to distinguish between different customer segments effectively. The Calinski-Harabasz Score of 12.523 reflects weaker cluster separation compared to other methods, highlighting limitations in handling varying densities and noise in the data.

Affinity Propagation was utilized to segment mall customers based on their spending behaviors, focusing on exemplar-based clustering. The algorithm achieved a Silhouette Score of 0.369, indicating well-defined and distinct clusters in the data. The Davies-Bouldin Score of 0.949 suggests good clustering quality, with well-separated clusters and minimal overlap. The Calinski-Harabasz Score of 128.602 reflects strong cluster separation and compactness, suitable for datasets with complex and varied cluster shapes. Affinity Propagation's ability to automatically determine the number of clusters and capture diverse data patterns makes it robust for market segmentation tasks, outperforming other methods evaluated. 

Comparison

From the evaluation results, Affinity Propagation demonstrates the highest Silhouette Score (0.369) and Calinski-Harabasz Score (128.602), indicating better-defined clusters and higher cluster separation compared to other algorithms. However, it is noted that OPTICS produced only one label, suggesting it may not be suitable for this dataset due to its inability to distinguish clusters effectively.

Conclusion

This study comprehensively analyzed various clustering algorithms for market segmentation using Mall Customer Data. Affinity Propagation emerged as the most effective algorithm based on Silhouette Score and Calinski-Harabasz Score, indicating its capability to identify distinct customer segments with significant differences in spending behavior. The findings provide valuable insights for businesses aiming to optimize marketing strategies and enhance customer targeting through effective market segmentation techniques. Future research could explore ensemble clustering methods or incorporate additional features for further refinement of customer segmentation models.

To see code: Click Here



Comparative Analysis of Advanced Clustering Algorithms for Market Segmentation - A Case Study on Online Retail

Abstract

This case study presents a comparative analysis of advanced clustering algorithms for market segmentation using the Online Retail dataset. Various clustering techniques, including K-Means, Hierarchical, DBSCAN, Gaussian Mixture Model (GMM), Mean Shift, Agglomerative, BIRCH, Spectral, OPTICS, and Affinity Propagation, were applied to the dataset to evaluate their performance based on key metrics such as the Silhouette Score, Davies-Bouldin Score, and Calinski-Harabasz Score. The results demonstrate significant differences in clustering quality, highlighting the strengths and weaknesses of each algorithm in the context of online retail market segmentation.

Introduction

Market segmentation is crucial for businesses to effectively target and serve different customer groups. Clustering algorithms provide a means to identify distinct segments within a customer base. This study focuses on the application of various clustering algorithms to the Online Retail dataset, aiming to identify the most suitable technique for market segmentation. By comparing the performance of each algorithm, we aim to provide insights into their effectiveness and applicability in real-world scenarios.

Methodology

The methodology involves the following steps:

·       Data Collection: The Online Retail dataset was obtained, containing transactions from a UK-based online retailer.

·       Data Preprocessing: The dataset was cleaned by removing rows with missing values and outliers. Feature engineering was performed to create relevant attributes for clustering.

·       Clustering Algorithms: Nine clustering algorithms were implemented and their performance was evaluated using three metrics: Silhouette Score, Davies-Bouldin Score, and Calinski-Harabasz Score.

·       Comparison: The performance metrics of each algorithm were compared to determine their effectiveness in clustering the online retail data.

Algorithms

1)    K-Means: A partitioning method that divides the dataset into K clusters based on feature similarity.

2)    Hierarchical Clustering: A method that builds a hierarchy of clusters through either agglomerative or divisive approaches.

3)    DBSCAN: A density-based clustering algorithm that identifies clusters based on the density of data points.

4)    Gaussian Mixture Model (GMM): A probabilistic model that assumes the data is generated from a mixture of several Gaussian distributions.

5)    Mean Shift: A non-parametric clustering technique that aims to find the modes of a density function.

6)    Agglomerative Clustering: A type of hierarchical clustering that merges data points into clusters based on their similarity.

7)    BIRCH: A clustering method that builds a tree structure from the data to find clusters efficiently.

8)    Spectral Clustering: A technique that uses the eigenvalues of a similarity matrix to perform dimensionality reduction before clustering.

9)    OPTICS: An algorithm similar to DBSCAN but can identify clusters with varying densities.

10) Affinity Propagation: A clustering algorithm that identifies exemplars among data points and forms clusters based on message passing.

Dataset

The Online Retail dataset contains transactions occurring between December 2010 and December 2011 for a UK-based online retailer. The dataset includes attributes such as InvoiceNo, StockCode, Description, Quantity, InvoiceDate, UnitPrice, CustomerID, and Country.

Data Preprocessing

Data preprocessing involved:

·       Removing Missing Values: Transactions with missing CustomerID were removed.

·       Outlier Detection and Removal: Outliers in the Quantity and UnitPrice fields were identified and removed.

·       Feature Engineering: A TotalPrice feature was created by multiplying Quantity by UnitPrice. RFM (Recency, Frequency, Monetary) features were computed for clustering.

Results

Algorithm

Silhouette Score

Davies-Bouldin Score

Calinski-Harabasz Score

K-Means

0.601

0.729

3074.447

Hierarchical

0.552

0.711

2631.877

DBSCAN

0.660

1.389

433.904

GMM

0.121

1.421

655.595

Mean Shift

0.409

0.371

433.865

Agglomerative

0.552

0.711

2631.877

BIRCH

0.947

0.380

1484.277

Spectral

0.506

0.534

855.436

OPTICS

-0.375

1.667

5.450

Affinity Propagation

0.186

0.615

1283.033

Table: Clustering Algorithm Performance Metrics for Online Retail

This table provides the Silhouette Score, Davies-Bouldin Score, and Calinski-Harabasz Score for each clustering algorithm applied to the online retail dataset, facilitating a comparison of their clustering performance.

The comparative analysis of clustering algorithms on the Online Retail dataset yielded varied results across different metrics, providing insights into the effectiveness of each method for market segmentation.

K-Means Clustering demonstrated robust performance with a Silhouette Score of 0.601, a Davies-Bouldin Score of 0.729, and a high Calinski-Harabasz Score of 3074.447. These results indicate that K-Means produced well-defined and distinct clusters, making it a strong contender for market segmentation.

Hierarchical Clustering achieved a Silhouette Score of 0.552, a Davies-Bouldin Score of 0.711, and a Calinski-Harabasz Score of 2631.877. While its performance was slightly below K-Means, it still produced distinct clusters, showing its potential for market segmentation.

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) excelled in identifying high-density clusters with a Silhouette Score of 0.660. However, its Davies-Bouldin Score of 1.389 and Calinski-Harabasz Score of 433.904 were less favorable, indicating that while it can identify dense regions effectively, the overall cluster compactness and separation were not as strong as K-Means or Hierarchical Clustering.

Gaussian Mixture Model (GMM) underperformed with a low Silhouette Score of 0.121, a high Davies-Bouldin Score of 1.421, and a moderate Calinski-Harabasz Score of 655.595. These results suggest that GMM struggled with overlapping clusters and did not provide clear separation between clusters.

Mean Shift Clustering produced moderate results with a Silhouette Score of 0.409, a Davies-Bouldin Score of 0.371, and a Calinski-Harabasz Score of 433.865. While its Davies-Bouldin Score was relatively low, indicating compact clusters, the overall performance was not as high as K-Means or DBSCAN.

Agglomerative Clustering, similar to Hierarchical Clustering, achieved a Silhouette Score of 0.552, a Davies-Bouldin Score of 0.711, and a Calinski-Harabasz Score of 2631.877. These identical results suggest that Agglomerative Clustering, which is a type of hierarchical clustering, produced similar cluster quality.

BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) outperformed all other algorithms with the highest Silhouette Score of 0.947 and a low Davies-Bouldin Score of 0.380. Its Calinski-Harabasz Score of 1484.277 was also favorable, indicating well-separated and compact clusters. BIRCH's exceptional performance makes it a top choice for market segmentation in this context.

Spectral Clustering showed balanced performance with a Silhouette Score of 0.506, a Davies-Bouldin Score of 0.534, and a Calinski-Harabasz Score of 855.436. While not the best, Spectral Clustering provided reasonably distinct clusters, making it a viable option.

OPTICS (Ordering Points to Identify the Clustering Structure) performed poorly with a negative Silhouette Score of -0.375, indicating incorrect clustering. Its high Davies-Bouldin Score of 1.667 and low Calinski-Harabasz Score of 5.450 further confirmed its inadequacy for this dataset.

Affinity Propagation delivered moderate results with a Silhouette Score of 0.186, a Davies-Bouldin Score of 0.615, and a Calinski-Harabasz Score of 1283.033. While its Silhouette Score was low, indicating overlapping clusters, its other scores were decent, showing potential in specific scenarios.

 

Comparison

In summary, BIRCH emerged as the best-performing algorithm, followed by K-Means and DBSCAN. Hierarchical, Agglomerative, and Spectral Clustering also provided good results. GMM, Mean Shift, OPTICS, and Affinity Propagation were less effective, with OPTICS showing particularly poor performance. These findings can guide businesses in selecting appropriate clustering algorithms for market segmentation and other analytical purposes in the online retail industry.

Conclusion

The comparative analysis of clustering algorithms for the Online Retail dataset revealed that BIRCH outperformed other methods with the highest Silhouette Score and a low Davies-Bouldin Score, indicating well-defined and compact clusters. K-Means and DBSCAN also showed good performance, making them suitable for market segmentation tasks. In contrast, OPTICS and GMM exhibited poor performance. These findings can guide businesses in selecting appropriate clustering algorithms for market segmentation and other analytical purposes.


To see code: Click Here



Comparative Analysis of Advanced Clustering Algorithms for Market Segmentation - A Case Study on Whole Customers Data

Abstract

Clustering is a pivotal technique for market segmentation, enabling businesses to categorize customers based on their purchasing behaviors. This study evaluates the performance of various clustering algorithms on a Wholesale Customers dataset using three key metrics: Silhouette Score, Davies-Bouldin Score, and Calinski-Harabasz Score. The analysis reveals that DBSCAN and BIRCH algorithms outperform others in defining compact and well-separated clusters, providing valuable insights for businesses seeking effective customer segmentation strategies.

Introduction

In the domain of wholesale business, understanding customer behavior is crucial for optimizing marketing strategies and enhancing customer satisfaction. Clustering, an unsupervised machine learning technique, helps in segmenting customers into distinct groups based on their purchasing patterns. This study aims to evaluate the effectiveness of various clustering algorithms on the Wholesale Customers dataset to identify the best method for customer segmentation.

Methodology

The methodology involves applying multiple clustering algorithms to the Wholesale Customers dataset and evaluating their performance using standard clustering metrics. This approach ensures a comprehensive analysis of each algorithm's ability to form meaningful and well-defined clusters.

Algorithms

The following clustering algorithms were evaluated:

1)    K-Means Clustering

2)    Hierarchical Clustering

3)    DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

4)    Gaussian Mixture Model (GMM)

5)    Mean Shift Clustering

6)    Agglomerative Clustering

7)    BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies)

8)    Spectral Clustering

9)    OPTICS (Ordering Points to Identify the Clustering Structure)

10) Affinity Propagation

Dataset

The dataset used for this analysis is the Wholesale Customers dataset, which includes the following attributes:

·       Fresh

·       Milk

·       Grocery

·       Frozen

·       Detergents_Paper

·       Delicassen

Data Preprocessing

Data preprocessing steps included:

·       Handling missing values by imputing or removing them.

·       Standardizing the dataset to ensure each attribute contributes equally to the clustering process.

Metrics Used

Three key metrics were used to evaluate the clustering performance:

1)    Silhouette Score: Measures the cohesion within clusters and separation between clusters.

2)    Davies-Bouldin Score: Evaluates the average similarity ratio of each cluster with the cluster most similar to it.

3)    Calinski-Harabasz Score: Assesses the ratio of the sum of between-cluster dispersion to within-cluster dispersion.

Terms Explanation

·       Silhouette Score: Ranges from -1 to 1, where a higher value indicates better-defined clusters.

·       Davies-Bouldin Score: Lower values indicate better clustering with less intra-cluster variance.

·       Calinski-Harabasz Score: Higher values indicate better-defined clusters with greater separation.

Results

Algorithm

Silhouette Score

Davies-Bouldin Score

Calinski-Harabasz Score

K-Means Clustering

0.458

1.249

132.363

Hierarchical Clustering

0.265

1.285

111.151

DBSCAN

0.803

1.126

68.548

Gaussian Mixture Model

0.316

1.471

91.252

Mean Shift Clustering

0.354

0.503

62.346

Agglomerative Clustering

0.265

1.285

111.151

BIRCH

0.526

0.667

130.222

Spectral Clustering

0.208

1.349

98.475

OPTICS

-0.407

1.561

3.371

Affinity Propagation

0.179

0.852

149.436

Table: Clustering Algorithm Performance Metrics for Whole Customers Data

K-Means Clustering resulted in a Silhouette Score of 0.458, indicating moderate cohesion and separation within the clusters. The Davies-Bouldin Score was 1.249, suggesting average similarity between clusters. The Calinski-Harabasz Score was 132.363, reflecting a reasonable ratio of between-cluster to within-cluster dispersion.

Hierarchical Clustering showed a lower performance with a Silhouette Score of 0.265, indicating less defined clusters. The Davies-Bouldin Score was 1.285, slightly higher than K-Means, indicating more similarity between clusters. The Calinski-Harabasz Score was 111.151, lower than K-Means, indicating less defined clusters.

DBSCAN outperformed many other algorithms with a Silhouette Score of 0.803, indicating well-defined clusters. However, the Davies-Bouldin Score was 1.126, showing some similarity between clusters. The Calinski-Harabasz Score was 68.548, lower than expected, suggesting that while clusters are well-defined, they are not well-separated.

GMM showed a moderate Silhouette Score of 0.316, indicating less cohesion within clusters. The Davies-Bouldin Score was 1.471, higher than other algorithms, indicating more similarity between clusters. The Calinski-Harabasz Score was 91.252, showing moderate separation between clusters.

Mean Shift Clustering showed a Silhouette Score of 0.354, indicating moderate cluster cohesion. The Davies-Bouldin Score was 0.503, the lowest among the algorithms, indicating less similarity between clusters. The Calinski-Harabasz Score was 62.346, suggesting moderate cluster separation.

Agglomerative Clustering had the same performance as Hierarchical Clustering with a Silhouette Score of 0.265, a Davies-Bouldin Score of 1.285, and a Calinski-Harabasz Score of 111.151, indicating moderate cluster definition.

BIRCH showed strong performance with a Silhouette Score of 0.526, indicating well-defined clusters. The Davies-Bouldin Score was 0.667, suggesting low similarity between clusters. The Calinski-Harabasz Score was 130.222, indicating well-separated clusters.

Spectral Clustering had a Silhouette Score of 0.208, indicating less cohesive clusters. The Davies-Bouldin Score was 1.349, showing high similarity between clusters. The Calinski-Harabasz Score was 98.475, indicating moderate separation between clusters.

OPTICS performed poorly with a Silhouette Score of -0.407, indicating incorrect clustering. The Davies-Bouldin Score was 1.561, the highest among the algorithms, indicating high similarity between clusters. The Calinski-Harabasz Score was 3.371, suggesting poor cluster separation.

Affinity Propagation showed a Silhouette Score of 0.179, indicating less defined clusters. The Davies-Bouldin Score was 0.852, indicating moderate similarity between clusters. The Calinski-Harabasz Score was 149.436, the highest among the algorithms, indicating well-separated clusters.

 Comparison

Among the evaluated clustering algorithms, DBSCAN emerged as the best-performing method for the Wholesale Customers dataset, particularly in terms of the Silhouette Score. BIRCH also demonstrated strong performance across various metrics, followed by K-Means Clustering, which provided a good balance of compactness and separation.

Hierarchical, Agglomerative, and Mean Shift Clustering provided moderate results, showing that while they could identify distinct clusters, the overall cluster quality was not as high as the top-performing algorithms. GMM, Spectral Clustering, OPTICS, and Affinity Propagation were less effective, with OPTICS showing particularly poor performance due to incorrect clustering and poor cluster separation.

Conclusion

This study highlights the effectiveness of different clustering algorithms for segmenting wholesale customers. DBSCAN and BIRCH were found to be the most effective, providing well-defined and meaningful clusters. These findings guide businesses in selecting appropriate clustering algorithms for market segmentation and other analytical purposes within the wholesale customer segment, ensuring better-targeted marketing strategies and improved customer satisfaction.

 

To see code: Click Here

Comments

Popular posts from this blog

Face Detection Based Attendance System

 Face Detection Based Attendance System Create a Main Folder named "Face Detection Based Attendance System" in VS Code.  Create a file named "add_faces.py" add_faces.py import cv2 video = cv2 . VideoCapture ( 0 ) while True :     ret , frame = video . read ()     cv2 . imshow ( "Frame" , frame )     k = cv2 . waitKey ( 1 )     if k == ord ( 'q' ):         break video . release () cv2 . destroyAllWindows () Open a new terminal and type "python add_faces.py" This will open your web camera. So, the process is getting started. Click "Q" to exit camera.  Create a Folder named "Data". In that folder, create a file named "haarcascade_frontalface_default.xml" haarcascade_frontalface_default.xml For, haarcascade_frontalface_default.xml   code   link   Click Here Now, write code in add_faces.py as, add_faces.py import cv2 video = cv2 . VideoCapture ( 0 ) facedetect = cv2 . CascadeClassifier ( 'data\...

Titanic Survival Prediction Using Logistic Regression: A Data-Driven Approach to Understand Survival Factors

Titanic Survival Prediction Using Logistic Regression: A Data-Driven Approach to Understand Survival Factors Abstract The Titanic disaster remains one of the most infamous maritime tragedies, and its dataset provides a valuable opportunity to study the factors influencing survival rates using data analysis. In this study, we employ Logistic Regression, a widely-used statistical classification algorithm, to predict the survival of passengers aboard the Titanic. Using features such as passenger class, age, gender, and other socio-economic factors, the Logistic Regression model achieved an accuracy of 78.21% on the test data. The findings suggest that gender, class, and age were significant factors affecting survival, offering insights into the predictive power of statistical modeling for classification problems. Introduction The RMS Titanic sank in the early hours of April 15, 1912, during its maiden voyage, resulting in over 1,500 deaths. Many efforts have been made to analyze the facto...