Optimizing Customer Segmentation Using Multi-Metric Internal Validation and Boxplot Analysis
DOI:
https://doi.org/10.71302/jbidai.v8i1.67Keywords:
Boxplot, Customer Segmentation, Internal Validation, Inter Quartile Range, K-Means ClusteringAbstract
The simultaneous use of multiple internal validation metrics to determine the optimal number of clusters in K-Means Clustering often results in differing K values, which can confuse data practitioners when extracting insights, such as identifying customer characteristics. This study aims to develop an evaluation framework to address the ambiguity arising from varying K values produced by different internal validation metrics. The proposed K evaluation framework consists of two stages. In the first stage, five internal validation metrics—Davies-Bouldin Index (DBI), Silhouette Score, Elbow Method, Dunn Index, and Calinski-Harabasz Index—are used as filters to generate up to five top K candidates. The second stage involves boxplot analysis, interquartile range (IQR), and elbow visualization to explore the cohesiveness and stability of the resulting clusters. The first-stage evaluation yielded four potential cluster counts: K = 2, 5, 7, and 10. In the second stage, based on the elbow graph of the average interquartile range, K = 5 was identified as the most optimal number of clusters compared to the other candidates. These results indicate that using a larger number of internal validation metrics may increase the likelihood of producing multiple K values. However, a higher number of clusters does not necessarily guarantee better quality. The implications of this research highlight the importance of a layered evaluation approach in determining the optimal number of clusters, especially when employing multiple internal validation metrics. The proposed framework can assist data practitioners in making more informed decisions and reducing ambiguity in the clustering process. In the future, this framework can be extended by incorporating external validation metrics or adapted to other clustering algorithms.
References
[1] M. Alves Gomes and T. Meisen, “A review on customer segmentation methods for personalized customer targeting in e-commerce use cases,” Inf Syst E-Bus Manage, vol. 21, no. 3, pp. 527–570, Sep. 2023, doi: 10.1007/s10257-023-00640-4.
[2] R. W. B. S. Berahmana, F. A. Mohammed, and K. Chairuang, “Customer Segmentation Based on RFM Model Using K-Means, K-Medoids, and DBSCAN Methods,” LKJITI, vol. 11, no. 1, p. 32, Apr. 2020, doi: 10.24843/LKJITI.2020.v11.i01.p04.
[3] N. A. S. Z. Abidin, R. D. Avila, A. Hermatyar, and R. Rismayani, “Perbandingan Algoritma K-Means dan K-Medoids untuk Pengelompokan Daerah Produksi Kakao,” Jurnal Teknik Informatika dan Sistem Informasi, vol. 8, no. 2, Art. no. 2, Aug. 2022, doi: 10.28932/jutisi.v8i2.4897.
[4] A. A. Aldino, D. Darwis, A. T. Prastowo, and C. Sujana, “Implementation of K-Means Algorithm for Clustering Corn Planting Feasibility Area in South Lampung Regency,” J. Phys.: Conf. Ser., vol. 1751, no. 1, p. 012038, Jan. 2021, doi: 10.1088/1742-6596/1751/1/012038.
[5] R. Fitriyanto and M. Ardi, “Feature Selection Comparative Performance for Unsupervised Learning on Categorical Dataset,” Journal of Computing and Information Technology, vol. 22, no. 1, pp. 61–69, 2025.
[6] R. Fitriyanto and U. Syafiqoh, “Multilevel Modal Value Analysis for Interpreting Categorical K-Medoids Clusters Data,” techno, vol. 21, no. 2, pp. 134–143, Sep. 2024, doi: 10.33480/techno.v21i2.5796.
[7] N. A. Maori and E. Evanita, “Metode Elbow dalam Optimasi Jumlah Cluster pada K-Means Clustering,” Simetris J. Teknik Mesin, Elektro dan Ilmu Komput., vol. 14, no. 2, pp. 277–288, Nov. 2023, doi: 10.24176/simet.v14i2.9630.
[8] A. Winarta and W. J. Kurniawan, “Optimasi Cluster K-Means Menggunakan Metode Elbow Pada Data Pengguna Narkoba Dengan Pemrograman Python,” JTIK, vol. 5, no. 1, pp. 113–119, Jan. 2021, doi: 10.59697/jtik.v5i1.593.
[9] A. Azzahra and A. W. Wijayanto, “Comparison of Agglomerative Hierarchical and K-Means in Grouping Provinces Based on Maternal Health Services,” SISTEMASI, vol. 11, no. 2, p. 481, May 2022, doi: 10.32520/stmsi.v11i2.1829.
[10] S. Monalisa, “Klusterisasi Customer Lifetime Value dengan Model LRFM menggunakan Algoritma K-Means,” JTIIK, vol. 5, no. 2, pp. 247–252, May 2018, doi: 10.25126/jtiik.201852690.
[11] A. M. Sikana and A. W. Wijayanto, “Analisis Perbandingan Pengelompokan Indeks Pembangunan Manusia Indonesia Tahun 2019 dengan Metode Partitioning dan Hierarchical Clustering,” jik, vol. 14, no. 2, p. 66, Sep. 2021, doi: 10.24843/JIK.2021.v14.i02.p01.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.







