UNSUPERVISED K-MEANS CLUSTERING ALGORITHM
Keywords:
Clustering, K-means, number of clusters, initializations, unsupervised learning schema, Unsupervised k-means (U-k-means).Abstract
The k-means algorithm is generally the most known and used clustering method. There are various extensions of k-means to be proposed in the literature. Although it is an unsupervised learning to clustering in pattern recognition and machine learning, the k-means algorithm and its extensions are always influenced by initializations with a necessary number of clusters a priori. That is, the k-means algorithm is not exactly an unsupervised clustering method. In this paper, we construct an unsupervised learning schema for the k-means algorithm so that it is free of initializations without parameter selection and can also simultaneously find an optimal number of clusters. That is, we propose a novel unsupervised k-means (U- k-means) clustering algorithm with automatically finding an optimal number of clusters without giving any initialization and parameter selection. The computational complexity of the proposed U-k-means clustering algorithm is also analyzed. Comparisons between the proposed U-k-means and other existing methods are made. Experimental results and comparisons actually demonstrate these good aspects of the proposed U-k- means clustering algorithm.
References
A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, Englewood Cliffs, NJ, USA: Prentice-Hall, 1988.
L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. New York, NY, USA: Wiley, 1990.
G. J. McLachlan and K. E. Basford, Mixture Models: Inference and Applications to Clustering. New York, NY, USA: Marcel Dekker, 1988.
A. P. Dempster, N. M. Laird, and D. B. Rubin, ‘‘Maximum likelihood from incomplete data via the EM algorithm (with discussion),’’ J. Roy. Stat. Soc., Ser. B, Methodol., vol. 39, no. 1, pp. 1–38, 1977.
J. Yu, C. Chaomurilige, and M.-S. Yang, ‘‘On convergence and parameter selection of the EM and DA-EM algorithms for Gaussian mixtures,’’ Pattern Recognit., vol. 77, pp. 188–203, May 2018.
A. K. Jain, ‘‘Data clustering: 50 years beyond K-means,’’ Pattern Recognit. Lett., vol. 31, no. 8, pp. 651–666, Jun. 2010.
M.-S. Yang, S.-J. Chang-Chien, and Y. Nataliani, ‘‘A fully-unsupervised possibilistic C-Means clustering algorithm,’’ IEEE Access, vol. 6, pp. 78308–78320, 2018.
J. MacQueen, ‘‘Some methods for classification and analysis of multivariate observations,’’ in Proc. 5th Berkeley Symp. Math. Statist. Probab., vol. 1, 1967, pp. 281–297.
M. Alhawarat and M. Hegazi, ‘‘Revisiting K-Means and topic modeling, a comparison study to cluster arabic documents,’’ IEEE Access, vol. 6, pp. 42740–42749, 2018.
Y. Meng, J. Liang, F. Cao, and Y. He, ‘‘A new distance with derivative information for functional k-means clustering algorithm,’’ Inf. Sci., vols. 463–464, pp. 166–185, Oct. 2018.
Z. Lv, T. Liu, C. Shi, J. A. Benediktsson, and H. Du, ‘‘Novel land cover change detection method based on k-Means clustering and adaptive majority voting using bitemporal remote sensing images,’’ IEEE Access, vol. 7, pp. 34425–34437, 2019. [12] J. Zhu, Z. Jiang, G. D. Evangelidis, C. Zhang, S. Pang, and Z. Li, ‘‘Efficient registration of multi-view point sets by K-means clustering,’’ Inf. Sci., vol. 488, pp. 205–218, Jul. 2019.