CENTROID INITIALIZATION IN K-MEANS CLUSTERING USING GATCAM
Abstract
Clustering is one of the most widely used machine learning techniques in data processing. Clustering has a wide range of applications, including market research, pattern recognition, data analysis, and image processing, among others. The k-means algorithm is one of the most extensively used clustering algorithms, although it does not guarantee convergence to the global minimum solution because it uses randomization as its initialization of the centroid. Several studies have offered various techniques for solving the problem, including heuristic and meta-heuristic search optimization algorithms. The implementations of k-means continue to rely on random initialization; these solutions have not been successful in addressing the issue of convergence in k-means. This paper proposes GATCAM, an enhanced genetic algorithm with a two-point crossover and adaptive mutation, for centroid initialization in k-means clustering. GATCAM, the proposed approach, improved k-means accuracy by 4.2% for the wine dataset and 1.23% for the Irish dataset while also increasing the likelihood that k-means will converge to the global minimum. According to the experimental results, GATCAM k-means obtain higher accuracy with fewer iterations than standard genetic algorithm-initialized k-means (SGA).