Presenting a Method based on Genetic Algorithm for finding the most Stable Clusters in Ensemble Clustering
Clustering is one of the fundamental tools in data analysis and data mining, enabling the extraction of hidden and meaningful structures from large datasets by grouping data based on intrinsic similarities. However, selecting optimal clusters in conventional clustering algorithms poses challenges, especially when clusters are dense or heterogeneous. In this study, a novel genetic algorithm-based method is proposed to identify the most stable clusters in ensemble clustering. By leveraging cluster stability criteria and a correlation matrix, the proposed approach improves the accuracy and stability of the final clustering results. The proposed method involves generating initial partitions of the data using six different clustering algorithms. Next, the Fisher criterion is applied to identify more stable clusters. These selected clusters are then evaluated and optimized using a genetic algorithm to construct an optimized correlation matrix. This matrix is subsequently fed into a hierarchical clustering algorithm, which produces the final consensus clustering. The proposed method was tested on standard datasets. Results demonstrated improvements of 12% and 5% in NMI and ARI metrics, respectively, compared to previous methods. The use of a genetic algorithm enabled the identification of clusters with higher stability and diversity, reducing the impact of noise and increasing the accuracy of the final clustering. Moreover, the method outperformed individual base clustering algorithms in providing more precise clustering results. Due to its ability to enhance the accuracy and stability of clustering, the proposed method holds potential for applications in domains such as big data analysis, machine learning, and information retrieval. The use of the Fisher criterion for selecting stable clusters and genetic algorithms for optimization are among the strengths of this research. This method not only preserves diversity among clusters but also significantly enhances clustering accuracy. Future studies could explore the combination of this approach with more advanced algorithms to assess its applicability to more complex datasets.