Download PDFOpen PDF in browserOptimization and Parallelization of Fuzzy Clustering Algorithm Based on the Improved Kmeans++ ClusteringEasyChair Preprint 19317 pages•Date: November 11, 2019AbstractFuzzy clustering algorithm is one of the most widely used clustering algorithm in the field of big data. Although the fuzzy c-means (FCM) algorithm performs well, it still has some problems like sensitive to initial clustering center and difficult to determine the number of clusters. To solve these problems, we put forward an improved fuzzy clustering algorithm based on kmeans++ algorithm. The improved algorithm optimized the kmeans++ algorithm with the Canopy algorithm, integrated the L2 norm, and parallelized based on spark. Experimental result shows that the improved algorithm performs better on clustering accuracy and computational performance. Keyphrases: Canopy, Fuzzy C-Means, Kmeans++, Parallelization, Spark
|