Download PDFOpen PDF in browser

Optimization and Parallelization of Fuzzy Clustering Algorithm Based on the Improved Kmeans++ Clustering

EasyChair Preprint 1931

7 pagesDate: November 11, 2019

Abstract

Fuzzy clustering algorithm is one of the most widely used clustering algorithm in the field of big data. Although the fuzzy c-means (FCM) algorithm performs well, it still has some problems like sensitive to initial clustering center and difficult to determine the number of clusters. To solve these problems, we put forward an improved fuzzy clustering algorithm based on kmeans++ algorithm. The improved algorithm optimized the kmeans++ algorithm with the Canopy algorithm, integrated the L2 norm, and parallelized based on spark. Experimental result shows that the improved algorithm performs better on clustering accuracy and computational performance.

Keyphrases: Canopy, Fuzzy C-Means, Kmeans++, Parallelization, Spark

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:1931,
  author    = {Wenjuan Cheng and Yu Ma},
  title     = {Optimization and Parallelization of Fuzzy Clustering Algorithm Based on the Improved Kmeans++ Clustering},
  howpublished = {EasyChair Preprint 1931},
  year      = {EasyChair, 2019}}
Download PDFOpen PDF in browser