基于MapReduce的K-means聚類(lèi)算法的優(yōu)化

首頁(yè) > 過(guò)刊瀏覽>2016年第24卷第7期 >272-275, 279

基于MapReduce的K-means聚類(lèi)算法的優(yōu)化
DOI:
                        
                    
CSTR:
                        [cstr]
                    
作者:
                        
                        
                    
作者單位:(常州大學(xué) 信息科學(xué)與工程學(xué)院,江蘇 常州 213164)
作者簡(jiǎn)介:李媛媛(1991-),女,江蘇鹽城人,碩士研究生,主要從并行計算、數據挖掘等方向的研究。 孫玉強(1956-),男,河南人,教授,碩士研究生導師,主要從事并行計算、軟件工程等方向的研究。[FQ)]
通訊作者:
中圖分類(lèi)號:
基金項目:國家自然科學(xué)基金項目(11271057,6)；江蘇省自然科學(xué)基金項目(BK2009535)。

Optimization of K-means Clustering Algorithm Based on MapReduce

Author:

Affiliation:

(School of Information Science&Engineering, ChangZhou University, Changzhou 213164,China)

Fund Project:

摘要

圖/表

訪(fǎng)問(wèn)統計

參考文獻

相似文獻

引證文獻

資源附件

文章評論

摘要:

針對傳統的聚類(lèi)算法K-means對初始中心點(diǎn)的選擇非常依賴(lài),容易產(chǎn)生局部最優(yōu)而非全局最優(yōu)的聚類(lèi)結果,同時(shí)難以滿(mǎn)足人們對海量數據進(jìn)行處理的需求等缺陷,提出了一種基于MapReduce的改進(jìn)K-means聚類(lèi)算法。該算法結合系統抽樣方法得到具有代表性的樣本集來(lái)代替海量數據集；采用密度法和最大最小距離法得到優(yōu)化的初始聚類(lèi)中心點(diǎn)；再利用Canopy算法得到粗略的聚類(lèi)以降低運算的規模；最后用順序組合MapReduce編程模型的思想實(shí)現了算法的并行化擴展,使之能夠充分利用集群的計算和存儲能力,從而適應海量數據的應用場(chǎng)景；文中對該改進(jìn)算法和傳統聚類(lèi)算法進(jìn)行了比較,比較結果證明其性能優(yōu)于后者；這表明該改進(jìn)算法降低了對初始聚類(lèi)中心的依賴(lài),提高了聚類(lèi)的準確性,減少了聚類(lèi)的迭代次數,降低了聚類(lèi)的時(shí)間,而且在處理海量數據時(shí)表現出較大的性能優(yōu)勢。

Abstract:

To deal with the problems that traditional K-means clustering algorithm is very dependent on the selection of the initial points, being prone to clustering result of local optimum rather than global optimum, and it is difficult to meet the need of dealing with massive amounts of data, an improved K-means clustering algorithm based on MapReduce is proposed. The algorithm combines systematic sampling method to get a representative sample set which is used to replace the massive data set; and uses density method and Max-Min distance method to get the optimal initial clustering centers; and adopts Canopy algorithm to get a rough clustering which can reduce the computational scale; and finally employs the idea of sequential composition of MapReduce programming model to realize the parallel extension of the algorithm, which can make full use of the computing and storage capacity of the cluster, in order to adapt to the application of massive data. The improved algorithm is compared with the traditional clustering algorithms in this paper, and the comparative results show that the performance of improved algorithm is better than the latter. The experiments show that the improved method reduces the dependence on the initial cluster centers and also reduces the number of iterations of clustering and the clustering time.Furthermore it shows greater performance advantage in dealing with massive data.

參考文獻

相似文獻

引證文獻

引用本文

孫玉強,李媛媛,陸勇.基于MapReduce的K-means聚類(lèi)算法的優(yōu)化計算機測量與控制[J].,2016,24(7):272-275, 279.

復制

文章指標

點(diǎn)擊次數:
下載次數:
HTML閱讀次數:
引用次數:

歷史

收稿日期:2016-01-19
最后修改日期:2016-02-29
錄用日期:
在線(xiàn)發(fā)布日期: 2016-08-09
出版日期:

国产欧美精品一区二区,中文字幕专区在线亚洲,国产精品美女网站在线观看,艾秋果冻传媒2021精品,在线免费一区二区,久久久久久青草大香综合精品,日韩美aaa特级毛片,欧美成人精品午夜免费影视

引用本文

分享

文章指標

歷史

文章二維碼