| 研究生: |
高修恒 Kao, Hsiou-Hen |
|---|---|
| 論文名稱: |
改善K-means分群方法之研究─以樣本點為基礎 A Study of Improving K-means Clustering Method- Based on Sample Points |
| 指導教授: |
溫敏杰
Wen, Miin-Jye |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 統計學系 Department of Statistics |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 英文 |
| 論文頁數: | 52 |
| 中文關鍵詞: | 群集分析 、K-means 、Relational data |
| 外文關鍵詞: | Cluster analysis, K-means, relational data |
| 相關次數: | 點閱:108 下載:13 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
我們將K-means演算法中的中心點由平均數改為在樣本點上,提出了K-exemplars演算法。K-exemplars演算法不僅可以處理原始資料,更可以處理Relational data。雖然K-exemplars的分群正確率未必比K-means好,但也不會太差。且K-exemplars的疊代次數顯著比K-means少,因此K-exemplars的收斂速度較快。例如,在Iris data中,K-means與K-exemplars之疊代次數分別為7.22和4.02,K-exemplars疊代次數減少3.2次。K-exemplars可使用在不同的距離公式,並改善了K-means易受到離群值影響的問題。
Comparing to K-means algorithm, we constrain the cluster centers on the data points rather than the mean, so we propose K-exemplars algorithm. Based on this concept, K-exemplars algorithm can not just deal with the raw data but also the relational data. Although the cluster accuracy rate of K-exemplars method may not be better than K-means method, the difference is small. But the iteration times is less than K-means method significantly. This leads the convergence rate of K-exemplars is faster than K-means. In Iris data, the iteration times of K-means and K-exemplars are 7.22 and 4.02, respectively; K-exemplars reduces 3.2 iterations. Moreover, K-exemplars can be applied on any specified dissimilarity measure. K-means is influenced by outlier, but K-exemplars improves this problem.
1. Arai, K. and Barakbah, A. R. (2007), “Hierarchical K-means: an algorithm for centroids initialization for K-means,” Reports of the Faculty of Science and Engineering, Saga University, 36 (1), pp. 25-31.
2. Chen, S. Y. (2005), “Multivariate analysis,” Fourth edition, Hwa Tai publishing, Taipei, Taiwan. (in Chinese).
3. Fisher, R. A. (1936), “The use of multiple measurements in taxonomic problems,” Annals of Eugenics, 7 (2), pp. 179-188.
4. Hwang, C. M., Yang, M. S., Hung, W. L. and Lee, M.G. (2012), “A similarity measure of intuitionistic fuzzy sets based on Sugeno integral with its application to pattern recognition,” Information Sciences, 189, pp. 93-109.
5. Jain, A. K. (2010), “Data clustering: 50 years beyond K-means,” Pattern Recognition Letters, 31, pp. 651-666.
6. Johnson, R. A. and Wichern, D. W. (2007), “Applied multivariate statistical analysis,” Sixth edition, Pearson Education, Inc., Upper Saddle River, New Jersey, USA.
7. MacLeod, N. (n. d.), “Palaeo-math 101: MDS and ordination,” Retrieved March 14, 2013, from
http://www.palass.org/modules.php?name=palaeo_math&page=20
8. MacQueen, J. B. (1967), “Some methods for classification and analysis of multivariate observations,” Proceedings of 5th Berkeley symposium on mathematical statistics and probability, University of California Press, pp. 281-297.
9. Pal, K., Pal, N. R., Keller, J. M. and Bezdek, J. C. (2005), “Relational mountain (density) clustering method and web log analysis,” International Journal of Intelligent Systems, 20, pp. 375-392.
10. Rand, W. M. (1971), “Objective criteria for the evaluation of clustering methods,” Journal of the American Statistical Association, 66 (336), pp. 846-850.
11. Saltelli, A., Tarantola, S., Campolongo, F. and Ratto, M. (2004), “Sensitivity analysis in practice,” John Wiley & Sons Ltd, Chichester, England.
12. Wu, K. L. and Lin, Y. J. (2012), “Kernelized K-means algorithm based on gaussian kernel,” Advances in Control and Communication, LNEE, 137, pp. 657-664.
13. Yang, M. S. and Shih, H. M. (2001), “Cluster analysis based on fuzzy relations,” Fuzzy Sets and Systems,120, pp.197-212.