簡易檢索 / 詳目顯示

研究生: 高修恒
Kao, Hsiou-Hen
論文名稱: 改善K-means分群方法之研究─以樣本點為基礎
A Study of Improving K-means Clustering Method- Based on Sample Points
指導教授: 溫敏杰
Wen, Miin-Jye
學位類別: 碩士
Master
系所名稱: 管理學院 - 統計學系
Department of Statistics
論文出版年: 2013
畢業學年度: 101
語文別: 英文
論文頁數: 52
中文關鍵詞: 群集分析K-meansRelational data
外文關鍵詞: Cluster analysis, K-means, relational data
相關次數: 點閱:108下載:13
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 我們將K-means演算法中的中心點由平均數改為在樣本點上,提出了K-exemplars演算法。K-exemplars演算法不僅可以處理原始資料,更可以處理Relational data。雖然K-exemplars的分群正確率未必比K-means好,但也不會太差。且K-exemplars的疊代次數顯著比K-means少,因此K-exemplars的收斂速度較快。例如,在Iris data中,K-means與K-exemplars之疊代次數分別為7.22和4.02,K-exemplars疊代次數減少3.2次。K-exemplars可使用在不同的距離公式,並改善了K-means易受到離群值影響的問題。

    Comparing to K-means algorithm, we constrain the cluster centers on the data points rather than the mean, so we propose K-exemplars algorithm. Based on this concept, K-exemplars algorithm can not just deal with the raw data but also the relational data. Although the cluster accuracy rate of K-exemplars method may not be better than K-means method, the difference is small. But the iteration times is less than K-means method significantly. This leads the convergence rate of K-exemplars is faster than K-means. In Iris data, the iteration times of K-means and K-exemplars are 7.22 and 4.02, respectively; K-exemplars reduces 3.2 iterations. Moreover, K-exemplars can be applied on any specified dissimilarity measure. K-means is influenced by outlier, but K-exemplars improves this problem.

    Contents................................................I List of Tables.........................................II List of Figures.......................................III Chapter 1 Introduction..................................1 1.1 Research Background and Research Motivations........1 1.2 Research Objectives.................................2 1.3 Research Structure..................................2 Chapter 2 Literature Review.............................4 2.1 K-means Algorithm...................................4 2.2 Relational Data.....................................5 Chapter 3 Research Methodology..........................7 3.1 K-exemplars Algorithm...............................7 3.2 K-exemplars Algorithm- II...........................9 Chapter 4 Numerical Examples...........................12 4.1 Clustering by Raw Data.............................12 4.2 Clustering with Outlier or Noise Data..............16 4.3 Clustering by Relational Data......................22 4.4 Clustering by K-exemplars- II......................31 Chapter 5 Conclusions and Future Studies...............32 References.............................................34 Appendix (1): R code- K-means and K-exemplars..........36 Appendix (2): R code- K-exemplars- II..................43 Appendix (3): The Uniform-16 data set..................45

    1. Arai, K. and Barakbah, A. R. (2007), “Hierarchical K-means: an algorithm for centroids initialization for K-means,” Reports of the Faculty of Science and Engineering, Saga University, 36 (1), pp. 25-31.
    2. Chen, S. Y. (2005), “Multivariate analysis,” Fourth edition, Hwa Tai publishing, Taipei, Taiwan. (in Chinese).
    3. Fisher, R. A. (1936), “The use of multiple measurements in taxonomic problems,” Annals of Eugenics, 7 (2), pp. 179-188.
    4. Hwang, C. M., Yang, M. S., Hung, W. L. and Lee, M.G. (2012), “A similarity measure of intuitionistic fuzzy sets based on Sugeno integral with its application to pattern recognition,” Information Sciences, 189, pp. 93-109.
    5. Jain, A. K. (2010), “Data clustering: 50 years beyond K-means,” Pattern Recognition Letters, 31, pp. 651-666.
    6. Johnson, R. A. and Wichern, D. W. (2007), “Applied multivariate statistical analysis,” Sixth edition, Pearson Education, Inc., Upper Saddle River, New Jersey, USA.
    7. MacLeod, N. (n. d.), “Palaeo-math 101: MDS and ordination,” Retrieved March 14, 2013, from
    http://www.palass.org/modules.php?name=palaeo_math&page=20
    8. MacQueen, J. B. (1967), “Some methods for classification and analysis of multivariate observations,” Proceedings of 5th Berkeley symposium on mathematical statistics and probability, University of California Press, pp. 281-297.
    9. Pal, K., Pal, N. R., Keller, J. M. and Bezdek, J. C. (2005), “Relational mountain (density) clustering method and web log analysis,” International Journal of Intelligent Systems, 20, pp. 375-392.
    10. Rand, W. M. (1971), “Objective criteria for the evaluation of clustering methods,” Journal of the American Statistical Association, 66 (336), pp. 846-850.
    11. Saltelli, A., Tarantola, S., Campolongo, F. and Ratto, M. (2004), “Sensitivity analysis in practice,” John Wiley & Sons Ltd, Chichester, England.
    12. Wu, K. L. and Lin, Y. J. (2012), “Kernelized K-means algorithm based on gaussian kernel,” Advances in Control and Communication, LNEE, 137, pp. 657-664.
    13. Yang, M. S. and Shih, H. M. (2001), “Cluster analysis based on fuzzy relations,” Fuzzy Sets and Systems,120, pp.197-212.

    下載圖示 校內:2015-07-02公開
    校外:2015-07-02公開
    QR CODE