簡易檢索 / 詳目顯示

研究生: 方茜
Fang, Qian
論文名稱: 有初始值篩選的核函數K中心聚類法
Kernel K Medoids Algorithm with Selected Initial Values
指導教授: 溫敏杰
Wen, Miin-Jye
學位類別: 碩士
Master
系所名稱: 管理學院 - 統計學系
Department of Statistics
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 36
中文關鍵詞: 核函數K-中心點初始值
外文關鍵詞: Kernel function, K medoids, Initialization
相關次數: 點閱:68下載:12
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究所提出的聚類方法將高斯核函數與 k-中心聚類法結合起來,與此同時,還 加入了利用變量 Vj (Park and Jun, 2009) 來對資料進行排序並篩選出 r 個中間值作為 我們的初始中心點。初始值的篩選讓聚類分析過程更加高效,而高斯核函數的加 入可以讓我們的聚類方法比較不容易受異常值和干擾數據的影響。為了評估我們 所提出來的方法,我們分析了一些真實數據,合成數據以及關聯數據,並用 ARI (Adjusted Rand Index), F1 score 和 MSE (Mean Squared Error) 這些指標進行結果評估, 將其與 k-平均值 (k means) 聚類法,k 中心點 (k medoids) 聚類法的分群評估結果進行 比較。評估結果顯示,本文所提出的分群方法與 k 平均值 (k means) 聚類法,k-中心 點 (kmedoids) 聚類法相比,有更好的分群效果。

    This study proposes a clustering algorithm that combine gaussian kernel function with k medoids clustering algorithm. In the meanwhile, we use a variable called Vj (Park and Jun, 2009) to rank objects and select the r middle values as our initial centers. The selection of initial values makes the clustering process more efficient, and the combination of gaussian kernel function makes the clustering outcome more resistant to outliers or noises. To evaluate the proposed algorithm, we analyze some real, synthetic and relational datasets and compar- ing with the results of other algorithms in terms of the Adjusted Rand Index, F1 score and Mean Squared Error. The outcomes show that our proposed algorithm having better cluster- ing performance over the other mentioned algorithms (k means, k medoids) in this study.

    摘要 i Abstract ii 致謝 iii Table of Contents iv List of Tables vi List of Figures vii Chapter 1. Introduction 1 Chapter 2. Background 3 K Means Algorithm and KMedoidsAlgorithm . . . . . . . . . . . . . . . . . 3 . K means .................................. 3 . K medoids................................. 5 Kernel Function.................................. 6 Initialization.................................... 8 RelationalData .................................. 9 Different Dissimilarity Measures......................... 9 . Distance-based dissimilarity measures................... 9 . Correlation-based dissimilarity measures . . . . . . . . . . . . . . . . . 11 Chapter 3. Kernel k medoids algorithm with selected initial values 13 TheProposedMethod............................... 13 SensitivityCurve ................................. 15 Chapter 4. Numerical Experiments 18 Measures of Results................................ 18 . Adjusted rand index ............................ 18 . F1 score................................... 18 . Mean squared error............................. 19 Conventional Data ................................ 19 . Data description .............................. 19 . Clustering outcomes ............................ 19 Synthetic Data................................... 20 . Synthetic data formed by skewed distributions . . . . . . . . . . . . . . 20 . Synthetic data with outlier or noise .................... 22 RelationalData .................................. 24 Chapter 5. Conclusion 26 Bibliography 27

    [1] Agrawal,K.P.andGarg,S.,Patel,P. Performance Measures for Densed and Arbitrary Shaped Clusters. CS-Journals, Vol 6, pp. 388-350, 2015.
    [2] Chang, C. C. and Lin C. J . Training ν-support vector classifiers: Theory and algorithms. Neural Computation, 13(9):2119–2147, 2001.
    [3] Duda, R., Hart P. and Stork, D. Pattern Classification, seconded. John Wiley and Sons, New York.
    [4] Hubert, L. and Arabie, P. Comparing partitions. Journal of Classification, 2, 193– 218, 1985.
    [5] Jain, A. K. Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31, 651–666, 2010.
    [6] Kaufman, L. and Rousseeuw, P. Finding Groups in Data: An Introduction To Cluster Analysis. John Wiley, New York., ISBN: 0-471-87876-6, 1990.
    [7] Lance, G. L. and Williams, W. T . A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems John Wiley and Sons, New York.
    [8] MacQueen, J. Some methods for classification and analysis of multivariate observa- tions. Fifth Berkeley Symposium on Mathematics, Statistics and Probability, University of California Press, pp. 281–297, 1967.
    [9] Mei, J. P. and Chen, L. Fuzzy clustering with weighted medoids for relational data. Pattern Recognition, 43, 1964–1974, 2010.
    [10] Park, H.S. and Jun, C.H . A simple and fast algorithm for K-medoids clustering. Expert Systems with Applications, 36, 3336–3341, 2009.
    [11] Saltelli, A., Tarantola, S., Campolongo, F. and Ratto, M. Sensitivity Analysis in Prac- tice, a Guide to Assessing Scientific Models. New York: Wiley, 2004.
    [12] Wu, K. L. and Lin, Y. J. Kernelized K-Means Algorithm Based on Gaussian Kernel. Advances in Control and Communication, pp 657-664, 2012.

    下載圖示 校內:2018-06-01公開
    校外:2018-06-01公開
    QR CODE