成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	方茜 Fang, Qian
論文名稱：	有初始值篩選的核函數K中心聚類法 Kernel K Medoids Algorithm with Selected Initial Values
指導教授：	溫敏杰 Wen, Miin-Jye
學位類別：	碩士 Master
系所名稱：	管理學院 - 統計學系 Department of Statistics
論文出版年：	2017
畢業學年度：	105
語文別：	英文
論文頁數：	36
中文關鍵詞：	核函數、K-中心點、初始值
外文關鍵詞：	Kernel function, K medoids, Initialization
相關次數：	點閱：123 下載：12
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本研究所提出的聚類方法將高斯核函數與 k-中心聚類法結合起來,與此同時,還加入了利用變量 Vj (Park and Jun, 2009) 來對資料進行排序並篩選出 r 個中間值作為我們的初始中心點。初始值的篩選讓聚類分析過程更加高效,而高斯核函數的加入可以讓我們的聚類方法比較不容易受異常值和干擾數據的影響。為了評估我們所提出來的方法,我們分析了一些真實數據,合成數據以及關聯數據,並用 ARI (Adjusted Rand Index), F1 score 和 MSE (Mean Squared Error) 這些指標進行結果評估, 將其與 k-平均值 (k means) 聚類法,k 中心點 (k medoids) 聚類法的分群評估結果進行比較。評估結果顯示,本文所提出的分群方法與 k 平均值 (k means) 聚類法,k-中心點 (kmedoids) 聚類法相比,有更好的分群效果。

This study proposes a clustering algorithm that combine gaussian kernel function with k medoids clustering algorithm. In the meanwhile, we use a variable called Vj (Park and Jun, 2009) to rank objects and select the r middle values as our initial centers. The selection of initial values makes the clustering process more efficient, and the combination of gaussian kernel function makes the clustering outcome more resistant to outliers or noises. To evaluate the proposed algorithm, we analyze some real, synthetic and relational datasets and compar- ing with the results of other algorithms in terms of the Adjusted Rand Index, F1 score and Mean Squared Error. The outcomes show that our proposed algorithm having better cluster- ing performance over the other mentioned algorithms (k means, k medoids) in this study.

摘要 i
Abstract ii
致謝 iii
Table of Contents iv
List of Tables vi
List of Figures vii
Chapter 1. Introduction 1
Chapter 2. Background 3
                  K Means Algorithm and KMedoidsAlgorithm . . . . . . . . . . . . . . . . . 3
                  . K means .................................. 3 
                  . K medoids................................. 5
                  Kernel Function.................................. 6 
                  Initialization.................................... 8 
                  RelationalData .................................. 9   
                  Different Dissimilarity Measures......................... 9
                  . Distance-based dissimilarity measures................... 9 
                  . Correlation-based dissimilarity measures . . . . . . . . . . . . . . . . . 11
Chapter 3. Kernel k medoids algorithm with selected initial values 13
                  TheProposedMethod............................... 13 
                  SensitivityCurve ................................. 15
Chapter 4. Numerical Experiments 18
                  Measures of Results................................ 18 
                  . Adjusted rand index ............................ 18     
                  . F1 score................................... 18 
                  . Mean squared error............................. 19
                  Conventional Data ................................ 19 
                  . Data description .............................. 19 
                  . Clustering outcomes ............................ 19
                  Synthetic Data................................... 20 
                  . Synthetic data formed by skewed distributions . . . . . . . . . . . . . . 20
                  . Synthetic data with outlier or noise .................... 22          
                  RelationalData .................................. 24
Chapter 5. Conclusion 26
Bibliography 27
                                    

[1] Agrawal,K.P.andGarg,S.,Patel,P. Performance Measures for Densed and Arbitrary Shaped Clusters. CS-Journals, Vol 6, pp. 388-350, 2015.
[2] Chang, C. C. and Lin C. J . Training ν-support vector classifiers: Theory and algorithms. Neural Computation, 13(9):2119–2147, 2001.
[3] Duda, R., Hart P. and Stork, D. Pattern Classification, seconded. John Wiley and Sons, New York.
[4] Hubert, L. and Arabie, P. Comparing partitions. Journal of Classification, 2, 193– 218, 1985.
[5] Jain, A. K. Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31, 651–666, 2010.
[6] Kaufman, L. and Rousseeuw, P. Finding Groups in Data: An Introduction To Cluster Analysis. John Wiley, New York., ISBN: 0-471-87876-6, 1990.
[7] Lance, G. L. and Williams, W. T . A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems John Wiley and Sons, New York.
[8] MacQueen, J. Some methods for classification and analysis of multivariate observa- tions. Fifth Berkeley Symposium on Mathematics, Statistics and Probability, University of California Press, pp. 281–297, 1967.
[9] Mei, J. P. and Chen, L. Fuzzy clustering with weighted medoids for relational data. Pattern Recognition, 43, 1964–1974, 2010.
[10] Park, H.S. and Jun, C.H . A simple and fast algorithm for K-medoids clustering. Expert Systems with Applications, 36, 3336–3341, 2009.
[11] Saltelli, A., Tarantola, S., Campolongo, F. and Ratto, M. Sensitivity Analysis in Prac- tice, a Guide to Assessing Scientific Models. New York: Wiley, 2004.
[12] Wu, K. L. and Lin, Y. J. Kernelized K-Means Algorithm Based on Gaussian Kernel. Advances in Control and Communication, pp 657-664, 2012.

2018-06-01公開

簡易檢索 / 詳目顯示

相關論文