簡易檢索 / 詳目顯示

研究生: 姜仁傑
Chiang, Jen-Chieh
論文名稱: 針對支持向量群聚演算法發展具雜訊偵測之群聚驗證
A Development of Cluster Validity Measure with Outlier Detection for Support Vector Clustering
指導教授: 王振興
Wang, Jeen-Shing
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2005
畢業學年度: 93
語文別: 英文
論文頁數: 50
中文關鍵詞: 支持向量群聚演算法群聚驗證
外文關鍵詞: Support Vector Clustering, Cluster Validity Measure
相關次數: 點閱:69下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   本論文針對支持向量群聚演算法發展具雜訊偵測之群聚驗證,其最主要貢獻為在沒有任何先驗資訊的條件下,能夠經由群聚驗證的過程鑑別出資料的群聚架構及最佳參數。支持向量群聚演算法是使用核心函數為基礎的方法,在拉式函數(Lagrangian functions)中,核心函數之參數與鬆弛邊界常數在群聚結果中扮演著極重要的角色。在沒有先驗資訊的條件下,利用研究所提出之群聚緊密度與分離度的比率及雜訊偵測與群聚合併的機制,群聚驗證可以發展出能自動決定核心函數之參數與鬆弛邊界常數的值。利用這些參數,支持向量群聚演算法能鑑別出具有緊密且平滑的任意形狀輪廓的最佳群聚及其個數,並且亦能增加對雜訊的強韌度。經由不同範例的電腦模擬結果,其中包含人造、IRIS分類以及米粒影像的資料,我們證明所提出的針對支持向量群聚演算法之群聚驗證其有效性。

     This study focuses on the development of cluster validity measure with outlier detection for support vector clustering (SVC). The major contribution of this work is the capability of the proposed validity measure in identifying the cluster configuration and optimal parameters through a cluster validity process without a priori knowledge regarding the given data sets. Since SVC is a kernel-based clustering approach, the parameter of kernel functions and the soft-margin constants in Lagrangian functions play a crucial role in clustering results. Without a priori knowledge of the data sets, a validity measure based on a ratio of cluster compactness to separation with outlier detection and a cluster merging mechanism have been developed to automatically determine suitable parameters of the kernel functions and soft-margin constants as well. Using those parameters, the SVC algorithm is capable of identifying the optimal cluster number with compact and smooth arbitrary-shaped cluster contours and increased robustness to outliers and noise. Several simulations, including artificial data sets, the IRIS classification, and the rice image data have been conducted to demonstrate the effectiveness of the proposed cluster validity measure for SVC algorithms.

    CHINESE ABSTRACT i ABSTRACT ii LIST OF TABLES v LIST OF FIGURES vi 1 Introduction 1-1 1.1 Motivation 1-1 1.2 Literature Survey 1-3 1.3 Purpose of the Study 1-5 1.4 Organization of the Thesis 1-6 2 Support Vector Clustering Algorithm and Cluster Validity Measure 2-1 2.1 Support Vector Clustering Algorithm 2-1 2.1.1 Cluster Boundaries 2-1 2.1.2 Cluster Assignment 2-6 2.1.3 Summary of Support Vector Clustering Algorithm 2-7 2.2 Cluster Validity Measure 2-8 2.2.1 Fundamental Concepts of Cluster Validity Measure 2-9 2.2.2 Existing Cluster Validity Measures 2-11 2.2.3 Summary of Cluster Validity Measure 2-15 3 Cluster Validity Measure with Outlier Detection 3-1 3.1 A Cluster Validity Measure for SVC Algorithm 3-2 3.2 Outliers Detection 3-4 3.3 Cluster Merging Mechanism 3-7 3.4 Implementation Strategy 3-10 4 Simulation Results 4-1 4.1 Artificial Examples 4-1 4.1.1 BENSAID Data Set 4-1 4.1.2 Five-Cluster Data Set 4-3 4.1.3 Random Noise Data Set 4-5 4.1.4 Crescent Data Set 4-7 4.2 The IRIS Data 4-9 4.3 The Rice Image Data 4-11 5 Conclusion and Future Work 5-1 5.1 Conclusion 5-1 5.2 Future Work 5-1 References

    [1] J. C. Bezdek, J. Keller, R. Krisnapuram, and N. R. Pal, Fuzzy Models and Algorithms for Pattern Recognition and Image Processing, Boston: Kluwer Academic, 1999.
    [2] T. A. Runkler and J. C. Bezdek, “Alternating cluster estimation: a new tool for clustering and function approximation,” IEEE Trans. on Fuzzy Systems, vol. 7, no. 4, pp. 377-393, 1999.
    [3] D. E. Gustafson and W. C. Kessel, “Fuzzy clustering with a fuzzy covariance matrix,” Proceedings of IEEE Conference on Decision Control, pp. 761-766, 1979.
    [4] U. Kaymak and M. Setnes, “Fuzzy clustering with volume prototypes and adaptive cluster merging,” IEEE Trans. on Fuzzy Systems, vol. 10, no. 6, pp. 705-712, Dec. 2002.
    [5] J. B. McQueen, “Some methods for classification and analysis of multivariate observations,” Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281-297, 1967.
    [6] A. Ben-Hur, D. Horn, H. T. Siegelmann, and V. Vapnik, “A support vector clustering method,” Proceedings of International Conference on Pattern Recognition, vol. 2, pp. 724-727, 2000.
    [7] A. Ben-Hur, D. Horn, H. T. Siegelmann, and V. Vapnik, “Support vector clustering,” Journal of Machine Learning Research, vol. 2, pp. 125-137, 2001.
    [8] B. Boser, I. Guyon, and V. Vapnik, “A training algorithm for optimal margin classifiers,” Proceedings of the Fifth Annual Workshop on Computational Learning Theory, vol. 5, pp. 144-152, 1992.
    [9] C. Cortes and V. Vapnik, “Support-vector network,” Machine Learning, vol. 20, pp. 273-297, 1995.
    [10] A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, New Jersey, 1988.
    [11] J. H. Chiang and P. Y. Hao, “A new kernel-based fuzzy clustering approach: Support vector clustering with cell growing,” IEEE Trans. on Fuzzy Systems, vol. 11, no. 4, pp. 518-527, 2003.
    [12] W. S. Chung, T. B. Trafalis, and L. Gruenwald, “Support vector clustering for web usage mining,” Intelligent Engineering Systems Through Artificial Neural Networks, vol. 12, pp. 385-390, 2002.
    [13] C. Garcia and J. A. Moreno, “Application of support vector clustering to the visualization of medical images,” Proceedings of the IEEE International Symposium on Biomedical Imaging, pp. 1553-1556, 2004.
    [14] B. Y. Sun and D. S. Huang, “Support vector clustering for multiclass classification problems,” The Congress on Evolutionary Computation, vol. 2, pp. 1480-1485, 2003.
    [15] T. Ban and S. Abe, “Spatially chunking support vector clustering algorithm,” Proceedings of the 2004 International Joint Conference on Neural Networks, vol. 1, pp.413-418, 2004.
    [16] D. Tax and R. Duin, “Support vector domain description,” Pattern Recognition Letters, vol. 20, pp. 1991-1999, 1999.
    [17] B. Schölkopf, J. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson, “Estimating the support of a high-dimensional distribution,” Neural Computation, vol. 13, pp. 1443-1471, 2001.
    [18] J. C. Bezdek, “Numerical taxonomy with fuzzy sets,” Journal of Math. Biology, vol. 1, pp. 57-71, 1974.
    [19] X. L. Xie and G. Beni, “A validity measure for fuzzy clustering,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 13, no. 8, pp. 841-847, 1991.
    [20] H. S. Rhee and K. W. Oh, “A validity measure for fuzzy clustering and its use in selecting optimal number of clusters,” Proceedings of IEEE International Conference on Fuzzy Systems, vol. 2, pp. 1020-1025, 1996.
    [21] J. C. Dunn, “A fuzzy relative of the ISODATA process and its use in detecting compact well-separation clusters,” Journal of Cybernetics, vol. 3, pp. 32-57, 1974.
    [22] D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 1, no. 2, pp. 224-227, 1979.
    [23] H. Joguchi and M. Tanaka, “On the support vector machine with the kernel of the q-normal distribution,” Proceedings of the 2002 International Technical Conference on Circuits/Systems, Computers and Communications, vol. 2, pp. 983-986, 2002.
    [24] J. Yang, V. E. Castro, and S. K. Chalup, “Support vector clustering through proximity graph modelling,” Proceedings of 9th International Conference on Neural Information Processing, vol. 2, pp. 898-903, 2002.
    [25] V. Estivill-Castro and I. Lee, “Amoeba: Hierarchical clustering based on spatial proximity using delaunay diagram,” Proceedings of the 9th International Symposium on Spatial Data Handling, pp. 26-41, 2000.
    [26] V. Estivill-Castro, I. Lee, and A. T. Murray, “Criteria on proximity graphs for boundary extraction and spatial clustering,” Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 348-357, 2001.
    [27] M. Halkidi, Y. Batistakis, and M. Vazirgiannis, “Cluster validity methods: Part I,” ACM SIGMOD Record, vo1. 31, pp. 40-45, 2002.
    [28] S. Theodoridis and K. Koutroubas, Pattern Recognition, Academic Press, 1999.
    [29] M. J. A. Berry and G. Linoff, Data Mining Techniques for Marketing, Sales and Customer Support, John Wiley & Sons, Inc., USA, 1996.
    [30] J. C. Bezdek, “Numerical taxonomy with fuzzy sets,” Journal of Math. Biology, vol. 1 pp. 57-71, 1974.
    [31] J. C. Bezdek, “Cluster validity with fuzzy sets,” Journal of Cybernetics, vol. 3, pp. 58-72, 1974.
    [32] Y. Fukuyama and M. Sugeno, “A new method of choosing the number of clusters for the fuzzy c-means method,” Proceedings of Fifth Fuzzy Systems Symposium, pp. 247-250, 1989.
    [33] S. H. Kwon, “Cluster validity index for fuzzy clustering,” Electronics Letters, vol. 34, no. 22, pp. 2176-2177, 1998.
    [34] P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, vol. 20, no. 1, pp. 53-65, 1987.
    [35] N. Bolshakova and F. Azuaje, “Cluster validation techniques for genome expression data,” Signal Processing, vol. 83, no. 4, pp. 825-833, 2003.
    [36] D. W. Kim, K. H. Lee, and D. Lee, “Fuzzy cluster validation index based on inter-cluster proximity,” Pattern Recognition Letters, vol. 24, no. 15, pp. 2561-2574, 2003.
    [37] M. Halkidi, Y. Batistakis, and M. Vazirgiannis, “On clustering validation techniques,” Journal of Intelligent Information Systems, vol. 17, no. 2-3, pp. 107-145, 2001.
    [38] J. H. Yang and I. Lee, “Cluster validity through graph-based boundary analysis,” The 2004 International Conference on Information and Knowledge Engineering, pp. 204-210, 2004.
    [39] C. W. Hsu, C. C. Chang, and C. J. Lin. (2003, July). A practical guide to support vector classification. Available: http://www.csie.ntu.edu.tw/~cjlin/papers.html.
    [40] H. T. Lin and C. J. Lin. (2003, March). A study on sigmoid kernels for SVM and the training of non-PSD Kernels by SMO-type methods. Available: http://www.csie.ntu.edu.tw/~cjlin/papers.html.
    [41] C. C. Chang and C. J. Lin. (2001). LIBSVM: A library for support vector machines. Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html.
    [42] J. C. Chiang and J. S. Wang, “A validity-guided support vector clustering algorithm for identification of optimal cluster configuration,” Proceedings of 2004 IEEE International Conference on Systems, Man & Cybernetics, pp. 3613-3618, 2004.
    [43] C. H. Chou, M. C. Su, and E. Lai, “A new cluster validity measure for clusters with different densities,” IASTED International Conference Intelligent Systems and Control, pp. 276-281, 2003.
    [44] J. C. Bezdek and N. R. Pal, “Some new indexes of cluster validity,” IEEE Trans. on Systems, Man, and Cybernetics—Part B: Cybernetics, vol. 28, no. 3, pp. 301-315, 1998.
    [45] A. M. Bensaid, L. O. Hall, J. C. Bezdek, L. P. Clarke, M. L. Silbiger, J. A. Arrington, and R. F. Murtagh, “Validity-guided (re)clustering with applications to image segmentation,” IEEE Trans. on Fuzzy Systems, vol. 4, no. 2, pp. 112-123, 1996.
    [46] R. Fisher, “The use of multiple measurements in taxonomic problems,” Ann. Eugenics, vol. 7, no. 2, pp. 179-188, 1936.

    下載圖示 校內:2006-08-26公開
    校外:2006-08-26公開
    QR CODE