| 研究生: |
林正航 Lin, Cheng-Hang |
|---|---|
| 論文名稱: |
兩層式與樹狀式自組織映射圖聚合叢集績效之研究 A Comparative Study of Agglomerative Clustering Capability in Two-Layer SOM and TreeSOM |
| 指導教授: |
李昇暾
Li, Sheng-Tun |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2007 |
| 畢業學年度: | 95 |
| 語文別: | 中文 |
| 論文頁數: | 97 |
| 中文關鍵詞: | 多層式自組織映射圖 、兩層式自組織映射圖 、樹狀式自組織映射圖 、蒙地卡羅模擬法 、叢集聚合 |
| 外文關鍵詞: | Monte Carlo simulation, agglomerative clustering, TreeSOM, Two-Layer SOM, Multilayer SOM |
| 相關次數: | 點閱:76 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著知識經濟時代的來臨,全球化發展使資訊量以指數型態爆炸性成長。在面臨資訊過載(Information overloading)的環境挑戰之下,自組織映射圖(Self-Organizing Map)神經網路是人工智慧發展上使用廣泛的非監督學習之聚類型分群方法,擅長將高維度資料作投影到低維度平面。大拓樸的自組織映射圖具有群內資料分散較緊密的優點,以及經由叢集聚合(Agglomerative clustering)方法可將叢集由下到上(Bottom-Up)凝聚成個別使用者需求之群數。然而,自組織映射圖在叢集聚合方法的發展上並非獨尊儒術,使得叢集聚合方法的採用對於效能良窳影響及資料的適用問題,儼然成為一項重要的研究議題。
有鑑於此,本研究從多層式自組織映射圖(Multilayer SOM)發展叢集聚合應用之特例:兩層式自組織映射圖(Two-Layer SOM)。並且,將研究焦點專注於兩層式自組織映射圖和樹狀式自組織映射圖(TreeSOM)兩種方法的叢集聚合績效評比。其中,本實證研究的測試資料集是採用真實資料集,以及運用蒙地卡羅模擬法(Monte Carlo simulation)所產生之模擬資料集。
實驗結果顯示,樹狀式自組織映射圖之叢集聚合對於大部分測試資料集,在分群的群集性、內部離散比率、分群精確度、分群召回率方面效能指標平均數表現良於兩層式自組織映射圖,經由統計檢定結果顯示兩者的平均數具有顯著差異。另一方面,兩層式自組織映射圖的叢集聚合之效能指標,僅於測試資料集為2個屬性或分群結果為2群之下,在分群的群集性、內部離散比率、分群精確度、分群純度的平均數表現良於樹狀式自組織映射圖,並且兩者的平均數具有顯著差異。
In the forthcoming era of knowledge economics, the amount of information has been growing exponentially. To the challenge of information overloading, the self-organizing map (SOM) artificial neural network is an unsupervised learning method which has been widely used for clustering. It is capable of mapping high-dimensional data onto a low-dimensional grid such that similar data elements are placed close together. A large size of SOM has the advantage of smaller internal cluster dispersion than a small size of SOM. Furthermore, after SOM uses agglomerative clustering method, the clustering number of individual user demand can be presented. However, the agglomerative clustering method used on SOM is not the only one. Therefore, a central issue of research in the area is of SOM used on agglomerative clustering method.
In the paper we present a comparison between two Agglomerative clustering methods including TreeSOM and Two-Layer SOM, which is a special case of Multilayer SOM. Simultaneously, this study aims to address this problem by estimating their agglomerative clustering preferences. In data collection, we employed real-world data and artificial data generated using Monte Carlo simulation, which were simulated considering correlated and uncorrelated variables, non-overlapping and overlapping clusters with and without outliers.
The results showed that TreeSOM had a very good performance in most of the cases. On the other hand, Two-Layer SOM did not perform well in almost any cases because it was very affected by the number of variables and clusters. For the most part, the average performances of TreeSOM were better than those of Two-Layer SOM, and the differences were statistically significant.
Bernataviciene, J., Dzemyda, G. & Kurasova, O. Optimal decisions in combining the SOM with nonlinear projection methods. European Journal of Operational Research, 729-745. (2006).
Bladh, T., Carr, D., & Scholl, J. Extending Tree-Maps to Three Dimensions: A Comparative Study. Proceedings of the 6th Asia Pacific Conference on Computer Human Interaction, 50-60. (2004).
Blake, C., & Merz, C. UCI repository of machine learning databases. URL http://www.ics.uci.edu/˜mlearn/MLRepository.html. (1998).
Dittenbach, M., Rauber, A. & Merkl, D. Uncovering hierarchical structure in data using the growing hierarchical self-organizing map. Neurocomputing 48. 199-216. (2002).
Everitt, B. S. Cluster Analysis. John Wiley & Sons, New York. (2001).
Fisher, R. A. The use of multiple measurements in taxonomic problems, Annals of Eugenics, 7, 179–184. (1936).
Fritzke, B. Growing Grid - A Self-Organizing Network with Constant Neighborhood Range and Adaptation Strength. Neural Processing Letters, 2, 9-13. (1995).
Himberg, J. A SOM based cluster visualization and its application for false coloring. In IEEE-INNS-ENNS international joint conference on neural networks. 3587–3592. (2000).
Horton, P., & Nakai, K. A probablistic classification system for predicting the cellular localization sites of proteins. In D. J. States, P. Agarwal, T. Gaasterland, L. Hunter, & R. Smith (Eds.), Proceedings of the fourth international conference on intelligent systems for molecular biology, 109–115. (1996).
Kohonen, T. Self-Organized Formation of Topologically Correct Feature Maps. Biological Cybernetics, 43(1), 9-69. (1982).
Kohonen, T., Kaski, S., Lagus, K., Salojarvi, J., Honkela, J., Paatero, V., Saarela, A. Self Organization of a Massive Document Collection. Neural Networks, IEEE Transactions on , 11(3), 574-585. (2000).
Lampinen, J. & Oja, E. Clustering properties of hierarchical self-organizing maps. Journal of Mathematical Imaging and Vision, 261-272. (1992).
Milligan, G.. W. & Cooper, M. C. An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika 45 , 159–179. (1980).
Milligan, G. W. A Monte Carlo study of thirty internal criterion measures for cluster analysis. Psychometrika 46. 187-199. (1981).
Milligan, G.. W. An algorithm for generating artificial test clusters. Psychometrika 50 , 123-127. (1985).
Milligan, G. W. and Cooper, M. C. An examination of procedures for determining the number of clusters in a data set. Psychometrika 50. 159-179. (1985).
Mingoti, S. A. & Lima, J. O. Comparing SOM neural network with Fuzzy c-means, K-means and traditional hierarchical clustering algorithms. Journal of European Operational Research, 1742-1759. (2006).
Nash, W. J., Sellers, T. L., Talbot, S. R., Cawthorn, A. J., & Ford, W. B. The population biology of abalone (Haliotis species) in Tasmania. I. Blacklip abalone (H. rubra) from the North coast and islands of Bass Strait. Sea Fisheries Division, Hobart, Tasmania 7001, Australia, Technical report 48. (1994).
Ong, T. -H., Chen, H., Sung, W. -K., & Zhu B. Newsmap: a knowledge map for online news. Decision Support Systems, 39, 583-597. (2005).
Pullwitt, D. Integrating contextual information to enhance SOM-based text document clustering. Neural Networks, 1099-1106. (2002).
Rauber, A., Merkl, D., & Dittenbach, M. The Growing Hierarchical Self-Organizing Map: Exploratory Analysis of High-Dimensional Data. Neural Networks, IEEE Transactions on, 13(6), 1331–1341. (2002).
Rousseeuw, P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 53-65. (1987).
Roussinov, D., & Chen, H. Document clustering for electronic meetings: An experimental comparison of two techniques. Decision Support Systems, 27(1-2), 67-79. (1999).
Rubinov, A. M., Soukhorokova, N. V. & Ugon, J. Classes and clusters in data analysis. European Journal of Operational Research, 849-865. (2006).
Samsonova, E. V., Bäck, T., Kok, J. N. & IJzerman A.P. Reliable Hierarchical Clustering with the Self-Organizing Map, Advances in Intelligent Data Analysis VI: 6th International Symposium on Intelligent Data Analysis. 385-396. (2005).
Samsonova, E. V., Kok, J. N., & Scholl, J. TreeSOM: Cluster analysis in the self-organizing map. Neural Networks 19, 935-949. (2006).
Simula, O., Vesanto, J., & Vasara, P. Analysis of Industrial Systems using the Self-Organizing Map. Proceeding of Knowledge-Based Intelligent Electronic Systems, Adelaide Australia, 1, 21-23. (1998).
Shalash, W. M. & Abou-Chadi, F. A fingerprint classification technique using multilayer SOM. In Proceedings of the Seventeenth National Radio Science Conference. 17th NRSC’2000. Minufiya Univ, Minufiya,Egypt. (2000).
Vesanto, J. & Alhoniemi, E. Clustering of the Self-Organizing Map. Neural Networks, IEEE Transactions on, 11(3), 586-600. (2000).
Xiang, Y., Chau, M., Atabakhsh, H. & Cheng, H. Visualizing criminal relationships: comparison of a hyperbolic tree and a hierarchical list. Decision Support Systems, 41, 69-83. (2005).