簡易檢索 / 詳目顯示

研究生: 劉冠妤
Liu, Guan-Yu
論文名稱: 導入概念階層觀念以改善分群演算法之績效
Introduce concept hierarchy to improve the results of clustering algorithm
指導教授: 葉榮懋
Yeh, Rong-Mao
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2004
畢業學年度: 92
語文別: 中文
論文頁數: 60
中文關鍵詞: 分群概念階層PAM遺傳演算法
外文關鍵詞: clustering, concept hierarchy, PAM, genetic algorithm
相關次數: 點閱:92下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   資料分群(Data Clustering)常被視為在做資料挖掘(Data Mining)時的一個初步動作,特別是在具有大量及高維度的資料集合中。資料經由良好的分群,將可發掘原本隱藏於資料中之有用資訊,以後續用來幫助企業做問題解決與決策制訂。其中切割式分群演算法在搜尋分群最佳解的作法上,當面對龐大的資料量時,常需要耗費大量的時間成本,且無法自動產生適合的分群數目,必須由使用者於事前給定,而這通常是最困難的部分。同時,當資料的一些基礎屬性描述空間(Description Spaces)無法充分表示該維度的複雜性時,則此演算法可能也會得到不良的分群結果。因此,本研究將根據以上問題,以切割式演算法中之PAM演算法為主,提出合併解決方案。針對PAM分群演算法,結合啟發式演算法與概念階層(Concept Hierarchy)中屬性層級爬升之觀念,以在分群過程中找到適合的群聚數目,並改善該演算法需要大量時間成本於最佳解搜尋上之缺點,最後使分群結果更具有意義及品質更好。

      Usually, data clustering is used to be a preliminary step in data mining, especially in the mass and multiple dimensions dataset. After appropriate clustering, useful information can be found in the hidden data. This information can support the enterprise to do problem-solving and decision-making. When the data is mass, using partition clustering algorithm in searching optimal clustering often take a lot of time and cannot generate the appropriate cluster number. The partition clustering algorithm need user to set the initial cluster number which is usually the most difficult part in clustering. Furthermore, when the data description spaces cannot describe the complexity of the data dimensions sufficiently, the algorithm may result in a poor clustering. According to the above description, this research proposes a solution based on PAM algorithm. By combining the heuristic algorithm and the concept of attribute level climbing, the algorithm can decrease the spending time of searching optimal solution and find the appropriate cluster number. Finally, it leads the clustering result more comprehensible and better.

    摘要............................................. ..I ABSTRACT...........................................II 致謝............................................. III 目錄................................................V 表目錄............................................VII 圖目錄...........................................VIII 第一章 緒論......................................1 第一節、研究動機....................................1 第二節、研究目的....................................2 第三節、研究範圍與限制..............................3 第四節、研究流程....................................3 第五節、研究架構....................................5 第二章 文獻探討..................................7 第一節、分群演算法..................................7 第二節、遺傳演算法.................................12 第三節、概念階層...................................17 第四節、小結.......................................21 第三章 研究方法.................................22 第一節、問題定義...................................22 第二節、相關知識...................................24 第三節、遺傳演算法設計.............................29 第四節、改良後之分群演算法.........................33 第四章 實例驗證與評估...........................34 第一節、資料集合介紹與概念階層定義.................34 第二節、演算法實作.................................38 第三節、資料測試與驗證評估.........................41 第四節、實例探討...................................45 第五章 結論與建議...............................53 第一節 結論.......................................53 第二節 未來研究方向及建議.........................54 參考文獻...........................................55 附表...............................................58

    英文文獻
    Alsabti K., Ranka S. and Singh V. (1998), “An Efficient K-means Clustering Algorithm,” Proc. First Workshop on High-Performance Data Mining.

    Anderberg M. R. (1973), Cluster Analysis for Application, Academic Press, New York.

    Bandyopadhyay S. and Maulik U. (2002a), “An evolutionary technique based on K-Means algorithm for optimal clustering in RN,” Information Sciences, Vol. 146, pp. 221-237.

    Bandyopadhyay S. and Maulik U. (2002b), “Genetic Clustering for Automatic Evolution of Clusters and Application to Image Classification,” Pattern Recognition, Vol. 35, pp. 1197-1208.

    Bradley P. S. and Fayyad U. M. (1998), “Refining Initial Points for K-means Clustering,” In J. Shavlik, editor, Proceedings of the Fifteenth International Conference on Machine Learning, pp. 91-99.

    Goldberg D. E. (1989), Genetic Algorithm in Search, Optimization and Machine Learning, Addison-Wesley, MA.

    Hamilton H. J., Hilderman R. J. and Cercone N. (1996), “Attribute-oriented induction using domain generalization graphs,” In Proceedings of the Eighth IEEE International Conference on Tools with Artificial Intelligence, pp. 246-253.

    Han J., Cai Y. and Cercone N. (1992), “Knowledge Discovery in Database: An Attribute-oriented Approach,” Proceedings of 18th International Conference on Very Large Data Bases, pp. 547-559.

    Han J. and Fu Y. (1994), “Dynamic Generation and Refinement of Concept Hierarchies for Knowledge Discovery in Database,” AAAI’94 Workshop on Knowledge Discovery in Database, pp. 157-168.

    Han J. and Fu Y. (1995), “Discovery of Multi-level Association Rules From Large Databases,” Proceedings of the International Conference on Very Large Databases, pp. 420-431.

    Han J. and Kamber M. (2001), Data Mining: Concepts and Techniques, Morgan Kaufmann, USA.

    Kantardzic M. (2003), Data Mining – Concepts, Models, Methods and Algorithms, Wiley – Interscience, USA.

    Kaufman L. and Rousseeuw P. J. (1990), Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley & Sons, New York.

    Krishna K. and Murty M. N. (1999), “Genetic K-Means Algorithm,” IEEE
    Transactions on Systems, Man, and Cybernetic-Part B: Cybernetics, Vol. 29, No. 3, pp. 433-439.

    Maulik U. and Bandyopadhyay S. (2000), “Genetic Algorithm Based Clustering Technique,” Pattern Recognition, Vol. 33, pp.1455-1465.

    Murthy C. A. and Chowdhury N. (1996), “In Search of Optimal Clusters Using Genetic Algorithms,” Pattern Recognition Letters, Vol. 17, pp.825-832.

    Ng R. T. and Han J. (1994), “Efficient and Effective Clustering Methods for Spatial Data Mining,” Proceedings of International Conference on Very Large Data Base, pp. 144-155.

    Ralambondrainy H. (1995), “A Conceptual Version of The K-means Algorithm,” Pattern Recognition Letters, Vol. 16, pp.1147-1157.

    Sarkar M., Yegnanarayana B. and Khemani D. (1997), “A Clustering Algorithm Using An Evolutionary Programming-based Approach,” Pattern Recognition Letters, Vol. 18, pp. 975-986.

    Selim S. Z. and Alsultan K. (1991), “A Simulated Annealing Algorithm for The Clustering Problem,” Pattern Recognition, Vol. 24, No. 10, pp.1003-1008.

    Spath H. (1989), Cluster Analysis Algorithms, Ellis Horwood, Chichester, UK.

    Srikanth R., George R. and Warsi N. (1995), “A Variable-length Genetic Algorithm for Clustering and Classification,” Pattern Recognition Letters, Vol. 16, pp. 789-800.

    Tou J. T. and Gonzalez R. C. (1974), Pattern Recognition Principles, Addison-Wesley, Canada.

    Tsai C. F., Chen Z. C. and Tsai C. W. (2002), “MSGKA: An Efficient Clustering Algorithm for Large Database,” Systems, Man and Cybernetics, 2002 IEEE International Conference on , Vol. 5, pp. 6-9.

    Tseng L. Y. and Yang S. B. (2000), “A Genetic Clustering Algorithm for Data with Non-Spherical-Shape Clusters,” Pattern Recognition, Vol. 33, pp.1251-1259.

    Wei C., Hu P., Kung L. M. and Tan J. (2000), “A Multiple-Level Clustering Analysis for Health Data Mining,” International Journal of Management Theory and Practice, Vol. 1, No. 1, pp.79-100.

    中文文獻
    國家衛生研究院(民92),「全民健康保險研究資料庫—學術研究類專用譯碼簿」,國家衛生研究院,頁1-15-1-17,2-67-2-68。

    李世代(民89),「全民健康保險慢性疾病建議代碼」,中央健康保險局委託計畫。

    網站
    全民健康保險研究資料庫網站,http://www.nhri.org.tw/nhird/

    下載圖示 校內:2005-06-18公開
    校外:2007-06-18公開
    QR CODE