| 研究生: |
鄭國隆 Tseng, Kuo-Lung |
|---|---|
| 論文名稱: |
探勘負向對比集演算法之建立 |
| 指導教授: |
翁慈宗
Wong, Tzu-Tsung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 工業與資訊管理學系 Department of Industrial and Information Management |
| 論文出版年: | 2004 |
| 畢業學年度: | 92 |
| 語文別: | 中文 |
| 論文頁數: | 57 |
| 中文關鍵詞: | 多重檢定 、對比集 、負向關聯性法則 、關聯性法則 |
| 相關次數: | 點閱:38 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在現實的社會中,不同的群組間在某些特質上會具有明顯的差異,譬如說在大學裡頭,學生可以區分為理工學院、文學院、管理學院三類群組,不同學院的學生彼此之間存在相當多的差異,但要確實指出所謂的不同之處,通常對我們來說是相當不容易,因此造成人們多半只能憑藉自身的主觀印象來看待這類問題,無法說出完整正確的答案。為了解決這類問題,Bay與 Pazzani提出探勘對比集的概念,藉由資料探勘中的關聯性法則模式來找尋存在於不同群組間具有的差異,這些差異就是由一群有著高度依存關係的屬性所組成的集合,也是對比集定義的由來。不過,根據Bay與 Pazzani提出的STUCCO演算法所探勘出的結果,只能提供資料所隱含的部分資訊,造成決策者不能依據完整的資訊來制訂決策,因此如何自資料去挖掘出其他潛藏的資訊,來識別不同群組間的差異,是本研究所要探究的問題。
本研究結合對比集的基本概念、負向關聯性的定義及多重檢定的方法,來找出不具高度依存關係的屬性所組成的集合,稱之為負向對比集,這些負向對比集可以提供決策者一些額外的訊息。整個探勘負向對比集演算法的建置過程中,遭遇不少問題,包括改善著名的Apriori演算法在計算各種屬性組合支持度的效率、透過機率論相關基本概念解決各個負向關係組合的支持度計算,非一昧採用掃瞄資料檔的低效率方式來完成;另外還發現一項重要的性質,幫助我們在喜好度篩選步驟能更為簡化;最後再利用Holm所提出的方法修正STUCCO在多重檢定針對個別檢定的型一誤差設定,以避免檢定力不足的問題。
中文
張洲期 (2000),“應用資料挖掘技術於產品替代性及互補性之研究”,中正大學資訊管理學系碩士班碩士論文。
彭文正譯 (2001), 資料採礦:顧客關係管理暨電子行銷之應用,臺北市/數博網資訊出版。麥可.斐瑞 (Michael J. A. Berry), 戈登.林諾夫 (Gordon S. Linoff) 原著。
英文
Agrawal, R., Imielinski, T., and Swami, A. (1993), Mining Association Rules Between Sets of Items in Large Databases, Proceedings of the ACM SIGMOD Conference on Management of Data, 207-216.
Agrawal, R. and Srikant, R. (1994), Fast Algorithms for Mining Association Rules, Proceedings of International Conference on Very Large Data Bases, 487-499.
Bay, S. D. and Pazzani, M. J. (1999), Detecting Change in Categorical Data: Mining Contrast Sets, Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 302 – 306.
Brin, S., Motwani, R., and Silverstein C. (1997), Beyond Market Baskets:Generalizing Association Rules to Correlations, Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, 265-276.
Delgado, M., Sánchez, D., Martín-Bautista, M. J., and Vila, M. (2001), Mining Association Rules with Improved Semantics in Medical Databases, Artificial Intelligence in Medicine, 21:241-245.
Han, J. and Fu, Y. (1995), Discovery of Multiple-level Association Rules from Large Databases, Proceedings of the Very Large Data Bases Conference, 420-431.
Han, J., Pei, J., and Yin, Y. (2000), Mining Frequent Patterns without Candidate Generation, Proceedings of the ACM SIGMOD International Conference on Management of Data, 1-12.
Han, J., Koperski, K., and Stefanovic, N. (1997), GeoMiner: A System Prototype for Spatial Data Mining, Proceedings of the ACM-SIGMOD International Conference on Management, 553-556.
Holm, S. (1979), A Simple Sequentially Rejective Multiple Test Procedure, Scandinavian Journal of Statistics, 6: 65 -70.
Liu, H., Lu, H., Feng, L., and Hussain, F. (1999), Efficient Search of Reliable Exceptions, Proceedings of The Third Pacific Asia Conference on Knowledge Discovery and Data Mining, 194-204.
Piatetsky-Shapiro, G. (1991), Discovery, Analysis, and Presentation of Strong Rules, Knowledge Discovery in Databases, Menlo Park,CA:AAAI Press, 229-248.
Pramudiono, I., Shintani, T., Takahashi, K., and Kitsuregawa, M. (2002), User Behavior Analysis of Location Aware Search Engine, Mobile Data Management, January 08 - 11: 139-145.
Quinlan, J.R. (1993), C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA.
Savasere, A., Omiecinski, E., and Navathe, S. (1998), Mining for Strong Negative Associations in a Large Database of Customer Transactions, Proceedings of the International Conference on Data Engineering, 494-502.
Shaffer, J.P. (1995), Multiple Hypothesis Testing, Annual Review Psychology, 46:561-584.
Webb, G..I. (2000), Efficient Search for Association Rules, Proceedings of International Conference on Knowledge Discovery and Data Mining, 99-107.
Webb, G. I., Butler, S., and Newlands, D. (2003), On Detecting Differences Between Groups, citeseer.nj.nec.com/576632.html.
Witten, I.H. and Frank, E. (2000), Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, San Francisco.
Zhang, C. and Zhang, S. (2002), Association Rule Mining: Models and Algorithms, Springer-Verlag, Berlin.