| 研究生: |
廖彥誠 Liao, Yen-Cheng |
|---|---|
| 論文名稱: |
於大型資料庫中探勘高度價值之有趣負向關聯法則之研究 A study on mining interesting negative association rules in large databases |
| 指導教授: |
翁慈宗
Wong, Tzu-Tsung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2006 |
| 畢業學年度: | 94 |
| 語文別: | 中文 |
| 論文頁數: | 57 |
| 中文關鍵詞: | 負向關聯法則 、搜尋樹 、搜尋空間修剪 、有趣度 、大型資料庫 |
| 外文關鍵詞: | interestingness, large databases, negative association rules, search space pruning, search tree |
| 相關次數: | 點閱:82 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在過去已有許多針對關聯法則 (association rule) 的研究,但大多僅針對正向關聯法則的推導部份做改進,對於負向關聯法則 (negative association rule) 則較少著墨;負向關聯法則指的是相對於正向關聯法則的觀念,當兩個不同資料項目的發生具有一定程度的相依性時,而此相依性是一個發生導致另一個的不發生,就稱其為負向關聯法則。而負向關聯法則依其所可能具有的形式,為只要不是滿足正向關聯法則條件之所有其他者,因此,凡挖掘正向關聯法則中所可能產生的問題,在負向關聯法則的探勘裡皆有可能發生。本研究首先將試著以OPUS演算法將整個搜尋空間以一個搜尋樹的方式架構出來,透過此法所架構出的搜尋空間,可事先將不符合使用者限制條件的資料項目集及其延伸的資料項目集動態的移除,同時此搜尋樹可使得往後的法則搜尋過程能更有效率,接著以KORD(K-Optimal-Rule-Discovery)演算法利用使用者所設定之最小有趣度 (minimal interest)以及所要搜尋到的法則數目,來當作主要的法則篩選限制條件,在搜尋時即將過大的搜尋空間進行刪除的動作,來減少所要搜尋的範圍並增加演算法執行的效率。透過實證資料檔呈現出的結果,正負向關聯法則在利用有趣度作為法則的主要篩選條件時,利用修剪規則對不可能成為我們最終解的狀況進行排除,的確可以更有效率的辨識出有趣的法則,透過此讓使用者自行設定某些限制條件的方法,可以過濾出真正符合使用者需求的規則。
Most of the studies about mining association rules were focused on discovering positive association rules. However, negative rules are also very useful in association analysis. Let X and Y be any two disjoint item sets. Negative rules can be in either one of the following three forms : X→┐Y, ┐X→Y, and ┐X→┐Y. So, all of the problems that we have encountered in mining positive association rules will happen in mining negative ones. This thesis proposes an approach to revise the KORD (k-Optimal-Rule-Discovery) algorithm for mining negative association rules. Some pruning rules are developed to speed up the mining procedure. Based on the minimum support, the OPUS search algorithm is used to generate search trees that can help us filter out candidate item sets efficiently. Then the minimum interest is the threshold in finding interesting negative association rules. Our method is applied on a real insurance data set to demonstrate its efficiency in mining interesting negative association rules.
中文部份
王仁傑 (民89)。有效找出較長資料項目型樣的關聯法則之研究。未出版之碩士論文,逢甲大學資訊工程學研究所,台中市
鄭國隆 (民93)。探勘負向對比集演算法之建立。未出版之碩士論文,國立成功大學工業與資訊管理學研究所,台南市
英文部份
Agrawal, R. and Srikant, R. (1994), Fast algorithms for mining association rules, Proceedings of International Conference on Very Large Data Bases, Santiago, Chile, 487-499.
Brin, S., Motwani, R., and Silverstein, C. (1997b), Beyond market baskets: Generalizing association rules to correlations, Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, 265-276.
Brin, S., Motwani, R., Unman, J. D., and Tsur, S. (1997a), Dynamic itemset counting and implication rules for market basket data, Proceedings of ACM SIGMOD Conference on Management of Data, 255-264.
Fayyad, M. U. (1996), Data mining and knowledge discovery: Making sense out of data, IEEE Expert, 11(5), 20-25.
Ganti, V., Gehrke, J., and Ramakrishnan, R. (1999), Mining very large databases, IEEE Computer, 32, 38-45.
Hamano, S., and Sato, M. (2004), Mining indirect association rules, Advances in Data Mining: Applications in Image Mining, Medicine and Biotechnology, Management and Environmental Control, and Telecommunications, Lecture Notes in Computer Science, 3275, Springer-Verlag, 106-115.
Liu, H., Lu, H., Feng, L., and Hussain, F. (1999), Efficient search of reliable exceptions, Proceedings of The Third Pacific Asia Conference on Knowledge Discovery and Data Mining, 194-204.
Park, J. S., Chen, M. S., and Philip, S.Y. (1995), An effective hash-based algorithm for mining association rules, Proceedings of the ACM SIGMOD Conference on Management of Data, 175-186.
Savasere, A., Omiecinski, E., and Navathe, S. (1995), An efficient algorithm for mining association rules in large databases, Proceedings of International Conference on Very large Data Bases, Zurich, Switzerland, 432-444.
Savasere, A., Omiecinski, E., and Navathe, S. (1998), Mining for strong negative associations in a large database of customer transactions, Proceedings of the International Conference on Data Engineering, 494-502.
Toivonen, H. (1996), Sampling large databases for association rules, Proceedings of International Conference on Very Large Data Bases, Mumbai (Bombay), India, 134-145.
Tan, P. N., Kumar, V., and Srivastava J. (2000), Indirect association: Mining higher order dependencies in data, Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, 632-637.
Tan, P. N., and Kumar, V. (2001), Mining indirect associations in web data, Lecture Notes in Computer Science, 2356, 145-166.
Webb, G. I. (1995), OPUS: An efficient admissible algorithm for unordered search, Journal of Artificial Intelligence Research, 3, 431-465.
Webb, G. I. (2000), Efficient search for association rules, Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, 99-107.
Webb, G. I. and Zhang, S. (2005), K-Optimal-Rule-Discovery, Data Mining and Knowledge Discovery, 10(1), 39-79.
Witten, I. H. and Frank, E. (2000), Data Mining: Practical Machine Learning tools and Techniques with Java Implementations, Morgan Kaufmann, San Francisco.
Wu, X., Chengqi Z., and Shichao Z. (2004), Efficient mining of both positive and negative association rules, ACM Transactions on Information Systems, 22(3), 281-405.
Yuan, X., Buckles, B. P., Yuan, Z., and Zhang J. (2002), Mining negative association rules, Proceedings of the Seventh International Symposium on Computers and Communications (ISCC’02), 623-628.
Zhang, C. and Zhang, S. (2002), Association Rule Mining: Models and Algorithms, Springer-Verlag, Berlin.