| 研究生: |
林友民 Lin, Yeou-Min |
|---|---|
| 論文名稱: |
一個以矩陣為基礎的分割叢集演算法應用於非數值性資料 MBA – A Matrix-Based Partitioning Clustering Algorithm for Non-Numeric Data |
| 指導教授: |
焦惠津
Jiau, Hewi-Jin Christine |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2003 |
| 畢業學年度: | 91 |
| 語文別: | 英文 |
| 論文頁數: | 59 |
| 中文關鍵詞: | 資料探勘 、叢集演算法 |
| 外文關鍵詞: | data mining, clustering algorithm |
| 相關次數: | 點閱:125 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Data clustering has attracted a lot of attention in the field of data mining and computational statistics. Many methods have been developed for clustering objects according to their similarity.
However, most of the traditional clustering algorithms that use distances between points for clustering are not appropriate for non-numeric attributes such as nominal, boolean and categorical data. Because the data type of these specific applications is non-numeric(non-metric), thus it is not easy to map attribute tuples to vectors. And sometimes we would lose certain useful information embedded in data attributes with using traditional distance-based clustering algorithms directly.
This thesis proposes and explores two simple algorithms more
suitable than distance-based clustering algorithms for non-numeric data clustering. The Cosine measure for document clustering triggers the idea of our similarity measure. This similarity measure can easily reflect the ordinal information of data variables of sequential data. It means that our similarity measure shows whether the ``quality of consecution" is good or not. In addition, some issues about similarity matrix (SM) partitioning and Union-Find-like methods are discussed in detail.
[1] G. Karypis,E.-H. Han, and V. Kumar,“Chameleon:Hierarchical Clustering Using
Dynamic Modeling, ” Computer, pp. 68–75, Aug. 1999.
[2] L. Talavera and J. Bejar,“Generality-Based Conceptual Clustering with Probabilistic
Concepts, ” IEEE Trans on Patter Analysis and Machine Intelligence, vol. 23,
pp. 196–206, Feb. 2001.
[3] T. Zhang, R. Ramakrishnan, and M. Livny,“BIRCH: An Efficient Data Clustering
Method for Very Large Databases, ” In Proceedings of the ACM SIGMOD International
Conference on Management of Data, pp. 196–206, Jun. 1996.
[4] Y. Xie and V.V. Phoha,“Web User Clustering From Access Log Using Belief Function,
” First International Conference on Knowledge Capture, pp. 202–208, Oct.
2001.
[5] W. Wang and Osmar R. Zaiane,“Clustering Web Sessions by Sequence Alignment,
” Third International Workshop on Management of Information on the Web, Sep.
2002
[6] B. Hay, G. Wets, and K. Vanhoof,“Clustering Navigation Patterns on a Website
using a Sequence Alignment Method, ” IJCAI’s Workshop on Intelligent Techniques
for Web Personalization, Aug. 2001.
[7] V. Guralnik and G. Karypis,“A Scalable Algorithm for Clustering Sequential Data,
” IEEE International Conference on Data Mining, pp. 179–1866, Dec. 2001.
[8] H. Kawaji, Y. Yamaguchi, H Matsuda, and A. Hashimoto,“A Graph-Based Clustering
Method for a Large set of sequence Using a Graph Partitioning Algorithm,
” The Twelfth International Conference on Genome Informatics, pp. 93–102, Dec.
2001.
[9] E. Bolten1, A. Schliep, S. Schneckener, D. Schomburg, and R. Schrader,“Clustering
Protein Sequences - Structure Prediction by Transitive Homology, ” 8th International
Conference on Intelligent Systems for Molecular Biology, Aug. 2000.
[10] F. Giannotti,C. Gozzi, and G. Manco,“Characterizing Web User Accesses: A Transactional
Approach to Web Log Clustering, ” Proceedings of the International Conference
on Information Technology: Coding and Computing, pp. 246–251, Apr. 2002.
[11] Y. Fu, K. Sandhu, and M. Shih,“Clustering ofWeb Users Based on Access Patterns, ”
International Workshop on Web Usage Analysis and User Profiling (WEBKDD’99),
Aug. 1999.
[12] A. Nanopoulos, D. Katsaros, and Y. Manolopoulos,“Effective Prediction of Webuser
Accesses: A Data Mining Approach, ” Mining Log Data Across All Customer
TouchPoints (WEBKDD’2001), Aug. 2001.
[13] Jiawei Han and Micheline Kamber,“Data Mining: Concepts and Techniques, ” Morgan
Kaufmann, 2001.
[14] Ricardo Baeza-Yates and Berthier Ribeiro-Neto,“Modern Information Retrieval, ”
Addison Wesley, 1999.
[15] R. Cooley, B.Mobasher, and J. Srivastava,“Data Preparation for MiningWorldWide
Web Browsing Patterns, ” Knowledge and Information Systems, pp. 5–32, 1999.
[16] J. Han, Y. Cai, and N. Cercine,“Knowledge Discovery in Databases: An Attribute-
Oriented Approach, ” In Proceedings of 18th International Conference Very Large
Data Base, pp. 547–559, Aug. 1992.
[17] K. Charter, J. Schaeffer, and D. Szafron,“Sequence Alignment Using Fastlsa, ” International
Conference on Mathematics and Engineering Techniques in Medicine and
Biological Sciences (METMBS’2000), pp. 239–245, Jun. 2000.
[18] H. Mannila, H.Toivonen, and A. Inleri Verkamo,“Discovering Frequent Episodes in
Sequences, ” In Proceedings of the International Conference on Knowledge Discovery
and Data Mining (KDD’95), pp. 210–215, Aug. 1995.
[19] R. Agrawal and R. Srikant,“Mining Sequential Patterns, ” In Proceedings of the
International Conference Data Engineering (ICDE’95), pp. 3–14, Aug. 1995.
[20] B. Mobasher, H. Dai, T. Luo, M. Nakagawa, Y. Sun, and J. Wiltshire,“Discovery of
Aggregate Usage Profiles for Web Personalization, ” Worskhop on Web Mining for
E-Commerce – Challenges and Opportunities (WEBKDD’2000), Aug. 2000.
[21] G. Adomavicius and A. Tuzhilin,“Using Data Mining Methods to Build Customer
Profiles, ” IEEE Computer, vol. 34, pp. 74–82, Feb. 2001.
[22] J. Yang, W.Wang, P.Yu, and J. Han,“Discovery of Aggregate Usage Profiles for
Web Personalization, ” In Proceedings of ACM SIGMOD International Coference
on Manegement of Data, Jun. 2002.
[23] S. Guha, R. Rasrogi, K. Shim,“ROCK: A Robust Clustering Algorithm for Categorical
Attributes, ” In Proceedings of IEEE International Conference on Data
Engineering, pp. 512–521, Mar. 1999.
[24] W.-C. Chih, M.-S. Chen,“Developing Data Allocation Schemes by Incremental Mining
of User Moving Patterns in a Mobile Computing System, ” IEEE Transactions
on Knowledge and Data Engineering, Vol. 15, pp. 70–85 Feb. 2003.
[25] W.-C. Chih, M.-S. Chen,“Mining User Moving Patterns for Personal Data Allocation
in aMobile Computing System, ” In Proceedings of the 29th International Conference
on Parallel Processing (ICPP-2000), pp. 573–580, Aug. 2000.
[26] D.S. Phatak, R. Mulvaney,“Clustering for Personalized Mobile Web Usage, ” Proceedings
of the IEEE FUZZ’02, pp. 705–710, May. 2002.
[27] C.-H. Yun, K.-T. Chuang and M.-S. Chen,“An Efficient Clustering Algorithm for
Market Basket Data Based on Small-Large Ratios, ” In Proceedings of the 25th
International Computer Software and Applications Conference (COMPSAC 2001),
pp. 505–510, Oct. 2001.
[28] C. Ordonez, E. Omiecinski, N. Ezquerra,“A fast algorithm to cluster high dimensional
basket data, ” In Proceedings of IEEE International Conference on Data
Mining (ICDM 2001), pp. 633–636, Dec. 2001.