| 研究生: |
楊文典 Yang, Wen-Dian |
|---|---|
| 論文名稱: |
基於概念延伸之文件分類法 Text classification with concept expansion |
| 指導教授: |
李昇暾
Li, Sheng-Tun |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2011 |
| 畢業學年度: | 99 |
| 語文別: | 中文 |
| 論文頁數: | 62 |
| 中文關鍵詞: | 文件分類 、模糊正規概念分析 、擴展查詢 |
| 外文關鍵詞: | Text Classification, Fuzzy Formal Concept Analysis, Query Expansion |
| 相關次數: | 點閱:99 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
數位化文件已成為人們取得、儲存以及傳播知識最重要的形式之一,並且隨著知識管理逐漸受到重視,數位化文件的數量亦呈現急遽的成長。因此,如何有效的管理龐大的文件集便成為一個相當重要的議題。
文件分類乃是根據文件的內容給予一適當的類別,使其方便組織、管理以及利用,因此有不少學者提出了自動化文件分類的方法。而縱觀現有的技術,幾乎都是將所有的特徵詞視為互相獨立,因此可能會因為忽略了特徵詞與特徵詞之間或特徵詞與多個文件之間潛在隱含的關係而導致分類效能不彰。此外,資訊的不完整亦可能使得文件無法被正確的分類。
根據上述的問題,本研究提出基於概念延伸之文件分類法來進行改善。主要是採用正規概念分析來進行文件的類別推論,並加入擴展查詢的概念以避免因資訊不足而造成分類錯誤。根據實驗的結果,證實本研究方法在分類的效能上的確優於以往的分類方法,並且應用擴展查詢於文件分類領域中也能有效的提升分類的正確率。
Digital files have become one of the most important forms which people used for acquisition, storage and dissemination knowledge. And with the knowledge management has been gradually emphasized, the number of digital files also has been dramatic increase. As a result, how to manage a large set of files effectively becomes a considerable issue.
Text classification is a technique that assigns appropriate categories to a document based on its content, making it easy to organize, manage and use. Therefore, there are many scholars proposed methods of automated text classification. But examining the present technology comprehensively, almost all of them assume the feature words in documents are independent of each other that may causes error classification because of ignoring the implied relationship between words and words or words and multiple documents. What more, the incomplete information of unknown document also may make it couldn’t be classified correctly.
According to the above problems, this study proposes an extended classification method based on concept expansion to improve the present efficiency. It mainly use the Formal Concept Analysis to infer the categories of unknown document, and involve the concept of query expansion in order to avoid classification errors caused by lack of information. The experimental results confirmed the effectiveness of proposed method in this study which is obvious better than that of previous classification methods, and applied the query expansion in the field of text classification can effectively improve the classification accuracy.
Abebe, A. J., Guinot, V., & Solomatine, D. P. (2000). Fuzzy alpha-cut vs. Monte Carlo techniques in assessing uncertainty in model parameters. Proc. 4-th International Conference on Hydroinformatics.
Amayri, O., & Bouguila, N. (2010). A study of spam filtering using support vector machines. Artificial Intelligence Review, 34(1), 73-108.
Baoli, L., Qin, L., & Shiwen, Y. (2004). An adaptive k-nearest neighbor text categorization strategy. ACM Transactions on Asian Language Information Processing, 3(4), 215-226.
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. (1984). Classification and Regression Trees. Chapman and Hall/CRC
Burusco, A. & Fuentes-Gonzalez, R. (1994) The study of the L-fuzzy concept lattices. Mathware and Soft Computing, 1(3), 209-218.
Carpineto, C., Michini, C., & Nicolussi, R. (2009). A Concept Lattice-Based Kernel for SVM Text Classification. Paper presented at the Proceedings of the 7th International Conference on Formal Concept Analysis, Darmstadt, Germany.
Cortes, C., & Vapnik, V. (1995). Support-Vector Networks. Machine Learning, 20(3), 273-297.
Cross, V. (2003). Uncertainty in the Automation of Ontology Matching. Paper presented at the Proceedings of the 4th International Symposium on Uncertainty Modelling and Analysis.
Everts, T. J., Park, S. S., & Kang, B. H. (2006). Using formal concept analysis with an incremental knowledge acquisition system for web document management. Paper presented at the Proceedings of the 29th Australasian Computer Science Conference - Volume 48, Hobart, Australia.
Formica, A. (2006). Ontology-based concept similarity in Formal Concept Analysis. Information Sciences, 176(18), 2624-2641.
Frank, E., Hall, M., Trigg, L., Holmes, G., & Witten, I. H. (2004). Data mining in bioinformatics using Weka (Vol. 20, pp. 2479-2481): Oxford Univ Press.
Han, E. H., Karypis, G., & Kumar, V. (2001). Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification. Paper presented at the Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining.
Heckerman, D. (1997). Bayesian Networks for Data Mining. Data mining and knowledge discovery, 1(1), 79-119.
Huang, Y. (1998). A theoretic and empirical research of cluster indexing for mandarin chinese full text document. The Journal of Library and Information Science, 24, 1023-2125.
Kass, G. V. (1980). An Exploratory Technique for Investigating Large Quantities of Categorical Data. Journal of the Royal Statistical Society. Series C (Applied Statistics), 29(2), 119-127.
Klir, G. J., & Yuan, B. (1995). Fuzzy sets and fuzzy logic: theory and applications: Prentice-Hall, Inc.
Ko, Y., & Seo, J. (2002). Text categorization using feature projections. Paper presented at the Proceedings of the 19th international conference on Computational linguistics - Volume 1, Taipei, Taiwan.
Kotsiantis, S. B. (2007). Supervised Machine Learning: A Review of Classification Techniques. Informatica, 31, 249-268.
Lee, H. M., Chen, C. M., & Hwang, C. W. (2000). A neural network document classifier with linguistic feature selection. Paper presented at the Proceedings of the 13th international conference on Industrial and engineering applications of artificial intelligence and expert systems: Intelligent problem solving: methodologies and approaches, New Orleans, Louisiana, United States.
Manevitz, L. M., & Yousef, M. (2002). One-class svms for document classification. The Journal of Machine Learning Research, 2, 139-154.
Manning, C. D., Raghavan, P., & Schtze, H. (2008). Introduction to Information Retrieval: Cambridge University Press.
Perez, M. S., Sanchez, A., Herrero, P., Robles, V., & Pena, J. M. (2005). Adapting the weka data mining toolkit to a grid based environment. LECTURE NOTES IN COMPUTER SCIENCE, 3528, 492–497.
Quan, T. T., Hui, S. C., & Cao, T. H. (2004, 1-3 Dec. 2004). A fuzzy FCA-based approach for citation-based document retrieval. Paper presented at the Cybernetics and Intelligent Systems, 2004 IEEE Conference on.
Quan, T. T., Hui, S. C., Fong Alvis, C. M., & Cao, T. H. (2006). Automatic fuzzy ontology generation for semantic Web. Knowledge and Data Engineering, IEEE Transactions on, 18(6), 842-856.
Quinlan, J. R. (1987). Simplifying decision trees. International Journal of Man-Machine Studies, 27(3), 221-234.
Quinlan, J. R. (1996). Bagging, Boosting, and C4.5. In Proceedings Of The Thirteenth National Conference On Artificial Intelligence, 725-730.
Salton, G., & Buckley, C. (1988). TERM-WEIGHTING APPROACHES IN AUTOMATIC TEXT RETRIEVAL. Information Processing & Management, 24(5), 513-523.
Shein, K. P. P., & Nyunt, T. T. S. (2010). Sentiment Classification Based on Ontology and SVM Classifier. Paper presented at the Proceedings of the 2010 Second International Conference on Communication Software and Networks.
Wang, T. Y., & Chiang, H. M. (2007). Fuzzy support vector machine for multi-class text categorization. Information Processing & Management, 43(4), 914-929.
Wille, R. (1982). Restructuring lattice theory: an Approach based on Hierarchies of Concepts. In I. Rival (Ed.), Ordered sets. Reidel, Dordrecht-Boston, 445-470.
Zadeh, L. A. (1965). FUZZY SETS. Information and Control, 8(3), 338-353.
校內:2023-12-18公開