簡易檢索 / 詳目顯示

研究生: 楊文典
Yang, Wen-Dian
論文名稱: 基於概念延伸之文件分類法
Text classification with concept expansion
指導教授: 李昇暾
Li, Sheng-Tun
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2011
畢業學年度: 99
語文別: 中文
論文頁數: 62
中文關鍵詞: 文件分類模糊正規概念分析擴展查詢
外文關鍵詞: Text Classification, Fuzzy Formal Concept Analysis, Query Expansion
相關次數: 點閱:99下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 數位化文件已成為人們取得、儲存以及傳播知識最重要的形式之一,並且隨著知識管理逐漸受到重視,數位化文件的數量亦呈現急遽的成長。因此,如何有效的管理龐大的文件集便成為一個相當重要的議題。
    文件分類乃是根據文件的內容給予一適當的類別,使其方便組織、管理以及利用,因此有不少學者提出了自動化文件分類的方法。而縱觀現有的技術,幾乎都是將所有的特徵詞視為互相獨立,因此可能會因為忽略了特徵詞與特徵詞之間或特徵詞與多個文件之間潛在隱含的關係而導致分類效能不彰。此外,資訊的不完整亦可能使得文件無法被正確的分類。
    根據上述的問題,本研究提出基於概念延伸之文件分類法來進行改善。主要是採用正規概念分析來進行文件的類別推論,並加入擴展查詢的概念以避免因資訊不足而造成分類錯誤。根據實驗的結果,證實本研究方法在分類的效能上的確優於以往的分類方法,並且應用擴展查詢於文件分類領域中也能有效的提升分類的正確率。

    Digital files have become one of the most important forms which people used for acquisition, storage and dissemination knowledge. And with the knowledge management has been gradually emphasized, the number of digital files also has been dramatic increase. As a result, how to manage a large set of files effectively becomes a considerable issue.
    Text classification is a technique that assigns appropriate categories to a document based on its content, making it easy to organize, manage and use. Therefore, there are many scholars proposed methods of automated text classification. But examining the present technology comprehensively, almost all of them assume the feature words in documents are independent of each other that may causes error classification because of ignoring the implied relationship between words and words or words and multiple documents. What more, the incomplete information of unknown document also may make it couldn’t be classified correctly.
    According to the above problems, this study proposes an extended classification method based on concept expansion to improve the present efficiency. It mainly use the Formal Concept Analysis to infer the categories of unknown document, and involve the concept of query expansion in order to avoid classification errors caused by lack of information. The experimental results confirmed the effectiveness of proposed method in this study which is obvious better than that of previous classification methods, and applied the query expansion in the field of text classification can effectively improve the classification accuracy.

    摘要 I Abstract II 誌謝 III 目錄 IV 表目錄 VII 圖目錄 IX 第一章 緒論 1 1.1研究背景與動機 1 1.2研究目的 2 1.3研究步驟與流程 3 1.4論文架構 4 第二章 文獻探討 5 2.1文件分類 5 2.1.1詞頻-逆向文件頻率 6 2.1.2逆向一致頻率 7 2.1.3一致性 8 2.1.4文件分類技術 8 2.1.5文件分類技術比較 12 2.2模糊邏輯 13 2.2.1標準模糊運算子 15 2.2.2α-截集 15 2.2.3模糊合成運算 16 2.3正規概念分析 17 2.3.1正規情境 18 2.3.2正規概念 19 2.3.3概念網路 20 2.3.4模糊正規概念分析 22 第三章 研究方法 24 3.1概念學習 25 3.1.1資料前處理 28 3.1.2特徵選取 29 3.1.3模糊正規概念分析 31 3.1.4概念延伸 34 3.2新文件分類 36 3.2.1計算相似概念 36 3.2.2推論最適類別 37 第四章 實驗與分析 39 4.1實驗資料集 39 4.2實驗結果 41 4.2.1實驗1:Reuters-21578(R8) 41 4.2.2實驗2:20 Newsgroups 43 4.3實驗結果比較 45 4.3.1 Reuters-21578(R8) 45 4.3.2 20Newsgroups 46 4.4敏感度分析 47 4.5實驗檢定 52 第五章 結論與未來展望 54 5.1結論 54 5.2未來展望 55 參考文獻 57 附錄 61

    Abebe, A. J., Guinot, V., & Solomatine, D. P. (2000). Fuzzy alpha-cut vs. Monte Carlo techniques in assessing uncertainty in model parameters. Proc. 4-th International Conference on Hydroinformatics.
    Amayri, O., & Bouguila, N. (2010). A study of spam filtering using support vector machines. Artificial Intelligence Review, 34(1), 73-108.
    Baoli, L., Qin, L., & Shiwen, Y. (2004). An adaptive k-nearest neighbor text categorization strategy. ACM Transactions on Asian Language Information Processing, 3(4), 215-226.
    Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. (1984). Classification and Regression Trees. Chapman and Hall/CRC
    Burusco, A. & Fuentes-Gonzalez, R. (1994) The study of the L-fuzzy concept lattices. Mathware and Soft Computing, 1(3), 209-218.
    Carpineto, C., Michini, C., & Nicolussi, R. (2009). A Concept Lattice-Based Kernel for SVM Text Classification. Paper presented at the Proceedings of the 7th International Conference on Formal Concept Analysis, Darmstadt, Germany.
    Cortes, C., & Vapnik, V. (1995). Support-Vector Networks. Machine Learning, 20(3), 273-297.
    Cross, V. (2003). Uncertainty in the Automation of Ontology Matching. Paper presented at the Proceedings of the 4th International Symposium on Uncertainty Modelling and Analysis.
    Everts, T. J., Park, S. S., & Kang, B. H. (2006). Using formal concept analysis with an incremental knowledge acquisition system for web document management. Paper presented at the Proceedings of the 29th Australasian Computer Science Conference - Volume 48, Hobart, Australia.
    Formica, A. (2006). Ontology-based concept similarity in Formal Concept Analysis. Information Sciences, 176(18), 2624-2641.
    Frank, E., Hall, M., Trigg, L., Holmes, G., & Witten, I. H. (2004). Data mining in bioinformatics using Weka (Vol. 20, pp. 2479-2481): Oxford Univ Press.
    Han, E. H., Karypis, G., & Kumar, V. (2001). Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification. Paper presented at the Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining.
    Heckerman, D. (1997). Bayesian Networks for Data Mining. Data mining and knowledge discovery, 1(1), 79-119.
    Huang, Y. (1998). A theoretic and empirical research of cluster indexing for mandarin chinese full text document. The Journal of Library and Information Science, 24, 1023-2125.
    Kass, G. V. (1980). An Exploratory Technique for Investigating Large Quantities of Categorical Data. Journal of the Royal Statistical Society. Series C (Applied Statistics), 29(2), 119-127.
    Klir, G. J., & Yuan, B. (1995). Fuzzy sets and fuzzy logic: theory and applications: Prentice-Hall, Inc.
    Ko, Y., & Seo, J. (2002). Text categorization using feature projections. Paper presented at the Proceedings of the 19th international conference on Computational linguistics - Volume 1, Taipei, Taiwan.
    Kotsiantis, S. B. (2007). Supervised Machine Learning: A Review of Classification Techniques. Informatica, 31, 249-268.
    Lee, H. M., Chen, C. M., & Hwang, C. W. (2000). A neural network document classifier with linguistic feature selection. Paper presented at the Proceedings of the 13th international conference on Industrial and engineering applications of artificial intelligence and expert systems: Intelligent problem solving: methodologies and approaches, New Orleans, Louisiana, United States.
    Manevitz, L. M., & Yousef, M. (2002). One-class svms for document classification. The Journal of Machine Learning Research, 2, 139-154.
    Manning, C. D., Raghavan, P., & Schtze, H. (2008). Introduction to Information Retrieval: Cambridge University Press.
    Perez, M. S., Sanchez, A., Herrero, P., Robles, V., & Pena, J. M. (2005). Adapting the weka data mining toolkit to a grid based environment. LECTURE NOTES IN COMPUTER SCIENCE, 3528, 492–497.
    Quan, T. T., Hui, S. C., & Cao, T. H. (2004, 1-3 Dec. 2004). A fuzzy FCA-based approach for citation-based document retrieval. Paper presented at the Cybernetics and Intelligent Systems, 2004 IEEE Conference on.
    Quan, T. T., Hui, S. C., Fong Alvis, C. M., & Cao, T. H. (2006). Automatic fuzzy ontology generation for semantic Web. Knowledge and Data Engineering, IEEE Transactions on, 18(6), 842-856.
    Quinlan, J. R. (1987). Simplifying decision trees. International Journal of Man-Machine Studies, 27(3), 221-234.
    Quinlan, J. R. (1996). Bagging, Boosting, and C4.5. In Proceedings Of The Thirteenth National Conference On Artificial Intelligence, 725-730.
    Salton, G., & Buckley, C. (1988). TERM-WEIGHTING APPROACHES IN AUTOMATIC TEXT RETRIEVAL. Information Processing & Management, 24(5), 513-523.
    Shein, K. P. P., & Nyunt, T. T. S. (2010). Sentiment Classification Based on Ontology and SVM Classifier. Paper presented at the Proceedings of the 2010 Second International Conference on Communication Software and Networks.
    Wang, T. Y., & Chiang, H. M. (2007). Fuzzy support vector machine for multi-class text categorization. Information Processing & Management, 43(4), 914-929.
    Wille, R. (1982). Restructuring lattice theory: an Approach based on Hierarchies of Concepts. In I. Rival (Ed.), Ordered sets. Reidel, Dordrecht-Boston, 445-470.
    Zadeh, L. A. (1965). FUZZY SETS. Information and Control, 8(3), 338-353.

    無法下載圖示 校內:2023-12-18公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE