成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	楊文典 Yang, Wen-Dian
論文名稱：	基於概念延伸之文件分類法 Text classification with concept expansion
指導教授：	李昇暾 Li, Sheng-Tun
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理研究所 Institute of Information Management
論文出版年：	2011
畢業學年度：	99
語文別：	中文
論文頁數：	62
中文關鍵詞：	文件分類、模糊正規概念分析、擴展查詢
外文關鍵詞：	Text Classification, Fuzzy Formal Concept Analysis, Query Expansion
相關次數：	點閱：195 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

數位化文件已成為人們取得、儲存以及傳播知識最重要的形式之一，並且隨著知識管理逐漸受到重視，數位化文件的數量亦呈現急遽的成長。因此，如何有效的管理龐大的文件集便成為一個相當重要的議題。
文件分類乃是根據文件的內容給予一適當的類別，使其方便組織、管理以及利用，因此有不少學者提出了自動化文件分類的方法。而縱觀現有的技術，幾乎都是將所有的特徵詞視為互相獨立，因此可能會因為忽略了特徵詞與特徵詞之間或特徵詞與多個文件之間潛在隱含的關係而導致分類效能不彰。此外，資訊的不完整亦可能使得文件無法被正確的分類。
根據上述的問題，本研究提出基於概念延伸之文件分類法來進行改善。主要是採用正規概念分析來進行文件的類別推論，並加入擴展查詢的概念以避免因資訊不足而造成分類錯誤。根據實驗的結果，證實本研究方法在分類的效能上的確優於以往的分類方法，並且應用擴展查詢於文件分類領域中也能有效的提升分類的正確率。

Digital files have become one of the most important forms which people used for acquisition, storage and dissemination knowledge. And with the knowledge management has been gradually emphasized, the number of digital files also has been dramatic increase. As a result, how to manage a large set of files effectively becomes a considerable issue.
Text classification is a technique that assigns appropriate categories to a document based on its content, making it easy to organize, manage and use. Therefore, there are many scholars proposed methods of automated text classification. But examining the present technology comprehensively, almost all of them assume the feature words in documents are independent of each other that may causes error classification because of ignoring the implied relationship between words and words or words and multiple documents. What more, the incomplete information of unknown document also may make it couldn’t be classified correctly.
According to the above problems, this study proposes an extended classification method based on concept expansion to improve the present efficiency. It mainly use the Formal Concept Analysis to infer the categories of unknown document, and involve the concept of query expansion in order to avoid classification errors caused by lack of information. The experimental results confirmed the effectiveness of proposed method in this study which is obvious better than that of previous classification methods, and applied the query expansion in the field of text classification can effectively improve the classification accuracy.

摘要	I
Abstract	II
誌謝	III
目錄	IV
表目錄	VII
圖目錄	IX
第一章　緒論	1
1研究背景與動機	1
2研究目的	2
3研究步驟與流程	3
4論文架構	4
第二章　文獻探討	5
1文件分類	5
1.1詞頻-逆向文件頻率	6
1.2逆向一致頻率	7
1.3一致性	8
1.4文件分類技術	8
1.5文件分類技術比較	12
2模糊邏輯	13
2.1標準模糊運算子	15
2.2α-截集	15
2.3模糊合成運算	16
3正規概念分析	17
3.1正規情境	18
3.2正規概念	19
3.3概念網路	20
3.4模糊正規概念分析	22
第三章　研究方法	24
1概念學習	25
1.1資料前處理	28
1.2特徵選取	29
1.3模糊正規概念分析	31
1.4概念延伸	34
2新文件分類	36
2.1計算相似概念	36
2.2推論最適類別	37
第四章　實驗與分析	39
1實驗資料集	39
2實驗結果	41
2.1實驗1：Reuters-21578(R8)	41
2.2實驗2：20 Newsgroups	43
3實驗結果比較	45
3.1 Reuters-21578(R8)	45
3.2 20Newsgroups	46
4敏感度分析	47
5實驗檢定	52
第五章　結論與未來展望	54
1結論	54
2未來展望	55
參考文獻	57
附錄	61
                                    

Abebe, A. J., Guinot, V., & Solomatine, D. P. (2000). Fuzzy alpha-cut vs. Monte Carlo techniques in assessing uncertainty in model parameters. Proc. 4-th International Conference on Hydroinformatics.
Amayri, O., & Bouguila, N. (2010). A study of spam filtering using support vector machines. Artificial Intelligence Review, 34(1), 73-108.
Baoli, L., Qin, L., & Shiwen, Y. (2004). An adaptive k-nearest neighbor text categorization strategy. ACM Transactions on Asian Language Information Processing, 3(4), 215-226.
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. (1984). Classification and Regression Trees. Chapman and Hall/CRC
Burusco, A. & Fuentes-Gonzalez, R. (1994) The study of the L-fuzzy concept lattices. Mathware and Soft Computing, 1(3), 209-218.
Carpineto, C., Michini, C., & Nicolussi, R. (2009). A Concept Lattice-Based Kernel for SVM Text Classification. Paper presented at the Proceedings of the 7th International Conference on Formal Concept Analysis, Darmstadt, Germany.
Cortes, C., & Vapnik, V. (1995). Support-Vector Networks. Machine Learning, 20(3), 273-297.
Cross, V. (2003). Uncertainty in the Automation of Ontology Matching. Paper presented at the Proceedings of the 4th International Symposium on Uncertainty Modelling and Analysis.
Everts, T. J., Park, S. S., & Kang, B. H. (2006). Using formal concept analysis with an incremental knowledge acquisition system for web document management. Paper presented at the Proceedings of the 29th Australasian Computer Science Conference - Volume 48, Hobart, Australia.
Formica, A. (2006). Ontology-based concept similarity in Formal Concept Analysis. Information Sciences, 176(18), 2624-2641.
Frank, E., Hall, M., Trigg, L., Holmes, G., & Witten, I. H. (2004). Data mining in bioinformatics using Weka (Vol. 20, pp. 2479-2481): Oxford Univ Press.
Han, E. H., Karypis, G., & Kumar, V. (2001). Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification. Paper presented at the Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining.
Heckerman, D. (1997). Bayesian Networks for Data Mining. Data mining and knowledge discovery, 1(1), 79-119.
Huang, Y. (1998). A theoretic and empirical research of cluster indexing for mandarin chinese full text document. The Journal of Library and Information Science, 24, 1023-2125.
Kass, G. V. (1980). An Exploratory Technique for Investigating Large Quantities of Categorical Data. Journal of the Royal Statistical Society. Series C (Applied Statistics), 29(2), 119-127.
Klir, G. J., & Yuan, B. (1995). Fuzzy sets and fuzzy logic: theory and applications: Prentice-Hall, Inc.
Ko, Y., & Seo, J. (2002). Text categorization using feature projections. Paper presented at the Proceedings of the 19th international conference on Computational linguistics - Volume 1, Taipei, Taiwan.
Kotsiantis, S. B. (2007). Supervised Machine Learning: A Review of Classification Techniques. Informatica, 31, 249-268.
Lee, H. M., Chen, C. M., & Hwang, C. W. (2000). A neural network document classifier with linguistic feature selection. Paper presented at the Proceedings of the 13th international conference on Industrial and engineering applications of artificial intelligence and expert systems: Intelligent problem solving: methodologies and approaches, New Orleans, Louisiana, United States.
Manevitz, L. M., & Yousef, M. (2002). One-class svms for document classification. The Journal of Machine Learning Research, 2, 139-154.
Manning, C. D., Raghavan, P., & Schtze, H. (2008). Introduction to Information Retrieval: Cambridge University Press.
Perez, M. S., Sanchez, A., Herrero, P., Robles, V., & Pena, J. M. (2005). Adapting the weka data mining toolkit to a grid based environment. LECTURE NOTES IN COMPUTER SCIENCE, 3528, 492–497.
Quan, T. T., Hui, S. C., & Cao, T. H. (2004, 1-3 Dec. 2004). A fuzzy FCA-based approach for citation-based document retrieval. Paper presented at the Cybernetics and Intelligent Systems, 2004 IEEE Conference on.
Quan, T. T., Hui, S. C., Fong Alvis, C. M., & Cao, T. H. (2006). Automatic fuzzy ontology generation for semantic Web. Knowledge and Data Engineering, IEEE Transactions on, 18(6), 842-856.
Quinlan, J. R. (1987). Simplifying decision trees. International Journal of Man-Machine Studies, 27(3), 221-234.
Quinlan, J. R. (1996). Bagging, Boosting, and C4.5. In Proceedings Of The Thirteenth National Conference On Artificial Intelligence, 725-730.
Salton, G., & Buckley, C. (1988). TERM-WEIGHTING APPROACHES IN AUTOMATIC TEXT RETRIEVAL. Information Processing & Management, 24(5), 513-523.
Shein, K. P. P., & Nyunt, T. T. S. (2010). Sentiment Classification Based on Ontology and SVM Classifier. Paper presented at the Proceedings of the 2010 Second International Conference on Communication Software and Networks.
Wang, T. Y., & Chiang, H. M. (2007). Fuzzy support vector machine for multi-class text categorization. Information Processing & Management, 43(4), 914-929.
Wille, R. (1982). Restructuring lattice theory: an Approach based on Hierarchies of Concepts. In I. Rival (Ed.), Ordered sets. Reidel, Dordrecht-Boston, 445-470.
Zadeh, L. A. (1965). FUZZY SETS. Information and Control, 8(3), 338-353.

校外：不公開電子論文及紙本論文均尚未授權公開

簡易檢索 / 詳目顯示

相關論文