| 研究生: |
李婷蓉 Lee, Ting-Jung |
|---|---|
| 論文名稱: |
以領域本體論為基礎之語意概念式文件自動分類系統 Automatic Concept-based Document Categorization System with Domain Ontology Framework |
| 指導教授: |
耿伯文
Kreng, Victor B. |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2005 |
| 畢業學年度: | 93 |
| 語文別: | 中文 |
| 論文頁數: | 58 |
| 中文關鍵詞: | 文件自動分類 、潛在語意分析 、領域本體論 、基因演算法 |
| 外文關鍵詞: | Automatic Document Categorization, Genetic Algorithms, Domain Ontology, Latent Semantic Analysis |
| 相關次數: | 點閱:157 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在這篇論文中,我們提出一個領域本體論(Domain Ontology)為架構,結合潛在語意分析(Latent Semantic Analysis)技術的自動文件分類系統。在這個系統中,我們將已建構好的領域本體論其中的字詞集當作我們分類的基本元素,結合潛在語意分析擷取出文章語意的技術,建立領域本體的概念空間。此概念空間使用基因演算法(Genetic Algorithms)的方式分群,以此分群方式學習出文件在各類別的範圍。依據學習的結果,測試文件在新加入時便可根據文件範圍找到其適合的類別。在測試過程中,因為擷取出了此文章的概念,所以就算文章並沒有包括此關鍵詞,我們可以將此文章的概念與領域概念空間相對應,一樣的可以將此文章分到正確的類別。在實驗中,我們先請領域專家以人工方式定義了輕金屬的領域本體論關鍵詞集,以及對蒐集的文件資料分類,而以領域本體論為基礎的語意概念式自動文件分類系統,運用在中文文件方面其正確率可達69%,召回率可達50%;運用在英文文件方面準確率可達73%,召回率可達55%。比傳統的以關鍵字次數為分類方式正確率提升了19%,召回率提升了1%。
In this thesis, we propose an automatic concept-based document categorization system with Domain Ontology and take advantage of LSA as its technology. In this system, we prepare defined key-word sets which from Domain Ontology. The concept space is built with LSA. The vector in the concept-space is clustered with Genetic Algorithms. The boundaries of clusters are set up in the learning step. The test documents can be categorized correctly by the boundary. In the testing step, if a test document without key-word, we still can categorize them correctly. We use LSA to collect document real means, and map this document to the concept space. In the experiment, we invite the expert in light-metal domain. The expert defines light-metal Domain Ontology, and categorizes all documents. When Automatic Concept-based Document Categorization System with Domain Ontology Framework is used for Chinese documents, the average precision is 69% , and the average recall is 50%. While the system is used for English documents, the average precision is 73%, and average recall is 55%.
中文部分:
工業技術研究院工業材料研究所(1999)。材料世界網。線上檢索日期:2005年5月。網址:http://www.materialsnet.com.tw/
財團法人台灣鎂合金協會(2000)。台灣鎂合金協會。線上檢索日期:2005年5月。網址:http://tmag.org.tw/
經濟部工業局金屬工業研究發展中心(2005)。輕金屬資訊網。線上檢索日期:2005年5月。網址:http://www.lightmetal.org.tw/
經濟部技術處(2003)。技術尖兵。線上檢索日期:2005年5月。網址:http://www.st-pioneer.org.tw/
葉怡成(2003)。類神經網路模式應用與實作。台北:儒林。
英文部分:
ASM International. (2005). The Materials Information Society. Detroit: ASM International. Retrieved May, 2005 from the World Wide Web: http://www.asminternational.org/
Automotive Industry Action Group. (1995). AIAG. Michigan: Automotive Industry Action Group. Retrieved May, 2005 from the World Wide Web: http://www.aiag.org/
Berry, M. W., Dumais, S. T. & O'Brien, G. W. (1995). Using linear algebra for intelligent information retrieval. SIAM Review, 37(4), 177-196.
Borko, H. & Bernick, M. (1963). Automatic Document Classification. Journal of the ACM(JACM), 10(2), 151-162.
Chandrasekaran, B., Josephson, J. R. & Beniamins, V. R. (1999). What are Ontologies, and Why Do We Need Them? IEEE Intelligient Systems, 14, 20-26.
Chen, K. J. & Liu, S. H. (1992). Word Identification for Mandarin Chinese Sentences. Proceedings of the 14th Conference on Computational Linguistics, 1, 101-107.
Chien, L. F. & Pu, H. T. (1996). Important Issues on Chinese Information Retrieval. Computational Linguistics and Chinese Language Processing, 1(1), 205-221.
Maron, M. E. (1961). Automatic Indexing: an Experimental Inquiry. Journal of the ACM(JACM), 8, 407-417.
Daconta, M. C., Obrst, L. J. & Smith, K. T. (2003). The Semantic Web: A Guide to the Future of XML,0020Web Services, and Knowledge Management. Indiana: Wiley.
Deerwester, S., Dumais, S. T, Furnas, G.W., Landauer, T. K. & Harshman, R. (1990). Indexing by Latent Semantic Analysis. Journal of the Amercan Society for Information Science, 41(6), 391-407.
Griesbach, J. D., & Etter, D. M. (1998). Fitness-based exponential probabilities for genetic algorithms applied to adaptive IIR filtering. Signals, Systems & Computers, 1(1), 523-527.
Grossman, D. A., & Frieder, O. (2000). Information Retrieval(2nd ed.). Massachusetts: Kluwer Academic Publishers.
Gruber, T. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition, 5, 199-220
Institute Of Materials, Minerals and Mining. (2005). IOC3. London: Institute Of Materials, Minerals and Mining. Retrieved May, 2005 from the World Wide Web: http://www.iom3.org/
Jiang, J., Berry, M. W., Donato, J. M., Ostrouchov, G. & Grady, N. W. (1999). Mining consumer product data via latent semantic indexing. Intelligent Data Analysis, 3, 377-398.
Jin, H. & Wong, K. (2002). A Chinese Dictionary Construction Algorithms for Information Retrieval. ACM Transactions on Asian Language Information Processing, 1(4), 281-296.
Joachims, T. (1998). Text categorization with Support Vector Machines: Learning with many relevant features. Proceedings of {ECML}-98, 137-142.
Katirai, H. & Schuurmans, D. (1999, September 10). Filtering Junk E-Mail: A Performance Comparison between Genetic Programming and Naive Bayes. Retrieved September 30, 2004 from the World Wide Web: http://members.rogers.com/hoomank/papers/ katirai99filtering.pdf, September 1999
Landauer, T. K, Foltz, P. W. & Laham, D. (1998). An Introduction to Latent Semantic Analysis. Discourse Processes, 25(2&3), 259-284.
Lee, D. L., Chuang, H. & Seamons, K. (1997). Document Ranking and the Vector-Space Model. IEEE Software, 14(2), 67-75.
Letsche, T. A. & Berry, M. W. (1997). Large-Scale Information Retrieval with Latent Semantic Indexing. Informatcs and Computer Science,100, 105-137.
Maulik, U. & Bandyopadhyay, S. (2000). Genetic Algorithm-Based Clustering Technique" Pattern Recognition. Pattern Recognition Society, 33, 1455-1465.
Man, K.F. & Kwong, S. (1999). Genetic Algorithms Concepts and Designs(2nd ed). London: Spring.
Navigli, R. & Velardi, O. (2003). Ontology Learning and Its Application to Automated Terminology Translation. IEEE Intelligent Systems, 18, 22-31.
Negnevitsky, M. (2002). Artifical Intelligence. Edinburgh: Pearson Education Limited.
Rehder, B. Schreiner, M. E., Wolfe, B. W., Laham, D., Landauer, K. & Kintsch, W. (1998). Using Latent Semantic Analysis to Assess Knowledge: Some Technical Considerations. Discourse Processes, 255(2&3), 337-354.
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. (1986). Learning Internal Representations by Error Propagation. Parallel Distributed Processing,1, 318-162.
Salton, G. (1969). A Comparison between Manual and Automatic Indexing Methods. Journal of American Documentation, 20(1), 61-71.
Shima, K., Todoriki, M. & Suziki, A. (2004). SVM-Based Feature Selection of Latent Semantic Features. Pattern Recognition Letters, 25(9), 1051-1057.
Sugumaran, V. & Storey, V. C. (2002). Ontologies for Conceptual Modeling: Their Creation, Use, and Management. Data & Knowledge Engineering, 42, 251-271.
Tokkola, K. (2002). Discriminative Features for Document Classification. Proceedings of the 16 th International Conference on Pattern Recognition (ICPR02), 1, 472-475.
Trans Tech Publications Inc. (1995). Trans Tech Publications Inc. Switzerland: Trans Tech Publications Inc. Retrieved May, 2005 from the World Wide Web: http://www.ttp.net/
Vapnik, V. (1982). Estimation of Dependencies Based on Empirical Data. New York: Springer-Verleg.
Wang, B. B., McKey, R. I., Abbass, H. A. & Barlow, M. (2002). Learning text classifier using the domain concept hierarchy. Communications, Circuits and Systems and West Sino Expositions, 2, 1230-1234.
Xu, M., & Wang, Y. (2004). Ontology based unstructured text query. IEEE International Conference on Systems, Man and Cybernetics, 2, 1426-1430
校內:2014-06-27公開