| 研究生: |
王常威 Wang, Chang-Wei |
|---|---|
| 論文名稱: |
以內容為基礎之XML文件分類方法之研究 A content-based XML document classification method |
| 指導教授: |
王泰裕
Wang, Tai-Yu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2004 |
| 畢業學年度: | 92 |
| 語文別: | 中文 |
| 論文頁數: | 59 |
| 中文關鍵詞: | 以內容為基礎 、延伸標記語言 、領域知識 、文件分類 |
| 外文關鍵詞: | document classification, domain knowledge, XML, content-based |
| 相關次數: | 點閱:93 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在資訊爆炸的現在,人類所需要處理的資訊已經遠超越我們可以負荷的程度。因此對於許多工作的自動化需求也越加重要,
尤其是在包含大量資訊的文件當中更是如此。而隨著XML(eXtensible Markup Language)的問世,這套共通的標準與其使用上的便利性,使得其文件數量呈現驚人的快速成長。因此結合上述這兩項觀點,本研究將嘗試提出一套有別於以關鍵字為基礎(keyword-based)的文件分類方法,試圖以內容為基礎(content-based)來架構出一套自動化XML文件的分類方法,並針對大量的未知XML文件作分類。本研究方法首先將在事前對用來訓練之XML文件作處理,找出代表該分類的特徵項集合,並試圖加入各種領域所獨有的知識字彙,以此來作分類的動作。如此以XML特殊的架構來找出適合它的分類方法,再結合各領域獨有知識,發展成一套擁有足夠正確率的分類程序。而經過實例驗證之後,將本研究方法與改良後之VSM分類方法作比較,結果發現,本研究方法較VSM分類方法,在針對XML文件的分類上,具有較高的分類正確率。
"none"
中文部分
葉怡成,"類神經網路模式應用與實作",儒林圖書有限公司,民國90年七版
英文部分
Aiello, M., C. Monz, L. Todoran. 2002. Document understanding for a broad class of documents. International Journal on Document Analysis and Recognition 5(1) 1-16.
Allan, J.(editor), B. Croft(editor). 2003. Challenges in Information Retrieval and Language Modeling. ACM SIGIR Forum 37(1)
Belkin, N. J., W. B. Croft. 1992. Information Filtering and Information Retrieval: Two Sides of the Same Coin?. Communications of ACM 35(12) 29-38.
Bertino, E., G. Guerrini, M. Mesiti. 2002 Matching an XML Document against a Set of DTDs. Proceeding of the Thirteenth International Symposium on Methodologies for Intelligent Systems pp.412-422.
Bertino, E., G. Guerrini, M. Mesiti. 2003. A Matching Algorithm for Measuring the Structural Similarity between an XML Document and a DTD and its Applications. Information Systems 29(1) 23-46.
Borko, H., M. Bernick. 1963. Automatic Document Classification. Journal of the ACM 10(1) 151-162.
Chen, Y. S., T. H. Chu, 1995. A Neural Network Classification Tree. IEEE International Conference on Neural Networks pp.409-413.
Chisholm, E., T. G. Kolda, 1998. New term weighting formulas for the vector space method in information retrieval. Report ORNL/TM-13756, Computer Science and Mathematics Division, Oak Ridge National Laboratory.
Heaps, H.S. 1973. A Theory of Relevance for Automatic Document Classification. Information and Control 22(3) 268-278.
Jacobes, P. S. 1993. Using Statistical Mehods to Improve Knowledge-based News Categorization. IEEE expert 8(2) 13-23.
Jain, A. K., M. N. Murty, P. J. Flynn. 1999. Data Clustering: A Review. ACM Computing Surveys 31(3) 264-323.
Jenkins, C., D. Inman. Adaptive Automatic Classification on the Web. 2000. 11th International Workshop on Database and Expert Systems Application pp.504-511.
Lee, J. Y., J. S. Park, H. Byun, J. Moon, S. W. Lee. 2002. Automatic Generation of Structured Hyperdocuments From Document Images. Pattern Recognition 35(2) 485-503.
Lewis, D. D., R. E. Schapire, J. P. Callan, R. Papka. 1996. Training Algorithms for Linear Text Classifiers. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pp.298-306.
Mostafa, J., W. Lam. 2000. Automatic Classification Using Supervised Learning in A Medical Document Filtering Application. Information Processing & Management 36(3) 415-444.
NietoSanchez, S., E. Triantaphyllou, D. Kraft. 2002. A Feature Mining Based Approach for the Classification of Text Documents into Disjoint Classes. Information Processing and Management 38(4) 583-604.
Oracle TextServer3 Administrator's Guide, URL:http://otn.oracle.co.kr/docs/oracle78/txtsvr30/tsad/ch15.htm
Richard F. E. S. 1991. Distributed representations in a text based information retrieval system: a new way of using the vector space model. Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval pp.123-132.
Ruge, G. 1997. Automatic Detection of Thesaurus Relations for Information Retrieval Applications. Foundations of Computer Science: Potential - Theory - Cognition pp.499-506.
Salton, G., A. Wong, C. S. Yang. 1975. A Vector Space Model for Automatic Indexing. Communications of the ACM 18(11) 613-620.
Salton, G., C. Buckley. 1991. Automatic Text Structuring and Retrieval-experiments in Automatic Encyclopedia Searching. Proceedings of the Fourteenth International ACM SIGIR Conference on Research and Development in Information Retrieval pp.21-30.
Tai, S. M., C. Z. Yang, I. X. Chen. 2002a. Improved Automatic Web-Page Classification by Neighbor Text Percolation. Proceedings of the 8th International Conference on Information Management pp.289-296.
Tai, X., M. Sasaki, Y. Tanaka, K. Kita. 2000. Improvement of vector space information retrieval model based on supervised learning. Proceedings of the fifth international workshop on on Information retrieval with Asian languages pp.69-74.
Tai, X., F. Ren, K. Kita. 2002b. An information retrieval model based on vector space method by supervised learning. Information Processing and Management 38(6) 749-764.
W3C. Extensible Markup Language, URL:http://www.w3.org/TR/REC-xml.
Yahoo!, URL:http://www.yahoo.com
Yi, J., N. Sundaresan. 2000. A Classifier for Semi-structured Documents. Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp.340-344.
Zhu, L., A. Rao, A. Zhang. 2002. Advanced Feature Extraction for Keyblock-based Image Retrieval. Information Systems 27(8) 537-557.
Zisman, A. 2000. An Overview of XML. Computing & Control Engineering Journal 11(4) 165-167.