研究生: |
劉佳宗 Liu, Chia-Tzung |
---|---|
論文名稱: |
利用機器學習摘要概念為基礎之文件摘要自動建立方法 Automatic Summarization with Concepts Learned by Machine Learning Technique |
指導教授: |
王惠嘉
Wang, Hei-Chia |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
論文出版年: | 2007 |
畢業學年度: | 95 |
語文別: | 中文 |
論文頁數: | 75 |
中文關鍵詞: | 本體論 、文件摘要自動化 、自然語言處理 |
外文關鍵詞: | Automatic Text Summarization, Ontology, Natural Language Processing |
相關次數: | 點閱:130 下載:12 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著網際網路的發展,網路上資訊所包含的範圍也越來越廣,資訊量也比以往更為豐富。而大量的資訊卻造成資訊過載的問題,使人們無法去處理這些大量的資訊。過度的資訊充斥在網路上,迫使人們必須花大量的時間在審視資訊的好壞及符不符合需求,並逐一進行資訊的篩選。當然這些問題可以透過搜尋引擎得到不小程度的改善,但對使用者而言,得逐一檢視每一筆資訊依舊是不變的事實。為有效解決資訊過載的問題,各式資訊擷取的技術紛紛被提出,文件摘要自動化(Automatic Summarization)的概念也應運而生。透過文件摘要自動化的模式,抽出文中重要的文句,讓人們能快速理解文中的內容,減輕資訊過載的問題。
今日文件摘要自動化的方法有很多種,其中以本體論模式進行摘要自動化為其中一種較好的方式。然而,本體論的建構往往需耗費大量人力在建立及維護之上,反造成本體論摘要模式在應用上的困難度。因此為增進實用性,自動本體論建構是首要工作。但完成本體論建構後,傳統的本體論摘要模式在摘要句的擷取上僅以是否與本體論概念相似為判斷依據,卻未考量到「理解程度」、「主題相關」這兩大特性。由於上述兩點原因,而使文件摘要自動化的技術目前多僅應用在短篇的文獻之中。
有鑑於此,本研究提出一套全自動的本體論摘要模式,先以概念擷取的模式自動建立本體論,並以此本體論為摘要句判斷的依據。此外更透過Position和Topic Detection的技術進一步改善「理解程度」及「主題相關」的問題,以提升摘要結果的品質。
Due to Internet became popular, the information on Internet is growing fast and the content is very diverse. This situation leads to the problem of information overload which people cannot find the required information easily. It forces human to consume much time on information filtering to find useful information. This problem can be lightened by search engine, but user still has to check each searching result one by one. To efficiently solve this problem, many studies have been proposed by information retrieval. Automatic text summarization is one solution for reducing the problem of information overload. This technique can extract important sentences to represent the subject of information, and let human understand information as soon as possible.
There are many approaches to do this job, and one is the ontology-based summarization approach. However, the ontology construction and maintenance always need many human efforts. These causes the ontology-based approach hardly apply in the real world. To enhance the practicability of this approach, we should consider how to construct ontology automatically. In addition, we also need to think about “Reading Comprehension” and “Topic relevance”.
According to these problems mentioned above, this thesis proposed one ontology-based summarization approach which is full-automatic on the ontology construction and maintenance. In addition, the summarizer of this thesis not only uses the ontology to generate summaries, but also considers the “Reading Comprehension” and “Topic relevance” features.
英文文獻
Alani, H., Kim, S., Millard, D. E., Weal, M. J., Hall, W., Lewis, P. H., et al.. Automatic ontology-based knowledge extraction from web documents. IEEE Intelligent Systems, 18(1), 14-21, 2003.
Allen, J.. Natural language processing. Redwood: Benjamin/Cummings, 1995.
Antoniou, G., & Harmelen, F. V.. A semantic web primer. Cambridge, MA: MIT Press, 2004.
Baeza-Yates, R., & Ribeiro-Neto, B.. Modern Information Retrieval. Harlow: Addison Wesley, 1999.
Barzilay, R., & Elhadad, M.. Using lexical chains for text summarization. Proceedings of the Intelligent Scalable Text Summarization Workshop (ISTS 97), 10-18, 1997.
Bawden, D., Holtham, C., & Courtney, N.. Perspectives on information overload. Aslib Proceedings: New Information Perspectives, 51(8), 249-255, 1999.
Bettis-Outland, E. H.. Critical roles of information overload, information quality, and perceived information distortion on organizational effectiveness: a customer relationship management perspective. Unpublished doctoral dissertation, Georgia State University, 2004.
Brewster, C., Ciravegna, F., & Wilks, Y.. User-centred ontology learning for knowledge management. Natural language processing and information systems lecture notes in computer science, 2553, 203-207, 2002.
Bunge, M.. Ontology I: the furniture of the World. Treatise on Basic Philosophy, 3, 1997.
Carbonell, J. G., & Goldstein, J.. The use of MMR, diversity-based reranking for reordering documents and producing summaries. Proceedings of SIGIR-98, 1998.
Chali, Y.. Topic detection of unrestricted texts: approaches and evaluations. Artificial Intelligence, 19, 119-136, 2005.
Chen, Y., Wang, X., & Guan, Y.. Automatic text summarization based on lexical chains. Proceedings Lecture Notes In Computer Science, 3610, 947-951, 2005.
Daud, N. M. & Husin, Z.. Developing critical thinking skills in computer-aided extended reading classes. Brithish Journal Of Educational Technology, 35(4), 477-487, 2004.
Dorr, B. J.. Review of natural language processing in R.A. Wilson and F.C. Keil (Eds.). Artificial Intelligence, 130, 185-189, 2001.
Edmunson, H.. New methods in automatic extracting. Journal of the ACM, 16(2), 264-285, 1969.
Embley, D. W., Campbell, D. M., Smith, R. D., & Liddle, S. W.. Ontology-based extraction and structuring of information from data-rich unstructured documents. International Conference on Information and Knowledge Management (CIKM), 1998.
Farhoomand, A. F., & Drury, D. H.. Managerial information overload. Communications of the ACM, 45(10), 127-131, 2002.
Guarino, N., & Giaretta, P.. Ontologies and knowledge bases: towards a terminological clarification.. Toward Very Large Knowledge Bases. Amsterdam, IOS Press, 1995.
Halteren, H. V.. New feature sets for summarization by sentence extraction. IEEE Intelligent Systems, 18(4), 34-42, 2003.
Hirst, G., & St-Onge, D.. Lexical chains as representation of context for the detection and correction of malapropisms. Cambridge, MA: MIT Press, 1998.
Hotho, A., Maedche, A., & Staab, S.. Ontology-based text clustering. Proceedings of the IJCAI-2001 workshop text learning: Beyond supervision, 2001.
Huang, H. H., Kuo, Y. H., & Yang, H. C.. Fuzzy-Rough Set Aided Sentence Extraction Summarization. Innovative Computing, Information and Control International Conference, 1, 450-453, 2006.
Just, M. A. & Carpenter, P. A.. A theory of reading: From eye fixations to comprehension. Psychological Review, 87(4), 329-354, 1980.
Khan, L., & Luo, F.. Ontology construction for information selection. Proceedings of the 14th IEEE international conference on tools with artificial intelligence, 122-127, 2002.
Kietz, J. U., Maedche, A., & Volz, R.. A method for semi-automatic ontology acquisition from a corporate intranet. Proceedings of the EKAW’2000 workshop on ontologies and texts, 2000.
Klein, D., & Manning, C. D.. Fast Exact inference with a Factored Model for Natural Language Parsing. In Advances in Neural information Processing System, 15, 3-10, 2002.
Klein, D., & Manning, C. D.. Accurate Unlexicalized Parsing. Proceeding of the 41st Meeting of the Association for Computational Linguistics, 2003.
Lee, C. S., Jian, Z. W., & Huang, L. K.. A fuzzy ontology and its application to news summarization. IEEE Transactions on Systems, Man and Cybermetics Part B, 35(5), 859-880, 2005.
Lee, T. B., & Fischetti, M.. Weaving the web. San Francisco: Harper, 1999.
Lunh, H. P.. The automatic creation of literature abstracts. IBM Journal Of Research And Development, 2(92), 159-165, 1958.
Maedche, A., & Staab, S.. Discovering conceptual relations from text. Proceedings of the 14th European conference on artificial intelligence, 321-325, 2000.
Mani, I., & Bloedorn, E.. Machine learning of generic and user-focused summarization. Proceedings of the Fifteenth National Conference on AI (AAAI-98), 821-826, 1998.
Mani, I., & Maybury, M. T.. Advances in automatic text summarization. Cambridge, MA: MIT Press, 1999.
Manning, C. D., & Schütze, H.. Foundations of statistical natural language processing. Cambridge, MA: MIT Press, 1999.
Mars, N. J. I.. What is an ontology? Seminar on the impact of ontologies on reuse, interoperability and distributed processing. Unicom, Uxbridge, Middlesex, UK, 9-19, 1995.
Morris, J., & Hirst, G.. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17(1), 21-43, 1991.
Neto, J. L., Freitas, A. A., & Kaestner, C. A. A.. Automatic text summarization using a machine learning approach. Proceedings Lecture Notes In Artificial Intelligence, 2507, 205-215, 2002.
Neto, J. L., Santos, A. D., Kaestner, C. A. A., & Freitas, A. A.. Document clustering and text summarization. Proceedings 4th International Conference Practical Applications of Knowledge Discovery and Data Mining (PADD-2000), 41-55, 2000.
Niles, L., & Pease, A.. Towards a standard upper ontology. In Proceeding of the 2nd International Conference on Formal Ontology in Information Systems, October, 17-19, 2001.
Ono, K., Sumita, K., & Miike, S.. Abstract generation based on rhetorical structure extraction. Proceedings of the 15th International Conference on Computational Linguistics, 1, 344-348, 1994.
Paice, C.. Constructing literature abstracts by computer: techniques and prospects. Information Processing and Management, 26(1), 171-186, 1990.
Paliouras, G.. On the need to bootstrap ontology learning with extraction grammar learning. Proceedings lecture notes in artificial intelligence, 3596, 119-135, 2005.
Porter, M. F.. An algorithm for suffix stripping. Program, 14(3), 130-137, 1980.
Riloff, E.. An empirical study of automated dictionary construction for information extraction in three domains. Artificial Intelligence, 85, 101-134, 1996.
Robertson, S. E., Porter, M. F., & Rijsbergen, C. J.. New models in probabilistic information retrieval. London: British Library, 1980.
Salton, G.. Automatic information organization and retrieval. New York: McGraw-Hill, 1968.
Salton, G., & Buckley, C.. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24, 513-523, 1988.
Salton, G., Singhal, A., Mitra, M., & Buckley, C.. Automatic text structuring and summarization. Information Processing and Management, 33(2), 193-207, 1997.
Schutze, H.. Automatic word sense discrimination, Computational Linguistics, 24(1), 97-123, 1998.
Sigletos, G., Paliouras, G., Spyropoulos, C. D., & Stamapoulos, T.. Stacked generalization for information extraction. 16th European Conference on Artificial Intelligence (ECAI), 2004.
Tan, K. W., Han, H., & Elmasri, R.. Web data cleansing and preparation for ontology extraction using WordNet. Proceedings of the first international conference on web information systems engineering, 2, 11-18, 2000.
Tatar, D.. Word sense disambiguation by machine learning approach: a short survey. Fundamenta Informaticae, 64, 433-442, 2005.
Villatoro-Tello, E., Villaseñor-Pineda, L., & Montes-y-Gómez, M.. Using Word Sequence for Text Summarization. Text, Speech and Dialogue, Proceedings Lecture Notes In Artificial Intelligence, 4188, 293-300, 2006.
Visser, U., Stuckenschmidt, H., Schuster, G., & Vogele, T.. Ontologies for geographic information processing. Comput Geosci, 28, 103-117, 2002.
Wei, J., Bressan, S., & Ooi, B. C.. Mining term association rules for automatic global query expansion: methodology and preliminary results. Proceedings of the first international conference on web information systems engineering, 1, 366-373, 2000.
Weng, S. S., Tsai, H. J., Liu, S. C., & Hsu, C. H.. Ontology construction for information classification. Expert Systems with Applications, 31, 1-12, 2006.
Wille, R.. Restructuring lattice theory: an approach based on hierarchies of concepts. In I. Rival (Ed.), Ordered sets (445-470). Boston: Reidel, 1982.
Wu, S. H., Day, M. Y., & Hsu, W. L.. FAQ-centered organizational memory proceedings of the IJCAI’2001 workshop on knowledge management and organizational memories, Seattle, 2001.
Xu, J.. Solving the word mismatch problem through automatic text analysis. Unpublished doctoral dissertation, University of Massachusetts, Amherst, Massachusetts, 1997.
Yarowsky, D.. Hierarchical decision Lists for WSD. Kluwer Acadmic Publishers, 1999.
網站資料
HowNet
(http://www.keenage.com/)
Jena Semantic Web Framework
(http://jena.sourceforge.net/)
JWNL Overview
(http://jwordnet.sourceforge.net/)
Standard Upper Ontology Working Group
(http://suo.ieee.org/)
The Stanford Natural Language Processing Group
(http://nlp.stanford.edu/software/lex-parser.shtml)
Thesaurus.com
(http://thesaurus.reference.com/)
WordNet
(http://wordnet.princeton.edu/)