簡易檢索 / 詳目顯示

研究生: 陳明彥
Chen, Ming-Yen
論文名稱: 語意感知之多意圖資訊檢索機制研發
Research and Development of a Semantic-Aware Mechanism for Multipurpose Information Retrieval
指導教授: 陳裕民
Chen, Yuh-Min
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 製造工程研究所
Institute of Manufacturing Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 180
中文關鍵詞: 多意圖資訊檢索資訊檢索語意感知
外文關鍵詞: semantic aware, multipurpose information retrieval, information retrieval
相關次數: 點閱:100下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 知識經濟時代的來臨使得知識變成個人與組織最重要的資產,也成為決定企業競爭力的關鍵要素。資訊為一知識的載體,隱含人類欲與他人溝通與傳遞之知識內容,故一有效率的資訊檢索機制將可達成知識分享與再用的目標。現有之資訊檢索機制大多以關鍵字檢索為主,以關鍵字為基礎進行比對替檢索者搜尋所需內容。以關鍵字進行檢索雖然易於實施與使用,但不易完整呈現查詢與內容中之語意特徵,故易導致檢索錯誤的情況。
    在以文字為基礎的內容中,作者欲表達的意圖與概念會透過字詞與邏輯的組合,以人類可理解的語意呈現。故若能透過以語意為基礎的方式進行知識內容的檢索,將可有效提升知識內容的通透性與能見度,引導內容的作者及使用者以語意為基礎進行無縫地溝通及互動,進而使知識內容得以正確、快速地傳遞至使用者手中。
    本研究提出ㄧ語意感知之多意圖資訊檢索機制,透過對內容語意的處理、識別、擷取、擴張與比對等程序,達成以下目的:(1) 分析與辨別資訊內容中的語意特徵、(2) 發展一可呈現資訊內容語意特徵,並將語意特徵結構化與具體化的語意圖像、(3) 設計一多意圖資訊檢索機制,可根據不同類型的使用者及其使用需求,提供不同的檢索模式。本研究提出之語意感知機制可改善傳統以關鍵字為主之資訊檢索模式,令使用者可透過一語意感知的查詢與檢索方式獲取其所需資訊,進而提昇資訊內容的分享與再用性。

    In recent years, knowledge becomes the most important asset of individuals as well organizations, and also determines the competitiveness of an enterprise. Information content is a knowledge container that implies what human beings transform their knowledge in when they want to communicate with other people. Therefore, effective information content retrieval can achieve the goal and value of knowledge sharing and reusing. The existing information retrieval systems are mostly keyword-based and retrieve relevant information content by matching keywords. Keyword-based search, in spite of its merits of expedient query for information and ease-of-use, has failed to represent the complete semantics contained in the content and has let to the retrieval failure.
    In a textual content, the author’s intention is represented in a semantic format of various combinations of word-word relations that are comprehensible to human beings. Accordingly, retrieving information content from a semantic approach can effectively improve transparency and visibility of the content and guide both the content creator and the content user to engage in seamless, semantic-based communications and interactions.
    This study developed a semantic-aware mechanism for multipurpose information retrieval that handles the processing, recognition, extraction, extensions and matching of content semantics to achieve the following objectives: (1) to analyze and determine the semantic features of information content; (2) to develop a semantic pattern that represents semantic features of the content, and to structuralize and materialize semantic features; (3) to design a multipurpose information retrieval model that provides the most appropriate retrieval method for different types of users depending on their needs. This mechanism is capable of improving the traditional problem of keyword search and enables the user to perform a semantic-aware query and search for the required information, thereby improving the reusing and sharing of information content.

    中文摘要 I ABSTRACT II 誌謝 III TABLE OF CONENTS IV LIST OF FIGURES VIII LIST OF TABLES XI CHAPTER 1. INTRODUCTION 1 1.1 Background 1 1.2 Motivation 2 1.3 Objective 3 1.4 Research Framework 4 CHAPTER 2. LITERATURE REVIEW 7 2.1 Domain Investigation 7 2.1.1 Boolean Model 7 2.1.2 Vector Space Model 9 2.1.3 Probability Model 10 2.2 Technologies Investigation 12 2.2.1 Data Mining 12 2.2.2 Latent Semantic Analysis 14 2.2.3 Support Vector Machine 15 2.2.4 Concept Map 16 2.2.5 Constrained Spreading Activation Model 17 2.3 Summary 20 CHAPTER 3. DESIGN OF SEMANTIC-AWARE MECHANISM FUNCTIONAL FRAMEWORK FOR MULTIPURPOSE INFORMATION RETRIEVAL 23 3.1 Information Retrieval Process and Information Retrieval System 23 3.2 Semantic Issues of Information Content Retrieval 27 3.3 The Conceptual Model of Semantic Aware Mechanism for Multipurpose Information Retrieval 30 3.4 The Functional Framework of Semantic Aware Mechanism for Multipurpose Information Retrieval 37 3.5 Summary 40 CHAPTER 4. INFORMATION CONTENT SEMANTIC RECOGNITION AND REPRESENTATION 43 4.1 Content Semantic Recognition and Representation Process 44 4.2 Development of The Content Semantic Recognition and Representation Functional Model 46 4.3 Content Preprocess Module 48 4.3.1 Content Parsing 49 4.3.2 Content Filtering 49 4.3.3 Part-of-Speech Analysis 50 4.4 Content Abstraction Module 51 4.4.1 Concept Determination 52 4.4.2 Concept and Sentence Weighting 54 4.4.3 Content Extraction and Redundancy Eliminating 56 4.5 Content Semantic Recognition and Annotation Module 57 4.5.1 Semantic Mining 59 4.5.2 Semantic Identification and Semantic Space Construction 61 4.5.3 Semantic Pattern Generation and Transcoding 65 4.6 Summary 67 CHAPTER 5. QUERY-BASED INFORMATION RETRIEVAL 69 5.1 Query-based Information Retrieval Process 70 5.2 Development of The Query-based Information Retrieval Functional Model 72 5.3 Semantic Determination and Extraction 74 5.4 Query Content Semantic Extension 75 5.4.1 Semantic Matrix Construction 76 5.4.2 Singular Value Decomposition 77 5.4.3 Semantic Matrix Dimensionality Reduction 78 5.4.4 Latent Semantic Selection 79 5.5 Semantic Pattern Clustering and Matching 82 5.5.1 Content Semantic Pattern Pre-clustering 83 5.5.2 Optimal Hyper-plane Separate 85 5.5.3 Support Vector Generation 86 5.5.4 Semantic Pattern Matching and Ranking 88 5.6 Summary 90 CHAPTER 6. CONTENT-BASED INFORMATION RETRIEVAL 93 6.1 Content-based Information Retrieval Process 94 6.2 Development of The Content-based Information Retrieval Functional Model 96 6.3 Content Semantic Determination and Extraction 98 6.4 Content Map Construction 100 6.4.1 Content Map Construction Procedure 100 6.4.2 Content Map Construction Module 102 6.5 Content Mapping 106 6.5.1 Content Mapping Procedure 106 6.5.2 Content Mapping Module 108 6.6 Summary 112 CHAPTER 7. CONCEPT-BASED INFORMATION RETRIEVAL 115 7.1 Concept-based Information Retrieval Process 116 7.2 Development of The Concept-based Information Retrieval Functional Model 119 7.3 Hybrid Concept Map Construction Module 121 7.3.1 Document Preprocess 123 7.3.2 Concept Map Construction 124 7.4 Concept Map Navigation Module 129 7.4.1 Concept Spreading 130 7.4.2 Constrained Spreading Judgment 130 7.5 Question-based Concept Exploration Module 132 7.5.1 Concept Matching 133 7.5.2 Sentence Generation 134 7.5.3 Document Retrieval 136 7.6 Summary 138 CHAPTER 8. EXPERIMENT AND RESULT ANSLYSIS OF SEMANTIC-AWARENESS MULTIPURPOSE INFORMATION RETRIEVAL 141 8.1 Mechanism Implement Environment 141 8.2 Experiment and Result Analysis 142 8.2.1 Content Semantic Recognition and Representation 142 8.2.2 Query-based Information Retrieval 149 8.2.3 Content-based Information Retrieval 156 CHAPTER 9. CONCLUSIONS 171 REFERENCES 175

    [1] Abdelali, A., Cowie, J., Soliman, H.S., (2007). Improving Query Precision using Semantic Expansion, Information Processing and Management, vol.43, pp.705–716.
    [2] Baldi, P., Frasconi, P., and Smyth, P., (2003). Modeling the Internet and the Web, Chichester, England: Wiley.
    [3] Belgacem, F.B., (1999). The Mortar Finite Element Method with Lagrange Multipliers, Numericche Mathematik, vol. 84, pp.173–197.
    [4] Bergholtz, M., Johannesson, P., (2001). Classifying the Semantics of Relationships in Conceptual Modeling by Categorization of Roles. Proceedings of the 6th International Workshop on Applications of Natural Language to Information Systems, pp. 199-203.
    [5] Berry, M.W., (1992). Large Scale Singular Value Computations, International Journal of Supercomputer Applications, vol. 6, no. 1, pp.13-49.
    [6] Berry, M.W., Dumais, S.T., O'Brien, G.W., (1995). Using Linear Algebra for Intelligent Information Retrieval. Society for Industrial and Applied Mathematics, vol. 37, no. 4, pp.573-595.
    [7] Bezerra, B.L.D., Carvalho, F. de A.T., (2004). A Symbolic Approach for Content-Based Information Filtering, Information Processing Letters, vol. 92, pp. 45-52.
    [8] Campbell, I., (2000). The Ostensive Model of Developing Information Needs. PhD thesis, University of Glasgow.
    [9] Chang, C.C., Hsu, C.W., Lin, C.J., (2000). The Analysis of Decomposition Methods for Support Vector Machines. IEEE Transactions on Neural Networks, vol.11, no. 4, pp.1003-1008.
    [10] Chen, S.W., Lin, S.C., Chang, K.E., (2001). Attributed Concept Maps: Fuzzy Integration and Fuzzy Matching, IEEE Transactions on Systems, Man, and Cybernetics, vol. 31, no. 5, pp. 842-852.
    [11] Chi, Y. L. (2009). A Consumer-centric Design Approach to Develop Comprehensive Knowledge-based Systems for Keyword Discovery. Expert System with Application, vol. 36, pp. 2481-2493.
    [12] Chi, Y.L. (2005). Elicitation synergy of extracting conceptual tags and hierarchies in textual document. Expert Systems with Applications, vol. 32, no. 2, pp. 349-357.

    [13] Collins, A.M., Quillian, M.R., (1969). Retrieval Time from Semantic Memory, Journal of Verbal Behavior and Verbal Learning, vol. 8, pp. 240-247.
    [14] Crestani, F., (1997). Applications of Spreading Activation Techniques in Information Retrieval. Artificial Intelligence Review, vol. 11, no.6, pp. 453-582.
    [15] Crestani, F., Lee, P.L., (2000). Searching the Web by Constrained Spreading Activation. Information Processing and Management, vol. 36, pp. 585-605.
    [16] Cristianini, N., Shawe-Taylor, J., (2000). An Introduction to Support Vector Machines, Cambridge, UK: Cambridge Univ. Press.
    [17] Daconta, M. C., Obrst, L J.., Smith, K.T., (2003). The Semantic Web: a Guide to the Future of Xml, Web Services, and Knowledge Management, Wiley Publishing, Inc. IN: Indianapolis
    [18] Dale, R., Somers, H.L., Moisl, H., (2000). Handbook of Natural Language Processing, Marcel Dekker, Inc.
    [19] Davies, J., Fensel, D., Harmelen, F. V., (2003). Towards The Semantic Web: Ontology-driven Knowledge Management, John Wiley & Sons, Ltd, England.
    [20] Dearman, D., Kellar, M., Truong, K. N., (2008). An Examination of Daily Information Needs and Sharing Opportunities. Proceedings of the ACM 2008 Conference on Computer supported cooperative work, pp. 679-688.
    [21] Evans, C., Gibbons, N.J., (2007). The Interactivity Effect in Multimedia Learning. Computer and Education, vol. 49, pp. 1147-1160.
    [22] Fattahi, R., Wilson, C. S., Cole, F., (2008). An Alternative Approach to Natural Language Query Expansion in Search Engines Text Analysis of Non-topical Terms in Web Documents, Information Processing and Management, vol. 44, pp. 1503-1516.
    [23] Fellbaum, V., (1998). Wordnet: an Electronic Lexical Database, The MIT Press, MA: Cambridge.
    [24] Feng, T., Millard, D., Woukeu, A., and Davis, H., (2005). Managing the Semantic Aspects of Learning using the Knowledge Life Cycle, Proceedings of 5th IEEE International Conference on Advanced Learning Technologies, pp. 575-579.
    [25] Frakes, W. B., Baeza-Yates, R. (1992). Information Retrieval: Data Structures and Algorithms, Prentice-Hall, Inc. Upper Saddle River, NJ, USA.
    [26] Frakes, W.B., Baeza-Yates, R., (1992). Information Retrieval: Data Structures and Algorithms, Prentice Hall PTR.
    [27] Gomez-Perez, A., Fernandez-Lopez, M., Corcho, O., (2004). Ontological Engineering with Example from the Areas of Knowledge Management, E-commerce and the Semantic web. Springer-Verlag, London.
    [28] Han, J., Kamber, M., (2001). Data Mining: Concepts and Techniques. Morgan-Kaufman, CA: San Francisco.
    [29] Hassan, O.A.B. (2004). Application of Value-focused Thinking on the Environmental Selection of Wall Structures, Journal of Environmental Management, vol. 70, pp. 181-187.

    [30] Hassan, O.A.B., (2004). Application of Value-focused Thinking on The Environmental Selection of Wall Structures, Journal of Environmental Management, vol. 70, pp. 181-187.

    [31] Hofmann, T., (2001). Unsupervised Learning by Probabilistic Latent Semantic Analysis, Machine Learning, vol. 42, no. 1, pp. 177-196.
    [32] Jeffreys, A. J., Wilson, V., and Thein, S. L., (1985). Individual-specific Fingerprints of Human DNA, Nature, vol. 314, pp. 67-73
    [33] Jeong, B., Lee, D., Cho, H., Lee, J., (2008). A Novel Method for Measuring Semantic Similarity for XML Schema Matching, Expert Systems with Applications, vol. 34, no. 3, pp. 1651-1658.
    [34] Justeson, J.S., Katz, S.M., (1995). Technical Terminology: Some Linguistic Properties And an Algorithm for Identification in Text, Almadem: IBM Research Division.
    [35] Kao, G.Y.M., Lin, S.S.J., Sun, C.T., (2008). Breaking Concept Boundaries to Enhance Creative Potential Using Integrated Concept Maps for Conceptual Self-awareness, Computers and Education, vol. 51, pp. 1718-1728.
    [36] Kozakov, L., Park, Y., Fin, T., Drissi, Y., Doganata, Y., Cofino, T., (2004). Glossary Extraction and Utilization in the Information Search and Delivery System for IBM Technical Support. IBM System Journal, vol. 43, no.3, pp. 546-563.
    [37] Kurengkrai, C., Jaruskulchai, C., (2003). Generic Text Summarization Using Local and Global Properties of Sentences, Proceeding of IEEE International Conference on Web Intelligence, Halifax, pp. 201-206.
    [38] Landauer, T.K., Foltz, P.W., Laham, D., (1998). Introduction to Latent Semantic Analysis, Discourse Processes, vol.25, pp.259-284.
    [39] Lee, C.H., Yang, H.C., (2005). A Classifier-based Text Mining Approach for Evaluating Semantic Relatedness Using Support Vector Machines. Proceedings of the International Conference on Information Technology, vol.1, pp.128- 133.
    [40] Lee, M. C., Tsai, K. H., Wang, T. I., (2008). A Practical Ontology Query Expansion Algorithm for Semantic-aware Learning Objects Retrieval, Computers and Education, vol. 50, pp. 1240-1257.
    [41] Li, D.C., Fang, Y.H., (2006). An Algorithm to Cluster Data for Efficient Classification of Support Vector Machines. Expert Systems with Applications, vol. 34, pp.2013–2018.
    [42] Lin, S.C., Chang, K.E., Sung, Y.T., Chen, G.D., (2002). A New Structural Knowledge Assessment Based on Weighted Concept Maps, Proceeding of the International Conference on Computers in Education, pp. 679-680.
    [43] Lu, X.Q., Ren, F.L., Huang, Z.D., Yao, T.S., (2003). Sentence Similarity Model and the Most Similar Sentence Search Algorithm, Journal of Northeastern University, vol. 24, no. 6, pp. 531-534.
    [44] Malone, J., Dekkers, J. (1984). The Concept Map as an Aid to Instruction in Science and Mathematics, School Science and Mathematics, vol. 84, no. 3, pp. 220-231.
    [45] Moldovan, D., Harabagiu, S., Pasca, M., Mihalcea, R., Goodrum, R., Girju, R., and Rus, V., (1999). Lasso: A Tool for Surfing the Answer Net, Proceedings of the 8th Text Retrieval Conference (TREC-8), pp.175-183.
    [46] Moreale, E., Vargas-Vera, M., (2004). A Question-Answering System Using Argumentation, Proceeding of Mexican International Conference on Artificial Intelligence, pp.400-409.
    [47] Moreda, P., Navarro, B., Palomar, M., (2007). Manuel Corpus-based Semantic Role Approach in Information Retrieval, Data & Knowledge Engineering, vol. 61, no. 3, pp. 467-483.
    [48] Nouali, O., Blache, P., (2004). A Semantic Vector Space and Features-based Approach for Automatic Information Filtering. Expert Systems with Applications, vol.26, no. 2, pp. 171-179.
    [49] Novak, J. D., Musonda, D., (1991). A Twelve-year Longitudinal Study of Science Concept learning, American Education Research Journal, vol. 28, no. 1, pp. 117-153
    [50] Novak, J.D., Gowin, D.B., (1984). Learning How to Learn, Cambridge. London: Cambridge University Press.
    [51] Oh, H.J., Myaeng, S.H., & Jang, M.G., (2007). Semantic Passage Segmentation Based on Sentence Topics for Question Answering, Information Sciences, vol. 177, pp.3696–3717.
    [52] O'Leary, Daniel E., (1999). Internet-based Information and Retrieval Systems, Decision Support Systems, vol. 27, no. 3, pp. 319-327.
    [53] Park, J. and Hunting, S., (2002). XML Topic Maps, Addison-Wesley Professional, MA: Boston.
    [54] Rumelhart, D., Norman, D., (1983). Representation in Memory. Technical report, Department of Psychology and Institute of Cognitive Science, UCSD La Jolla, USA.
    [55] Salton, G., Christopher, B., (1988). Term-Weighting Approaches in Automatic Text Retrieval, Information Processing and Management, vol. 24, no. 5, pp. 513-523.
    [56] Salton, G., Michael J.M., (1986). Introduction to Modern Information Retrieval, McGraw-Hill, Inc., New York, USA.
    [57] Scardamalia, M., Bereiter, C., (1994). Computer Support for Knowledge-Building Communities, The Journal of the Learning Sciences vol.3, no.3, pp.265-283
    [58] Shokouhi, M., Zobel, J., Tahaghoghi, S., & Scholer, F., (2007).Using Query Logs to Establish Vocabularies in Distributed Information Retrieval, Information Processing and Management, vol. 43, pp.169–180.
    [59] Silva, N., Rocha, J., (2003). Complex Semantic Web Ontology Mapping, Proceedings of 2003 IEEE/WIC International Conference on Web Intelligence, Halifax, Canada.
    [60] Sowa, J. F. (2000). Knowledge Representation: Logical, Philosophical and Computational Foundations, Brooks/Cole Publishing Corporation.
    [61] Srikant, R., Agrawal, R., (1997). Mining Generalized Association Rules Future Generation Computer Systems, Expert Systems with Applications, vol. 13, pp. 161-180.

    [62] Storey, V.C., (1993). Understanding Semantic Relationships, Very Large Data Bases Journal, vol. 12, no. 4, pp. 455-488.
    [63] Storey, V.C., (2006). Comparing Relationships in Conceptual Modeling: Mapping to Semantic Classifications, Data and Knowledge Engineering, vol. 17, no. 11, pp. 1478-1489.
    [64] Su, X., Gulla, J. A. (2006). An Information Retrieval Approach to Ontology Mapping, Data & Knowledge Engineering, vol. 58, no. 1, pp. 47-69.
    [65] Tseng, S.S., Sue, P.C., Su, J.M., Weng, J.F., Tsai, W.N., (2007). A New Approach for Constructing the Concept Map, Computers & Education, vol. 49, pp. 691-707.
    [66] Vechtomova, O., Karamuftuoglu, M., Robertson, S.E., (2006). On Document Relevance and Lexical Cohesion between Query Terms. Information Processing and Management, vol. 42, pp.1230–1247.
    [67] Wang, W.M., Cheung, C. F., Lee, W. B., Kwok, S.K., (2008). Mining Knowledge from Natural Language Texts Using Fuzzy Associated Concept mapping, Information Processing and Management, vol. 44, pp. 1707-1719.
    [68] Wang, W.M., Cheung, C.F., Lee, W.B., Kwok, S.K., (2008). Self-associated Concept Mapping for Representation, Elicitation and Inference of Knowledge. Knowledge-based systems, vol. 21, pp. 51-61.
    [69] Weng, S., Chang, H., (2008). Using Ontology Network Analysis for Research Document Recommendation, Expert Systems with Applications, vol. 34, no. 3, pp. 1857-1869.
    [70] Witten, I.H., Frank, E., (2005). Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, CA: San Francisco.
    [71] Yang, C.C., Yen, J., Chen, H.C., (2000). Intelligent Internet Searching Agent Based on Hybrid Simulated Annealing, Decision Support System, vol. 28, no. 3, pp. 269-277.
    [72] Yeh, J.Y., Ke H.R., Yang, W.P., Meng, I.H., (2005). Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis, Information Processing and Management, vol. 41, pp.75-95.
    [73] Zantout, H., Farhi, M., (1999). Document Management Systems from Current Capabilities Towards Intelligent Information Retrieval: an Overview, International Journal of Information Management, vol. 19, no. 6, pp. 471-484.

    下載圖示 校內:2014-07-27公開
    校外:2014-07-27公開
    QR CODE