簡易檢索 / 詳目顯示

研究生: 許安順
Hsu, An-Shun
論文名稱: 語意感知為基之資訊檢索機制研發
Development of a Semantic Awareness-based Information Retrieval Mechanism
指導教授: 陳裕民
Chen, Yuh-Min
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 製造工程研究所
Institute of Manufacturing Engineering
論文出版年: 2008
畢業學年度: 96
語文別: 中文
論文頁數: 89
中文關鍵詞: 資訊檢索語意擷取潛在語意分析支持向量機制
外文關鍵詞: Support vector machines, Latent semantic analysis, Semantic extraction, Information retrieval
相關次數: 點閱:108下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 資訊科技的進步與網際網路的快速發展,實現了便利與通透的資訊分享。由於數位資訊快速累積,致使透過網際網路搜尋資訊常存在下列問題:(1)傳統以關鍵字為基的搜尋方法僅能比對資訊部份概念,使用者必須進行多次修改查詢才能得到所需之內容;(2)相對於一般的文章,查詢通常以較少的內容構成,導致因比對資訊量不足所造成的主題不易判定與適當內容不易搜尋的困難;(3)人類語言具曖昧性,造成語意落差,也易導致搜尋結果錯誤。
    為解決上述問題,本研究發展一個語意感知為基之資訊檢索機制。透過「內容語意擷取與鑑定」、「查詢內容語意圖像之語意擴張」與「內容語意圖像之搜尋」,本機制可提供更正確之搜尋結果。經由語意分析、語意探勘與語意比較,可解決傳統關鍵字為基礎之資訊檢索技術所無法克服的語意曖昧問題,有效提升資訊檢索正確性與效率。

    The rapid advance in information technologies and the fast development of the Internet have realized expedient and transparent information sharing. However, the following problems often occur due to the fast accumulation of information, when searching for content via Internet. (1) Conventional keyword-based search methods can only make partial concept comparisons. Revisions on query are always required before getting appropriate contents. (2) As contents provided by typical queries are less than that of general texts, difficulties in determining search topics and matching appropriate contents occurred very often due to lack of information. (3) Semantics variations may cause concept ambiguity and lead to the low accuracy in information retrieval.
    To address the aforementioned issues, this study developed a semantic- awareness mechanism for information retrieval. By conducting “semantic retrieval and determination” and “query content semantic extension” and “semantic pattern search”, the mechanism provides more accurate results as compared to traditional keyword based methods. Through semantic analysis, latent semantics mining, and semantic comparison, the issues caused by semantic ambiguity can be resolved and thus improve efficiency and accuracy of information retrieval.

    中文摘要......................................I Abstract.....................................II 誌謝........................................III 目錄.........................................IV 圖目錄......................................VII 表目錄.......................................IX 第一章 緒論...................................1 1.1研究背景...................................1 1.2研究動機...................................2 1.3研究目的...................................2 1.4問題定義與分析.............................3 1.5研究項目...................................4 1.6研究步驟...................................5 1.7論文架構...................................8 第二章 文獻探討...............................9 2.1語意分析...................................9 2.2資訊擷取模型..............................11 2.3概念關係擴張..............................12 2.4文件分類..................................15 第三章 機制架構設計..........................18 3.1語意感知為基資訊檢索模式設計..............18 3.2語意感知為基之資訊檢索機制架構設計........22 第四章 語意感知機制核心元件設計..............27 4.1內容語意擷取與鑑定........................27 4.1.1.內容前處理與摘要.......................29 4.1.2.內容語意識別與呈現.....................34 4.1.3.內容語意圖像之建構.....................39 4.2.查詢內容語意圖像之語意擴張...............41 4.2.1.內容語意圖像之矩陣轉換.................43 4.2.2語意矩陣之奇異值分解....................45 4.2.3.語意矩陣之維度約化.....................47 4.2.4.語意矩陣之潛在語意選擇.................49 4.3.內容語意圖像之搜尋.......................51 4.3.1.內容語意圖像搜尋之前處理. .............53 4.3.2.內容語意圖像之分類類別超平面切割.......55 4.3.3.內容語意圖像之支持向量產生.............57 4.3.4.內容語意圖像之群聚與比對...............60 第五章 實驗設計與機制驗證....................62 5.1.實作環境介紹.............................62 5.2.資料簡介.................................62 5.3.實驗流程.................................63 5.3.1實驗資料前置處理........................65 5.3.2實驗一:查詢擴張........................66 5.3.3實驗二:分類訓練........................68 5.3.4實驗三:內容搜尋........................78 第六章 結論與未來展望........................81 6.1.結論與成果...............................81 6.2.未來研究方向.............................83 參考文獻.....................................85

    [1] Abdelali, A., Cowie, J., & Soliman, H.S. (2007). Improving query precision using semantic expansion. Information Processing and Management, vol.43, pp.705–716
    [2] Belgacem, F.B. (1999). The mortar finite element method with Lagrange multipliers, Numericche Mathematik, vol.84, pp.173–197.
    [3] Berry, M.W. (1992). Large scale singular value computations. International Journal of Supercomputer Applications, vol.6(1), pp.13-49.
    [4] Berry, M.W., Dumais, S.T., & O'Brien, G.W. (1995). Using Linear Algebra for Intelligent Information Retrieval. Society for Industrial and Applied Mathematics, vol.37(4), pp.573-595.
    [5] Bezerra, B.L.D., & Carvalho, F. de A.T. de. (2004). A symbolic approach for content-based information filtering, Information Processing Letters, vol.92, pp. 45-52.
    [6] Borko, H., & Bernick, M. (1963). Automatic Document Classification. Journal of the ACM, vol.10(1), pp.151-162.
    [7] Chang, C.C., Hsu, C.W., & Lin, C.J. (2000). The Analysis of Decomposition Methods for Support Vector Machines. IEEE Transactions on Neural Networks, vol.11(4), pp.1003-1008.
    [8] Chien, L.F. (1997). PAT-tree-based keyword extraction for Chinese Information retrieval. ACM Special Interest Group on Information Retrieval Forum, vol.31(S1), pp.50-58.
    [9] Collins, A.M., and Ross, M.Q. (1969). Retrieval Time from Semantic Memory, Journal of Verbal Learning and Verbal Behavior, vol.8, pp.240-248.
    [10] Davies, J., et al., the semantic web: ontology-driven knowledge management
    [11] Frakes, W., B., and Ricardo, B.Y. (1992). Information Retrieval Data Structures and Algorithms, Prentice-Hall, Inc., NJ, USA, 1992.
    [12] Gomez-Perez, A., et al., Ontology Engineering.
    [13] Jason, C. (2000). Personal Ontology for Web Navigation, Conference on Information and Knowledge Management Proceedings of the Ninth, pp. 227-234.
    [14] Landauer, T.K., Foltz, P.W., & Laham, D. (1998). Introduction to Latent Semantic Analysis. Discourse Processes, vol.25, pp.259-284.
    [15] Lee, C.H., Yang, H.C. (2005). A classifier-based text mining approach for evaluating semantic relatedness using support vector machines. Proceedings of the International Conference on Information Technology, vol.1, pp.128- 133.
    [16] Lee, K.S., Kageura, K. (2007). Virtual relevant documents in text categorization with support vector machines. Information Processing and Management, vol.43, pp.902–913.
    [17] Leonard, K. and Rousseeuw, P.J. (1990). Finding Groups in Data. An Introduction to Cluster Analysis, Wiley-Interscience, New York.
    [18] Li, D.C., Fang, Y.H. (2006). An algorithm to cluster data for efficient classification of support vector machines. Expert Systems with Applications, vol.34, pp.2013–2018.
    [19] Moreale, E. and Vargas-VeraA, M. (2004). A Question-Answering System Using Argumentation. Mexican International Conference on Artificial Intelligence, 26th-30th, pp.400-409.
    [20] Moreale, E., Vargas-Vera, M. (2004). Semantic Services in e-Learning: an Argumentation Case Study. Educational Technology & Society, vol.7 (4), pp.112-128.
    [21] Oh, H.J., Myaeng, S.H., & Jang, M.G. (2007). Semantic passage segmentation based on sentence topics for question answering. Information Sciences, vol.177, pp.3696–3717.
    [22] O'Leary, Daniel E. (1999). Internet-based information and retrieval systems. Decision Support Systems, vol.27(3), pp.319-327.
    [23] Park, J. and Hunting, S. (2002). XML topic maps. Addison-Wesley Professional, MA: Boston.
    [24] Punj, G. and Stewart, D. (1983). Cluster Analysis in Marketing Research: Review and Suggestions for Application, Journal of Marketing Research, pp.134-148.
    [25] Salton, G., and Michael J.M. (1986). Introduction to Modern Information Retrieval, McGraw-Hill, Inc., New York, USA.
    [26] Salton, G., Christopher, B., (1988). Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management: an International Journal, vol.24(5), pp.513-523.
    [27] Salton, G., Lesk, M.E. (1965). The Smart Automatic Document Retrieval Systems-An Illustration. Communications of the ACM, vol.8(6), pp.391– 398.
    [28] Scardamalia, M., & Bereiter, C. (1994). Computer support for knowledge-building communities. The Journal of the Learning Sciences, vol.3(3), pp.265-283.
    [29] Shokouhi, M., Zobel, J., Tahaghoghi, S., & Scholer, F. (2007).Using query logs to establish vocabularies in distributed information retrieval. Information Processing and Management, vol.43, pp.169–180.
    [30] Song, M., Song, I.Y., Hu, X., Allen R.B. (2007). Integration of association rules and ontologies for semantic query expansion. Data & Knowledge Engineering, vol.63, pp.63–75.
    [31] Storey, V.C. (2006). Comparing relationships in conceptual modeling: mapping to semantic classifications. Data and Knowledge Engineering, vol.17(11), pp.1478-1489.
    [32] Vechtomova, O., Karamuftuoglu, M., Robertson, S.E. (2006). On document relevance and lexical cohesion between query terms. Information Processing and Management, vol.42, pp.1230–1247.
    [33] Wiesman, F.; Hasman, Arie; van den Herik, H.J. (1997). Information retrieval: an overview of system characteristics. International Journal of Medical Informatics, Vol. 47(1-2), pp.5-26.
    [34] Xu, Q., Zuo, W. (2004). Extracting Precise Link Context Using NLP Parsing Technique. Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, pp.64- 69.
    [35] Yang, C.C., Yen, J. & Chen, H.C. (2000). Intelligent Internet Searching Agent Based on Hybrid Simulated Annealing. Decision Support System, vol.28(3), pp. 269-277.
    [36] Yeh, J.Y., Ke H.R., Yang, W.P., & Meng, I.H., (2005). Text summarization using a trainable summarizer and latent semantic analysis. Information Processing and Management, vol.41, pp.75–95.
    [37] Ying, Z. and Karypis, George. (2002). Evaluation of Hierarchical Clustering Algorithms for Document Datasets, Conference on Information and Knowledge Management Proceedings of the eleventh, pp.515-524.
    [38] Zantout, H. and Farhi, M. (1999). Document management systems from current capabilities towards intelligent information retrieval: an overview. International Journal of Information Management Volume: 19, Issue: 6, pp. 471-484.
    [39] Zhang, J., Dimitroff, A. (2004). The impact of webpage content characteristics on webpage visibility in search engine results (Part I). Information Processing and Management, vol.41, pp.665–690.
    [40] Dan, M.; Sanda; H., Marius; P., Rada; M., Richard; G., Roxana, G., Vasile, R. (1999). LASSO: A Tool for Surfing the Answer Net, Proceedings of the Text Retrieval Conference (TREC-8),
    Gaithersburg, Maryland, USA, November, pp. 175-183.

    下載圖示 校內:2013-07-17公開
    校外:2013-07-17公開
    QR CODE