| 研究生: |
許安順 Hsu, An-Shun |
|---|---|
| 論文名稱: |
語意感知為基之資訊檢索機制研發 Development of a Semantic Awareness-based Information Retrieval Mechanism |
| 指導教授: |
陳裕民
Chen, Yuh-Min |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 製造工程研究所 Institute of Manufacturing Engineering |
| 論文出版年: | 2008 |
| 畢業學年度: | 96 |
| 語文別: | 中文 |
| 論文頁數: | 89 |
| 中文關鍵詞: | 資訊檢索 、語意擷取 、潛在語意分析 、支持向量機制 |
| 外文關鍵詞: | Support vector machines, Latent semantic analysis, Semantic extraction, Information retrieval |
| 相關次數: | 點閱:108 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
資訊科技的進步與網際網路的快速發展,實現了便利與通透的資訊分享。由於數位資訊快速累積,致使透過網際網路搜尋資訊常存在下列問題:(1)傳統以關鍵字為基的搜尋方法僅能比對資訊部份概念,使用者必須進行多次修改查詢才能得到所需之內容;(2)相對於一般的文章,查詢通常以較少的內容構成,導致因比對資訊量不足所造成的主題不易判定與適當內容不易搜尋的困難;(3)人類語言具曖昧性,造成語意落差,也易導致搜尋結果錯誤。
為解決上述問題,本研究發展一個語意感知為基之資訊檢索機制。透過「內容語意擷取與鑑定」、「查詢內容語意圖像之語意擴張」與「內容語意圖像之搜尋」,本機制可提供更正確之搜尋結果。經由語意分析、語意探勘與語意比較,可解決傳統關鍵字為基礎之資訊檢索技術所無法克服的語意曖昧問題,有效提升資訊檢索正確性與效率。
The rapid advance in information technologies and the fast development of the Internet have realized expedient and transparent information sharing. However, the following problems often occur due to the fast accumulation of information, when searching for content via Internet. (1) Conventional keyword-based search methods can only make partial concept comparisons. Revisions on query are always required before getting appropriate contents. (2) As contents provided by typical queries are less than that of general texts, difficulties in determining search topics and matching appropriate contents occurred very often due to lack of information. (3) Semantics variations may cause concept ambiguity and lead to the low accuracy in information retrieval.
To address the aforementioned issues, this study developed a semantic- awareness mechanism for information retrieval. By conducting “semantic retrieval and determination” and “query content semantic extension” and “semantic pattern search”, the mechanism provides more accurate results as compared to traditional keyword based methods. Through semantic analysis, latent semantics mining, and semantic comparison, the issues caused by semantic ambiguity can be resolved and thus improve efficiency and accuracy of information retrieval.
[1] Abdelali, A., Cowie, J., & Soliman, H.S. (2007). Improving query precision using semantic expansion. Information Processing and Management, vol.43, pp.705–716
[2] Belgacem, F.B. (1999). The mortar finite element method with Lagrange multipliers, Numericche Mathematik, vol.84, pp.173–197.
[3] Berry, M.W. (1992). Large scale singular value computations. International Journal of Supercomputer Applications, vol.6(1), pp.13-49.
[4] Berry, M.W., Dumais, S.T., & O'Brien, G.W. (1995). Using Linear Algebra for Intelligent Information Retrieval. Society for Industrial and Applied Mathematics, vol.37(4), pp.573-595.
[5] Bezerra, B.L.D., & Carvalho, F. de A.T. de. (2004). A symbolic approach for content-based information filtering, Information Processing Letters, vol.92, pp. 45-52.
[6] Borko, H., & Bernick, M. (1963). Automatic Document Classification. Journal of the ACM, vol.10(1), pp.151-162.
[7] Chang, C.C., Hsu, C.W., & Lin, C.J. (2000). The Analysis of Decomposition Methods for Support Vector Machines. IEEE Transactions on Neural Networks, vol.11(4), pp.1003-1008.
[8] Chien, L.F. (1997). PAT-tree-based keyword extraction for Chinese Information retrieval. ACM Special Interest Group on Information Retrieval Forum, vol.31(S1), pp.50-58.
[9] Collins, A.M., and Ross, M.Q. (1969). Retrieval Time from Semantic Memory, Journal of Verbal Learning and Verbal Behavior, vol.8, pp.240-248.
[10] Davies, J., et al., the semantic web: ontology-driven knowledge management
[11] Frakes, W., B., and Ricardo, B.Y. (1992). Information Retrieval Data Structures and Algorithms, Prentice-Hall, Inc., NJ, USA, 1992.
[12] Gomez-Perez, A., et al., Ontology Engineering.
[13] Jason, C. (2000). Personal Ontology for Web Navigation, Conference on Information and Knowledge Management Proceedings of the Ninth, pp. 227-234.
[14] Landauer, T.K., Foltz, P.W., & Laham, D. (1998). Introduction to Latent Semantic Analysis. Discourse Processes, vol.25, pp.259-284.
[15] Lee, C.H., Yang, H.C. (2005). A classifier-based text mining approach for evaluating semantic relatedness using support vector machines. Proceedings of the International Conference on Information Technology, vol.1, pp.128- 133.
[16] Lee, K.S., Kageura, K. (2007). Virtual relevant documents in text categorization with support vector machines. Information Processing and Management, vol.43, pp.902–913.
[17] Leonard, K. and Rousseeuw, P.J. (1990). Finding Groups in Data. An Introduction to Cluster Analysis, Wiley-Interscience, New York.
[18] Li, D.C., Fang, Y.H. (2006). An algorithm to cluster data for efficient classification of support vector machines. Expert Systems with Applications, vol.34, pp.2013–2018.
[19] Moreale, E. and Vargas-VeraA, M. (2004). A Question-Answering System Using Argumentation. Mexican International Conference on Artificial Intelligence, 26th-30th, pp.400-409.
[20] Moreale, E., Vargas-Vera, M. (2004). Semantic Services in e-Learning: an Argumentation Case Study. Educational Technology & Society, vol.7 (4), pp.112-128.
[21] Oh, H.J., Myaeng, S.H., & Jang, M.G. (2007). Semantic passage segmentation based on sentence topics for question answering. Information Sciences, vol.177, pp.3696–3717.
[22] O'Leary, Daniel E. (1999). Internet-based information and retrieval systems. Decision Support Systems, vol.27(3), pp.319-327.
[23] Park, J. and Hunting, S. (2002). XML topic maps. Addison-Wesley Professional, MA: Boston.
[24] Punj, G. and Stewart, D. (1983). Cluster Analysis in Marketing Research: Review and Suggestions for Application, Journal of Marketing Research, pp.134-148.
[25] Salton, G., and Michael J.M. (1986). Introduction to Modern Information Retrieval, McGraw-Hill, Inc., New York, USA.
[26] Salton, G., Christopher, B., (1988). Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management: an International Journal, vol.24(5), pp.513-523.
[27] Salton, G., Lesk, M.E. (1965). The Smart Automatic Document Retrieval Systems-An Illustration. Communications of the ACM, vol.8(6), pp.391– 398.
[28] Scardamalia, M., & Bereiter, C. (1994). Computer support for knowledge-building communities. The Journal of the Learning Sciences, vol.3(3), pp.265-283.
[29] Shokouhi, M., Zobel, J., Tahaghoghi, S., & Scholer, F. (2007).Using query logs to establish vocabularies in distributed information retrieval. Information Processing and Management, vol.43, pp.169–180.
[30] Song, M., Song, I.Y., Hu, X., Allen R.B. (2007). Integration of association rules and ontologies for semantic query expansion. Data & Knowledge Engineering, vol.63, pp.63–75.
[31] Storey, V.C. (2006). Comparing relationships in conceptual modeling: mapping to semantic classifications. Data and Knowledge Engineering, vol.17(11), pp.1478-1489.
[32] Vechtomova, O., Karamuftuoglu, M., Robertson, S.E. (2006). On document relevance and lexical cohesion between query terms. Information Processing and Management, vol.42, pp.1230–1247.
[33] Wiesman, F.; Hasman, Arie; van den Herik, H.J. (1997). Information retrieval: an overview of system characteristics. International Journal of Medical Informatics, Vol. 47(1-2), pp.5-26.
[34] Xu, Q., Zuo, W. (2004). Extracting Precise Link Context Using NLP Parsing Technique. Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, pp.64- 69.
[35] Yang, C.C., Yen, J. & Chen, H.C. (2000). Intelligent Internet Searching Agent Based on Hybrid Simulated Annealing. Decision Support System, vol.28(3), pp. 269-277.
[36] Yeh, J.Y., Ke H.R., Yang, W.P., & Meng, I.H., (2005). Text summarization using a trainable summarizer and latent semantic analysis. Information Processing and Management, vol.41, pp.75–95.
[37] Ying, Z. and Karypis, George. (2002). Evaluation of Hierarchical Clustering Algorithms for Document Datasets, Conference on Information and Knowledge Management Proceedings of the eleventh, pp.515-524.
[38] Zantout, H. and Farhi, M. (1999). Document management systems from current capabilities towards intelligent information retrieval: an overview. International Journal of Information Management Volume: 19, Issue: 6, pp. 471-484.
[39] Zhang, J., Dimitroff, A. (2004). The impact of webpage content characteristics on webpage visibility in search engine results (Part I). Information Processing and Management, vol.41, pp.665–690.
[40] Dan, M.; Sanda; H., Marius; P., Rada; M., Richard; G., Roxana, G., Vasile, R. (1999). LASSO: A Tool for Surfing the Answer Net, Proceedings of the Text Retrieval Conference (TREC-8),
Gaithersburg, Maryland, USA, November, pp. 175-183.