| 研究生: |
陳彥勳 Yen-Shun, Chen |
|---|---|
| 論文名稱: |
使用WWW資源協助知識本體整合 Ontology Integration Based on World Wide Web Resources |
| 指導教授: |
王惠嘉
Wang, Hei-Chia |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2006 |
| 畢業學年度: | 94 |
| 語文別: | 中文 |
| 論文頁數: | 56 |
| 中文關鍵詞: | 網際網路 、語意整合 、網路探勘 、資訊擷取 、知識本體相配 |
| 外文關鍵詞: | world wide web, ontology mapping, taxonomy, web mining |
| 相關次數: | 點閱:119 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
知識本體(Ontology)是一種知識表達法,主要是建構出人類的知識概念,及概念間的關連,其特性乃是:正規,共享,重用。這些特性使得知識本體間可以直接經由電腦達到自動化的溝通而不需要人工介入---這是知識本體應用最終目標。
然而各組織在建置自己的知識本體之前,並無事先溝通建置標準。因此可以說,沒有重複的知識本體。當跨組織知識分享的需要產生時,所帶來的就是不同規格標準的知識本體整合問題。從2000年有許多學者提出不同知識本體間的比對方法(ontology mapping )。其中領域語彙典,是一項重要輔助資源,它能解決同義異詞的問題,找出語意上正確的概念對應關係。但缺點是語彙典建置需要大量人工、時間,更新慢,使得一些新的用詞,或新興領域用語,無法即時被收錄。因此本體比對若遇到新詞,常常推論錯誤。
本研究為了解決知識本體在做比對時時輔助資訊(information sparseness)不足的問題。我們利用網際網路的資訊,它不止量大,更是常常有新的資訊加入,不會有找不到新詞的疑慮。然而,雜訊多一直是網頁資訊的缺點。為此,許多過濾的機制便因應而生,其中一個顯著有效的,並可應用於字詞關係推論的方法,就是”語意式網路探勘技術”。它藉由找尋一些特定結構的句子,來判斷字詞間的關連。我們利用這種概念,並加入網路位址,以及連結的分析,希望從網頁找出更豐富的資訊,來協助知識本體整合。這樣的系統,其背後的文件庫乃是日新月異且龐大的網際網路資源,這樣的方法可擺脫語彙典的限制,使知識本體整合更具時效性。
Ontology is a kind of knowledge representation model. It can represent the concepts of humans and the relationship between concepts. Owing to its characters which are formal, explicit, and sharing, the computers can communicate with each other through ontologies automatically.
However, different organizations construct their own ontologies to use by themselves. It brought the situation that many ontologies appears but stand on different standards. When Knowledge sharing requirement between organizations arises, the integration problem between ontologies with variant standards happens. Since 2000, many researches attempt to deal with the ontology integration problem. In their methods, the thesauri are main auxiliary information resources. The thesauri are used to detect synonym, hypernym, and hyponym in the mapping process, and increase the mapping precision. But the thesauri construction are time-consuming and human power-consuming. It causes two problems. First, for the new domain, it does not have a thesaurus. And for the new terms, they are not put in to the thesaurus in time. Owing to the bottleneck, the ontology mapping applications above are restricted to some domains.
To solve the problem of auxiliary information lacking, we utilize web as our resources. For find the precise concepts relations on web, we use sites structure analysis and linguistic pattern mining to find some concepts relations cues. Then we combine both cues to extract the hierarchy knowledge hidden in the site. The hierarchy knowledge can be used to support ontology mapping process to improve mapping precision.
Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Informatin Retrieval. New York: The ACM Press.
Berghel, H. (1997). Cyberspace 2000: Dealing with information overload. Communications of the ACM, 40(2), 19-24.
Bernstein, P. A., Madhavan, J. and Rahm, E.(2001). Generic Schema Matching with Cupid. In the Twenty Seventh International Conference on Very Large Databases (VLDB'2001), Aug, Roma, Italy.
BRILL, E. (1994). Some advances in transformation-based part of speech tagging. In Proceedings of the Twelfth National Conference on Artificial Intelligence, 722-727.
Chen, J., Zhou, B., Shi, J., Zhang, H. and Wu, Q. ( 2001). Function-based Object Model Towards Website Adaptation. In Proceedings of 10th International WWW Conference, Hong-Kong. 587-596.
Chen, Z., Liu, S., Liu, W. and Ma, W.-Y. (2003, July). Building a Web Thesaurus from Web Link Structure. In Proceeding of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, Toronto, Canada. 48-55.
Cimiano, P. and Staab, S. (2004). Learning by Googling. SIGKDD Explorations, 6(2), 24-34.
Cimiano, P., Ladwig, G.. and Staab, S. (2005). Gimme The Context: Context-driven automatic semantic annotation with C-PANKOW. In Proceedings of the 14th World Wide Web Conference, Chiba, Japan. 332 – 341.
Davulcu, H., Vadrevu, S., Nagarajan, S., and Gelgi, F. (2005). Automated Metadata and Instance Extraction from News Web Sites. International Journal of Web and Grid Services 2005, 1(2), 196 - 221.
Ding, Y., Fensel, D., Klein, M. and Omelayenko, B. (2002). The Semantic Web: Yet Another Hip?. Data and Knowledge Engineering, 41(3), 205-227.
Do, H. H. and Rahm, E. (2002, Aug). COMA - A system for flexible combination of schema matching approaches. In Proceedings of the 28th International Conference on Very Large Databases, Hongkong.
Doan, A., Madhavan, J., Dhamankar, R., Domingos, P. and Halevy, A. (2003). Learning to match ontologies on the Semantic Web. The International Journal on Very Large Data Bases, 12(4), 303-319.
Ehrig, M. and Staab, S. (2004,Nov). QOM - Quick Ontology Mapping.In proceedings of the Third International Semantic Web Conference, Hiroshima, Japan.
Gupta, S., Kaiser, G., Grimm, P., Chiang, M., and Starren, J. (2005). Automating Content Extraction of HTML Documents. World Wide Web, 8(2),179-224.
Hage, V. W. R., Katrenko, S., and Schreiber, G.. (2005). A Method to Combine Linguistic Ontology-Mapping Techniques. In Proceedings of ISWC, Galway, Ireland.
Hearst, M. A. (1992). Automatic Acquisition of Hyponyms from Large Text Corpora. In Proceedings of the 14th International Conference on Computational Linguistics, 539-545.
Hippisley, A., Cheng, D., and Ahmad, K. (2005). The head-modifier principle and multilingual term extraction. Natural Language Engineering,11(2),129-157.
Jinwon, H., and Rong, T. (2001). Towards an optimal resolution to information overload: An infomediary approach. Proceedings of the 2001 International ACM SIGGROUP Conference on Supporting Group Work, 91-96.
Kalfoglou, Y. and Schorlemmer, M. (2002, Oct). Information-Flow-based Ontology Mapping. In proceedings of the 1st International Conference on Ontologies, Databases and Application of Semantics (ODBASE'02), Irvine, CA, USA.
Kalfoglou, Y. and Schorlemmer, M. (2003). IF-Map: an ontology mapping method based on Information Flow theory. Journal on Data Semantics, 1(1), 98-127.
Kalfoglou, Y. and Schorlemmer, M. (2003). Ontology mapping: the state of the art. The Knowledge Engineering Review, 18(1), 1-31.
Liu, B., Chin, C. W., and Ng, H. T.(2003), Mining Topic-Specific Concepts and Definitions on the Web. In Proceedings International WWW Conference, Budapest, Hungary.
Lu, W. H., Chien, L. F., and Lee, H. J. (2004). Anchor Text Mining for Translation of Web Queries: A Transitive Translation Approach. ACM Transactions on Information Systems, 22(2), 242-269.
Noy, N. F. and Musen, M. A. (2003). The PROMPT Suite: Interactive Tools For Ontology Merging And Mapping. International Journal of Human-Computer Studies, 59(6), 983-1024.
Noy, N. F. and Musen, M. A. (2001). Anchor-PROMPT: Using Non-Local Context for Semantic Matching. In Proceedings of WS Ontologies & Information Sharing at IJCAI-2001, Seattle, USA.
Rahm, E. and Bernstein, P. A. (2001). A survey of approaches to automatic schema matching. The VLDB Journal, 10(4), 334-350.
Resnik, p. and Smith, N. (2003). The web as a parallel corpus. Computational Linguistics, 29(3), 349-380.
Rocha, C., Schwabe, D. and Aragao, M. P. (2004, May). A hybrid approach for searching in the semantic web. In proceedings of the 13th international conference on World Wide Web, NewYork, USA.
Sergey, M., Hector, G.-M. and Erhard, R. (2002). Similarity Flooding: A Versatile Graph Matching Algorithm and its Application to Schema Matching. In 18th International Conference on Data Engineering (ICDE'02), San Jose, California, USA.
Shvaiko, P. (2004, Nov). A classification of schema-based matching approaches. In Proceedings of the Meaning, Negotiation and Coordination workshop (MCN'04) at the 3rd International Semantic Web Conference (ISWC'04), Hiroshima, Japan.
Uschold, M. and Gruninger, M. (2004). Ontologies and semantics for seamless connectivity. ACM SIGMOD Record, 33(4), 58-64.
Zhang, D. and Lee, W. S. (2004). Learning to Integrate Web Taxonomies. Journal of Web Semantics, 2(2), 131-151.