簡易檢索 / 詳目顯示

研究生: 陳彥勳
Yen-Shun, Chen
論文名稱: 使用WWW資源協助知識本體整合
Ontology Integration Based on World Wide Web Resources
指導教授: 王惠嘉
Wang, Hei-Chia
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2006
畢業學年度: 94
語文別: 中文
論文頁數: 56
中文關鍵詞: 網際網路語意整合網路探勘資訊擷取知識本體相配
外文關鍵詞: world wide web, ontology mapping, taxonomy, web mining
相關次數: 點閱:119下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   知識本體(Ontology)是一種知識表達法,主要是建構出人類的知識概念,及概念間的關連,其特性乃是:正規,共享,重用。這些特性使得知識本體間可以直接經由電腦達到自動化的溝通而不需要人工介入---這是知識本體應用最終目標。
      然而各組織在建置自己的知識本體之前,並無事先溝通建置標準。因此可以說,沒有重複的知識本體。當跨組織知識分享的需要產生時,所帶來的就是不同規格標準的知識本體整合問題。從2000年有許多學者提出不同知識本體間的比對方法(ontology mapping )。其中領域語彙典,是一項重要輔助資源,它能解決同義異詞的問題,找出語意上正確的概念對應關係。但缺點是語彙典建置需要大量人工、時間,更新慢,使得一些新的用詞,或新興領域用語,無法即時被收錄。因此本體比對若遇到新詞,常常推論錯誤。
      本研究為了解決知識本體在做比對時時輔助資訊(information sparseness)不足的問題。我們利用網際網路的資訊,它不止量大,更是常常有新的資訊加入,不會有找不到新詞的疑慮。然而,雜訊多一直是網頁資訊的缺點。為此,許多過濾的機制便因應而生,其中一個顯著有效的,並可應用於字詞關係推論的方法,就是”語意式網路探勘技術”。它藉由找尋一些特定結構的句子,來判斷字詞間的關連。我們利用這種概念,並加入網路位址,以及連結的分析,希望從網頁找出更豐富的資訊,來協助知識本體整合。這樣的系統,其背後的文件庫乃是日新月異且龐大的網際網路資源,這樣的方法可擺脫語彙典的限制,使知識本體整合更具時效性。

      Ontology is a kind of knowledge representation model. It can represent the concepts of humans and the relationship between concepts. Owing to its characters which are formal, explicit, and sharing, the computers can communicate with each other through ontologies automatically.
      However, different organizations construct their own ontologies to use by themselves. It brought the situation that many ontologies appears but stand on different standards. When Knowledge sharing requirement between organizations arises, the integration problem between ontologies with variant standards happens. Since 2000, many researches attempt to deal with the ontology integration problem. In their methods, the thesauri are main auxiliary information resources. The thesauri are used to detect synonym, hypernym, and hyponym in the mapping process, and increase the mapping precision. But the thesauri construction are time-consuming and human power-consuming. It causes two problems. First, for the new domain, it does not have a thesaurus. And for the new terms, they are not put in to the thesaurus in time. Owing to the bottleneck, the ontology mapping applications above are restricted to some domains.
      To solve the problem of auxiliary information lacking, we utilize web as our resources. For find the precise concepts relations on web, we use sites structure analysis and linguistic pattern mining to find some concepts relations cues. Then we combine both cues to extract the hierarchy knowledge hidden in the site. The hierarchy knowledge can be used to support ontology mapping process to improve mapping precision.

    摘要                I abstract              II 目錄圖目錄             III 圖目錄               IV 表目錄               V 1.   緒論             1 1.1   研究背景 1 1.2   研究動機與目的 4 1.3   研究流程 6 1.4   研究範圍限制 7 1.5   論文章節說明 7 2.   文獻探討 8 2.1   知識本體論 8 2.2.2   傳統知識本體比對方法 10 2.2.3   目前知識本體比對方法 11 3.   研究方法 16 3.1   方法架構總覽 16 3.2   概念樹(Concept tree)    17 3.2.1   Site Selection 19 3.2.2   Site tree construction 20 3.2.3   Site link structure 20 3.2.4   Page content structure 21 3.2.5   概念屬性蒐集 31 3.2.6   Concept tree construction 33 4.   實作驗證 38 4.1   系統建構 38 4.2   實驗方法 39 4.2.1   資料來源 39 4.2.2   評估指標 44 5.   結論             51     參考文獻           53

    Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Informatin Retrieval. New    York: The ACM Press.
    Berghel, H. (1997). Cyberspace 2000: Dealing with information overload.       Communications of the ACM, 40(2), 19-24.
    Bernstein, P. A., Madhavan, J. and Rahm, E.(2001). Generic Schema Matching with   Cupid. In the Twenty Seventh International Conference on Very Large Databases   (VLDB'2001), Aug, Roma, Italy.
    BRILL, E. (1994). Some advances in transformation-based part of speech tagging.   In Proceedings of the Twelfth National Conference on Artificial Intelligence,  722-727.
    Chen, J., Zhou, B., Shi, J., Zhang, H. and Wu, Q. ( 2001). Function-based Object   Model Towards Website Adaptation. In Proceedings of 10th International WWW    Conference, Hong-Kong. 587-596.
    Chen, Z., Liu, S., Liu, W. and Ma, W.-Y. (2003, July). Building a Web Thesaurus   from Web Link Structure. In Proceeding of the 26th annual international ACM   SIGIR conference on Research and development in information retrieval,      Toronto, Canada. 48-55.
    Cimiano, P. and Staab, S. (2004). Learning by Googling. SIGKDD Explorations,     6(2), 24-34.
    Cimiano, P., Ladwig, G.. and Staab, S. (2005). Gimme The Context: Context-driven   automatic semantic annotation with C-PANKOW. In Proceedings of the 14th World   Wide Web Conference, Chiba, Japan. 332 – 341.
    Davulcu, H., Vadrevu, S., Nagarajan, S., and Gelgi, F. (2005). Automated Metadata   and Instance Extraction from News Web Sites. International Journal of Web and   Grid Services 2005, 1(2), 196 - 221.
    Ding, Y., Fensel, D., Klein, M. and Omelayenko, B. (2002). The Semantic Web: Yet   Another Hip?. Data and Knowledge Engineering, 41(3), 205-227.
    Do, H. H. and Rahm, E. (2002, Aug). COMA - A system for flexible combination of   schema matching approaches. In Proceedings of the 28th International       Conference on Very Large Databases, Hongkong.
    Doan, A., Madhavan, J., Dhamankar, R., Domingos, P. and Halevy, A. (2003).      Learning to match ontologies on the Semantic Web. The International Journal   on Very Large Data Bases, 12(4), 303-319.
    Ehrig, M. and Staab, S. (2004,Nov). QOM - Quick Ontology Mapping.In proceedings   of the Third International Semantic Web Conference, Hiroshima, Japan.
    Gupta, S., Kaiser, G., Grimm, P., Chiang, M., and Starren, J. (2005).        Automating Content Extraction of HTML Documents. World Wide Web,         8(2),179-224.
    Hage, V. W. R., Katrenko, S., and Schreiber, G.. (2005). A Method to Combine     Linguistic Ontology-Mapping Techniques. In Proceedings of ISWC, Galway,     Ireland.
    Hearst, M. A. (1992). Automatic Acquisition of Hyponyms from Large Text Corpora.   In Proceedings of the 14th International Conference on Computational       Linguistics, 539-545.
    Hippisley, A., Cheng, D., and Ahmad, K. (2005). The head-modifier principle and   multilingual term extraction. Natural Language Engineering,11(2),129-157.
    Jinwon, H., and Rong, T. (2001). Towards an optimal resolution to information    overload: An infomediary approach. Proceedings of the 2001 International ACM   SIGGROUP Conference on Supporting Group Work, 91-96.
    Kalfoglou, Y. and Schorlemmer, M. (2002, Oct). Information-Flow-based Ontology    Mapping. In proceedings of the 1st International Conference on Ontologies,    Databases and Application of Semantics (ODBASE'02), Irvine, CA, USA.
    Kalfoglou, Y. and Schorlemmer, M. (2003). IF-Map: an ontology mapping method     based on Information Flow theory. Journal on Data Semantics, 1(1), 98-127.
    Kalfoglou, Y. and Schorlemmer, M. (2003). Ontology mapping: the state of the art.   The Knowledge Engineering Review, 18(1), 1-31.
    Liu, B., Chin, C. W., and Ng, H. T.(2003), Mining Topic-Specific Concepts and    Definitions on the Web. In Proceedings International WWW Conference,       Budapest, Hungary.
    Lu, W. H., Chien, L. F., and Lee, H. J. (2004). Anchor Text Mining for        Translation of Web Queries: A Transitive Translation Approach. ACM        Transactions on Information Systems, 22(2), 242-269.
    Noy, N. F. and Musen, M. A. (2003). The PROMPT Suite: Interactive Tools For     Ontology Merging And Mapping. International Journal of Human-Computer       Studies, 59(6), 983-1024.
    Noy, N. F. and Musen, M. A. (2001). Anchor-PROMPT: Using Non-Local Context for    Semantic Matching. In Proceedings of WS Ontologies & Information Sharing at   IJCAI-2001, Seattle, USA.
    Rahm, E. and Bernstein, P. A. (2001). A survey of approaches to automatic schema   matching. The VLDB Journal, 10(4), 334-350.
    Resnik, p. and Smith, N. (2003). The web as a parallel corpus. Computational     Linguistics, 29(3), 349-380.
    Rocha, C., Schwabe, D. and Aragao, M. P. (2004, May). A hybrid approach for     searching in the semantic web. In proceedings of the 13th international     conference on World Wide Web, NewYork, USA.
    Sergey, M., Hector, G.-M. and Erhard, R. (2002). Similarity Flooding: A Versatile   Graph Matching Algorithm and its Application to Schema Matching. In 18th     International Conference on Data Engineering (ICDE'02), San Jose, California,   USA.
    Shvaiko, P. (2004, Nov). A classification of schema-based matching approaches. In   Proceedings of the Meaning, Negotiation and Coordination workshop (MCN'04) at   the 3rd International Semantic Web Conference (ISWC'04), Hiroshima, Japan.
    Uschold, M. and Gruninger, M. (2004). Ontologies and semantics for seamless     connectivity. ACM SIGMOD Record, 33(4), 58-64.
    Zhang, D. and Lee, W. S. (2004). Learning to Integrate Web Taxonomies. Journal of   Web Semantics, 2(2), 131-151.

    下載圖示 校內:立即公開
    校外:2006-09-13公開
    QR CODE