簡易檢索 / 詳目顯示

研究生: 王思永
Wang, Szu-Yaung
論文名稱: 一個可用於收集與整合網路資源的資料模型之設計與實現
Design and Implementation of A Data Model for Collecting and Integrating Information from Web Sources
指導教授: 焦惠津
Jiau, Hewijin Christine
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 38
中文關鍵詞: 資料模型資訊模型資訊系統知識鏈結知識管理
外文關鍵詞: Data modeling, information modeling, information system, knowledge mamagement
相關次數: 點閱:64下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現今,我們能夠從網路上取得許多不同類型的資訊,而可得的資訊量也一直持續不斷的快速成長。雖然資訊的取得非常容易,但是要從各式不同的資訊來源中擷取出有用的部分並加以整合是一件不容易的事情。通常為了滿足一個特定情境下的需求,我們會找尋相關的資訊,但是這些資訊可能分散在不同的地方,要收集它們便會非常費時。除此之外,網路上的資源種類各異,使用者必須以自己的方式整理後才能使這些資訊滿足於特定的情境。本論文提出了 iPick。iPick 是一個為了收集與整合不同 (heterogeneous) 資訊的架構(architecture)。iPick 中主要的核心概念是知識單元 (Knowledge Unit) 的操作與使用。知識單元為各種不同網路上的資訊提供了一個統一 (unified) 的觀點。知識單元能夠被彈性地拆解以滿足不同的使用需求,使用者可以依據個人的使用需求選擇與組合知識單元成為個人的知識,來幫助進行決策。為了展示 iPick 應用的實例,本論文實作了一個可用於旅遊的行動應用程式,TourPick。在開發應用程式的過程中,被選用於儲存知識單元的資料庫系統 (database system) 會影響知識單元操作的效能表現。為了評估知識單元在不同資料庫的適應性,知識單元被實作於 MySQL 以及 MongoDB 上,並使用三種不同複雜程度的查詢 (query) 來比較其操作效能。結果顯示本論文提出的知識單元比較適合實現於 MongoDB 上。

    The amount of information type accessible via the Internet is large and increasing rapidly. Although the information can be easily accessed, retrieving and integrating information from different sources is not a simple task. Because the information may locate or span multiple sources, to manually collect information for a requirement is time-consuming. Moreover, information on the Internet have different types, user needs to spend additional effort to organize the information in her own way to fulfill a particular requirement. In this thesis, iPick is proposed. iPick is an architecture aims at collecting and integrating information from heterogeneous sources. The main idea behind iPick is the manipulation and usage of the Knowledge Unit. Knowledge unit is a collection of information contents and it provides a unified view of multiple information sources. Knowledge unit can be flexibly decomposed and reused to satisfy different requirements. User can choose and combine Knowledge units according to her requirement. The chosen knowledge unit can become user's personal knowledge which can help her make decision. To represent how iPick and knowledge unit can assist in collecting and integrating information, TourPick was developed for demonstration. TourPick is a mobile application which can be used in traveling. When developed this application, the database systems for storing knowledge units will affect the performance of manipulating knowledge unit. To assess the databases' fitness for knowledge unit, it was implemented on MySQL and MongoDB, then compared the performance by three different complexities of queries. The results show that MongoDB is more appropriate for implementing the proposed knowledge unit.

    1 Introduction (p.1) 2 Related Work (p.4) 2.1 Information Integration (p.4) 2.2 Knowledge Representation (p.5) 2.3 Big Data and NoSQL (p.5) 3 Design and Implementation of iPick (p.7) 3.1 Issue on Information Utilization (p.7) 3.2 iPick Concept and Overview (p.8) 3.2.1 Concept (p.8) 3.2.2 Overview (p.9) 3.3 Architecture (p.11) 3.4 Data Modeling (p.13) 3.4.1 Design principle (p.13) 3.4.2 Knowledge Unit Scheme in iPick (p.14) 3.4.3 Knowledge Unit and Fetcher (p.17) 4 Application: TourPick (p.19) 4.1 Issues on Current Traveling Information Utilization Tool (p.19) 4.2 Implementation (p.21) 4.3 Knowledge Unit in TourPick (p.22) 5 Fitness of RDBMS and NoSQL for Knowledge Unit (p.24) 5.1 Motivation (p.24) 5.2 Database Choice (p.25) 5.3 Design of Experiment (p.26) 5.3.1 Measures (p.26) 5.3.2 Measurement Metrics (p.26) 5.3.3 Schemas (p.26) 5.3.4 Queries (p.29) 5.4 Setup (p.30) 5.5 Result (p.31) 5.6 Summary (p.33) 6 Conclusion (p.34) References (p.35) Vita (p.38)

    [1] C. A. Knoblock, S. Minton, J. L. Ambite, N. Ashish, P. J. Modi, I. Muslea, A. G. Philpot, and S. Tejada, “Modeling web sources for information integration,” Proceedings of the 15th National Conference on Artificial Intelligence (AAAI ’98), pp. 211-218, 1998.
    [2] Vurb. http://vurb.com/, 2014.
    [3] A. Y. Halevy, Z. G. Ives, J. Madhavan, P. Mork, D. Suciu, and I. Tatarinov, “The piazza peer data management system,” IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 7, pp. 787-798, July 2004.
    [4] T. Berners-Lee, J. Hendler, and O. Lassila, “The Semantic Web,” Scientific American Magazine, vol. 284, no. 5, pp. 28-37, 2001.
    [5] M. Hausenblas, “Exploiting linked data to build web applications,” IEEE Internet Computing, vol. 13, no. 4, pp. 68-73, July-Aug. 2009.
    [6] Linked Data. http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData/, 2014.
    [7] D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach, “Scalable semantic web data management using vertical partitioning,” Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 411-422, 2007.
    [8] T. Tran, G. Ladwig, and S. Rudolph, “Managing structured and semistructured RDF data using structure indexes,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 9, pp. 2076-2089, Sept. 2013.
    [9] R. M. M. Gregor, “A deductive pattern matcher,” Proceedings of AAAI-88, The National Conference on Artificial Intelligence, pp. 403–408, 1988.
    [10] J. L. Ambite, N. Ashish, G. Barish, C. A. Knoblock, S. Minton, P. J. Modi, I. Muslea, A. Philpot, and S. Tejada, “ARIADNE: A system for constructing mediators for internet sources,” Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp. 561–563, June 1998.
    [11] W. B. Rouse, “Need to know-information, knowledge, and decision making,” IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 32, no. 4, pp. 282–292, Nov. 2002.
    [12] H. S. Yan, “A new complicated-knowledge representation approach based on knowledge meshes,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 1, pp. 47–62, Jan. 2006.
    [13] A Semantic Web Primer for Object-Oriented Software Developers. http://www.w3.org/2001/sw/BestPractices/SE/ODSD/, 2014.
    [14] X. Wu, X. Zhu, G. Q. Wu, and W. Ding, “Data mining with big data,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97–107, Jan. 2014.
    [15] J. Han, E. Haihong, G. Le, and J. Du, “Survey on NoSQL database,” International Conference on Pervasive Computing and Applications (ICPCA), pp. 363–366, Oct. 2011.
    [16] C. Hadjigeorgiou, RDBMS vs NoSQL: Performance and Scaling Comparison, 2013. [Online]. Available: http://www.epcc.ed.ac.uk/sites/default/files/Dissertations/2012-2013/RDBMS vs NoSQL - Performance and Scaling Comparison.pdf
    [17] P. J. Sadalage and M. Fowler, NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. Addison-Wesley, 2012.
    [18] W. B. Rouse, “On the value of information in system design: A framework for understanding and aiding designers,” Information Processing and Management, vol. 22, no. 2, pp. 217–228, May 1986.
    [19] W. B. Rouse and S. H. Rouse, “Human information seeking and design of information systems,” Information Processing and Management, vol. 20, no. 1-2, pp. 129–138, Feb. 1984.
    [20] T. R. Gruber, “A translation approach to portable ontology specifications,” Knowledge Acquisition, vol. 5, no. 2, pp. 199–220, June 1993.
    [21] Y. T. Lee, “Information modeling: From design to implementation,” Proceedings of the Second World Manufacturing Congress, pp. 315–321, 1999.
    [22] TripAdvisor. http://www.tripadvisor.com/, 2014.
    [23] Tripit. https://www.tripit.com/, 2014.
    [24] TouristEye. http://www.touristeye.com/, 2014.
    [25] C. Vicknair, M. Macias, Z. Zhao, X. Nan, Y. Chen, and D. Wilkins, “A comparison of a graph database and a relational database: A data provenance perspective,” Proceedings of the 48th Annual Southeast Regional Conference, no. 42, pp. 42:1–42:6, 2010.
    [26] R. Cattell, “Scalable SQL and NoSQL data stores,” ACM SIGMOD Record, vol. 39, no. 4, pp.12–27, May 2011.
    [27] MySQL. http://www.mysql.com/, 2014.
    [28] MongoDB. http://www.mongodb.org/, 2014.
    [29] Python v2.7.8 documentation - 15.3 time. https://docs.python.org/2/library/time.html, 2014.
    [30] The MongoDB 2.6 Manual. http://docs.mongodb.org/manual/, 2014.
    [31] K. Chodorow, MongoDB: The Definitive Guide, 2nd ed. O’Reilly Media, 2013.
    [32] BSON. http://bsonspec.org/, 2014.

    下載圖示 校內:2019-09-10公開
    校外:2019-09-10公開
    QR CODE