| 研究生: |
李文淦 Li, Wen-Gan |
|---|---|
| 論文名稱: |
FulDex: 支援XML正規表示式查詢之記憶體表示模型 FulDex: A Fully-Indexing-Enabled Memory Representation Model for Supporting XML Regular Expression Queries |
| 指導教授: |
蔣榮先
Chiang, Jung-Hsien |
| 共同指導教授: |
李信杰
Lee, Shin-Jie |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2015 |
| 畢業學年度: | 103 |
| 語文別: | 英文 |
| 論文頁數: | 43 |
| 中文關鍵詞: | XML解析 、XML之記憶體表示模型 、正規表示式查詢 |
| 外文關鍵詞: | XML parsing, XML memory representation, regular expression query |
| 相關次數: | 點閱:84 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
基於XML的簡單、通用性和實用性,它已經被廣泛地應用在服務計算領域中。儘管XML在查詢部份已有了一些重大的研究成果,但很少強調通過比對名稱/值來優化針對查找元素/屬性的查詢,而非使用一般的正規表示法的路徑查詢。在這篇論文中,我們提出了一種稱為FulDex的記憶體表示模型,用在XML文件上執行XML正規表示法查詢。而提出的模型包括兩個關鍵特徵:(1)索引XML文件中元素/屬性的名稱/值的所有字符、(2)一套演算法用於執行正規表示法查詢,以及過濾不需要比對的名稱/值。實驗結果顯示,在95%的測試資料中,FulDex的平均查詢效率優於其他7個最先進的記憶體表示模型的XML解析器。特別是在處理大型XML文件(1.58GB)時,FulDex對於正規表示法查詢的平均執行時間比現有工具中具有最佳查詢性能的RapidXml所需的執行時間少80.41%。
XML is widely used in the field of service computing due to its simplicity, generality, and usability. Although significant efforts have been made on investigating XML query evaluations, little emphasis has been put on optimizing the evaluation of queries aimed at locating elements/attributes through the matching of names/values rather than paths with regular expressions. This paper presents a memory representation model, referred to as FulDex, for performing XML regular expression queries over XML documents. The pro-posed model includes two key features: (1) the indexing of all characters of the names/values of the elements/attributes within an XML document; and (2) an algorithm for performing regular expression queries in conjunction with a set of proposed rules for filtering out names/values that do not need to be matched with a query. Experiment re-sults demonstrate that in 95% of the test cases, the average query efficiency of FulDex is superior to that of seven other state-of-the-art memory representation-based XML parsers. Specifically, when dealing with a large XML document (1.58GB), the average execution time of FulDex for a regular expression query is 80.41% less than that required by RapidXml that has the best query performance among the existing tools.
[1] BaseX. The XML Database. http://basex.org/ (Last accessed on November 15, 2016).
[2] IXIASOFT. TEXTML Server. http://www.ixiasoft.com/en/products/ textml-server/ (Last accessed on November 15, 2016).
[3] libxml2. http://www.xmlsoft.org/ (Last accessed on November 15, 2016).
[4] Oracle Berkeley DB XML. http://www.oracle.com/us/products/database/ berkeley-db/xml (Last accessed on November 15, 2016).
[5] pugixml. http://pugixml.org/ (Last accessed on November 15, 2016).
[6] RapidXml. http://rapidxml.sourceforge.net/ (Last accessed on November 15, 2016).
[7] Software AG. Tamino XML Server. https://www.softwareag.com/tamino/ (Last accessed on November 15, 2016).
[8] The Apache Xerces Project. http://xerces.apache.org/ (Last accessed on Novem- ber 15, 2016).
[9] TinyXML. http://www.grinninglizard.com/tinyxml2/ (Last accessed on November 15, 2016).
[10] VTD-XML. http://vtd-xml.sourceforge.net/ (Last accessed on November 15, 2016).
[11] XQilla. http://xqilla.sourceforge.net/ (Last accessed on November 15, 2016).
[12] YARD Parsing Framework for C++. https://code.google.com/archive/p/ yardparser/ (Last accessed on November 15, 2016).
[13] Xrel: A path-based approach to storage and retrieval of xml documents using relational databases. ACM Trans. Internet Technol., 1(1):110-141, August 2001.
[14] Kamsuriah Ahmad. A comparative analysis of managing xml data in relational database. In Proceedings of the Third International Conference on Intelligent Information and Database Systems - Volume Part I, ACIIDS'11, pages 100-108, Berlin, Heidelberg, 2011. Springer-Verlag.
[15] D. Arroyuelo, F. Claude, S. Maneth, V. Mäkinen, G. Navarro, K. Nguyên, J. Sirén, and N. Välimäki. Fast in-memory xpath search using compressed indexes. In 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), pages 417- 428, March 2010.
[16] Mustafa Atay, Artem Chebotko, Dapeng Liu, Shiyong Lu, and Farshad Fotouhi. Effi- cient schema-based xml-to-relational data mapping. Information Systems, 32(3):458 - 476, 2007.
[17] Mikolaj Bojańczyk and Pawel Parys. Xpath evaluation in linear time. J. ACM, 58(4):17:1-17:33, July 2011.
[18] Chee-Yong Chan, Wenfei Fan, and Yiming Zeng. Taming xpath queries by minimizing wildcard steps. In Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30, VLDB '04, pages 156-167. VLDB Endowment, 2004.
[19] Dunren Che, Karl Aberer, and Tamer Özsu. Query optimization in xml structured-document databases. The VLDB Journal, 15(3):263-289, September 2006.
[20] Jason Jen-Yen Chen and Shih-Wei Su. Agentgateway: A communication tool for multi- agent systems. Information Sciences, 150(3):153 - 164, 2003. Internet Computing.
[21] Alin Deutsch, Mary Fernandez, and Dan Suciu. Storing semistructured data with stored. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, SIGMOD '99, pages 431-442, New York, NY, USA, 1999. ACM.
[22] T. Fahringer, R. Prodan, Rubing Duan, F. Nerieri, S. Podlipnig, Jun Qin, M. Siddiqui, Hong-Linh Truong, A. Villazon, and M. Wieczorek. Askalon: A grid application de- velopment and computing environment. In Proceedings of the 6th IEEE/ACM Inter- national Workshop on Grid Computing, GRID '05, pages 122-131, Washington, DC, USA, 2005. IEEE Computer Society.
[23] Georg Gottlob, Christoph Koch, and Reinhard Pichler. Efficient algorithms for pro- cessing xpath queries. ACM Trans. Database Syst., 30(2):444-491, June 2005.
[24] G. Gou and R. Chirkova. Efficiently querying large xml data repositories: A survey.
IEEE Transactions on Knowledge and Data Engineering, 19(10):1381-1403, Oct 2007.
[25] Torsten Grust. Accelerating xpath location steps. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD '02, pages 109- 120, New York, NY, USA, 2002. ACM.
[26] Torsten Grust, Maurice Van Keulen, and Jens Teubner. Accelerating xpath evaluation in any rdbms. ACM Trans. Database Syst., 29(1):91-131, March 2004.
[27] Michael R. Head, Madhusudhan Govindaraju, Robert van Engelen, and Wei Zhang. Benchmarking xml processors for applications in grid web services. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC '06, New York, NY, USA, 2006. ACM.
[28] Wen-Chiao Hsu and I-En Liao. Cis-x: A compacted indexing scheme for efficient query evaluation of xml documents. Information Sciences, 241:195 - 211, 2013.
[29] Daniela Florescu Inria, Daniela Florescu, and Donald Kossmann. Storing and querying xml data using an rdmbs. IEEE Data Engineering Bulletin, 22:27-34, 1999.
[30] Quanzhong Li and Bongki Moon. Indexing and querying xml data for regular path expressions. In Proceedings of the 27th International Conference on Very Large Data Bases, VLDB '01, pages 361-370, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc.
[31] Cesare Pautasso, Olaf Zimmermann, and Frank Leymann. Restful web services vs. "big"' web services: Making the right architectural decision. In Proceedings of the 17th International Conference on World Wide Web, WWW '08, pages 805-814, New York, NY, USA, 2008. ACM.
[32] Albrecht Schmidt, Martin L. Kersten, Menzo Windhouwer, and Florian Waas. Efficient relational storage and retrieval of xml documents. In Selected Papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases, pages 137-150, London, UK, UK, 2001. Springer-Verlag.