簡易檢索 / 詳目顯示

研究生: 王怡聖
Wang, Yi-Sheng
論文名稱: 一個利用索引有效率的查詢分支處理之XML文件排名系統
Efficient Index-Based Twig Pattern Query Processing for XML Document Ranking
指導教授: 高宏宇
Kao, Hung-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2006
畢業學年度: 94
語文別: 英文
論文頁數: 46
中文關鍵詞: XPath索引樹的索引XML查詢XML排名
外文關鍵詞: XPath, Tree Indexing, XML Query, XML Ranking, Index
相關次數: 點閱:65下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,有愈來愈多使用類似樹枝的樣式在XML文件上查詢資料的研究。使用XML文件的好處之一是使用者不只可以查詢資料的內容,還可以同時查詢資料的結構。然而,當使用者下類似樹枝的樣式在XML文件上查詢資料時,這樣的查詢形式無法找出在XML文件中相似結構的結果。因此,當務之急是要找出一個方法來延伸原本使用者查詢資料的樹枝樣式,以便找出在XML文件中相似結構的答案。雖然已經有許多這方面議題的研究,但是這些方法都有一些效率上的問題。首先,在這篇論文中我們提出一些方法來索引XML文件和使用者的樹枝查詢結構,這樣我們就能得到許多候選的關係是一定會出現在XML文件中,被過濾掉的關係則是絕對不會出現在XML文件的情形。利用這些候選的關係,我們可以快速地產生許多新的樹枝查詢結構。因此,我們還提出一些方法來產生新的樹枝查詢樣式,例如:並聯連接、串聯連接與合併的方法,以便從原本使用者的樹枝查詢樣式,產生一些可以查詢到XML文件中相似結構的新的樹枝樣式。我們所提出的方法是以索引為基礎,這個方法比其他的方法都還要快速而正確。從實驗可以看出來,我們的方法可以比其他的Twig方法快上四到十倍,而且還比其他的Binary方法得到較為正確的結果。

    In recent years, using the twig patterns to query over the XML documents has received much research. One benefit of the XML documents is that user can query both on structure and content. However, the twig pattern of the user profile can’t find the approximate answers in the XML documents. It is necessary to extend the original twig pattern to retrieve the approximate answers. The methods in a previous work which are based on the relaxation processing to extend the original twig pattern are not efficient enough. In this paper, we propose a method indexing the XML documents and the twig patterns first. It gets the candidates for generating the new twig patterns. We then propose the three ways, i. e., the parallel connection, the series connection, and the combination, to generate the new twig patterns from the original twig pattern of the user profiles and reach the same rate of the query precision. Our proposed index-based method is faster and more correct than other methods. Experiments showed that our index-based method is faster 4 to 10 times than the twig method and attained the higher accuracy than the binary method.

    LIST 摘要........................................................................III ABSTRACT.....................................................................IV 誌謝..........................................................................V LIST.........................................................................VI LIST OF CONTENTS............................................................VII LIST OF TABLES...............................................................IX LIST OF FIGURES...............................................................X LIST OF CONTENTS 1. INTRODUCTION.........................................................1 2. RELATED WORK.........................................................8 3. MOTIVATION..........................................................11 3.1 QUERY RELAXATIONS WITH TWIG METHOD..................................11 3.2 QUERY RELAXATIONS WITH BINARY METHOD................................12 4. THE QUERY PROCESSING SYSTEM.........................................14 FIGURE 8: THE ARCHITECTURE OVERVIEW..........................................14 4.1 INDEXING XML DOCUMENT...............................................14 4.1.1 XML Document Information............................................14 4.1.2 Parent-Child Relation between Tags in the XML Document..............16 4.1.3 Ancestor-Descendant Relation between Tags in the XML Document.......19 4.2 CONVERT TWIG PATTERNS INTO SEQUENCES................................22 4.2.1 Twig Pattern Information............................................22 4.2.2 Relation between Labels in the Twig Pattern.........................23 4.3 TWIG PATTERNS GENERATING ALGORITHM..................................26 4.3.1 Candidates for Twig Pattern Generating..............................27 4.3.2 Parallel Connection.................................................31 4.3.3 Series Connection...................................................32 4.3.4 Combination.........................................................33 4.4 ANSWERS SCORING.....................................................35 4.4.1 IDF.................................................................35 4.4.2 Document Edge Scoring...............................................35 5. EXPERIMENTS.........................................................36 5.1 EXPERIMENTAL SETUP..................................................36 5.2 DATASETS AND TWIG PATTERNS..........................................36 FIGURE 18: XML DOCUMENT INDEXING TIME........................................37 5.3 EVALUATION MEASUREMENT..............................................37 5.4 EVALUATION RESULTS..................................................39 5.4.1 Scoring.............................................................39 5.4.2 Time................................................................40 5.4.3 Precision ...........................................................42 6. CONCLUSION..........................................................44 7. REFERENCE ...........................................................45

    [1] S. Amer-Yahia, C. Botev, Jayavel. TeXQuery: A Full-Text Search Extension to XQuery. Proceedings of the 13th international conference on World Wide Web. WWW 2004.
    [2] S. Amer-Yahia, S. Cho, D. Srivastava. Tree pattern relaxation. 8th International Conference on Extending Database Technology. EDBT 2002.
    [3] S. Amer-Yahia, N. Koudas, A. Marian. Structure and Content Scoring for XML. Proceedings of the 31st International Conference on Very Large Data Bases. VLDB 2005.
    [4] S. Amer-Yahia, L. Lakshmanan, S. Pandit. FleXPath: Flexible Structure and Full-Text Querying for XML. Proceedings of the ACM SIGMOD International Conference on Management of Data. SIGMOD 2004
    [5] J. M. Bremer, M. Gertz. Xquery/IR: Integrating XML Document and Data Retrieval. Proceedings of the Fifth International Workshop on the Web and Databases. WebDB 2002.
    [6] D.Carmel, Y.S. Maarek, M.Mandelbrod, Y. Mass, A. Soffer. Searching XML Document via XML Fragments. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2003.
    [7] T. T. Chinenyanga, N. Kushmerick. Expressive and Efficient Ranked Querying of XML Data. Proceedings of the Fourth International Workshop on the Web and Databases. WebDB 2001.
    [8] S. Cohen, J. Mamou, Y. Kanza, Y. Sagiv. XSEarch: A Semantic Search Engine for XML. Proceedings of 29th International Conference on Very Large Data Bases. VLDB 2003.
    [9] S. Flesca, F. Furfaro, E. Masciari. On the minimization of Xpath queries. Proceedings of 29th International Conference on Very Large Data Bases. VLDB 2003.
    [10] N. Fuhr, K. Grossjohann. XIRQL: An Extension of XQL for Introduction Retrieval. Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2000.
    [11] L. Guo, F. Shao, C. Botev, J. Shanmugasundaram. XRANK: Ranked Keyword Search over XML Documents. Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. SIGMOD 2003.
    [12] Y. Kanza and Y. Sagiv. Flexible Queries over Semi-structured Data. Proceedings of the Twentieth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. PODS 2001.
    [13] J. Kwon, P. Rao, B. Moon, S. Lee. FiST: Scalable XML Document Filtering by Sequencing Twig Patterns. Proceedings of the 31st International Conference on Very Large Data Bases. VLDB 2005.
    [14] A. Marian, S. Amer-Yahia, N. Koudas, D. Srivastava. Adaptive Processing of Top-k Queries in XML. Proceedings of the 21st International Conference on Data Engineering. ICDE 2005.
    [15] N. Polyzotis, M. Garofalakis, Y. Ioannidis. Approximate XML Query Answers. Proceedings of the ACM SIGMOD International Conference on Management of Data. SIGMOD 2004.
    [16] G. Salton, M. McGill. Introduction to Modern Information Retrieval. McGraw-Hill 1983.
    [17] T. Schlieder. Schema-Driven Evaluation of Approximate Tree-Pattern Queries. 8th International Conference on Extending Database Technology. EDBT 2002.
    [18] A. Theobald, G. Weikum. The Index-Basd XXL Search Engine for Querying XML Data with Relevance Ranking. 8th International Conference on Extending Database Technology. EDBT 2002.
    [19] F. Weigel, H. Meuss, K. U. Schulz, F. Bry. Content and Structure in Indexing and Ranking XML. Proceedings of the Seventh International Workshop on the Web and Databases. WebDB 2004.
    [20] Y. Xu, Y. Papakonstantinou. Efficient Keyword Search for Smallest LCAs in XML Databases. Proceedings of the ACM SIGMOD International Conference on Management of Data. SIGMOD 2005.

    下載圖示 校內:立即公開
    校外:2006-08-28公開
    QR CODE