簡易檢索 / 詳目顯示

研究生: 張文欣
Chang, Wen-Hsin
論文名稱: 處理前 k 個在門檻值以及基數範圍內的組合式天際線查詢
Top-k Combinatorial Skyline with Threshold and Cardinality Constraints
指導教授: 李強
Lee, Chiang
鍾毓驥
Chung, Yu Chi
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 中文
論文頁數: 70
中文關鍵詞: 組合式天際線top-k基數限制天際線門檻值限制
外文關鍵詞: cardinality constraint, top-k, combinatorial skyline, skyline, threshold constraint
相關次數: 點閱:175下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近幾年來, skyline 查詢非常熱門。因為 skyline 查詢可以幫助使用者在大量具有矛盾條件的資料庫內容中有效過濾不必要的資料,
    同時挑選出符合使用者要求的資料結果,
    以讓使用者作進一步的決策判斷。然而目前的 skyline 查詢只是將資料一一的比對,找出資料中符合使用者需求的答案,
    卻忽略了使用者可能希望從這些資料中找出一個組合({ f combination})(由一到多筆資料經由一個特定的函式構成),
    使得這個組合能夠滿足使用者查詢的條件。
    在本論文中,我們提出一個新的 skyline 查詢,叫做 {em { f T}op-k { f C}ombinatorial { f S}kyline with { f T}hreshold and { f C}ardinality constraints { f Q}uery} ({em Top-k CSTCQ}) 。
    Top-k CSTCQ 可以解決多筆資料產生的組合的 skyline 查詢。然而多筆資料產生的組合計算複雜度高,
    因此我們針對 Top-k CSTCQ 提出一個有效率的演算法 {em { f T}op-k { f T}hreshold and { f C}ardinality-based { f C}ombinatorial { f S}kyline} ({em TTCCS}) 演算法。
    TTCCS 演算法先採取 threshold 限制轉換成 cardinality 限制的慨念,藉此降低 combinations 個數,接著利用二項式係數以及動態規劃兩個慨念提出表格式的演算法設計原則,
    並利用 4 個 pruning strategies 改良 TTCCS 演算法(TTCCS-Extension),
    可以過濾不符合 threshold 限制以及 top- k 限制的 combinations 。
    在最後,我們用實驗結果証明我們的 TTCCS-EX 演算法在 synthetic datasets 上很有效率。

    In recent years, the skyline operator has been popular.
    This is mainly due to the importance of various skyline results in applications involving multi-criteria decision making.
    However, skyline queries are considered in comparing each tuple one by one so as to provide useful information for the user
    but ignoring that user may wish to find information from a combination (constituted through a specific function from one to more tuples)
    which meets the conditions of the isuued query.
    %Then, the user wishes to make this combination to meet the conditions of the issued query.
    In this paper, we exploit a new skyline query, {em { f T}op-k { f C}ombinatorial { f S}kyline with { f T}hreshold and { f C}ardinality constraints { f Q}uery} ({em Top-k CSTCQ}).
    Top-k CSTCQ can tackle the problem of a combination which is constituted from one to more tuples.
    However, the computation complexity of top-k CSTCQ is quite high. We design an efficient algorithm, named {em { f T}op-k { f T}hreshold and { f C}ardinality-based { f C}ombinatorial { f S}kyline} ({em TTCCS}) algorithm, to solve Top-k CSTCQ.
    First, TTCCS algorithm adopts a concept of transforming threshold constraints to cardinality constraint to reduce the number of combinations,
    and then uses the concept of binomial coefficient and dynamic programming to construct a table-based algorithm.
    We propose four pruning strategies to improve TTCCS algorithm (named {em TTCCS -Extension})
    to prune the non-qualifying combinations progressively.
    Comprehensive experiments show that our algorithms can answer Top-k CSTCQ on various synthetic datasets efficiently.

    Abstract i Acknowledgements iii Table of Contents iv Table of Figures vi Table of Tables viii Table of Algorithms ix 1 Introduction 1 2 Related Work 8 2.0.1 Traditional skyline query . . . . . . . . . . . . . . . 8 2.0.2 Skyline Variants . . . . . . . . . . . . . . . . . . . . 9 3 Preliminaries 18 3.0.3 Problem Definition . . . . . . . . . . . . . . . . . . 18 4 Top-k Threshold and Cardinality-based Combinatorial Sky- line Algorithm (TTCCS) 24 4.0.4 Relationship between threshold and cardinality . . 24 4.0.5 TTCCS Algorithm . . . . . . . . . . . . . . . . . . 27 4.0.6 Improving the TTCCS Algorithmwith Pruning Strategies (TTCCS-EX Algorithm) . . . . . . . . . . . . 32 5 TTCCS-EX - Average 45 6 Experimental Evaluation 53 6.0.7 Experimental Setup . . . . . . . . . . . . . . . . . . 53 6.0.8 Synthetic datasets . . . . . . . . . . . . . . . . . . 54 6.0.9 Performance on Synthetic datasets . . . . . . . . . 57 7 Conclusions and Future Work 61 Biography 70

    [1] Stephan Borzsonyi, Donald Kossmann, and Konrad Stocker, “The
    Skyline Operator,” in Proceedings of the 17th International Conference
    on Data Engineering (ICDE), pp. 421-430, 2001.
    [2] Jan Chomicki, Parke Godfrey, Jarek Gryz and Dongming Liang,
    “Skyline with Presorting,” in Proceedings of the 17th International
    Conference on Data Engineering (ICDE), pp. 717-719, 2003.
    [3] Ken C. K. Lee, Baihua Zheng, Huajing Li, and Wang-Chien Lee,
    “Approaching the Skyline in Z Order,” in Proceedings of the 33th
    International Conference on Very Large Databases (VLDB), pp. 279-
    290, 2007.
    [4] Donald Kossmann, Frank Ramsak, and Steffen Rost, “Shooting Stars
    in the Sky: An Online Algorithm for Skyline Queries,” in Proceedings
    of the 28th International Conference on Very Large Data Bases
    (VLDB), pp. 275-286, 2002.
    [5] Kian-Lee, Tan Pin-Kwang Eng, and Beng Chin Ooi, “Efficient Progressive
    Skyline Computation,” in Proceedings of the 27th International
    Conference on Very Large Databases (VLDB), pp. 301-310,
    2001.
    [6] Dimitris Papadias, Yufei Tao, Greg Fu, and Bernhard Seeger, “An
    Optimal and Progressive Algorithm for Skyline Queries,” in Pro-
    62
    BIBLIOGRAPHY 63
    ceedings of the 2003 ACM Special Interest Group on Management
    Of Data international conference (SIGMOD), pp. 467-478, 2003.
    [7] Cuiping Li, Anthony K. H. Tung, Wen Jin, and Martin Ester, “On
    Dominating Your Neighborhood Profitably,” in Proceedings of the
    33rd international Conference on Very Large Databases (VLDB), pp.
    818-829, 2007.
    [8] Xuemin, Lin Yidong, Yuan Wei Wang, and Hongjun Lu, “Stabbing
    the sky: Efficient skyline computation over sliding windows,” in Proc.
    2005 Int. Conf. Data Engineering (ICDE), pp. 502- 513, 2005.
    [9] Cuiping Li, Beng Chin Ooi, Anthony K. H. Tung, and Shan Wang,
    “Dada: a data cube for dominant relationship analysis,” in Proceedings
    of the 2006 ACM SIGMOD international conference on Management
    of data conference (SIGMOD), pp. 659-670, 2006.
    [10] Milton Friedman and Leonard J. Savage, “The Utility Analysis of
    Choices Involvin Risk,” in Journal of Political Economy 56, pp. 279-
    304, 1948.
    [11] Chien Ping Hsieh, “Fundamentals of Investments,” BEST-WISE.
    [12] Stocks, Bonds, Bills, and Inflation Yearbook, “Ibbotson Associates,”
    Inc., Chicago.1983.
    [13] Frank K. Reilly and Keith C. Brown “Essentials of Investment Analysis
    and Portfolio Management,” THOMSON.
    [14] NYSE Euronext, http://www.nyse.com/
    [15] http://www.angelibrary.com/economic/gpjb/
    63
    BIBLIOGRAPHY 64
    [16] Parke Godfrey, Ryan Shipley, and Jarek Gryz, “Maximal vector computation
    in large data sets,” in Proceedings of the 31st international
    conference on Very large data bases (VLDB), pp. 229 - 240, 2005.
    [17] Wen Jin, Jiawei Han, andMartin Ester, “Mining thick skylines over
    large databases,” in Proceedings of the 8th European Conference
    on Principles and Practice of Knowledge Discovery in Databases
    (PKDD), pp. 255 - 266, 2004.
    [18] Tian Xia and Donghui Zhang, “Refreshing the sky: the compressed
    skycube with efficient support for frequent updates,” in Proceedings
    of the 2006 ACM SIGMOD international conference on Management
    of data (SIGMOD), pp. 491-502, 2006.
    [19] Yidong Yuany, Xuemin Liny, Qing Liuy, Wei Wangy, Jeffrey Xu
    Yux,and Qing Zhangy, “Efficient computation of the skyline cube,”
    in Proceedings of the 31th International Conference on Very Large
    Data Bases (VLDB), pp.241-252, 2005.
    [20] Jian Pei, Wen Jin, Martin Ester,and Yufei Tao, “Catching the best
    views in skyline: A semantic approach,” in Proceedings of the 31th
    International Conference on Very Large Data Bases (VLDB), pp.
    253-264, 2005.
    [21] Jian Pei, Yidong Yuan, Xuemin Lin, Wen Jin, Martin Ester, Qing
    Liu, Wei Wang, Yufei Tao, Jeffrey Xu Yu and Qing Zhang, “Towards
    multidimensional subspace skyline analysis,” in ACM Transactions
    on Database Systems (TODS), pp. 1335-1381, 2006.
    [22] Jian Pei, Ada Wai-Chee Fu, Xuemin Lin,and Haixun Wang, “Computing
    compressed skyline cubes efficiently,” in Proceedings of the
    64
    BIBLIOGRAPHY 65
    23nd International Conference on Data Engineering (ICDE), pp. 96-
    105, 2007.
    [23] Yufei Tao, Xiaokui Xiao, and Jian Pei, “Subsky: Efficient computation
    of skylines in subspaces,” in Proceedings of the 22nd International
    Conference on Data Engineering (ICDE), pp. 65-65, 2006.
    [24] Yufei Tao and Dimitris Papadias, “Maintaining sliding window skylines
    on data streams,” in IEEE Transactions on Knowledge and
    Data Engineering (TKDE), pp. 377-391, 2006.
    [25] Michael Morse, Jignesh M. Patel, and William I. Grosky, “Efficient
    continuous skyline computation,” in Proceedings of the 22nd International
    Conference on Data Engineering (ICDE), pp. 3411-3437,
    2006.
    [26] Jon Louis Bentley, HT T Kung, Mario Schkolnick, and Clark David
    Thomborson, “On the average number of maxima in a set of vectors
    and applications,” in Journal of the ACM (JACM), pp. 536-543,
    1978.
    [27] Wolf-tilo Balke , Ulrich Guntzer,and Jason Xin Zheng, “Efficient
    distributed skylining for web information systems.” ln Extending
    Database Technology (EDBT), pp. 256-273, 2004.
    [28] Zhiyong Huang, Jensen, C.S., Hua Lu,and Beng Chin Ooi, “Skyline
    queries against mobile lightweight devices in manets,” in Proceedings
    of the 22nd International Conference on Data Engineering (ICDE),
    pp. 66-66,2006.
    [29] CheeYong Chan, PinKwang Eng,and KianLee Tan, “Stratified computation
    of skylines with partially-ordered domains,” in Proceedings
    65
    BIBLIOGRAPHY 66
    of the 2005 ACM SIGMOD international conference (SIGMOD), pp.
    203-214, 2005.
    [30] Vladlen Koltun and Christos H. Papadimitriou, “Approximately
    dominating representatives,” in Theoretical Computer Science, pp.
    148-154, 2007.
    [31] Chee-Yong Chan, H.V. Jagadish, Kian-Lee Tan, Anthony K.H. Tung,
    and Zhenjie Zhang, “On high dimensional skylines,” in Extending
    Database Technology (EDBT), pp. 478-495, 2006,
    [32] Chee-Yong Chan, H.V. Jagadish, Kian-Lee Tan, Anthony K.H.
    Tung,and Zhenjie Zhang, “Finding k-dominant skylines in high dimensional
    space,” in Proceedings of the 2006 ACM SIGMOD international
    conference on Management of data (SIGMOD), pp. 503-514,
    2006.
    [33] Raymond Chi-WingWong, Jian Pei, AdaWai-Chee Fu,and KeWang
    “Mining favorable facets,” in Proceedings of the 13th ACM SIGKDD
    international conference on Knowledge discovery and data mining
    (KDD), pp. 804-813 , 2007.
    [34] Bin Jiang, Jian Pei, Xuemin Lin, David W. Cheung,and Jiawei
    Han, “Mining preferences from superior and inferior examples,” in
    Proceedings of the 14th ACM SIGKDD international conference on
    Knowledge discovery and data mining(KDD). New York, NY, USA:
    ACM, 2008.
    [35] Raymond ChiWing Wong, Ada WaiChee Fu, Jian Pei, Yip Sing Ho,
    Tai Wong,and Yubao Liu, “Efficient skyline querying with variable
    user preferences on nominal attributes,” in VLDB: Proceedings of the
    66
    BIBLIOGRAPHY 67
    34th International Conference on Very Large Databases (VLDB), pp.
    1032-1043, 2008.
    [36] Jie Liu, Liang Feng, and Yunpeng Xing, “A pruning-based approach
    for supporting Top-K join queries,” in Proceedings of the 15th international
    conference on World Wide Web (WWW), pp. 891-892,
    2006.
    [37] Xuemin Lin, Yidong Yuan, Qing Zhang, and Ying Zhang, “Selecting
    Stars: The k Most Representative Skyline Operator,” in 3rd International
    Conference on Data Engineering (ICDE), pp. 86-95, 2007.
    [38] Evangelos Dellis, Akrivi Vlachou, and Ilya Vladimirskiy, “Constrained
    Subspace Skyline Computation,” in Proceedings of the 15th
    ACM international conference on Information and knowledge management
    (CIKM), pp. 415-424, 2006.
    [39] Dimitris Sacharidis, Stavros Papadopoulos, Dimitris Papadias,
    “Topologically-sorted Skylines for Partially-ordered Domains,” in
    25th International Conference on Data Engineering (ICDE), pp.
    1072-1083, 2009
    [40] http://www.derf.net/knapsack/ NP-Completeness, Cryptology, and
    Knapsacks
    [41] http://www.cse.ohio-state.edu/ gurari/theory-bk/theory-bkfivese4.
    html More NP-Complete Problems
    [42] Silvano Martello, and Paolo Toth, KNAPSACK PROBLEMS Algorithms
    and Computer Implementations.
    [43] Wen Jin, Anthony K. H. Tung, Martin Ester, and Jiawei Han, “On
    Efficient Processing of Subspace Skyline Queries on High Dimen-
    67
    BIBLIOGRAPHY 68
    sional Data,” in 19th International Conference on Scientific and Statistical
    Database Management (SSBDM), pp. 12-12, 2007.
    [44] Bin Jiang, and Jian Pei, “Online Interval Skyline Queries on Time
    Series,” in Proceedings of the 25th International Conference on Data
    Engineering (ICDE), pp. 1036-104, 2009.
    [45] Edward M McCreight, “Priority search trees,” SIAM J. Comput.,
    vol. 14, no. 2, pp. 257-276, 1985.
    [46] Chia-Whey Chen, Yu-Whey Kao, and Yu-Jane Liu, “Investor’s Preferences
    and Assets Allocation,”
    [47] Bin Cui , Hua Lu , Quanqing Xu , Lijiang Chen , Yafei Dai , and
    Yongluan Zhou, “Parallel Distributed Processing of Constrained Skyline
    Queries by Filtering,” in Proceedings of the 24th International
    Conference on Data Engineering (ICDE), pp. 546-555, 2008.
    [48] Hakan Ferhatosmanoglu, Ioanna Stanoi, Divyakant Agrawal, and
    Amr El Abbadi, “Constrained Nearest Neighbor Queries,” International
    Symposium on Spatial and Temporal Databases (SSTD), pp.
    257-278, 2001.
    [49] Philippe Flajolet, and G Nigel Martin, “Probabilistic counting algorithms
    for data base applications,” Journal of Computer and System
    Sciences, pp. 182-209, 1985.
    [50] Michael Morse, Jignesh M. Patel, and H.V. Jagadish, “Efficient Skyline
    Computation over Low-Cardinality Domains,” in Proceedings of
    the 33rd international conference on Very large data bases (VLDB),
    pp. 267-278, 2007.
    68
    BIBLIOGRAPHY 69
    [51] Gang Gou,and Rada Chirkova, “Efficiently querying large xml data
    repositories: A survey,” in IEEE Transactions on Knowlenge and
    Data Engineering (TKDE), pp. 1381-1403, 2007.
    [52] Rakesh Agrawal, Alexander Tiberiu Borgida, and Hosagrahar V Jagadish,
    “Efficient management of transitive relationships in large
    data and knowledge bases,” in SIGMOD, 1989, pp. 253-262.
    [53] PingWu, Caijie Zhang, Ying Feng, Ben Y. Zhao, Divyakant Agrawal,
    and Amr El Abbadi, “Parallelizing Skyline Queries for Scalable Distribution,”
    in Advances In Database Technology (EDBT) pp. 112-
    130, 2006.
    [54] Gautam Das, Dimitrios Gunopulos, and Nick Koudas, “Answering
    Top-k Queries Using Views,” in Proceedings of the 32nd international
    conference on Very large data bases (VLDB), pp. 451-462, 2006.
    [55] Gautam Das, and Dimitrios Gunopulos, “Adhoc Top-k Query Answering
    for Data Streams,” in Proceedings of the 33rd international
    conference on Very large data bases (VLDB), pp. 183-194, 2007.
    [56] Panayiotis Tsaparas, Themistoklis Palpanas, and Yannis Kotidis
    “Ranked Join Indices,” in Proceedings of the 19th International Conference
    on Data Engineering (ICDE), pp. 277-288, 2003.
    [57] Philippe Rigaux, M. Scholl, and Agnes Voisard. “An introduction to
    spatial database systems,” The International Journal on Very Large
    Data Base, pp. 357-399, 2000.
    [58] Xuemin Lin,Qing Liu, Yidong Yuan,Xiao Fang Zhou,and Hong Jun
    Lu, “Summarizing Level-Two Topological Relations in Large Spatial
    69
    BIBLIOGRAPHY 70
    Datasets,” in ACM Transactions on Database Systems (TODS), pp.
    584-630, 2006.
    [59] Junchang Xin, Guoren Wang, Lei Chen, Xiaoyi Zhang, and ZhenhuaWang,
    “Continuously Maintaining Sliding Window Skylines in a
    Sensor Network,” in Proceedings of the 12th International Conference
    on Database Systems for Advanced Applications (DASFAA),
    pp. 509-521, 2007.
    [60] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, “INTRODUCTION
    TO ALGORITHMS,” MIT Press, McGraw-Hill
    Book Company.

    下載圖示 校內:2010-08-25公開
    校外:2011-08-25公開
    QR CODE