簡易檢索 / 詳目顯示

研究生: 梁汪鸛
Leung, Wang-Guan
論文名稱: 透過reverse top-k識別top-m個最具影響力的組合
Identifying top-m most influential combinations by using reverse top-k
指導教授: 李強
Lee, Chiang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 68
中文關鍵詞: Top-kReverse Top-k組合影響力
外文關鍵詞: Top-k, reverse top-k, combination, influence
相關次數: 點閱:58下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Top-k查詢會被廣泛的應用是因為能根據使用者的偏好找出最相符的k個商品。幫助使用者將大量的商品做過濾,留下k個做推薦,方便使用者做挑選。商品若能在使用者的top-k清單中,表示對使用者具有足夠的影響力,站在供應商的立場,會需要reverse top-k查詢,找出出現在最多人top-k清單中的商品,可謂最具影響力的商品。供應商可以對這樣的一個商品做廣告或行銷。然而在現實生活中,當使用者去購物時,往往會一次購買多樣商品而不會只買單一樣商品。舉例來說, 像是在開學季,使用者添購學校用品時,除了買文具,也會買水壺和書包等等。供應商如果可以事先透過過去的使用者偏好資料找出幾個最受使用者歡迎的組合,那麼將會對供應商和使用者帶來雙贏的效果。使用者省去一一查詢的時間,而供應商可以透過組合的方式做促銷以增加銷售量。我們將這樣的一個問題稱為識別top-m個最具影響力的組合,會透過reverse top-k來判斷組合的影響力。在我們的論文中,提出了兩個演算法,Score Upper Bound Algorithm(SUB)與Tighter Upper Bound Algorithm(TUB)。兩者會對組合的影響力做上界值的估算,SUB 是假設組合吸引來的使用者皆會被組合內的所有商品所吸引,因此得到的值會比真實的影響力大出較多。然而TUB則考慮了組合內商品吸引的人數上的不一致,得出的值會比較貼近真實的影響力。為此我們設計了一系列的實驗,來評估兩者的效率,並證明我們的演算法所找出的答案是有根據的。

    Top-k queries are widely used because they can find k most appropriate items according to a user’s preference. Help a user to filter a large number of items, keeping k items to recommend the user to choose. If an item can be in a user’s top-k list, it means that the item has sufficient influence on the user. From the perspective of suppliers, they need reverse top-k queries to find an item which is listed in most users’ top-k list. It is said to be the most influential item, and suppliers can advertise or market it. In reality, when a buyer (or a user) goes shopping, he generally purchases more than one item at a time. For example, when school starts, buyers stock up on school supplies. They purchase not only stationery but also kettles and school bags, etc. If suppliers were to offer a combo-pack (combination of packages) that were most popular with most buyers in advance via past buyer preferences, there were be a win-win effect for both the suppliers and the buyers. Buyers would be able to pick up everything that they wanted at one time rather than having to look up each item individually, and suppliers could do promotions for the combination packages to increase the sales. We call this problem “identifying top-m most influential combinations”, and the reverse top-k is used to judge the influential power of the combination. In our paper, we propose two algorithms, the score upper bound algorithm (SUB) and the tighter upper bound algorithm (TUB). Both of them will estimate the upper bound of the combination’s influential power. SUB assumes that the users attracted by the combination will be attracted to all the items in the combination, so the upper bound value obtained will be much higher than the real influential power. TUB considers the inconsistency in the number of users attracted by the items in the combination, so the upper bound value obtained will be closer to the real. To this end, we designed a series of experiments to evaluate the efficiency of our proposed algorithms and proved that the answers found by our algorithms are valid.

    Contents iv List of Figures vi List of Tables viii 1 Introduction 1 2 Related Work 7 3 Preliminaries 10 3.1 Top-k query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Reverse Top-k query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Top-m most influential combinations problem . . . . . . . . . . . . . . . 12 3.3.1 Attractiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3.2 Attractiveness number . . . . . . . . . . . . . . . . . . . . . . . . 14 4 Algorithm 16 4.1 Na¨ıve Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.2 Score Upper Bound Algorithm (SUB) . . . . . . . . . . . . . . . . . . . . 21 4.3 Tighter Upper Bound Algorithm (TUB) . . . . . . . . . . . . . . . . . . 33 5 Experiment 40 5.1 Experimental Environment . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.2.1 Small Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.2.2 Large Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.2.3 Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6 Conclusions 62 Bibliography 64

    [1] Reza Akbarinia, Esther Pacitti, and Patrick Valduriez. Best position algorithms for top-k queries. In Proceedings of the 33rd international conference on Very large data bases, pages 495–506. VLDB Endowment, 2007.
    [2] Holger Bast, Debapriyo Majumdar, Ralf Schenkel, Martin Theobald, and Gerhard Weikum. Io-top-k: Index-access optimized top-k query processing. In Proceedings of
    the 32nd international conference on Very large data bases, pages 475–486. VLDB Endowment, 2006.
    [3] Stefan Berchtold, Bernhard Ertl, Daniel A Keim, H-P Kriegel, and Thomas Seidl. Fast nearest neighbor search in high-dimensional space. In Data Engineering, 1998. Proceedings., 14th International Conference on, pages 209–218. IEEE, 1998.
    [4] Nicolas Bruno, Luis Gravano, and Amelie Marian. Evaluating top-k queries over web-accessible databases. In Data Engineering, 2002. Proceedings. 18th International Conference on, pages 369–380. IEEE, 2002.
    [5] Muhammad Aamir Cheema, Xuemin Lin,Wenjie Zhang, and Ying Zhang. Influence zone: Efficiently processing reverse k nearest neighbors queries. In Data Engineering (ICDE), 2011 IEEE 27th International Conference on, pages 577–588. IEEE, 2011.
    [6] Paulo Cortez, Antonio Cerdeira, Fernando Almeida, Telmo Matos, and Jose Reis. Wine quality. https://data.world/food/wine-quality.
    [7] Gautam Das, Dimitrios Gunopulos, Nick Koudas, and Dimitris Tsirogiannis. Answering top-k queries using views. In Proceedings of the 32nd international conference on Very large data bases, pages 451–462. VLDB Endowment, 2006.
    [8] Ronald Fagin. Combining fuzzy information from multiple systems. Journal of Computer and System Sciences, 58(1):83–99, 1999.
    [9] Ronald Fagin, Amnon Lotem, and Moni Naor. Optimal aggregation algorithms for middleware. Journal of computer and system sciences, 66(4):614–656, 2003.
    [10] Orestis Gkorgkas, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørv°ag. Discovering influential data objects over time. In International Symposium on Spatial and Temporal Databases, pages 110–127. Springer, 2013.
    [11] Orestis Gkorgkas, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørv°ag. Maximizing influence of spatio-textual objects based on keyword selection. In Inter-
    national Symposium on Spatial and Temporal Databases, pages 413–430. Springer, 2015.
    [12] Orestis Gkorgkas, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørv°ag. Exploratory product search using top-k join queries. Information Systems, 64:75–92,
    2017.
    [13] Arif Hidayat, Muhammad Aamir Cheema, and David Taniar. Relaxed reverse nearest neighbors queries. In International Symposium on Spatial and Temporal Databases, pages 61–79. Springer, 2015.
    [14] G´ısli R Hjaltason and Hanan Samet. Distance browsing in spatial databases. ACM Transactions on Database Systems (TODS), 24(2):265–318, 1999.
    [15] Huiqi Hu, Guoliang Li, Zhifeng Bao, Jianhua Feng, Yongwei Wu, Zhiguo Gong, and Yaoqiang Xu. Top-k spatio-textual similarity join. IEEE Transactions on Knowledge and Data Engineering, 28(2):551–565, 2016.
    [16] Ihab F Ilyas, Walid G Aref, and Ahmed K Elmagarmid. Supporting top-k join queries in relational databases. The VLDB JournalThe International Journal on Very Large Data Bases, 13(3):207–221, 2004.
    [17] Ihab F Ilyas, George Beskales, and Mohamed A Soliman. A survey of top-k query processing techniques in relational database systems. ACM Computing Surveys (CSUR), 40(4):11, 2008.
    [18] Ihab F Ilyas, Davide Martinenghi, and Marco Tagliasacchi. Rank-join algorithms for search computing. In Search Computing, pages 211–224. Springer, 2010.
    [19] Flip Korn and Suresh Muthukrishnan. Influence sets based on reverse nearest neighbor queries. In ACM Sigmod Record, volume 29, pages 201–212. ACM, 2000.
    [20] Hans-Peter Kriegel, Peer Kr¨oger, Matthias Renz, Andreas Z¨ufle, and Alexander Katzdobler. Incremental reverse nearest neighbor ranking. In Data Engineering,
    2009. ICDE’09. IEEE 25th International Conference on, pages 1560–1567. IEEE, 2009.
    [21] Apostol Natsev, Yuan-Chi Chang, John R Smith, Chung-Sheng Li, and Jeffrey Scott Vitter. Supporting incremental join queries on ranked inputs. In VLDB, volume 1, pages 281–290, 2001.
    [22] Shuyao Qi, Panagiotis Bouros, and Nikos Mamoulis. Efficient top-k spatial distance joins. In International Symposium on Spatial and Temporal Databases, pages 1–18.
    Springer, 2013.
    [23] Jo˜ao B Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørv°ag. Efficient processing of top-k spatial keyword queries. In International Symposium on Spatial and Temporal Databases, pages 205–222. Springer, 2011.
    [24] Nick Roussopoulos, Stephen Kelley, and Fr´ed´eric Vincent. Nearest neighbor queries. In ACM sigmod record, volume 24, pages 71–79. ACM, 1995.
    [25] Andrei Scheinkman. nba-draft-2015.https://github.com/fivethirtyeight/data/blob/master/nba-draft-2015/historical_projections.csv
    [26] Karl Schnaitter and Neoklis Polyzotis. Evaluating rank joins with optimal cost. In Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 43–52. ACM, 2008.
    [27] George Tsatsanifos and Akrivi Vlachou. On processing top-k spatio-textual preference queries. In EDBT, pages 433–444, 2015.
    [28] Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis, and Kjetil Nørv°ag. Reverse top-k queries. In Data Engineering (ICDE), 2010 IEEE 26th International Confer-
    ence on, pages 365–376. IEEE, 2010.
    [29] Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis, and Kjetil Nørv°ag. Monochromatic and bichromatic reverse top-k queries. IEEE Transactions on Knowledge and Data Engineering, 23(8):1215–1229, 2011.
    [30] Akrivi Vlachou, Christos Doulkeridis, Kjetil Nørv°ag, and Yannis Kotidis. Identifying the most influential data objects with reverse top-k queries. Proceedings of the VLDB Endowment, 3(1-2):364–372, 2010.
    [31] Akrivi Vlachou, Christos Doulkeridis, Kjetil Nørv°ag, and Yannis Kotidis. Branchand-bound algorithm for reverse top-k queries. In Proceedings of the 2013 ACM SIGMOD international conference on management of data, pages 481–492. ACM, 2013.
    [32] Biyan Wang, Zhengqing Dai, Cuiping Li, and Hong Chen. Efficient computation of monochromatic reverse top-k queries. In Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on, volume 4, pages 1788–1792. IEEE, 2010.
    [33] Dingming Wu, Man Lung Yiu, Gao Cong, and Christian S Jensen. Joint topk spatial keyword query processing. IEEE Transactions on Knowledge and Data Engineering, 24(10):1889–1903, 2012.
    [34] Wei Wu, Fei Yang, Chee-Yong Chan, and Kian-Lee Tan. Finch: Evaluating reverse k-nearest-neighbor queries on location data. Proceedings of the VLDB Endowment, 1(1):1056–1067, 2008.
    [35] Shiyu Yang, Muhammad Aamir Cheema, Xuemin Lin, and Ying Zhang. Slice: reviving regions-based pruning for reverse k nearest neighbors queries. In Data Engineering (ICDE), 2014 IEEE 30th International Conference on, pages 760–771. IEEE, 2014.
    [36] Shiyu Yang, Muhammad Aamir Cheema, Xuemin Lin, Ying Zhang, and Wenjie Zhang. Reverse k nearest neighbors queries and spatial reverse top-k queries. The VLDB Journal, 26(2):151–176, 2017.

    下載圖示 校內:2023-08-24公開
    校外:2023-08-24公開
    QR CODE