簡易檢索 / 詳目顯示

研究生: 陳政宏
Chen, Cheng-Hung
論文名稱: 在成本限制下,尋找最具影響力的商品
Identifying the Most Influential Product with Cost Constraint
指導教授: 李強
Lee, Chiang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 44
中文關鍵詞: 成本線top-k產品
外文關鍵詞: cost plane, top-k, product
相關次數: 點閱:91下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Top-k queries被廣泛的應用於找出符合使用者偏好的k個資料點(objects)。舉例來說,有一個顧客想買智慧型手機,卻不知道如何從眾多的手機中挑選,此時,top-k query便會根據顧客的喜愛來回傳k個顧客可能會喜歡的產品。反之產品出現在顧客的top-k list中,代表此產品將有機會被此顧客所購買。這樣的資訊對製造產品的廠商而言是一個很重要的資訊。因此我們可以延伸 top-k query,利用reverse top-k query協助廠商針對查詢點(產品)回傳一組顧客偏好,找出可能會購買此產品的顧客。在本篇論文中,我們將探討如何在固定的成本下(cost plane),尋找出最具影響力的產品。我們將影響力定義為”有多少使用者可能會購買此產品”,也就是此產品會出現在多少使用者偏好的top-k list中。現有的論文[2,3]探討了如何在現有的產品中,找出查詢點的influence score。然而若要用[2,3]的方法解決本論文所提出的問題,必須列舉cost plane上所有的點,一一的執行[2,3]所提出的演算法中。明顯的,這個作法會造成很高的計算成本,並不實用。因此我們提出了兩種演算法,Cutting the Cost plane Algorithm(CCA)與Grid-tree based Algorithm(GA)。我們將針對在固定的成本下,探討如何找出最具影響力的產品(Identifying the most influential product with cost constraint)。CCA是透過評估每個使用者偏好來找尋IS最高的答案區間,而GA則是透過grid-tree based的做法來快速找出答案。此外我們設計了一系列的實驗,展示並評估我們所提演算法的效率。我們的實驗證實我們所提出的方法可以有效率的找出最具有影響力的產品。

    Top-k queries are widely used for retrieving the k most interesting data points (objects) based on user preferences. Consider a customer wants to buy a smart phone, but he/she does not know how to choose from a lot of smart phones. In this case, top-k query is used for retrieving the k most interesting products based on the user preferences. On the other hand, if a product is in the top-k list of a customer, this product may have a high opportunity to be purchased by this customer. This information is an important business opportunity to manufacturers. Thus, we can extend the concept of top-k query processing to find the reverse top-k query for the manufacturers. By using the reverse top-k query, the top-k list of customers can be derived to manufacturers. Furthermore, the manufacturers can figure out the preferences of customers. In this thesis, we will discuss how to identify the most influential product with cost constraint. We define the influence of product as the number of customers who choose this product. The previous researches [2,3] discussed how to find the influence score of the query points with the given products. However, applying these algorithms of [2,3] to our problem has to enumerate all the points on the cost plane. Obviously, it results a high computational cost and impractical. Therefore, we propose two algorithms, the cutting the cost plane algorithm (CCA) and the grid-tree based algorithm (GA), for identifying the most influential product with cost constraint. CCA evaluates each user preference to find the greatest influence score region and GA is based on grid-tree technique to find the answer efficiently. In addition, we designed a series of experiments to demonstrate and evaluate the efficiency of our proposed algorithm. Our experiments exhibit the proposed algorithms are practical and efficient in retrieving the most influential products.

    Chinese Abstract i Abstract ii List of Contents iv List of Figures v List of Tables vii Chapter 1 INTRODUCTION 1 Chapter 2 RELATED WORK 8 2.1 Top-k Query Processing 8 2.2 Reverse Top-k Query Processing 9 2.3 Cost Plane 11 2.4 Reverse Nearest Neighbor Query Processing 13 2.5 Skyline Query Processing and Reverse Skyline Query Processing 13 Chapter 3 PRELIMINARIES 15 Chapter 4 CUTTING THE COST PLANE ALGORITHM 18 Chapter 5 GRID-TREE BASED ALGORITHM 23 Chapter 6 EXPERIMENTS 31 6.1 Experimental Setup 31 6.2 Effect of the number of data set cardinality 33 6.3 Effect of the number of weights cardinality 35 6.4 Effect of the number of dimensions 37 6.5 Effect of k 39 Chapter 7 CONCLUSIONS AND FUTURE WORK 41 References 42

    [1] Y.-C. Chang, L. Bergman, V. Castelli, C.-S. Li, M.-L. Lo, and J. R. Smaith, “The Onion technique: Indexing for linear optimization queries,” In Proceedings of International Conference on Management of Data (SIGMOD), pages 391-402, 2000.
    [2] A. Vlachou, C. Doulkeridis, Y. Kotidis, and K. Nørvåg, “Reverse top-k queries,” Proceeding on ICDE, pages 365-376, 2010.
    [3] A. Vlachou, C. Doulkeridis, K. Nørvåg, and Y. Kotidis, “Branch-and-Bound algorithm for reverse top-k queries,” In Proceedings of International Conference on Management of Data (SIGMOD), pages 481-492, 2013.
    [4] A. Vlachou, C. Doulkeridis, K. Nørvåg, and Y. Kotidis, “Identifying the most influential data objects with reverse top-k queries,” In Proceedings of International Conference on Very Large Data Bases (VLDB), pages 364-372, 2010.
    [5] C. Li, B. C. Ooi, A. K. H. Tung, and S. Wang. “DADA: A data cube for dominant relationship analysis,” In Proceedings of International Conference on Management of Data (SIGMOD), pages 659-670, 2006.
    [6] R. Ragin, “Combining fuzzy information from multiple systems,” Journal of Computer System Sciences, Vol. 58(1), pages 216-226, 1996.
    [7] R. Fagin, A. lotem and M. Naor, “Optimal aggregation algorithms for middleware,” In Proc. of Symposium on Principles of Database Systems (PODS), pages 102–113, 2001.
    [8] R. C.-W. Wong, M. T. O¨ zsu, P. S. Yu, A. W.-C. Fu, and L. Liu, “Efficient method for maximizing bichromatic reverse nearest neighbor,” In Proceedings of International Conference on Very Large Data Bases (VLDB), pages 1126-1137, 2009.
    [9] A. Arvanitis, A. Deligiannakis and Y. Vassiliou, “Efficient influence-based processing of market research queries,” Proceeding on CIKM, pages 1193-1202, 2012.
    [10] E.Dellis and B.Seeger, “Efficient computation of reverse skyline queries,” In Proceedings of International Conference on Very Large Data Bases (VLDB), pages 291-302, 2007.
    [11] J. M. Kang, M. F. Mokbel, S. Shekhar, T. Xia, and D. Zhang, “Continuous evaluation of monochromatic and bichromatic reverse nearest neighbors,” Proceeding on ICDE, pages 806-815, 2007.
    [12] S. Borzsonyi, D. Kossmann, and K. Stocker, “The skyline operator,” Proceeding on ICDE, pages 235-254, 2001.
    [13] M. Hasan, M. A. Cheema, W. Qu and X. Lin, “Efficient algorithms to monitor continuous constrained k nearest neighbor queries,” Proceeding on DASFAA, pages 233-249, 2010.
    [14] A. Vlachou, C. Doulkeridis, Y. Kotidis, and K. Nørvåg, “Monochromatic and bichromatic reverse top-k queries,” IEEE Transactions on Knowledge and Data Engineering, pages 1215-1229, 2011.
    [15] L. Chen and X. Lian, “Dynamic skyline queries in metric spaces,” Proceeding on EDBT, pages 333-343, 2008.
    [16] D. Papadias, Y. Tao, G. Fu, and B. Seeger, “An optimal and progressive algorithm for skyline queries,” In Proceedings of International Conference on Management of Data (SIGMOD), pages 467-478, 2003.
    [17] S. Ge, L. H. U, N. Mamoulis, and D. W. Cheung, “Efficient all top-k computation: A unified solution for all top-k, reverse top-k and top-m influential queries,” IEEE Transactions on Knowledge and Data Engineering (TKDE), pages 1015–1027, 2013.
    [18] D. Papadias, Y. Tao, G. Fu, and B. Seeger, “Progressive skyline computation in database systems,” ACM Transactions on Database Systems, pages 41-82, 2005.
    [19] Y. Yuan, X. Lin, Q. Liu, W. Wang, J. X. Yu, and Q. Zhang, “Efficient computation of skyline cube,” In Proceedings of International Conference on Very Large Data Bases (VLDB), pages 241-252, 2005.
    [20] K-L Tan, P-K Eng, and B. C. Ooi, “Efficient progressive skyline computation,” In Proceedings of International Conference on Very Large Data Bases (VLDB), pages 301-310, 2001.
    [21] K. Mouratidis and H. Pang, “Computing immutable regions for subspace top-k queries,” In Proceedings of International Conference on Very Large Data Bases (VLDB), pages 73-84, 2012.
    [22] J. Lee, H. Cho, and S.-w Hwang, “Efficient dual-resolution layer indexing for top-k queries,” Proceeding on ICDE, pages 1084-1095, 2012.
    [23] J. Chen and L. Feng, “Efficient pruning algorithm for top-k ranking on dataset with value uncertainty,” Proceeding on CIKM, pages 2231-2236, 2013.

    下載圖示 校內:2019-08-25公開
    校外:2019-08-25公開
    QR CODE