| 研究生: |
王梓憲 Wang, Tzu-Hsien |
|---|---|
| 論文名稱: |
使用 MapReduce 架構有效處理異種鄰近物體聚合距離查詢 Efficient Processing of Aggregate-distance Queries on Heterogeneous Neighboring Objects using MapReduce |
| 指導教授: |
李強
Lee, Chiang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2016 |
| 畢業學年度: | 104 |
| 語文別: | 英文 |
| 論文頁數: | 59 |
| 中文關鍵詞: | 基於位置之查詢 、異種物體 、異種鄰近物體 、聚合距離查詢 |
| 外文關鍵詞: | location-based queries, heterogeneous objects, heterogeneous neighboring objects, aggregate-distance query |
| 相關次數: | 點閱:70 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,大多數的研究專注於處理單一物體的基於位置之查詢 (location-based queries)。
然而,在現實生活中,空間中存在著許多不同種類的物體,使用者可能對於不同種類的物體較感興趣。
我們稱這些不同種類的物體為異種物體 (heterogeneous objects,簡稱為 HOs),如果在 HOs 中任兩個物體的距離都小於或等於 d,則此 HOs 被稱為異種鄰近物體 (heterogeneous neighboring objects,簡稱為 HNOs)。
對於處理 HNOs 的 location-based queries 相較於單一種類型態的物體來說更加複雜。
因此,在本論文中,我們提出一種新穎且有用之 location-based query 來針對 HNOs 做處理,進而提供給使用者有用的 HNOs 資訊。
這個新穎的 location-based query 稱之為聚合距離查詢 (aggregate-distance query,簡稱為 AggDQ)。
給定一個查詢點集合 Q、一個正整數值 k 以及一個距離值 d,對於每一個查詢點 q 來說,AggDQ 找尋 k 組 HNOs (每一組 HNOs 當中的物體彼此之間的距離小於或等於 d),且這 k 組 HNOs 到查詢點 q 的聚合距離較其他組 HNOs 小。
為了以分散式和平行化的方式有效率地處理 AggDQ,我們將給定的空間分解成若干個網格單元,並且在 MapReduce 框架的基礎上,設計了兩個演算法,分別是聚合距離查詢演算法與增強型聚合距離查詢演算法。
最後,我們以真實資料集與合成資料集為基礎,實作大量的實驗來證明我們提出演算法之效率及可擴展性。
Currently, most of the processing techniques for the conventional location-based queries focus only on a single type of objects.
However, in real-life applications the user may be interested in obtaining information about different types of objects, in terms of their neighboring relationship.
We term the different types of objects closer to each other the heterogeneous neighboring objects (HNOs for short).
Efficient processing of the location-based queries on the HNOs is more complicated than that on a single data source, because the neighboring relationship between the HNOs inevitably affects the query result.
In this thesis, we present a novel and important query on the HNOs, namely the aggregate-distance query (AggDQ for short), which can provide useful object information by considering both the spatial closeness of objects to the sets of query objects and the neighboring relationship between objects.
Given a set of query objects Q, an integer k, and a distance d, the AggDQ retrieves the k-sets of HNOs for each query object q, such that the distance between any two objects in each set of HNOs is less than or equal to d and the aggregate-distances of the k-sets of HNOs to q are the smallest among all sets of HNOs.
To efficiently process the AggDQ in a distributed and parallel manner, we divide the given space into grid cells and develop two algorithms based on the MapReduce framework, namely the MR-AggDQ and EMR-AggDQ algorithms.
Comprehensive experiments using both real and synthetic datasets are conducted to demonstrate the efficiency and scalability of the proposed algorithms.
[1] A. Akdogan, U. Demiryurek, F. Banaei-Kashani, and C. Shahabi, “Voronoi-based geospatial query processing with mapreduce,” in IEEE Second International Conference on Cloud Computing Technology and Science, pages 9-16, 2010.
[2] R. Benetis, C. S. Jensen, G. Karciauskas, and S. Saltenis, “Nearest and reverse nearest neighbor queries for moving objects,” in VLDB Journal, 15(3), pages 229-249, 2006.
[3] T. Brinkhoff, H. P. Kriegel, and B. Seeger, “Efficient processing of spatial joins using r-trees,” in ACM SIGMOD, 22(2), pages 237-246, 1993.
[4] A. Cary, Z. Sun, V. Hristidis, and N. Rishe, “Experiences on processing spatial data with mapreduce,” in International Conference on Scientific and Statistical Database Management, pages 302-319, 2009.
[5] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, R. E. Gruber, “Bigtable: A distributed storage system for structured data,” in Conference on USENIX Symposium on Operating Systems Design and Implementation, pages 205-218, 2006.
[6] J. Chen and R. Cheng, “Efficient evaluation of imprecise location-dependent queries,” in International Conference on Data Engineering, pages 586-595, 2007.
[7] Y. Chen and J. M. Patel, “Efficient evaluation of all-nearest-neighbor queries,” in International Conference on Data Engineering, pages 1056-1065, 2007.
[8] D. W. Choi and C. W. Chung, “Nearest neighborhood search in spatial databases,” in International Conference on Data Engineering, pages 669-710, 2015.
[9] B. S. Chung, W. C. Lee, and A. L. Chen, “Processing probabilistic spatio-temporal range queries over moving objects with uncertainty,” in International Conference on
Extending Database Technology: Advances in Database Technology, pages 60-71, 2009.
[10] J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,” in Communications of the ACM, 51(1), pages 107-113, 2008.
[11] L. George, “HBase: The Definitive Guide,” in O’Reilly, Inc., 2011.
[12] A. Guttman, “R-trees: A dynamic index structure for spatial searching,” in ACM SIGMOD, 14(2), pages 47-57, 1984.
[13] G. R. Hjaltason and H. Samet, “Distance browsing in spatial databases,” in ACM Transactions on Database Systems, 24(2), pages 265-318, 1999.
[14] Y. K. Huang, W. H. Kuo, C. Lee, and T. H. Wang, “Shortest Average-Distance Query on Heterogeneous Neighboring Objects,” in International Database Engineering and Applications Symposium, 2015.
[15] Y. K. Huang and L. F. Lin, “Continuous within query in road networks,” in International Wireless Communications and Mobile Computing Conference, pages 1176-1181, 2011.
[16] G. Iwerks, H. Samet, and K. Smith, “Continuous k-nearest neighbor queries for continuously moving points with updates,” in International Conference on Very Large Data Bases, 29, pages 512-523, 2003.
[17] C. Ji, T. Dong, Y. Li, Y. Shen, K. Li, W. Qiu, W. Qiu, and M. Guo, “Inverted grid-based knn query processing with mapreduce,” in ChinaGrid Annual Conference, pages 25-32, 2012.
[18] C. Ji, H. Hu, Y. Xu, Y. Li, and W. Qu, “Efficient multi-dimensional spatial RKNN query processing with mapreduce,” in ChinaGrid Annual Conference, pages 63-68, 2013.
[19] D. Jiang, B. C. Ooi, L. Shi, and S. Wu, “The performance of mapreduce: An in-depth study,” in International Conference on Very Large Data Bases, 3(1-2), pages 472-483, 2010.
[20] D. V. Kalashnikov, S. Prabhakar, S. Hambrusch, and W. Aref, “Efficient evaluation of continuous range queries on moving objects,” in International Conference on Database and Expert Systems Applications, pages 731-740, 2002.
[21] W. Lu, Y. Shen, S. Chen, and B. C. Ooi, “Efficient processing of k nearest neighbor joins using mapreduce,” in International Conference on Very Large Data Bases, 5(10), pages 1016-1027, 2012.
[22] F. Korn and S. Muthukrishnan, “Influence sets based on reverse nearest neighbor queries,” in ACM SIGMOD, 29(2), pages 201-212, 2000.
[23] M. F. Mokbel, X. Xiong, and W. G. Aref, “Sina: Scalable incremental processing of continuous queries in spatio-temporal databases,” in ACM SIGMOD, pages 623-634, 2004.
[24] N. Nodarakis, E. Pitoura, S. Sioutas, A. K. Tsakalidis, D. Tsoumakos, and G. Tzimas, “Efficient multidimensional aknn query processing in the cloud,” in International Conference on Database and Expert Systems Applications, pages 477-491, 2014.
[25] C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, “Pig Latin: a not-so-foreign language for data processing,” in ACM SIGMOD, pages 1099-1110, 2008.
[26] “OpenStreetMap,” http://www.openstreetmap.org/.
[27] D. Papadias, Q. Shen, Y. Tao, and K. Mouratidis, “Group nearest neighbor queries,” in International Conference on Data Engineering, pages 301-312, 2004.
[28] A. Papadopoulos and Y. Manolopoulos, “Multiple range query optimization in spatial databases,” in East European Symposium on Advances in Databases and Information Systems, pages 71-82, 1998.
[29] N. Roussopoulos, S. Kelley, and F. Vincent, “Nearest neighbor queries,” in ACM SIGMOD, 24(2), pages 71-79, 1995.
[30] M. Sharifzadeh and C. Shahabi, “The spatial skyline queries,” in International Conference on Very Large Data Bases, pages 751-762, 2006.
[31] A. P. Sistla, O. Wolfson, S. Chamberlain, and S. Dao, “Modeling and querying moving objects,” in International Conference on Data Engineering, 97, pages 422-432, 1997.
[32] Y. Tao and D. Papadias, “Time-parameterized queries in spatio-temporal databases,” in ACM SIGMOD, pages 334-345, 2002.
[33] A. Thusoo, J. S. Sarma, N. Jain, , Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, R. Murthy, “Hive: a warehousing solution over a map-reduce framework,” in International Conference on Very Large Data Bases, 2(2), pages 1626-1629, 2009.
[34] A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Anthony, H. Liu, R. Murthy, “Hive - a petabyte scale data warehouse using Hadoop,” in International
Conference on Data Engineering, pages 996-1005, 2010.
[35] C. Xia, H. Lu, B. C. Ooi, and J. Hu, “Gorder: An efficient method for knn join processing,” in International Conference on Very Large Data Bases, 30, pages 756-767, 2004.
[36] M. L. Yiu, N. Mamoulis, and D. Papadias, “Aggregate nearest neighbor queries in road networks,” in ACM Transactions on Knowledge and Data Engineering, 17(6),
pages 820-833, 2005.
[37] T. Yokoyama, Y. Ishikawa, and Y. Suzuki, “Processing all k-nearest neighbor queries in hadoop,” in International Conference on Web-Age Information Management, pages 346-351, 2012.
[38] D. Zhang, Y. M. Chee, A. Mondal, A. K. H. Tung, and M. Kitsuregawa, “Keyword search in spatial databases: Towards searching by document,” in International Conference on Data Engineering, pages 688-699, 2009.
[39] S. Zhang, J. Han, Z. Liu, K. Wang, and S. Feng, “Spatial queries evaluation with mapreduce,” in International Conference on Grid and Cooperative Computing,
pages 287-292, 2009.
[40] C. Zhang, F. Li, and J. Jestes, “Efficient parallel knn joins for large data in mapreduce,” in International Conference on Extending Database Technology, pages 38-49, 2012.
[41] L. Zhu, Y. Jing, W. Sun, D. Mao, and P. Liu, “Voronoi-based aggregate nearest neighbor query processing in road networks,” in SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 518-521, 2010.