成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	劉佩琦 Liu, Pei-Chi
論文名稱：	有效率處理多個最近鄰居查詢的框架 An Efficient Processing Framework for Multiple k Nearest Neighbor (kNN) Queries
指導教授：	李強 Lee, Chiang
共同指導教授:	鍾毓驥 Chung, Yu-Chi
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2011
畢業學年度：	99
語文別：	中文
論文頁數：	55
中文關鍵詞：	最近鄰居查詢、多查詢處理、查詢處理
外文關鍵詞：	kNN search, Multiple queries, query processing
相關次數：	點閱：68 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在資料庫的領域中， kNN (k Nearest Neighbor) 查詢是個十分熱門的議題。給定一個資料點集合 D ，一個 kNN 查詢點 q 。kNN 查詢回傳在 D 中的 k 個離 q 最近的資料點。kNN 查詢的應用相當的廣泛，不僅應用在地理空間上的查詢，更被應用在許多領域中。例如 : 多媒體資料庫、資料探勘、 scientific databases 、 video retrieval 等。在過去，已經有許多的研究在探討如何有效率的處理 kNN 查詢，這些研究都假設 server 一次只處理一個 kNN 查詢。然而，這些研究並沒有考量到 server 其實在一段時間中，會收到多個 kNN 的查詢的狀況。我們發現，在過去的研究中，當 server 收到多個查詢時， server 將每個查詢都視為獨立的個體，並分別處理它們。這樣的作法會使得 server 需要不斷重複存取資料庫，取得許多已存取過的資訊。Server 會因忙碌於存取資料而效能下降，同時也造成 I/O 的浪費。在本論文中，我們針對這樣的問題提出名為 Multiple kNN 的演算法。Multiple kNN 主要的概念是透過 "資訊分享" 的方式，使 server 可以利用過去查詢處理後的結果，進而加速現有查詢處理的效率。所謂的資訊分享，指的是「當兩個查詢點距離相近時，其 kNN 答案的重複率也很高」的現象。我們所設計的演算法利用了這個特性， server 會將處理過查詢的結果儲存起來，稍後 server 處理新的查詢時即可使用舊有資訊幫助找尋此次查詢的結果。最後，我們進行大量的實驗以分析 Multiple kNN algorithm 的效能，並且與傳統的 kNN 演算法 Best-First Search (BFS) 比較。從實驗的結果可以展現 Multiple kNN 在 I/O cost 及計算效率上都較 BFS 傑出的表現。

The problem of kNN (k Nearest Neighbor) query has received considerable attention from the database and information retrieval communities. Given a dataset D and a kNN query q, kNN query finds the closest k data points to q. The applications of kNN query is widely, not only in spatio-temporal database but also in many areas. For example, it’s used in multimedia database, data mining, scientific database and video retrieval. The past studies of kNN query processing did not consider the case that the server may receive multiple kNN queries at a time. Their algorithms process queries independently. Thus, server will busy with continuously reaccessing database to obtain the data that
already acquired. It results in wasting I/O cost and degrading the performance of whole system. In this paper, we focus on this problem and propose an algorithm that namely
Multiple kNN. The main idea of Multiple kNN is “information sharing” strategy that server re-uses the query results of previously executed queries for efficiently processing
subsequent queries. We conduct a comprehensive set of experiments to analyze the performance of Multiple kNN and compare with Best-First Search (BFS) algorithm. Empirical studies indicate that the performance of Multiple kNN outperforms BFS, achieves lower I/O cost and less running time.

Abstract . . . . . . . . . .. . . . . .   . . . . . . . . . . . . . . . . . . . .  i
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .iv
Table of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Table of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. viii
Table of Algorithms . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . ix
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1 Single kNN query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Multiple kNN queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Preliminary . . . . . . . . . . .  . . . . . . . . . .  . . . . . . . . . .  . . 12
1 Assumptions and Problem Definition . . . . . . . . . . . . . . . . . . . . . . 12
2 The Index Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
2.1 The Query Result Hash Table (QRHT) . . . . . . . . . . . . . . . . . . . . . 14
2.2 The R-Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 TheMBR Pointer Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Multiple kNN Queries Algorithm (Multiple kNN) . . . . . . . . . . . . . . . . .  21
1 Multiple kNN Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2 EX-Multiple kNN Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . .  27
Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . .  38
1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2 Datasets and Querysets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  41
3.1 Determination of Data Grid Granularity(δ) . . . . . . . . . . . . . . . . . 41
3.2 Effect of Number of Objects/Queries on Synthetic Data . . . . . . . . . . .  43
3.3 Effect of Dimensions on Synthetic Data . . . . . . . . . . . . . . . . . . . 44
3.4 Effect of Number of Objects/Queries on Real-World Data . . . . . . . . . . . 45
3.5 Effect of Number of RandomCluster Queries . . . . . . . . . . . . . . . . . .46
3.6 Effect of Dimensions with RandomCluster Queryset . . . . . . . . . . . . . . 48
3.7 Effect of Nnumber of Random Cluster Queries on Real-World Data . . . . . . . 48
3.8 Effect of Number of Neighbors(k Values) . . . . . . . . . . . . . . . . . .  49
Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . .  51
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  52
                                    

[1] X. Xiong, M. F. Mokbel, and W. G. Aref, “Sea-cnn: Scalable processing of continuous k-nearest neighbor queries in spatio-temporal databases,” in ICDE, pp.643–654, April 2005.
[2] G. R. Hjaltason and H. Samet, “Distance browsing in spatial databases,” TODS,June 1999.
[3] N. Roussopoulos, S. Kelly, and F. Vncent, “Nearest neighbor queries,” in ACM SIGMOD, May 1995.
[4] J. Zhang, M. Zhu, D. Papadias, Y. Tao, and D. L. Lee, “Location-based spatial queries,” in ACM SIGMOD, June 2003.
[5] M. Kolahdouzan and C. Shahabi, “Voronoi-based k nearest neighbor search for spatial network databases,” in VLDB, pp. 840–851, August 2004.
[6] F. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, and Z. Protopapas, “Fast nearest neighbor search in medical image databases,” in Proceedings of the 22nd International Conference on VLDB, pp. 215 – 226, September 1996.
[7] K. Mouratidis and D. Papadias, “Continuous nearest neighbor queries over sliding windows,” in TKDE, June 2007.
[8] T. Seidl and H. Kriegel, “Efficient user-adaptable similarity search in large multimedia database,” in Proceedings of International Conference on VLDB, pp. 506–515, August 1997.
[9] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, “Advances in knowledge discovery and data mining,” in AAAI Press/MIT Press, p. 611, 1996.
[10] B. Cui, B. C. Ooi, J. Su, and K.-L. Tan, “Indexing high-dimensional data for efficient in-memory similarity search,” TKDE, vol. 17, March 2005.
[11] H. Lu, B. C. Ooi, H. T. Shen, and X. Xue, “Hierarchical indexing structure for efficient similarity search in video retrieval,” TKDE, vol. 18, pp. 1544–1559, 2006.
[12] C. Bohm, B. C. Ooi, C. Plant, and Y. Yan, “Efficiently processing continuous k-nn queries on data streams,” in ICDE, April 2007.
[13] M. Sharifzadeh and C. Shahabi, “Vor-tree: R-trees with voronoi diagrams for efficient processing of spatial nearest neighbor queries,” in Proceedings of the VLDB Endowment, 2010.
[14] Y. Tao, K. Yi, C. Sheng, and P. Kalnis, “Quality and efficiency in high dimensional nearest neighbor search,” in ACM SIGMOD, June 2009.
[15] A. Gionis, P. Indyk, and R. Motwani, “Similarity search in high dimensions via hashing,” in VLDB, pp. 518 – 529, September 1999.
[16] X. Zhang, L. Shou, K.-L. Tan, and G. Chen, “idisque: Tuning high-dimensional similarity queries in dht networks,” in DASFAA, April 2010.
[17] G. Strang, inear Algebra and its Applications. Thomson Learning, 1980.
[18] B. Cui, B. C. Ooi, J. Su, and K.-L. Tan, “Contorting high dimensional data for efficient main memory knn processing,” in ACM SIGMOD, pp. 479–490, June 2003.
[19] M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni, “Locality-sensitive hashing scheme based on p-stable distributions,” in ACM SCG, pp. 253 – 262, 2004.
[20] N. Koudas, B. C. Ooi, K.-L. Tan, and R. Zhang, “Approximate nn queries on streams with guaranteed error/performance bounds,” in Proceedings of the 30th
VLDB Conference, pp. 804–815, August 2004.
[21] K. Mouratidis, M. Hadjieleftheriou, and D. Papadias, “Conceptual partitioning:an efficient method for continuous nearest neighbor monitoring,” in ACM SIGMOD, pp. 634–645, June 2005.
[22] M. Hasan, M. Aamir, W. Qu, and X. Lin, “Efficient algorithms to monitor continuous constrained k nearest neighbor queries,” in DASFAA, April 2010.
[23] M. F. Mokbel, X. Xiong, and W. G. Aref, “Sina: Scalable incremental processing of continuous queries in spatio-temporal databases,” in ACM SIGMOD, pp. 623–634, June 2004.
[24] M. F. Mokbel and W. G. Aref, “Sole: Scalable on-line execution of continuous queries on spatio-temporal data streams,” VLDB, vol. 17, pp. 971–995, 2008.
[25] X. Yu, K. Q. Pu, and N. Koudas, “Monitoring k-nearest neighbor queries over moving objects,” in ICDE, April 2005.
[26] I. T. Jolliffe, “Principal component analysis,” in Springer-Verlag, 1996.
[27] G.-H. Cha, X. Zhu, D. Petkovic, and C.-W. Chung, “An efficient indexing method for nearest neighbor searches in high-dimensional image databases,” in IEEE
TRANSACTIONS ON MULTIMEDIA, pp. 76–87, MARCH 2002.
[28] A. Guttman, “R-trees: A dynamic index structure for spatial searching,” in ACM SIGMOD, June 1984.
[29] N. R. Timos Sellis and C. Faloutsos, “The r+-tree: A dynamic index for multidimensional objects,” in VLDB, September 1987.
[30] R. S. Norbert Beckmann, Hans-Peter Knegel and B. Seeger, “The r*-tree: An efficient and robust access method for points and rectangles,” in ACM SIGMOD, June 1990.
[31] W.-C. P. Tao-Yang Fu and W.-C. Lee, “Parallelizing itinerary-based knn query processing in wireless sensor networks,” in TKDE, August 2010.
[32] X. T. Yuxia Yao and E.-P. Lim, “Localized monitoring of knn queries in wireless sensor networks,” in VLDB Journal, August 2009.
[33] http://www.rtreeportal.org/.

校內：2016-07-11公開
校外：不公開電子論文尚未授權公開，紙本請查館藏目錄

簡易檢索 / 詳目顯示

相關論文