簡易檢索 / 詳目顯示

研究生: 林芳君
Lin, Fang-Chun
論文名稱: 以視覺詞彙之連通性做景點物件之快速檢索
Fast Landmark Indexing Using Connectivity between Visual Words
指導教授: 戴顯權
Tai, Shen-Chuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 46
中文關鍵詞: 視覺詞袋物件檢索幾何驗證Delaunay三角形TF-IDF
外文關鍵詞: Bag-of-Visual-Words, object retrieval, geometric verification, Delaunay triangulations, TF-IDF
相關次數: 點閱:103下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 視覺詞袋模型常見於多媒體暨電腦視覺的研究中,包含:影像分類、物件辨識及本文的景點檢索系統中。對影像中擷取的SIFT描述子分群形成視覺詞彙,並透過視覺詞彙的出現頻率來描述影像。有鑒於視覺詞袋空間資訊上的不足,物件檢索的性能會急遽下降。為了改善視覺詞袋的缺點,本論文旨在提出一個新穎的幾何一致性分析和驗證,考慮視覺詞彙之間的相對位置,透過Delaunay三角形作為景點物件的幾何模型,有效地改善傳統的視覺詞袋模型。為達更好的效果,利用改良的TF-IDF公式來計算視覺詞彙的重要性,進而產生更精確的加權影像特徵向量。
    我們以Oxford building 5K和 Paris 6K影像集來驗證所提出的系統。實驗結果顯示,透過有效率的後處理步驟,我們的方法相較於其他的幾何模型具有較優的判別力。

    Bag-of-visual-words (BoVW) model has been widely adopted in the multimedia and the computer vision community, e.g. image classification, object recognition and landmark retrieval. In BoVW framework, visual words are built by clustering SIFT descriptors extracted from images. Each image is represented as a histogram containing the count of each visual word. Due to its ignorance of the spatial context information, the performance of object retrieval will decrease dramatically. To solve this, the objective of this thesis is to propose a novel geometric verification that captures the relative spatial information of visual words and then constructs the structure representation of a landmark to improve the standard BoVW model. Moreover, to get better results, the modified TF-IDF technique is used to calculate the significance of visual words for obtaining a more accurate weighted feature vector of image.
    In the experiment, two image datasets are used for the evaluation, i.e., Oxford Building 5K and Paris 6K dataset. The experimental results show that the proposed approach has better performance compared to other methods with spatial models with efficient post-processing modules.

    Contents i List of Tables iii List of Figures iv Chapter 1 Introduction 1 Chapter 2 Background and Related Works 4 2.1 Content-based Image Retrieval 4 2.2 Feature Representation 6 2.2.1 Features 6 2.2.2 Bag-of-words Model 11 2.3 Related Works 14 2.3.1 Spatial Pyramid Matching 14 2.3.2 RANSAC 17 2.3.3 Geometry-preserving Visual Phrases 19 Chapter 3 The Proposed Algorithm 21 3.1 Weighting Scheme 23 3.2 Geometric Verification 26 3.3 Spatial Re-ranking 28 Chapter 4 Experimental Results 31 4.1 Dataset and Evaluation Protocol 31 4.2 Experimental Setting and Baseline 33 4.3 Effect of Weighting Scheme 34 4.4 Performance Evaluation 37 4.5 Time Costs 38 Chapter 5 Conclusion and Future Work 40 5.1 Conclusion 40 5.2 Future Work 41 Reference 42

    [1] Y. Avrithis, “Quantize and conquer: A dimensionality-recursive solution to clustering, vector quantization, and image retrieval,” in Proc. ICCV’13, pp. 3024–3031, Dec. 2013.
    [2] R. Baeza-Yates and B. Ribeiro-Neto, “Modern Information Retrieval,” Addison-Wesley, 1999.
    [3] S. Belongie and J. Malik, “Matching with shape contexts,” in Content-based Access of Image and Video Libraries, 2000. Proceedings. IEEE Workshop, pp. 20-26, 2000.
    [4] G.R. Bradski, “Computer vision face tracking for use in a perceptual user interface” Intel Technology Journal, 1998.
    [5] Y. Cao, C. Wang, Z. Li, L. Zhang, and L. Zhang, "Spatial-bag-of- features," in CVPR, 2010.
    [6] A. Criminisi, J. Shotton, S. Bucciarelli, “Decision forests with long-range spatial context for organ localization in CT volumes,” in: MICCAI Workshop on Probabilistic Models for Medical Image Analysis, 2009.
    [7] O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman, “Total recall: Automatic query expansion with a generative feature model for object retrieval,” in Proc. ICCV, 2007.
    [8] L. Dai, X. Sun, F. Wu and N. Yu, "Large scale image retrieval with visual groups", in Proc. 20th IEEE Int. Conf. Image Process. (ICIP), pp.2582 -2586.
    [9] A. Del Bimbo, M. Mugnaini, P. Pala, and F. Turco. Picasso, “Visual querying by color perceptive regions,” in Proceedings of the 2nd International Conference on Visual Information Systems, San Diego, pages 125-131, 1997.
    [10] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in CVPR, 2005.
    [11] M.N. Do, M. Vetterli, “Wavelet-based texture retrieval using generalized Gaussian density and Kullback–Leibler distance,” in IEEE Trans. Image Process., pp. 146–158, 2002.
    [12] M. Fischler and R. Bolles, “Random sample consensus: A paradigm for model fitting applications to image analysis and automated cartography,” in Proc. Image Understanding Workshop, pp.71 -88, 1980.
    [13] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele and P. Yanker, “Query by Image and Video Content,” The QBIC System, IEEE Computer, 1995.
    [14] G. Giacinto, F. Roli, “Instance-based relevance feedback for image retrieval,” in Advances in Neural Information Processing Systems (NIPS), vol. 17, pp. 489–49, 2005.
    [15] I. González-Díaz, C. E. Baz-Hormigos, and F. Díaz-de-María, “A generative model for concurrent image retrieval and ROI segmentation,” IEEE Trans. Multimedia, vol. 16, no. 1, pp. 169–183, 2014.
    [16] J. Huang, S. R. Kumar, M. Mitra, W. J. Zhu, and R. Zabih, “Image indexing using color correlograms,” in IEEE Int. Conf. Computer Vision and Pattern Recognition, San Juan, Puerto Rico, pp.762–768, Jun. 1997.
    [17] H. Jegou, M. Douze, C. Schmid, and P. Perez, “Aggregating local descriptors into a compact image representation,” in Proc. IEEE Conf. CVPR, pp. 3304–3311, Jun. 2010.
    [18] L. S. Kennedy and M. Naaman, “Generating diverse and representative image search results for landmarks,” in Proc. 17th Int. Conf. World Wide Web, pp.297-306, 2008.
    [19] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bag of features: Spatial pyramid matching for recognizing natural scene categories,” in Proc. CVPR, 2006.
    [20] T. Lindeberg and J. Garding, “Shape-adapted smoothing in estimation of 3-D depth cues from affine distortions of local 2-D brightness structure,” in Proceedings of the Third European Conference on Computer Vision, pp. 389–400, 1994.
    [21] T. Lindeberg, “Feature detection with automatic scale selection” in Int. J. of Computer Vision, pp. 79–116, 1998.
    [22] C. Liu, H. Wechsler, “Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition,” in IEEE Trans. Image Process., pp. 467–476, 2002.
    [23] X. Li, W. Hu, C. Shen, Z. Zhang, A. Dick, and A. Hengel. A survey of appearance models in visual object tracking. ACM Transactions on Intelligent Systems and Technology (TIST), 2013.
    [24] Y. Li, D. J. Crandall and D. P. Huttenlocher, “Landmark classification in large-scale image collections,” in Proc. IEEE 12thInt. Conf. Comput. Vision, pp.1957 -1964, 2009.
    [25] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” in Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, Nov. 2004.
    [26] K. Mikolajczyk and C. Schmid, “Indexing Based on Scale Invariant Interest Points,” Proc. Eighth Int’l Conf. Computer Vision, pp. 525- 531, 2001.
    [27] K. Mikolajczyk and C. Schmid, “An Affine Invariant Interest Point Detector,” Proc. Seventh European Conf. Computer Vision, pp. 128- 142, 2002.
    [28] M. Mahoor and M. Abdel-Mottaleb, “A multimodal approach for face modeling and recognition,” IEEE Trans. on Information Forensics and Security, pp. 431–440, 2008.
    [29] D. Nister and H. Stewenius, "Scalable recognition with a vocabulary tree", Proc. IEEE Comput. Soc. Conf. CVPR, pp. 2161–2168, Jun. 2006.
    [30] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Object retrieval with large vocabulary and fast spatial matching,” in Proc. IEEE Conf. CVPR, pp. 1–8, Jun. 2007.
    [31] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Lost in quantization: Improving particular object retrieval in large scale image databases,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
    [32] A. Qamra and E. Y. Chang, “Scalable landmark recognition using EXTENT,” in Multimedia Tools Appl., vol. 38, no. 2, pp. 187–208, Jun. 2008.
    [33] X. Shen, Z. Lin, J. Brandt, S. Avidan, and Y. Wu, “Object retrieval and localization with spatially-constrained similarity measure and k-nn re-ranking,” in Proc. 2012 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 3013–3020, 2012.
    [34] J. Sivic and A. Zisserman, “Video google: A text retrieval approach to object matching in videos,” in Proc. 9th IEEE ICCV, pp. 1470–1477, Oct. 2003.
    [35] M. J. Swain and B.H. Ballard, “Color Indexing,” in International Journal Computer Vision, vol. 7, no. 1, pp. 11-32, 1991.
    [36] H. Tamura , S. Mori and T. Yamawaki, “Textural features corresponding to visual perception,” in IEEE Trans. Systems, Man Cybern., vol. 8, no. 6, pp.460-476, 1978.
    [37] P. Torr and D.W. Murray, “The Development and Comparison of Robust Methods for Estimating the Fundamental Matrix,” Int’l J. Computer Vision, vol. 24, pp. 271-300, 1997.
    [38] F. Wang, H. Wang, H. Li, “Large scale image retrieval with practical spatial weighting for bag-of-visual-words” in USA: Advances in Multimedia Modeling Lecture Notes in Computer Science pp. 513-523, 2013.
    [39] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, “Locality-constrained linear coding for image classification,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3360-3367, 2010.
    [40] J. Yang, Y.-G. Jiang, A. G. Hauptmann, and C.-W. Ngo, “Evaluating bag-of-visual-words representations in scene classification,” in Proc. Int. Workshop MIR, pp. 197–206, 2007.
    [41] Y. Zhang, Y. Zhao, and T. Chen, “Image retrieval with geometry-preserving visual phrases”, in IEEE International Conference on Computer Vision and Pattern Recognition, pp.809-816, 2011.
    [42] L. Zheng and S. Wang, “Visual phraselet: Refining spatial constraints for large scale image search", in IEEE Signal Process. Lett., vol. 20, no. 4, pp.391-394, 2013.

    無法下載圖示 校內:2026-01-29公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE