簡易檢索 / 詳目顯示

研究生: 王煜智
Wang, Yu-Chih
論文名稱: 應用多視角技術於深度估測與3D重建
Multiple View Techniques for Depth Estimation and 3D Reconstruction
指導教授: 詹寶珠
Chung, Pau-Choo
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2013
畢業學年度: 102
語文別: 英文
論文頁數: 82
中文關鍵詞: 多重視角雙眼視覺立體匹配視差圖估算雙眼立體定位3D重建3D平面模型
外文關鍵詞: multiple view, stereo view, stereo matching, disparity estimation, stereoscopic 3D localization, 3D reconstruction, piecewise 3D planar model
相關次數: 點閱:115下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文探討多視角電腦視覺技術於深度估測及立體重建之應用。包含密集深度圖估測、空間點坐標立體定位技術,以及多視角平面立體實景模型重建技術。

    正確且快速地估算出左右影像之間視差圖在立體顯示系統中是必要元素。視差圖估算常運用圖形切割法 (Graph Cut) 來實現。此方法以最小化全域能量函數來解決視差值指定的問題。雖然使用此法可以獲得高準確度的視差圖,但卻需要耗費大量運算時間。有鑑於此,我們提出了「階層式雙邊視差架構」(Hierarchical Bilateral Disparity Structure, HBDS),藉著階層式切分所有視差層次使其形成一系列的雙邊視差架構來增進傳統圖形切割法的效能並同時保持視差圖的品質。此外,為了處理常見的「前景擴張」現象,我們亦提出一個方法藉由先偵測擴張前景區域再恢復其正確的視差資訊來優化視差圖。最後,我們用 Middlebury 資料庫中的標準立體測試影像來評比所提出方法的效能及其產出結果的正確率並與數種常見的方法比較。

    因為現有對應點偵測技術上的限制,立體視覺定位技術無法有效運作在缺乏紋理的人體背部區域上。為解決此問題,本研究提出「應用核幾何及背部輪廓透過三角形重心座標偵測對應點」的方法 (Correspondences from Epipolar geometry and Contours via Triangle barycentric coordinates, CECT)。首先,此方法套用核幾何限制在人體背部邊界輪廓上以擷取可靠的對應點當作基礎對應點 (foundational correspondences)。透過基礎對應點以及三角形重心座標系統轉換,計算出背部區域內的對應點位置。接著,再套用三項幾何限制於估算出的對應點上以進一步確保其準確度及穩健性。最後,我們設計了一套實驗,包含三種不同的實驗情境以及二十八位受測者,來展示提出的方法的效能。

    為了從多視角相片中產生擬真的實景三維立體模型,我們提出一個運用影像間的單應性 (inter-image homography) 特性以及「半平面」(half-plane) 概念的自動三維立體重建方法。此建模方法首先會從拍攝真實場景的相片組中擷取出對應的特徵點以及對應線段,然後套用區域及共平面兩項限制條件以保留適用的對應點及線段。接著,這些對應點及線段會用在確認半平面的過程中,以找到對應到真實世界平面的正確半平面。這些確認過的半平面會根據現有資訊擴展至其極限並與其他共平面者合併,隨後組成完整的三維立體平面模型。最後我們展示用此方法重建的兩組包含多平面真實場景的三維立體模型,並藉此驗證這個方法的適用性。

    This thesis explores the multiple view computer vision techniques applied to depth estimation and 3D reconstruction, including dense depth map estimation, sparse world points localization, and multiple view 3D reconstruction of piecewise planar model.

    In realizing 3D display systems, an efficient and accurate estimation of disparity map between the stereo images is essential. The disparity estimation problem is commonly solved using graph cut methods, in which the disparity assignment problem is transformed to one of minimizing the global energy function. Although such an approach yields an accurate disparity map, the computational cost is relatively high. Accordingly, we propose a Hierarchical Bilateral Disparity Structure (HBDS) algorithm in which the efficiency of the GC-based method is improved without any loss in the disparity accuracy by dividing all the disparity levels hierarchically into a series of bilateral disparity structures of increasing fineness. To address the well-known ``foreground fattening' effect, a disparity refinement process is proposed comprising a fattening foreground region detection procedure followed by a disparity recovery process. The efficiency and accuracy of the proposed algorithm are verified and compared with several conventional methods using benchmark stereo images selected from the Middlebury dataset.

    Due to the limitation of the conventional correspondences detection methods, locating the points on the texture-less human back using stereoscopic 3D localization technique is impracticable. To cope with the issue, the present study proposes a novel correspondences detection scheme designated as Correspondences from Epipolar geometry and Contours via Triangle barycentric coordinates (CECT). In the proposed approach, reliable correspondences are extracted from the edge contours of the human back by applying epipolar geometry and are then regarded as foundations for computing the correspondences within the edge contour based on triangle barycentric coordinates system. The accuracy and robustness of the estimated correspondences are further ensured by applying three geometric constraints. The performance of the proposed approach is demonstrated by means of a series of experiments involving 28 subjects and three different testing conditions.

    An automatic 3D reconstruction method utilizing the property of inter-image homography and the concept of "half-planes" is proposed to produce realistic 3D model of a real-world scene portrayed in a set of images. The proposed modeling method starts with extracting the corresponding feature points and lines from the images of the world scene. Then, the extracted corresponding points and lines are filtered in accordance with region and coplanar constraints and are used to identify the correct half-planes of real-world planes. Finally, a complete 3D planar model is constructed by enlarging the half-planes to their full extent, and then merging all the extended half-planes which belong to the same world plane. The feasibility of the proposed approach is demonstrated by reconstructing 3D planar models of two real-world scenes containing objects with multiple planar facets.

    摘要............................ iii Abstract....................... v 誌謝............................ vii ListofTables................... x ListofFigures ................. xi 1. Introduction..................................... 1 2. Disparity Estimation Using HBDS-based GC Algorithm with a ForegroundBoundaryRefinementMechanism... 10 2.1 Disparity Estimation using the GC Method.................. 10 2.1.1 Energy Function and Graph....................... 10 2.1.2 Efficiency of GC Procedure...................... 12 2.2 HBDS Algorithm............................................ 13 2.2.1 Proposed Disparity Structure.................... 13 2.2.2 Specialized Energy Function for HBDS............ 15 2.2.3 Determining Optimal Break Point................. 19 2.3 Disparity Refinement with Fattening Foreground Region Detection.......... 22 2.3.1 Detection of Fattening Foreground Regions....... 22 2.3.2 Disparity Recovery.............................. 24 2.4 Experimental Results...................................... 26 2.4.1 Efficiency Evaluation........................... 26 2.4.2 Accuracy Evaluation............................. 29 2.4.3 Disparity Results for More Stereo Pairs......... 34 2.4.4 Disparity Results for 3D Video Sequences........ 35 3. Stereo-based 3D Localization on the Human Back Utilizing CECT Algorithm............ 39 3.1 System Overview................................ 39 3.2 CECT Algorithm................................. 40 3.2.1 Foundational Correspondences......... 41 3.2.2 Correspondences within Back Region... 44 3.3 Experimental Results........................... 49 3.3.1 Ideal Conditions .................... 52 3.3.2 Different Light Directions........... 54 3.3.3 Different Camera Positions........... 56 4. 3D Reconstruction of Piecewise Planar Models from Multiple Views .................. 58 4.1 System Overview................................ 58 4.2 Identification of Half-Planes.................. 59 4.2.1 Region Constraint.................... 60 4.2.2 Coplanar Constraint.................. 61 4.2.3 Verifying Half-Plane Regions......... 62 4.3 Reconstruction of Planar Model................. 63 4.3.1 Defining the Boundaries for the Half-Plane Extension Process... 64 4.3.2 Region Extension Based on Feature Points....................... 66 4.3.3 Grouping of Extended-Planes and Rendering of Planar Model...... 68 4.4 Experimental Results........................... 70 5. Conclusions.............................................. 75 References................................................. 78 Vita....................................................... 82

    [1] C. Baillard, C. Schmid, A. Zisserman, and A. W. Fitzgibbon, “Automatic line matching and 3d reconstruction of buildings from multiple views,” in Proc. ISPRS Conference on Automatic Extraction of GIS Objects from Digital Imagery, 1999, pp. 69–80.
    [2] K. Schindler, “Generalized use of homographies for piecewise planar reconstruction,” in Proc. Scandinavian Conference on Image Analysis, ser. Lecture Notes in Computer Science, vol. 2749. Springer Berlin Heidelberg, Jul. 2003, pp. 470–476.
    [3] G. Simon, “Automatic online walls detection for immediate use in ar tasks,” in Proc. IEEE/ACM International Symposium on Mixed and Augmented Reality, Oct. 2006, pp. 39–42.
    [4] D. Gallup, J. M. Frahm, P. Mordohai, Q. X. Yang, and M. Pollefeys, “Real-time plane- sweeping stereo with multiple sweeping directions,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Jun. 2007, pp. 1–8.
    [5] W. F. Li, J. Zhou, B. X. Li, and M. I. Sezan, “Virtual view specification and synthesis for free viewpoint television,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 4, pp. 533–546, Apr. 2009.
    [6] S. Chan, H. Y. Shum, and K. T. Ng, “Image-based rendering and synthesis,” IEEE Signal Processing Magazine, vol. 24, no. 6, pp. 22–33, Nov. 2007.
    [7] L. H. Wang, X. J. Huang, M. Xi, D. X. Li, and M. Zhang,“An asymmetric edge adaptive filter for depth generation and hole filling in 3dtv,” IEEE Transactions on Broadcasting, vol. 56, no. 3, pp. 425–431, Sep. 2010.
    [8] S. U. Yoon and Y. S. Ho, “Multiple color and depth video coding using a hierarchical repre- sentation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 11, pp. 1450–1460, Nov. 2007.
    [9] L. Shen, Z. Liu, T.Yan, Z. Zhang,and P. An,“View-adaptive motion estimation and disparity estimation for low complexity multiview video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, no. 6, pp. 925–930, Jun. 2010.
    [10] H. Karim, N. Shah, N. Arif, A. Sali, and S. Worrall, “Reduced resolution depth coding for stereoscopic 3d video,” IEEE Transactions on Consumer Electronics, vol. 56, no. 3, pp. 1705– 1712, Aug. 2010.
    [11] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo corre- spondence algorithms,” International Journal of Computer Vision, vol. 47, no. 1-3, pp. 7–42, Apr. 2002.
    [12] O. Veksler, “Fast variable window for stereo correspondence using integral images,” in Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 1, Jun. 2003, pp. 556–561.
    [13] T. Kanade and M. Okutomi, “A stereo matching algorithm with an adaptive window: Theory and experiment,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 9, pp. 920–932, Sep. 1994.
    [14] Y. Ohta and T. Kanade, “Stereo by intra- and inter-scanline search using dynamic program- ming,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 7, no. 2, pp. 139–154, Mar. 1985.
    [15] O. Veksler, “Stereo correspondence by dynamic programming on a tree,” in Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 2, Jun. 2005, pp. 384–390.
    [16] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 11, pp. 1222–1239, Nov. 2001.
    [17] Y. Boykov and V. Kolmogorov,“An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 9, pp. 1124–1137, Sep. 2004.
    [18] V. Kolmogorov and R. Zabih, “What energy functions can be minimized via graph cuts?” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 2, pp. 147–159, Feb. 2004.
    [19] V. Kolmogorov and R. Zabih, “Computing visual correspondence with occlusions using graph cuts,” in Proc. IEEE Int. Conf. on Computer Vision, vol. 2, Jul. 2001, pp. 508–515.
    [20] J. Sun, N. N. Zheng, and H. Y. Shum, “Stereo matching using belief propagation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 7, pp. 787–800, Jul. 2003.
    [21] M. F. Tappen and W. T. Freeman, “Comparison of graph cuts with belief propagation for stereo, using identical mrf parameters,” in Proc. IEEE Int. Conf. on Computer Vision, vol. 2, Oct. 2003, pp. 900–906.
    [22] L. Hong and G. Chen, “Segment-based stereo matching using graph cuts,” in Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 1, Jun.-Jul. 2004, pp. 74–81.
    [23] W. Daolei and K. B. Lim, “Obtaining depth map from segment-based stereo matching using graph cuts,” Journal of Visual Communication and Image Representation, vol. 22, no. 4, pp. 325–331, May 2011.
    [24] H. Lombaert, Y. Sun, L. Grady, and C. Xu, “A multilevel banded graph cuts method for fast image segmentation,” in Proc. IEEE Int. Conf. on Computer Vision, vol. 1, 2005, pp. 259–265.
    [25] A. Zureiki, M. Devy, and R. Chatila, “Stereo matching using reduced graph cuts,” in Proc. IEEE Int. Conf. on Image Processing, vol. 1, Oct. 2007, pp. 237–240.
    [26] G. Blanchet, A. Buades, B. Coll, J. M. Morel, and B. Rouge, “Fattening free block matching,” Journal of Mathematical Imaging and Vision, vol. 41, no. 1-2, pp. 109–121, Sep. 2011.
    [27] Y. S. Heo, K. M. Lee, and S. U. Lee, “Robust stereo matching using adaptive normalized cross-correlation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 4, pp. 807–822, Apr. 2011.
    [28] Z. Gu, X. Y. Su, Y. K. Liu, and Q. C. Zhang, “Local stereo matching with adaptive support- weight, rank transform and disparity calibration,” Pattern Recognition Letters, vol. 29, no. 9, pp. 1230–1235, Jul. 2008.
    [29] P. C. Li, J. Cheng, R. F. Yuan, and W. C. Zhao, “Robust 3d marker localization using multi- spectrum sequences,” in Proc. Int. Symposium on Advances in Visual Computing, ser. Lecture Notes in Computer Science, vol. 5876, no. 2, Dec., 2009, pp. 529–537.
    [30] G. J. Wen, Z. Q. Wang, S. H. Xia, and D. M. Zhu, “From motion capture data to character animation,” in Proc. ACM symposium on Virtual Reality Software and Technology. ACM, 2006, pp. 165–168.
    [31] K. Maruyama, K. Oka, R. Takase, Y. Kawai, T. Yoshimi, H. Takahashi, and F. Tomita, “3d localization of partially buried object in unstructured environment,” in Proc. Int. Conf. on Pattern Recognition. IEEE, Dec. 2008, pp. 1–4.
    [32] L. Zhang, N. Subramaniam, R. Lin, S. Nayar, and R. Raskar, “Capturing images with sparse informational pixels using projected 3d tags,” in Proc. IEEE Virtual Reality Conf., Mar. 2008, pp. 11–18.
    [33] D.G. Lowe,“Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
    [34] C. R. Huang, C. S. Chen, and P. C. Chung, “Contrast context histogram - an efficient dis- criminating local descriptor for object recognition and image matching,” Pattern Recognition, vol. 41, pp. 3071–3077, Jan. 2008.
    [35] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2nd ed. Cam- bridge University Press, 2004.
    [36] J. Pilet, V. Lepetit, and P. Fua, “Fast non-rigid surface detection, registration and realistic augmentation,” International Journal of Computer Vision, vol. 76, pp. 109–122, Feb. 2008.
    [37] S. Coorg and S. Teller, “Extracting textured vertical facades from controlled close-range imagery,” in Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 1, Jun. 1999, pp. 625–632.
    [38] R. T. Collins, “A space-sweep approach to true multi-image matching,” in Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, Jun. 1996, pp. 358–363.
    [39] J. Koˇseck ́a and W. Zhang, “Extraction, matching, and pose recovery based on dominant rectangular structures,” Computer Vision and Image Understanding, vol. 100, no. 3, pp. 274– 293, 2005.
    [40] A. Y. Yang, K. Huang, S. Rao, W. Hong, and Y. Ma, “Symmetry-based 3d reconstruction from perspective images,” Computer Vision and Image Understanding, vol. 99, no. 2, pp. 210–240, 2005.
    [41] S. Heuel, Uncertain Projective Geometry: Statistical Reasoning for Polyhedral Object Recon- struction, ser. Lecture Notes in Computer Science. Springer, 2004, vol. 3008.
    [42] S. Heuel and W. F ̈orstner, “Matching, reconstructing and grouping 3d lines from multiple views using uncertain projective geometry,” in Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 2, 2001, pp. 517–524.
    [43] M. Pollefeys and et al., “Detailed real-time urban 3d reconstruction from video,” International Journal of Computer Vision, vol. 78, pp. 143–167, 2008.
    [44] N. Cornelis, B. Leibe, K. Cornelis, and L. V. Gool, “3d urban scene modeling integrating recognition and reconstruction,” International Journal of Computer Vision, vol. 78, pp. 121– 141, 2008.
    [45] N. P. Anoop Cherian, Vassilios Morellas, “Accurate 3d ground plane estimation from a single image,” in Proc. IEEE Int. Conf. on Robotics and Automation, May 2009, pp. 2243–2249.
    [46] E. Delage, H. Lee, and A. Y. Ng, “Automatic single-image 3d reconstructions of indoor manhattan world scenes,” Robotics Research: Results of the 12th International Symposium ISRR, vol. 28, pp. 305–321, 2007.
    [47] J. Bauer, K. Karner, and K. Schindler, “Plane parameter estimation by edge set matching,” in Proc. 26th Workshop of the Austrian Association for Pattern Recognition, 2002, pp. 29–36.
    [48] M. I. Lourakis, A. A. Argyros, and S. C. Orphanoudakis, “Detecting planes in an uncalibrated image pair,” in Proc. British Machine Vision Conference. BMVA, 2002.
    [49] T. Collins, Graph Cut Matching In Computer Vision, 2004, [Online]. Available: homepages. inf.ed.ac.uk/rbf/CVonline/.
    [50] H. Mayer, “Analysis of means to improve cooperative disparity estimation,” in Proc. ISPRS Conf. on Photogrammetric Image Analysis, vol. 34, Sep. 2003, pp. 25–31.
    [51] S. Birchfield and C. Tomasi, “Depth discontinuities by pixel-to-pixel stereo,” in Proc. IEEE Int. Conf. on Computer Vision, Jan. 1998, pp. 1073–1080.
    [52] D. Scharstein and R. Szeliski, Middlebury Stereo Vision Page, [Online]. Available: vision. middlebury.edu/stereo/.
    [53] N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Transactions on Systems, Man and Cybernetics, vol. 9, no. 1, pp. 62–66, Jan. 1979.
    [54] X. Y. Xu, S. Z. Xu, L. H. Jin, and E. M. Song, “Characteristic analysis of otsu threshold and its applications,” Pattern Recognition Letters, vol. 32, pp. 956–961, May 2011.
    [55] H. F. Ng, “Automatic thresholding for defect detection,” Pattern Recognition Letters, vol. 27, no. 14, pp. 1644–1649, Oct. 2006.
    [56] Q. X. Yang, R. G. Yang, J. Davis, and D. Nister, “Spatial-depth super resolution for range images,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Jun. 2007, pp. 1–8.
    [57] D. Minv, J. Lu, and M. N. Do, “Depth video enhancement based on weighted mode filtering,” IEEE Transactions on Image Processing, vol. 21, no. 3, pp. 1176–1190, Mar. 2012.
    [58] J. Kopf, M. F. Cohen, D. Lischinski, and M. Uyttendaele, “Joint bilateral upsampling,” ACM Transactions on Graphics, vol. 26, no. 3, p. 96, Jul. 2007.
    [59] J. Canny, “A computational approach to edge detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 8, no. 6, pp. 679–698, Nov. 1986.
    [60] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graph-based image segmentation,” International Journal of Computer Vision, vol. 59, no. 2, pp. 167–181, Sep. 2004.
    [61] K. J. Yoon and I. S. Kweon, “Adaptive support-weight approach for correspondence search,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 650–656, Apr. 2006.
    [62] J. Lu, K. Zhang, G. Lafruit, and F. Catthoor, “Real-time stereo matching: A cross-based local approach,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Apr. 2009, pp. 733–736.
    [63] Z. Y. Zhang, “A flexible new technique for camera calibration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1330–1334, 2000.
    [64] D. F. DeMenthon and L. S. Davis, “Model-based object pose in 25 lines of code,” International Journal of Computer Vision, vol. 15, no. 1-2, pp. 123–141, Jun. 1995.
    [65] D. Chai and K. Ngan, “Locating facial region of a head-and-shoulders color image,” in Proc. IEEE Int. Conf. on Automatic Face and Gesture Recognition, Apr. 1998, pp. 124–129.
    [66] P. Kakumanu, S. Makrogiannis, and N. Bourbakis, “A survey of skin-color modeling and detection methods,” Pattern Recognition, vol. 40, no. 3, pp. 1106–1122, Mar. 2007.
    [67] C. J. Bradley, The Algebra of Geometry: Cartesian, Projective and Areal Co-Ordinates. High-perception Limited, 2007.
    [68] M. Aird, D. Cobbin, and C. Rogers, “A study of the relative precision of acupoint location methods,” Journal of Alternative and Complementary Medicine, vol. 8, no. 5, pp. 635–642, Oct. 2002.
    [69] C. Yin, H. Park, J. Seo, S. Lim, and H. Koh, “An evaluation of the cun measurement system of acupuncture point location,” American Journal of Chinese Medicine, vol. 33, no. 5, pp. 729–735, 2005.
    [70] C. Schmid and A. Zisserman, “Automatic line matching across views,” in Proc. IEEE Com- puter Society Conf. on Computer Vision and Pattern Recognition, Jun. 1997, pp. 666–671.
    [71] H. Bay, V. Ferrari, and L. V. Gool, “Wide-baseline stereo matching with line segments,” in Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 1, Jun. 2005, pp. 329–336.
    [72] Open Source Computer Vision Library, [Online]. Available: sourceforge.net/projects/ opencvlibrary/.
    [73] The Visual Geometry Group, [Online]. Available: www.robots.ox.ac.uk/∼vgg/data/ data-mview.html.

    下載圖示 校內:2015-01-10公開
    校外:2015-01-10公開
    QR CODE