| 研究生: |
呂立宇 Lu, Li-Yu |
|---|---|
| 論文名稱: |
基於人工智慧平面偵測之精準物件置入 AI-based Plane Detection for Precision Object Insertion |
| 指導教授: |
楊家輝
Yang, Jar-Ferr |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 英文 |
| 論文頁數: | 60 |
| 中文關鍵詞: | 平面偵測 、深度學習 、單一影像輸入 、消失點偵測 、卷積類神經網路 |
| 外文關鍵詞: | plane detection, deep learning, single image input, vanishing points, convolutional neural networks |
| 相關次數: | 點閱:194 下載:20 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著AI技術的發展,許多本來需要人工進行的工作得以用AI來取代,傳統的物件置入需要人力去調整物件讓置入的角度和朝向是合理的,後來發展出利用雷射二極體對準目標發射雷射脈衝,然後經目標反射後雷射向各方向散射。部分散射光返回到傳感器接收器,以此來測量畫面中的空間參數給物件置入作為參考。現今的AI平面偵測技術也多數是用於處理雷射回傳的參數達到更準確的空間偵測,但是此種方法的應用將僅限於具備雷射功能的裝置如手機、相機等。如果需要將物件置入於現有的影像上就沒辦法使用雷射資訊,為了解決此問題。本論文提出一基於深度學習之平面偵測和物件置入系統。首先藉由計算出的邊緣圖來加強網絡的邊界資訊,利用卷積神經網路預測出圖片的景深、平面分割和法向量,同時邊緣圖會被用於消失點校正系統,消失點校正系統藉由將邊緣圖轉換成線圖,並使用隨機抽樣一致算法來偵測出平面的消失點,用得出的消失點來校正置入物件的轉向使其和平面邊界對齊。由實驗結果顯示,本論文提出之系統可以更準確的預測單一圖片的平面分割和法向量資訊,並且可以藉由消失點對置入物件進行校正。
With the development of AI technology, many works which originally need human to process can be replaced by AI. Traditional object insertion need human to adjust the angle and direction to fit the screen. People then use laser diode to aim at the target to emit a laser pulse, and then the laser is scattered in all directions after being reflected by the target. They detect the plane parameter with laser information. Most recent research deal with the problems of laser data to get a better detect on plane parameters. However, these methods only can be embedded with equipment that can use laser device like mobile phone and camera. If we have to insert object with only RGB image, these methods are not afford to handle it. To solve this problems, we propose an AI-based 3D object insertion via plane detection system. We first enhance the network's input information with edge map. Then the network predicts depth, plane segmentation and normal. At the same time, the edge map is also used for vanishing point detection. Vanishing point detector first transform the edge map into line map and then detect the vanishing points with RANSAC. We take these vanishing points as reference points to adjust the inserted objects to align the planes. Experimental results show that our system can predict the planes' segmentation and normal more accurately, and we can also adjust the objects with vanishing points.
[1] R. Qi Charles, Hao Su, Mo Kaichun, and Leonidas J. Guibas, "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation," in IEEE Conference on Computer Vision and Pattern Recognition, 2017.
[2] R. Qi Charles, Li Yi, Hao Su, and Leonidas J. Guibas, "PointNet++: deep hierarchical feature learning on point sets in a metric space," in Neural Information Processing Systems, 2017.
[3] R. Qi Charles, Wei Liu, Chenxia Wu, Hao Su, and Leonidas J.Guibas, "Frustum pointnets for 3d object detection from RGB-D data," in IEEE Conference on Computer Vision and Pattern Recognition, 2018.
[4] Chen-Yu Lee, Vijay Badrinarayanan, Tomasz Malisiewicz, and Andrew Rabinovich, "RoomNet: End-to-End Room Layout Estimation," in IEEE International Conference on Computer Vision, 2017.
[5] James M. Coughlan, A. L. Yuille, "The Manhattan world assumption: regularities in scene statistics which enable Bayesian inference," in Neural Information Processing Systems, 2000.
[6] C. Liu, J. Yang, D. Ceylan, E. Yumer, and Y. Furukawa, "PlaneNet: Piece-wise planar reconstruction from a single RGB image," in IEEE Conference on Computer Vision and Pattern Recognition, 2018.
[7] Zehao Yu, Jia Zheng, Dongze Lian, Zihan Zhou, and Shenghua Gao, "Single-image piece-wise planar 3D reconstruction via associative embedding," in IEEE Conference on Computer Vision and Pattern Recognition, 2019.
[8] Rosenblatt, Frank. x, "Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms, " Spartan Books, Washington DC, 1961
[9] Rumelhart, David E., Geoffrey E. Hinton, and R. J. Williams, "Learning Internal Representations by Error Propagation," in Parallel distributed processing: Explorations in the microstructure of cognition, Volume 1: Foundations. MIT Press, 1986.
[10] T. Kanade and M. Okutomi, "A stereo matching algorithm with an adaptive window: theory and experiment," in IEEE Transactions on Pattern Analysis and Machine Intelligence, 1994.
[11] J. Sun, N. Zheng and H. Shum, "Stereo matching using belief propagation," in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003.
[12] Fisher Yu, Vladlen Koltun, and Thomas Funkhouser, "Dilated Residual Networks," in IEEE Conference on Computer Vision and Pattern Recognition, 2017.
[13] John D. Lafferty, Andrew McCallum, Fernando C. N. Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data," in International Conference on Machine Learning, 2001.
[14] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. H. Torr., "Conditional random fields as recurrent neural networks," In IEEE International Conference on Computer Vision, 2015.
[15] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation," in Medical Image Computing and Computer-Assisted Intervention, Springer, LNCS, 2015.
[16] Özgün Çiçek, Ahmed Abdulkadir, S. Lienkamp, Thomas Brox, and Olaf Ronneberger, "3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation," in Medical Image Computing and Computer-Assisted Intervention, Springer, LNCS, 2016.
[17] S. Hochreiter, "Untersuchungen zu dynamischen neuronalen Netzen," Diploma thesis, Institut f. Informatik, Technische Univ. Munich, 1991.
[18] John F. Kolen and Stefan C. Kremer, "Gradient Flow in Recurrent Nets: The Difficulty of Learning LongTerm Dependencies," in A Field Guide to Dynamical Recurrent Networks, 2001.
[19] Jun Fu, Jing Liu, Jie Jiang, Yong Li, Yongjun Bao, and Hanqing Lu, "Scene Segmentation With Dual Relation-Aware Attention Network" in IEEE Transactions on Neural Networks and Learning Systems, 2021.
[20] C.H. Chou, Y. Zhao, and H. Tai, "Vanishing-point detection based on a fuzzy clustering algorithm and new clustering validity measure," 淡江理工學刊18.2 (2015): 105-116.
[21] Chaudhury, Krishnendu, Stephen DiVerdi, and Sergey Ioffe, "Auto-rectification of user photos," in IEEE International Conference on Image Processing, 2014.
[22] Florian Kluger, Hanno Ackermann, Michael Ying Yang, Bodo Rosenhahn, "Deep Learning for Vanishing Point Detection Using an Inverse Gnomonic Projection," in German Conference on Pattern Recognition, 2017.
[23] Menghua Zhai, Scott Workman, Nathan Jacobs, "Detecting Vanishing Points using Global Image Context in a Non-Manhattan World," in IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[24] Chin-Kai Chang, Jiaping Zhao, Laurent Itti, "DeepVP: Deep Learning for Vanishing Point Detection on 1 Million Street View Images," in IEEE International Conference on Robotics and Automation, 2018.
[25] Canny, J., "A Computational Approach to Edge Detection," in IEEE Transaction Pattern Analysis and Machine Intelligence, 1986, (8): 679–714.
[26] P. E. Hart, "How the hough transform was invented [DSP history]," in IEEE Signal Processing Magazine, vol. 26, no. 6, pp. 18-22, November 2009, doi: 10.1109/MSP.2009.934181.
[27] Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," in Neural Information Processing Systems, 2012.
[28] Nobuyuki Otsu, "A threshold selection method from gray-level histograms," in IEEE Transactions on Systems, Man, and Cybernetics publication information, 1979.
[29] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, ”Deep residual learning for image recognition,” in IEEE Computer Vision and Pattern Recognition, 2016.
[30] Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner, "Scannet: Richly-annotated 3d reconstructions of indoor scenes," in IEEE Computer Vision and Pattern Recognition, 2017.
[31] Fischler and Robert C. Bolles, "Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography," in Communications of the ACM, 1981.
[32] Diederik P. Kingma and Jimmy Ba., "Adam: A method for stochastic optimization," in International Conference on Learning Representations, 2015.
[33] Sergi Caelles, Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Laura Leal-Taix´e, Daniel Cremers, and Luc Van Gool, "Oneshot video object segmentation," in Computer Vision and Pattern Recognition, 2017.
[34] Bert De Brabandere, Davy Neven, and Luc Van Gool, "Semantic instance segmentation with a discriminative loss function," in Clinical Orthopaedics and Related Research, 2017.
[35] Fengting Yang and Zihan Zhou, "Recovering 3d planes from a single image via convolutional neural networks," in European Conference on Computer Vision, 2018.
[36] George Seif and Dimitrios Androutsos, "Edge-Based Loss Function for Single Image Super-Resolution," in IEEE International Conference on Acoustics, Speech and Signal Processing, 2018.
[37] Abhishek Chaurasia, Sangpil Kim, and Eugenio Culurciello, "ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation," in International Conference on Learning Representations, 2017.