簡易檢索 / 詳目顯示

研究生: 呂冠儒
Lu, Kuan-Ju
論文名稱: 使用多階深度卷積神經網路估算二維影像之物體三維位置
3D Object Point Estimation by 2D Image Using Multi-Stage Deep Convolutional Neural Networks
指導教授: 連震杰
Lien, Jenn-Jier James
共同指導教授: 郭淑美
Guo, Shu-Mei
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 94
中文關鍵詞: 機械手臂深度卷積神經網路多階深度卷積神經網路物體三維位置估算
外文關鍵詞: Robot arm, Deep convolutional neural networks, Multi-stage deep convolutional neural networks, 3D object point estimation
相關次數: 點閱:94下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在攝影機之視覺系統中,藉由影像得到真實世界的物體三維資訊有許多應用,卷積神經網路(Convolutional Neural Networks, CNN)是一種深度學習(Deep Learning)的技術,可以從二維影像估算物體三維位置,只需要一台攝影機且估算時間達到即時(Real-Time)的18毫秒。本論文提出多階深度卷積神經網路架構,其特色有:1. 裁減層(Crop Layer)使神經網路不需要藉由下採樣(Down-Sampling)降低影像解析度以符合固定輸入之影像大小;2. 多階段三維位置的修正(Multi-Stage 3D Point Refinement)透過連接多個階段的深度卷積神經網路,每個階段會估算並更精確地修正物體三維位置。本論文將此架構應用在視覺導引機械手臂上,並使用機械手臂自動收集物體二維影像與其對應的三維位置;在實驗結果中,多階深度卷積神經網路估算物體三維位置與深度卷積神經網路之架構相比,估算誤差從2.162毫米減少至1.569毫米,減少約30%的估算誤差。

    In the vision system of a camera, there are many applications for obtaining the three-dimensional position from images. Convolutional neural networks (CNN) is a kind of deep learning technology. It can be used to estimate the three-dimensional position of an object from a 2D image and needs only one camera. The estimated time is 18 milliseconds in real time. This thesis proposes multi-stage deep convolutional neural networks architecture and the following are two characteristics: 1. Crop layer eliminates the need for the neural network to reduce the image resolution by down-sampling to match the fixed image size. 2. Multi-stage 3D point refinement is to connect multiple stages of deep convolutional neural networks. Each stage estimates the object's three-dimensional position and corrects it more accurately. This thesis applies this architecture to the vision-based robot arm and controls the robot arm to automatically collect the 2D image of the object and its corresponding three-dimensional position. In the result of the experiment, the estimation error is reduced from 2.162 mm by using deep convolutional neural networks to 1.569 mm by using multi-stage deep convolutional neural network. It reduces the estimation error by about 30%.

    摘要 I Abstract II 誌謝 III Table of Contents V List of Figures VII List of Tables X Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Related Works 3 1.3 Contribution 6 Chapter 2 Visual-Based Robot Arm System 8 2.1 System Setup and Framework 9 2.2 Convolutional Neural Networks-Based Visual Sub-System 12 2.3 Robot Arm Sub-System 18 2.4 Self-Calibration between Robot Arm Sub-System and Visual Sub-System 24 2.5 Data Collection 27 Chapter 3 CNN-Based 3D Point Estimation Using One 2D Camera 34 3.1 CNN Architecture 36 3.2 Training Process Using Gradient Descent Optimization 45 3.3 Test Process for 3D Point Estimation 52 Chapter 4 Multi-Stage CNN-Based 3D Point Refinement Using One 2D Camera 55 4.1 Multi-Stage CNN Architecture 58 4.2 Training Process Using Gradient Descent Optimization 65 4.3 Test Process for 3D Point Refinement 69 Chapter 5 Experimental Results 72 5.1 CNN-Based 3D Point Estimation 74 5.2 Multi-Stage CNN-Based 3D Point Refinement 83 Chapter 6 Conclusion and Future Works 90 Reference 92

    [1] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and F.F. Li, “ImageNet: A Large-Scale Hierarchical Image Database,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.248–255, 2009.
    [2] J.C. Duchi, E. Hazan, and Y. Singer, “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization,” Journal of Machine Learning Research, Vol.12, pp.2121-2159, 2011.
    [3] M. Halioua, R.S. Krishnamurthy, H.C. Liu, and F.P. Chiang, “Automated 360 Degree Profilometry of 3-D Diffuse Objects,” Applied Optics, Vol.24, pp.2193-2196, 1985.
    [4] K. He, X. Zhang, S. Ren, and J. Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,” In International Conference on Computer Vision (ICCV), 2015.
    [5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770–778, 2016.
    [6] Q.M. Huang, “Automatically Visual-Based Robot Arm Calibration and Pick and Place for Motion Target,” National Cheng Kung University, Robotics Lab. Thesis, 2017.
    [7] A. Krizhevsky, I. Sutskever, and G.E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Advances in Neural Information Processing Systems (NIPS) 25, pp.1097-1105, 2012.
    [8] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE, Vol.86, No.11, pp.2278–2324, 1998.
    [9] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-End Training of Deep Visuomotor Policies,” Journal of Machine Learning Research (JMLR), Vol.17, No.39, pp.1-40, 2016.
    [10] T.P. Lillicrap, J.J. Hunt, A. Pritzel, N.M.O. Heess, T. Erez, Y. Tassa, D. Silver, and D.P. Wierstra, “Continuous Control with Deep Reinforcement Learning,” International Conference on Learning Representations (ICLR), 2016.
    [11] P.F. Luo, Y.J. Chao, and M.A. Sutton, “Application of Stereo Vision to Three-Dimensional Deformation Analysis in Fracture Experiments,” Optical Engineering, Vol.33, pp.981–990, 1994.
    [12] J.S. Massa, G.S. Buller, A.C. Walker, S. Cova, M. Umasuthan, and A.M. Wallace, “Time-of-Flight Optical Ranging System Based on Time-Correlated Single-Photon Counting,” Applied Optics, Vol.37, pp.7298-7304, 1998.
    [13] D. Pagliari and L. Pinto, “Calibration of Kinect for Xbox One and Comparison between the Two Generations of Microsoft Sensors,” Sensors, Vol.15, No.11, pp.27569-27589, 2015.
    [14] L. Pinto and A. Gupta, “Supersizing Self-Supervision: Learning to Grasp from 50k Tries and 700 Robot Hours,” IEEE International Conference on Robotics and Automation, pp. 3406-3413, 2015.
    [15] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” International Conference on Learning Representations (ICLR), 2015.
    [16] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S.E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going Deeper with Convolutions,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1–9, 2015.
    [17] A. Toshev and C. Szegedy,. “DeepPose: Human Pose Estimation via Deep Neural Networks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1653–1660, 2014.
    [18] S.E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, “Convolutional Pose Machines,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4724-32, 2016.
    [19] YASKAWA, “FS100 Instructions,” pp.8.21-8.42, 2012.
    [20] YASKAWA, “Motoman MH5LF Robot,” 2013.
    [21] YASKAWA, “FS100 Operator’s Manual,” pp.2.5-2.15, 2014.
    [22] YASKAWA, “FS100 Options Instructions Programmer Manul for New Language Environment MotoPlus,” pp.4.1-4.6, 2014.
    [23] Z. Zhang, “A Flexible New Technique for Camera Calibration,” IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol.22, no.11, pp.1330-1334, 2000.

    下載圖示 校內:2023-07-17公開
    校外:2023-07-17公開
    QR CODE