| 研究生: |
呂冠儒 Lu, Kuan-Ju |
|---|---|
| 論文名稱: |
使用多階深度卷積神經網路估算二維影像之物體三維位置 3D Object Point Estimation by 2D Image Using Multi-Stage Deep Convolutional Neural Networks |
| 指導教授: |
連震杰
Lien, Jenn-Jier James |
| 共同指導教授: |
郭淑美
Guo, Shu-Mei |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 英文 |
| 論文頁數: | 94 |
| 中文關鍵詞: | 機械手臂 、深度卷積神經網路 、多階深度卷積神經網路 、物體三維位置估算 |
| 外文關鍵詞: | Robot arm, Deep convolutional neural networks, Multi-stage deep convolutional neural networks, 3D object point estimation |
| 相關次數: | 點閱:94 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在攝影機之視覺系統中,藉由影像得到真實世界的物體三維資訊有許多應用,卷積神經網路(Convolutional Neural Networks, CNN)是一種深度學習(Deep Learning)的技術,可以從二維影像估算物體三維位置,只需要一台攝影機且估算時間達到即時(Real-Time)的18毫秒。本論文提出多階深度卷積神經網路架構,其特色有:1. 裁減層(Crop Layer)使神經網路不需要藉由下採樣(Down-Sampling)降低影像解析度以符合固定輸入之影像大小;2. 多階段三維位置的修正(Multi-Stage 3D Point Refinement)透過連接多個階段的深度卷積神經網路,每個階段會估算並更精確地修正物體三維位置。本論文將此架構應用在視覺導引機械手臂上,並使用機械手臂自動收集物體二維影像與其對應的三維位置;在實驗結果中,多階深度卷積神經網路估算物體三維位置與深度卷積神經網路之架構相比,估算誤差從2.162毫米減少至1.569毫米,減少約30%的估算誤差。
In the vision system of a camera, there are many applications for obtaining the three-dimensional position from images. Convolutional neural networks (CNN) is a kind of deep learning technology. It can be used to estimate the three-dimensional position of an object from a 2D image and needs only one camera. The estimated time is 18 milliseconds in real time. This thesis proposes multi-stage deep convolutional neural networks architecture and the following are two characteristics: 1. Crop layer eliminates the need for the neural network to reduce the image resolution by down-sampling to match the fixed image size. 2. Multi-stage 3D point refinement is to connect multiple stages of deep convolutional neural networks. Each stage estimates the object's three-dimensional position and corrects it more accurately. This thesis applies this architecture to the vision-based robot arm and controls the robot arm to automatically collect the 2D image of the object and its corresponding three-dimensional position. In the result of the experiment, the estimation error is reduced from 2.162 mm by using deep convolutional neural networks to 1.569 mm by using multi-stage deep convolutional neural network. It reduces the estimation error by about 30%.
[1] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and F.F. Li, “ImageNet: A Large-Scale Hierarchical Image Database,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.248–255, 2009.
[2] J.C. Duchi, E. Hazan, and Y. Singer, “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization,” Journal of Machine Learning Research, Vol.12, pp.2121-2159, 2011.
[3] M. Halioua, R.S. Krishnamurthy, H.C. Liu, and F.P. Chiang, “Automated 360 Degree Profilometry of 3-D Diffuse Objects,” Applied Optics, Vol.24, pp.2193-2196, 1985.
[4] K. He, X. Zhang, S. Ren, and J. Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,” In International Conference on Computer Vision (ICCV), 2015.
[5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770–778, 2016.
[6] Q.M. Huang, “Automatically Visual-Based Robot Arm Calibration and Pick and Place for Motion Target,” National Cheng Kung University, Robotics Lab. Thesis, 2017.
[7] A. Krizhevsky, I. Sutskever, and G.E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Advances in Neural Information Processing Systems (NIPS) 25, pp.1097-1105, 2012.
[8] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE, Vol.86, No.11, pp.2278–2324, 1998.
[9] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-End Training of Deep Visuomotor Policies,” Journal of Machine Learning Research (JMLR), Vol.17, No.39, pp.1-40, 2016.
[10] T.P. Lillicrap, J.J. Hunt, A. Pritzel, N.M.O. Heess, T. Erez, Y. Tassa, D. Silver, and D.P. Wierstra, “Continuous Control with Deep Reinforcement Learning,” International Conference on Learning Representations (ICLR), 2016.
[11] P.F. Luo, Y.J. Chao, and M.A. Sutton, “Application of Stereo Vision to Three-Dimensional Deformation Analysis in Fracture Experiments,” Optical Engineering, Vol.33, pp.981–990, 1994.
[12] J.S. Massa, G.S. Buller, A.C. Walker, S. Cova, M. Umasuthan, and A.M. Wallace, “Time-of-Flight Optical Ranging System Based on Time-Correlated Single-Photon Counting,” Applied Optics, Vol.37, pp.7298-7304, 1998.
[13] D. Pagliari and L. Pinto, “Calibration of Kinect for Xbox One and Comparison between the Two Generations of Microsoft Sensors,” Sensors, Vol.15, No.11, pp.27569-27589, 2015.
[14] L. Pinto and A. Gupta, “Supersizing Self-Supervision: Learning to Grasp from 50k Tries and 700 Robot Hours,” IEEE International Conference on Robotics and Automation, pp. 3406-3413, 2015.
[15] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” International Conference on Learning Representations (ICLR), 2015.
[16] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S.E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going Deeper with Convolutions,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1–9, 2015.
[17] A. Toshev and C. Szegedy,. “DeepPose: Human Pose Estimation via Deep Neural Networks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1653–1660, 2014.
[18] S.E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, “Convolutional Pose Machines,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4724-32, 2016.
[19] YASKAWA, “FS100 Instructions,” pp.8.21-8.42, 2012.
[20] YASKAWA, “Motoman MH5LF Robot,” 2013.
[21] YASKAWA, “FS100 Operator’s Manual,” pp.2.5-2.15, 2014.
[22] YASKAWA, “FS100 Options Instructions Programmer Manul for New Language Environment MotoPlus,” pp.4.1-4.6, 2014.
[23] Z. Zhang, “A Flexible New Technique for Camera Calibration,” IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol.22, no.11, pp.1330-1334, 2000.