研究生: |
林超 Lin, Chao |
---|---|
論文名稱: |
藉由深度卷積神經網路來分類物體和估計夾取角度於機器手臂操作 Object Classification and Pick Angle Estimation Using Deep Convolutional Neural Networks for Robot Arm Operation |
指導教授: |
連震杰
Lien, Jenn-Jier James |
共同指導教授: |
郭淑美
Guo, Shu-Mei |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 英文 |
論文頁數: | 87 |
中文關鍵詞: | 機械手臂 、機械手臂操作 、深度卷積神經網路 、級聯型深度卷積神經網路 、視覺定位 、視覺分類 |
外文關鍵詞: | Manipulator, Manipulator operation, Deep convolution neural network, Cascade deep convolution neural network, Visual positioning |
相關次數: | 點閱:162 下載:8 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在機器人領域,物體的操作(Operation)是一個經典的問題。正確的操作物體需要判斷物體所處的位置、物體便於夾取角度和物體所屬的類別。傳統電腦視覺的判斷演算法,存在泛用性較差、魯棒性(robust)較低和計算成本較高等問題,無法滿足多樣複雜的物體操作需求。為解決上述問題,本文基於深度卷積神經網路,提出一種通用的物體取放框架(framework),其包含資料自動收集,深度卷積神經網路訓練,深度卷積神經網路測試三個部分,能夠較為高效的完成物體的操作任務。其中,單個物體資料收集時間約2.5小時,神經網路訓練時間約6小時。訓練完成的神經網路模型在測試中總體分類準確率達到100%,夾取準確率達到94.8%,並且可以隨著資料自動收集數量的增加而進一步提升。另外,在神經網路測試中,本文提出了級聯型(cascaded)網路架構加速運算,在不會影響夾取和分類準確率的前提下,將一次有效的運算時間從單一網路的1.53秒加速到0.65秒。
In the field of robots, the operation of objects is a classic problem. Correct manipulation of objects requires judging the position of objects, the angle at which objects are clipped, and the type of objects they belong to. Traditional computer vision judgment algorithms have some problems, such as poor generality, low robustness and high computational cost, which can not meet the needs of complex object operation. In order to solve the above problems, this paper proposes a general object retrieval framework based on deep convolution neural network, which includes three parts: automatic data collection, deep convolution neural network training, and deep convolution neural network testing. It can accomplish the task of object operation more efficiently. Among them, the collection time of single object data is about 2.5 hours, and the training time of neural network is about 6 hours. The total classification accuracy and pinching accuracy of the trained neural network model are 100% and 94.8% respectively, and can be further improved with the increase of the number of data automatically collected. In addition, in the test of neural network, this paper proposes cascaded network architecture to accelerate the operation, which can accelerate the effective operation time from 1.53 seconds of a single network to 0.65 seconds without affecting the pinch and classification accuracy.
[1] B. Bidanda, S. Motavalli, and K. Harding, "Reverse Engineering: An Evaluation of Prospective Non-contact Technologies and Applications in Manufacturing Systems," International Journal of Computer Integrated Manufacturing, Vol. 4, No. 3, pp. 145-156, 1991.
[2] G. Bradski, "Computer Vision Face Tracking for Use in A Perceptual User Interface," IEEE Workshop Applications of Computer Vision, pp. 790-799, 1998.
[3] R. Girshick, "Fast R-CNN," Proceedings of the IEEE International Conference on Computer Vision, pp. 1440-1448, 2015.
[4] R. Girshick, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587, 2014.
[5] K. He, "Deep Residual Learning for Image Recognition," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
[6] K. He, "Delving Deep into Rectifiers: Surpassing Human-Level Performance on Imagenet Classification." Proceedings of the IEEE International Conference on Computer Vision. 2015.
[7] K. He, "Mask R-CNN," Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, pp. 2980-2988, 2017.
[8] A. G. Howard, "Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications," arXiv preprint arXiv:1704.04861, 2017.
[9] A. Krizhevsky, S. Ilya and G. E. Hinton. "Imagenet Classification with Deep Convolutional Neural Networks," Advances in Neural Information Processing Systems, pp. 1097-1105, 2012.
[10] Y. LeCun, "Gradient-Based Learning Applied to Document Recognition," Proceedings of the IEEE, pp.2278-2324, 1998.
[11] M. Lin, C. Qiang, and Y. Shuicheng, "Network in Network," arXiv preprint arXiv:1312.4400, 2013.
[12] L. Pinto and G. Abhinav, "Supersizing Self-Supervision: Learning to Grasp from 50k Tries and 700 Robot Hours," Robotics and Automation (ICRA), 2016 IEEE International Conference on IEEE, pp. 3406-3413, 2016.
[13] S. Ren, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," Advances in Neural Information Processing Systems, pp. 91-99, 2015.
[14] D. E. Rumelhart, G. E. Hinton and R. J. Williams, "Learning Internal Representations by Error Propagation," California Univ San Diego La Jolla Inst for Cognitive Science, No. ICS-8506, 1985.
[15] M. Sandler, "Mobilenetv2: Inverted Residuals and Linear Bottlenecks," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, arXiv preprint arXiv:1801.04381, 2018.
[16] K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv preprint arXiv:1409.1556, 2014.
[17] C. Szegedy, "Going Deeper with Convolutions," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, 2015.
[18] A. T. Miller and P. K. Allen, " Graspit! A Versatile Simulator for
Robotic Grasping," Robotics & Automation Magazine, IEEE, Vol.11, no. 4, pp.110-122, 2004.
[19] YASKAWA, "FS100 Instructions," pp. 8.21-8.42, 2012.
[20] YASKAWA, "Motoman MH5LF Robot," 2013.
[21] YASKAWA, "FS100 Operator’s Manual," pp. 2.5-2.15, 2014.
[22] YASKAWA, "FS100 Options Instructions," pp. 4.1-4.6, 2014.
[23] Z. Zhang, "A Flexible New Technique for Camera Calibration," IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol.22, no.11, pp.1330-1334, 2000.
[24] Z. Zhang, "Flexible Camera Calibration by Viewing a Plane from Unknown Orientations," International Conference on Computer Vision, pp.666-673, 1999.