| 研究生: |
林宜謙 Lin, Yi-Chian |
|---|---|
| 論文名稱: |
使用Depth-Based Mask R-CNN實現堆疊物體面分割及法向量估計之夾取系統 Facet Segmentation and Normal Direction Estimation for a Pile of Objects Grasping System Using Depth-Based Mask R-CNN |
| 指導教授: |
郭淑美
Guo, Shu-Mei |
| 共同指導教授: |
連震杰
Lien, Jenn-Jier James |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 英文 |
| 論文頁數: | 75 |
| 中文關鍵詞: | 深度影像 、深度學習 、機器手臂 、堆疊物體 、物體切割 、遮罩 |
| 外文關鍵詞: | Depth Image, Deep Learning, Robot Arm, Piles, Object Segmentation, Mask |
| 相關次數: | 點閱:109 下載:16 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在機器人領域,物體的切割 (Segmentation)和夾取是一個很經典的問題。要精確地夾取一個物體必須將之先從所有物體分割出來,判斷出物體所在位置、計算出夾取角度才可操控機器手臂去做夾取。過去幾年所提出的方法都是將每個物體個別切出後去計算夾角,但這種方式有可能會夾到不易夾取之堆疊物體區域。為解決上述問題,本文基於深度卷積神經網路切割出堆疊物體的每個面,如此就能確保夾到的是物體的一個面,而不是尖端或難以夾取之區域。隨後利用PCA (Principle Component Analysis)找出物體面之中心點、主軸、次軸以及法向量,如此就能控制機器手臂沿著物體面之法向量做夾取的動作。其中,為了訓練深度卷積神經網路,我們在每個資料集上約花50個小時收集與標記物體的每個面,神經網路訓練時間耗時約27小時。我們選擇將深度影像 (depth image)做為深度卷積神經網路之輸入,並用深度影像訓練,如此一來神經網路可學習到一個面應當有連續地、平滑地深度值變化,若有深度不連續或深度值變化劇烈的情形,則可能為背景或另一個面,利用此方式可達到高精確度之物體面切割。
In the field of robots, object segmentation and grasping is a classic problem. In order to grasp an object, first, we need to separate one object from all objects. Then we calculate the position and angle of the object to control the robot arm to grasp. The method proposed in the past few years is to segment each object then calculate the angle. However, this method may hit the area of piled objects that is difficult to grip. In order to solve the above problems, this paper proposes segmentation on each object facet based on deep convolutional neural network (CNN). This ensures one facet of an object is gripped rather than the tip or the area that is hard to grasp. We use PCA (Principle Component Analysis) to find the center point, principle axis, secondary axis and normal direction of an object facet so that we can control the robot arm to grasp. In order to train the convolutional neural network, we cost 50 hours to collect and label the object facets on each dataset. Training the network cost 27 hours. We choose to use depth image as the input of the deep convolutional neural network. By using depth image, the network can learn a facet should contain smooth and continuous depth value change. If there is a situation where the depth value is discontinuous or the depth value changes drastically, it may be the background or the other facet. This method can achieve high-performance object facet segmentation.
[1] Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, “Region-based Convolutional Networks for Accurate Object Detection and Segmentation “, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
[2] Ross Girshick, “Fast R-CNN “, ICCV 2015
[3] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks”, NIPS, 2015.
[4] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN “. arxiv preprint arxiv: 170306870, 2017.
[5] M. Danielczuk et al., “Segmenting unknown 3D objects from real depth images using mask R-CNN trained on synthetic point clouds,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[6] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. Reed, “SSD: Single shot multibox detector”, ECCV, 2016.
[7] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement”, arXiv preprint arXiv:1804.02767, 2018.
[8] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition”, CVPR, 2016.
[9] T. Lin, P. Goyal, R. Girshick, K. He, P. Dollar, “Focal Loss for Dense Object Detection”, ICCV, 2017.
[10] T. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie “Feature pyramid networks for object detection”, CVPR, 2017.
[11] T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, “Microsoft COCO: Common objects in context”, ECCV, 2014.
[12] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation”, CVPR, 2014.
[13] J. Hosang, R. Benenson, P. Dollar, and B. Schiele. “What makes for effective detection proposals?” arXiv:1502.05082, 2015.
[14] D. Katz, A. Venkatraman, M. Kazemi, J. A. Bagnell, and A. Stentz. “Perceiving, learning, and exploiting object affordances for autonomous pile manipulation. Autonomous Robots”, 37(4), 2014.
[15] Dov Katz, Moslem Kazemi, J. Andrew Bagnell and Anthony Stentz, “Clearing a Pile of Unknown Objects using Interactive Perception”, ICRA, 2013
[16] B. B. M. R. Oliveira, Manuel M and Y.-S. Chang, “Fast digital image inpainting,” in Appeared in the Proceedings of the International Conference on Visualization, Imaging and Image Processing (VIIP 2001), Marbella, Spain, 2001, pp. 106–107.
[17] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
[18] A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
[19] YASKAWA, "FS100 Instructions," pp. 8.21-8.42, 2012.
[20] YASKAWA, "Motorman MH5LF Robot," 2013.
[21] YASKAWA, "FS100 Operator’s Manual," pp. 2.5-2.15, 2014.
[22] YASKAWA, "FS100 Options Instructions," pp. 4.1-4.6, 2014.
[23] Z. Zhang, "A Flexible New Technique for Camera Calibration," IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol.22, No.11, pp.1330-1334, 2000.
[24] Z. Zhang, "Flexible Camera Calibration by Viewing a Plane from Unknown Orientations," International Conference on Computer Vision, pp.666-673, 1999.