簡易檢索 / 詳目顯示

研究生: 李宛臻
Li, Wan-Jen
論文名稱: 使用RGB-D Faster R-CNN實現堆疊物體偵測分類的夾取系統
A Pile of Objects Detection, Classification for Grasping System Using RGB-D Faster R-CNN
指導教授: 郭淑美
Guo, Shu-Mei
共同指導教授: 連震杰
Lien, Jenn-Jier
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 71
中文關鍵詞: 堆疊物體深度學習RGB-D機械手臂
外文關鍵詞: A pile of Object, Deep Learning, RGB-D, Robot Arm
相關次數: 點閱:137下載:12
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 為了降低工資上漲帶來的生產成本,加工製造業以自動化生產設備取代傳統人力,進行料件加工或是品質檢測。但是設備間的給料仍為挑戰。為取代產線中使用人力擺放料件或使用震動餵料機將堆疊的半成品搖散再送入設備加工的環節,增加靈活性與降低成本,我們設計出一個基於機器視覺的堆疊料件機械手臂取放系統:結合RGB-D攝影機、電腦視覺演算法與機械手臂,進行堆疊物體的分類、偵測並夾取。本文共提供三種演算法,第一種為使用傳統電腦視覺方法分析彩色與深度影像完成位置偵測與分類物件;第二種RGB-D Faster R-CNN為改良原始的Faster R-CNN,使其可輸入彩色影像與原始深度影像並融合兩種型態的特徵以得到更精確的分類與2D偵測結果,再以物體位置與深度結合進行夾取。第三,我們進一步將2D RGB-D Faster R-CNN偵測網絡修改為3D RGB-D Faster R-CNN,使其能直接輸出物件的3D位置資訊。

    In order to reduce the production cost due to the rising wages, the manufacturing industry replaces the traditional manpower with automated production equipment to perform material processing or quality inspection. However, the feeding among machines is still a challenge. The goal of this study is to use a robot arm system to replace the part of loading materials manuually or shaking and flattening the piled semi-finished products by the vibratory feeder, resulting in an increase in the flexibility and a decrease in the cost. In this thesis, a RGB-D sensor, robot arm, and computer vision algorithms are combined to form a system for classifying, detecting and grasping objects in the pile. Three methods are applied. The first method is the detection and classification by analyzing the color and depth images using traditional computer vision methods. Next, the original Faster R-CNN is improved to multi-modal inputs which are RGB image and raw depth map. The RGB-D Faster R-CNN model has a precise detection result with fused RGB-D feature. Finally, the RGB-D Faster R-CNN is further modified to 3D RGB-D Faster R-CNN, which outputs 3D information of object directly.

    摘要 I Abstract II 誌謝 III Content V Content of Figure VII Content of Table X Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Related Work 6 1.3 Contribution 9 1.4 Organization of Thesis 11 Chapter 2 System Specification and Function 12 2.1 Hardware Specification 14 2.2 Function 18 Chapter 3 2.5D Pile of Objects Detection Using Slicing Depth Map 20 3.1 Sliced Depth Map 21 3.2 Screw Detection Using Contour and Color 22 3.3 Grasping target 24 Chapter 4 2.5D Pile of Objects Detection Using RGB-D Faster R-CNN 25 4.1 Framework of RGB-D Faster R-CNN 26 4.2 Data Preparation 28 4.3 Global Feature Extraction and Fusion 31 4.4 Region Proposal 33 4.5 BBox Proposal Classification and Refinement 39 4.6 Loss Function and Topmost Object Detection 42 Chapter 5 3D Object Detection Using 3D RGB-D Faster R-CNN 46 5.1 Framework of 3D RGB-D Faster R-CNN 48 5.2 3D Proposal Initialization 50 5.3 3D BBox Proposal Classification and Regression 51 Chapter 6 Experimental Results 53 6.1 Data Collection and Evaluation Metrics 53 6.2 Experimental Result of 2.5D Object Detection Using Slicing Depth Map 60 6.3 Experimental Result of 2.5D Object Detection Using RGB-D Faster R-CNN 61 6.4 Experimental Result of 3D Object Detection 66 Chapter 7 Conclusion, Discussion and Future Works 68 Reference 69

    D. Lowe, "Object recognition from local scale-invariant features," in ICCV, 1995.
    S. Ren, K. He, R. Girshick, and J. Sun,, "Faster R-CNN: Towards real-time object detection with region proposal networks," in NIPS, 2015.
    Gupta S, Girshick R, Arbelaez P and Malik J, " Learning rich features from RGB-D images for object detection and segmentation," in European Conference on Computer Vision (ECCV), 2014.
    K.S. Chahal, K. Dey, "A survey of modern object detection literature using deep learning," arXiv, 2018.
    R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
    J. R. Uijlings, K. E. van de Sande, T. Gevers, and A. W. Smeulders, "Selective search for object recognition," International Journal of Computer Vision, pp. 154-171, 2013.
    R. Girshick, "Fast R-CNN," in The IEEE International Conference on Computer Vision (ICCV), 2015.
    Schwarz, M., Milan, A., Periyasamy, A. S., & Behnke, S., "RGB-D object detection and semantic segmentation for autonomous manipulation in clutter.," The International Journal of Robotics Research, pp. 437-451, 2018.
    Johnson J, Karpathy A and Fei-Fei L, "DenseCap: Fully Convolutional Localization Networks for Dense Captioning," in IEEE Conference on Computer Vision and Pattern Recognition (VCPR), 2016.
    Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R and LeCun, "OverFeat: Integrated recognition, localization and," arXiv, 2016.
    Husain F, Schulz H, Dellen B, Torras C and Behnke S, "Combining semantic and geometric features for object class," IEEE Robotics and Automation, pp. 49-55, 2016.
    Gupta S, Hoffman J and Malik J, "Cross modal distillation for supervision transfer," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    Deng, Z., & Latecki, L. J. , "Amodal Detection of 3D Objects: Inferring 3D Bounding Boxes from 2D Ones in RGB-Depth Images," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    Rahman, M. M., Tan, Y., Xue, J., Shao, L., & Lu, K, "3D object detection: Learning 3D bounding boxes from scaled down 2D bounding boxes in RGB-D images," Information Sciences, pp. 147-158, 2019.
    YASKAWA, FS100 Instructions, 2012.
    YASKAWA, FS100 Operator's Manual, 2014.
    YASKAWA, FS100 Options Instructions, 2014.
    YASKAWA, Motoman MH5LF Robot, 2013.
    T. Ophoff, K. Van Beeck, and T. Goedem,“Exploring RGB+Depth fusion for real-time object detection,” Sensors, 2019.
    N. Silberman, D. Hoiem, P. Kohli, R. Fergus, "Indoor Segmentation and Support Inference from RGBD Images," in European Conference on Computer Vision (ECCV), 2012.
    J. Fredebon., "The role of instructions and familiar size in," Perception & Psychophysics, pp. 344-354, 1992.
    A. Kar, S. Tulsiani, J. Carreira, and J. Malik., "Amodal completion and size constancy in natural scenes," in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015.

    下載圖示 校內:2021-07-04公開
    校外:2021-07-04公開
    QR CODE