簡易檢索 / 詳目顯示

研究生: 陳明心
Chen, Ming-Hsin
論文名稱: 基於YOLOv7按鈕偵測法及其實現於服務型機器人之自主搭乘電梯任務
A YOLOv7-Based Method for Detecting Buttons in Service Robots during Autonomous Elevator-Taking Tasks
指導教授: 李祖聖
Li, Tzuu-Hseng, S.
學位類別: 碩士
Master
系所名稱: 敏求智慧運算學院 - 智慧科技系統碩士學位學程
MS Degree Program on Intelligent Technology Systems
論文出版年: 2023
畢業學年度: 112
語文別: 英文
論文頁數: 64
中文關鍵詞: 自主搭乘電梯電梯按鈕偵測服務型機器人YOLOv7
外文關鍵詞: Autonomous elevator-taking task, elevator button detection, service robot, YOLOv7
相關次數: 點閱:201下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本篇論文提出了一種基於YOLOv7的電梯按鈕偵測演算法,可成功改善電梯按鈕辨識準確性並提高服務型機器人在自主搭乘電梯時的穩定性。在服務機器人的應用場域中,自動搭乘電梯一直是一個難以解決的問題。傳統的解決方案通常使用圖像處理方法,例如模板匹配,或者使用無線通信協議與電梯溝通。然而,這些方法都有穩健性不足或額外增加設備成本的問題。為了解決上述困境,本論文基於YOLOv7的物件偵測神經網路,提出一種增進電梯按鈕辨識準確率的方法,並搭配實體機械手臂按壓電梯按鈕作為機器人與電梯互動的媒介。本文實驗中裝置多個相機用以擷取環境訊息,利用單階段物件偵測和深度學習演算法來識別電梯按鈕、並根據識別結果發出ROS指令控制機械手臂按下按鈕。所提方法可實現服務型機器人自動搭乘電梯的目的,同時也確保流程的安全性和可靠性。實驗結果顯示本篇論文提出演算法,在多個實驗場景中都表現出優秀的按鈕辨識能力並滿足服務機器人自主搭乘電梯之功能需求。未來的研究可進一步優化此演算法,以提升機器人自主搭乘電梯的效率和安全性、同時應用到更多場景。

    The aim of this thesis is to propose a visual algorithm based on YOLOv7 that enhances the robustness of automated elevator-taking service robots in detecting elevator buttons. Traditional solutions for elevator interaction, such as image processing methods, feature selection, or wireless communication protocols, have limitations related to communication security issues and additional equipment costs. To address these issues, our method utilizes a physical robotic arm and the YOLOv7 object detection neural network to improve elevator button detection accuracy and robot-elevator interaction. To identify elevator buttons, we employed single-stage object detection algorithms and multiple cameras to capture environmental information during experiments. The algorithm then sends ROS instructions to control the robotic arm and press the elevator button based on the detection results. Our proposed method ensures the safety and reliability of the automated elevator-taking process for service robots. Experimental results indicate that our algorithm effectively detects buttons in various testing scenarios. Our method is a more suitable solution for service robots to autonomously take elevators.

    Abstract Ⅰ Contents Ⅲ List of Tables V List of Figures VI Chapter 1 1 1.1 Background 1 1.2 Related Work 3 1.2.1 Methods of Automatic Elevator-Taking on Service Robots 3 1.2.2 Methods of Elevator Button Detection and Recognition 4 1.2.3 Wireless Control on Elevator-Taking Tasks 6 1.3 Thesis Organization 7 Chapter 2 9 2.1 Introduction 9 2.2 Different Types of Cameras in Service Robot 10 2.3 Proposed Multi-Camera Fusion Algorithm 12 Chapter 3 16 3.1 Introduction 16 3.2 The Comparison of Different Object Detection Models 17 3.3 An Efficient Post-Processing Method 21 3.4 Proposed Elevator Button Detection Algorithm 27 3.4.1 Datasets and Model Training 28 3.4.2 Data Augmentation 29 Chapter 4 32 4.1 Introduction 32 4.2 Experiment Results 33 4.2.1 Ablation Study 33 4.2.2 Model Performance Evaluation 43 4.2.3 Real-world Application 47 Chapter 5 54 5.1 Introduction 54 5.2 Future Work 56 References 58

    [1] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” arXiv:2207.02696.6, 2022
    [2] H. H. Kim, D. J. Kim, and K. H. Park, “Robust Elevator Button Recognition in the Presence of Partial Occlusion and Clutter by Specular Reflections,” IEEE Transactions on Industrial Electronics, vol. 59, no. 3, pp. 1597-1611, 2012
    [3] X. Lei, H. Pan, and X. Huang, “A Dilated CNN Model for Image Classification,” IEEE Access, vol. 7, pp. 124087-124095, 2019
    [4] N. Ma, J. Liu, and D. Zhu, “Autonomous Removal of Perspective Distortion of Elevator Button Images based on Corner Detection,” in Proc. IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 1998-2004, 2021
    [5] H. H. Kim, D. J. Kim and K. H. Park, “Robust Elevator Button Recognition in the Presence of Partial Occlusion and Clutter by Specular Reflections,” IEEE Transactions on Industrial Electronics, vol. 59, no. 3, pp. 1597-1611, 2012
    [6] P. -Y. Yang, T. -H. Chang, Y. -. H. Chang and B. -F. Wu, “Intelligent Mobile Robot Controller Design for Hotel Room Service with Deep Learning Arm-Based Elevator Manipulator,” International Conference on System Science and Engineering (ICSSE), pp. 1-6, 2018
    [7] F. Abtahi, Z. Zhu, and A. M. Burry, “A deep reinforcement learning approach to character segmentation of license plate images,” in Proc. International Conference on Machine Vision Applications (MVA), pp. 539-542, 2015
    [8] J. Park, E. Lee, Y. Kim, I. Kang, H. I. Koo, and N. I. Cho, “Multi-Lingual Optical Character Recognition System Using the Reinforcement Learning of Character Segmenter,” IEEE Access, vol. 8, pp. 174437-174448, 2020
    [9] J. W. Chen, T. N. T. Tran, and Y. C. Hsieh, “Wireless Elevator Communication and
    Monitor System Design Based on ZigBee Technology and Ethernet,” in Proc. IEEE Eurasia Conference on IOT, Communication and Engineering (ECICE), pp. 369-372, 2019
    [10] The RGB webcam, Logi C310. [Online] Available: https://www.logitech.com/zh-tw/products/webcams/c310-hd-webcam.960-000631.html
    [11] The depth camera, RealSense D435i. [Online] Available: https://www.intelrealsense.com/zh-hans/depth-camera-d435i/
    [12] The binocular camera, Stereolabs ZED Mini. [Online] Available: https://www.stereolabs.com/zed-mini/
    [13] The thermal camera, Bosch GTC 400 C. [Online] Available: https://www.bosch-pt.com.tw/tw/zh/products/gtc-400-c-0601083150
    [14] The panoramic camera, Insta360 Pro 2. [Online] Available: https://www.insta360.com/hk/product/insta360-pro/
    [15] R. Girshick, “Fast R-CNN,” arXiv: 1504.08083, 2015
    [16] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” arXiv: 1506.0149, 2015
    [17] K. He, G. Gkioxari, P. Dollár and R. Girshick, “Mask R-CNN,” in Proc. IEEE International Conference on Computer Vision (ICCV), pp. 2980-2988, 2017
    [18] Z. Cai and N. Vasconcelos, “Cascade R-CNN: Delving into High Quality Object Detection,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154-6162, 2018
    [19] A. Jain, A. R. Zamir, S. Savarese, and A. Saxena, “Structural-RNN: Deep Learning on Spatio-Temporal Graphs,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5308-5317, 2016
    [20] X. Lei, H. Pan, and X. Huang, “A Dilated CNN Model for Image Classification,” IEEE Access, vol. 7, pp. 124087-124095, 2019
    [21] Y. Song, Z. Xie, X. Wang and Y. Zou, “MS-YOLO: Object Detection Based on YOLOv5 Optimized Fusion Millimeter-Wave Radar and Machine Vision,” in IEEE Sensors Journal, vol. 22, no. 15, pp. 15435-15447, 2022
    [22] C. Dong, C. Pang, Z. Li, X. Zeng and X. Hu, “PG-YOLO: A Novel Lightweight Object Detection Method for Edge Devices in Industrial Internet of Things,” in IEEE Access, vol. 10, pp. 123736-123745, 2022
    [23] Y. Dai, W. Liu, H. Wang, W. Xie and K. Long, “YOLO-Former: Marrying YOLO and Transformer for Foreign Object Detection,” in IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1-14, 2022
    [24] M. Simon, S. Milz, K. Amende, and H. Gross, “Complex-YOLO: Real-time 3D Object Detection on Point Clouds,” arXiv:1803.06199, 2018
    [25] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517-6525, 2017
    [26] J. Redmon and S. Divvala, “You Only Look Once: Unified, Real-Time Object Detection,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788, 2016
    [27] W. Liu, D. Anguelov, D. Erhan, “SSD: Single Shot MultiBox Detector,” arXiv: 1512.02325, 2015
    [28] T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, “Focal Loss for Dense Object Detection,” arXiv: 1708.02002, 2017
    [29] M. Tan, R. Pang, Q. V. Le, “EfficientDet: Scalable and Efficient Object Detection,” arXiv: 1911.09070, 2019
    [30] X. Zhang, X. Zhou, M. Lin, and J. Sun, “ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices,” arXiv:1707.01083, 2017
    [31] T. Kong, F. Sun, A. Yao, H. Liu, M. Lu, and Y. Chen, “RON: Reverse Connection with Objectness Prior Networks for Object Detection,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5936-5944, 2017
    [32] Y. Xiang, W. Choi, Y. Lin, and S. Savarese, “Data-Driven 3D Voxel Patterns for Object Category Recognition,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1903-1911, 2015
    [33] M. Görner, S. Kortner, and M. Beetz, “Vision-Based Object Localization for Robotic Home Assistants,” in Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2390-2397, 2018
    [34] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-view 3D Object Detection Network for Autonomous Driving,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 6526-6534, 2017
    [35] A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, Q. V. Le, and H. Adam, “Searching for MobileNetV3,” arXiv:1905.02244, 2019
    [36] Z. Tian, C. Shen, H. Chen, and T. He, “FCOS: Fully Convolutional One-Stage Object Detection,” arXiv:1904.01355, 2019
    [37] M. Zou, J. Yu, Y. Lv, B. Lu, W. Chi and L. Sun, “A Novel Day-to-Night Obstacle Detection Method for Excavators Based on Image Enhancement and Multisensor Fusion,” in IEEE Sensors Journal, vol. 23, no. 10, pp. 10825-10835, 2023
    [38] N. Bodla, B. Singh, R. Chellappa, L. S. Davis, “Soft-NMS -- Improving Object Detection with One Line of Code,” arXiv: 1704.04503, 2017
    [39] D. Park, D. Ramanan, and C. Fowlkes, “End-to-end Integration of a Convolutional Network, Deformable Parts Model and Non-maximum Suppression,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1211-1220, 2015
    [40] L. Tychsen-Smith and L. Petersson, “Improving Object Localization with Fitness NMS and Bounded IoU Loss,” arXiv:1711.00164, 2017
    [41] J. Liu, Y. Fang, D. Zhu, N. Ma, J. Pan, and M. Q. H. Meng, “A Large-Scale Dataset for Benchmarking Elevator Button Segmentation and Character Recognition,” in Proc. IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 14018-14024, 2021
    [42] D. Zhu, Y. Fang, Z. Min, D. Ho and M. Q. -H. Meng, “OCR-RCNN: An Accurate and Efficient Framework for Elevator Button Recognition,” IEEE Transactions on Industrial Electronics, vol. 69, no. 1, pp. 582-591, 2022
    [43] R. Liu, Y. Chen, J. Wang and Z. Guo, “Attentive Mix: An Efficient Data Augmentation Method for Object Detection,” in Proc. International Conference on Computer and Communications (ICCC), pp. 770-774, 2021
    [44] Z. Nie, J. Cao, N. Weng, X. Yu and M. Wang, “Object-Based Perspective Transformation Data Augmentation for Object Detection,” in Proc. International Conference on Frontiers of Artificial Intelligence and Machine Learning (FAIML), pp. 186-190, 2022
    [45] C. Ping-Yang, J. -W. Hsieh, M. Gochoo and Y. -S. Chen, “Light-Weight Mixed Stage Partial Network for Surveillance Object Detection with Background Data Augmentation,” in Proc. IEEE International Conference on Image Processing (ICIP), pp. 3333-3337, 2021
    [46] Y. Peng, X. Liu, M. Zhao and Y. Cao, “Research on Camouflage Object Detection Based on Improved YOLOv5s,” in Proc. International Conferenc¬¬¬¬e on Network and Information Systems for Computers (ICNISC), pp. 644-648, 2022
    [47] H. Yang and Y. Zhou, “IDA-GAN: A Novel Imbalanced Data Augmentation GAN,” in Proc. International Conference on Pattern Recognition (ICPR), pp. 8299-8305, 2021
    [48] M. Shiotani, S. Iguchi and K. Yamaguchi, “Research on data augmentation for vital data using conditional GAN,” in Proc. IEEE Global Conference on Consumer Electronics (GCCE), Osaka, Japan, pp. 344-345, 2022
    [49] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, “End-to-End Object Detection with Transformers,” arXiv: 2005.12872, 2020
    [50] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, “Attention Is All You Need,” arXiv: 1706.03762, 2017
    [51] Y. Zhu, C. Zhao, H. Guo, J. Wang, X. Zhao and H. Lu, “Attention CoupleNet: Fully Convolutional Attention Coupling Network for Object Detection,” IEEE Transactions on Image Processing, vol. 28, no. 1, pp. 113-126, 2019
    [52] X. Wang, Q. Huang, A. Celikyilmaz, J. Gao, D. Shen, Y.-F. Wang, W. Y. Wang, and L. Zhang, “Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6622-6631, 2019
    [53] Z. Zhang, Z. Lin, J. Xu, W. -D. Jin, S. -P. Lu and D. -P. Fan, “Bilateral Attention Network for RGB-D Salient Object Detection,” IEEE Transactions on Image Processing, vol. 30, pp. 1949-1961, 2021
    [54] Z. Chen, Y. Liu, and M. Sun, “Deep Visual Semantic Localization for Autonomous Service Robots in Indoor Environments,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 5, pp. 2045-2058, 2021
    [55] J. Hu, L. Shen, and G. Sun, “Squeeze-and-Excitation Networks,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132-7141, 2018
    [56] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going Deeper with Convolutions,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, 2015
    [57] D. Tang, H. Yu, and W. Xu, “Scene Understanding in a Large-Scale Indoor Environment for Service Robots,” in Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2055-2061, 2019
    [58] A. Valada, L. Ott, and W. Burgard, “Deep Auxiliary Learning for Visual Localization and Odometry,” in Proc. IEEE International Conference on Robotics and Automation (ICRA), pp. 6939-6946, 2018
    [59] H. Chen, Y. Liu, Z. Yang, X. Chen, and E. K. Wong, “Multi-View 3D Object Detection Network for Autonomous Driving,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1907-1915, 2017
    [60] L. Chen, Y. Zhang, and J. Xiao, “Home-Oriented Visual Semantic SLAM for Service Robots,” IEEE Transactions on Automation Science and Engineering, vol. 15, no. 2, pp. 717-729, 2018
    [61] M. Gao, A. Breuer, and T. Zhang, “Visual Perception and Interaction for Service Robots in Human Environments,” IEEE Transactions on Cognitive and Developmental Systems, vol. 11, no. 1, pp. 1-12, 2019
    [62] J. Wang, L. Ma, and X. Yang, “Visual Servoing of Service Robots with Deep Reinforcement Learning,” IEEE Robotics and Automation Letters, vol. 4, no. 3, pp. 2976-2983, 2019
    [63] S. Kim, H. Choi, and J. Oh, “Visual Semantic Navigation for Home Service Robots Using a Deep Memory Network,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 50, no. 3, pp. 1022-1033, 2020
    [64] A. Jain, S. Sharma, and D. Ramanan, “Visual Semantic Planning: Deep Planning Network for Household Robots,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 1, pp. 104-116, 2020
    [65] L. Sun, Z. Zhang, and J. Liu, “Visual Object Recognition for Service Robots: A Benchmark and Performance Analysis,” IEEE Access, vol. 8, pp. 12345-12356, 2020
    [66] M. Tanaka, K. Noda, and S. Ikeda, “Visual Task Planning for Service Robots: Object Manipulation with Geometric and Physical Constraints,” IEEE Transactions on Robotics, vol. 37, no. 2, pp. 512-527, 2021
    [67] Y. Wu, H. Li, and X. Wang, “Visual Semantic SLAM with Landmarks for Large-Scale Service Robots,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 4, pp. 1548-1560, 2021
    [68] Z. Wang, J. Chen and S. C. H. Hoi, “Deep Learning for Image Super-Resolution: A Survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, pp. 3365-3387, 2020
    [69] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv:1409.1556, 2014
    [70] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” arXiv:1502.03167, 2015
    [71] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid Scene Parsing Network,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6230-6239, 2017
    [72] S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, “CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features,” arXiv:1905.04899, 2019

    無法下載圖示 校內:2028-09-01公開
    校外:2028-09-01公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE