| 研究生: |
徐家駒 Hsu, Jia-Jiu |
|---|---|
| 論文名稱: |
整合深度物件偵測、規則字元辨識及場景辨識之無建圖室內導航系統 Indoor navigation system without mapping integrated with depth object detection, regular character recognition, and scene recognition |
| 指導教授: |
王駿發
Wang, Jhing-Fa |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 英文 |
| 論文頁數: | 61 |
| 中文關鍵詞: | 視覺導航 、機器人 、物件偵測 、光學字元辨識 、場景識別 |
| 外文關鍵詞: | Visual Navigation, Robotics, Object Detection, OCR, Scene Recognition |
| 相關次數: | 點閱:286 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著科技的高速發展,機器人逐漸走進了我們的生活。過去十年,電商亞馬遜在其物流中心部署了超過50萬台機器人,幫助亞馬遜在物流自動化技術方面取得了巨大成功。這也促使各大公司開始引入機器人來應對大量訂單。在COVID-19的肆虐下,勞動力的缺乏也間接促使企業在服務行業使用機器人,而不再局限於工廠,相對的需要應對更加多變和複雜的環境。
在當今的機器人系統中,經常使用SLAM做為機器人的導航系統,SLAM運行時必須要倚靠地圖才可以正確導航,地圖的建立、更新和維護是非常耗時的;且當場域過於龐大時,逐一掃描地圖顯得不太實際;且僅透過SLAM進行導航會有諸多視覺死角。
因此本論文投入影像AI技術來解決室內導航的瓶頸,這個瓶頸其中一個關鍵是如何讓機器人能夠解讀人類日常生活的主要圖像進而整合到機器人服務上,本論文提出的系統包含深度物件偵測模組、光學字元辨識模組和場景辨識模組;且搭配環境的先備知識(室內地圖、GPS等),解決SLAM下沒建圖和標註地標就無法導航的困境。
Robots have gradually entered our lives with the rapid development of technology and science. In the past ten years, e-commerce Amazon has deployed more than 500,000 robots in its distribution centers, helping Amazon achieve great success in distribution automation technology. This has also prompted other companies to start introducing robots to handle the high volume of orders. Under the ravages of COVID-19, the lack of labor has also indirectly prompted enterprises to use robots in the service industry, rather than being limited to factories, and relatively need to deal with a more changeable and complex environment.
However, in the current robot system, SLAM(Simultaneous localization and mapping) is often used as the robot's navigation system. When using SLAM, it must rely on the map to navigate correctly. The establishment, update, and maintenance of the map are very time-consuming; and when the field is too large, It is impractical to scan the map one by one, and there will be many blind spots for navigation only through SLAM.
Therefore, this paper invests in image AI technology to solve the bottleneck of indoor navigation. One of the key technologies of this bottleneck is how to enable robots to interpret the main images of human daily life and then integrate them into robot services. The system proposed in this paper includes deep object detection modules, optical character recognition modules, and scene recognition modules; together with the prior knowledge of the environment (indoor maps, GPS, etc.), it solves the dilemma of not being able to navigate without building a map and marking landmarks under SLAM.
[1] Mehmet Guzel. Autonomous vehicle navigation using vision and mapless strate-
gies: A survey. Advances in Mechanical Engineering, 2013, 01 2013.
[2] Gang Peng, Wei Zheng, Zezao Lu, Jinhu Liao, Lu Hu, Gongyue Zhang, and
Dingxin He. An improved amcl algorithm based on laser scanning match in a
complex and unstructured environment. Complexity, 2018:1–11, 12 2018.
[3] Henrik I. Christensen, Niels O. Kirkeby, Steen Kristensen, Lars Knudsen, and Erik
Granum. Model-driven vision for in-door navigation. Robotics and Autonomous
Systems, 12(3):199–207, 1994.
[4] M. Hashima, F. Hasegawa, S. Kanda, T. Maruyama, and T. Uchiyama. Localiza-
tion and obstacle detection for robots for carrying food trays. In Proceedings of the
1997 IEEE/RSJ International Conference on Intelligent Robot and Systems. Inno-
vative Robotics for Real-World Applications. IROS ’97, volume 1, pages 345–351
vol.1, 1997.
[5] M. Kabuka and A. Arenas. Position verification of a mobile robot using standard
pattern. IEEE Journal on Robotics and Automation, 3(6):505–516, 1987.
[6] Giorgio Grisetti, Cyrill Stachniss, and Wolfram Burgard. Improved techniques
for grid mapping with rao-blackwellized particle filters. IEEE Transactions on
Robotics, 23(1):34–46, 2007.
[7] Stefan Kohlbrecher, Oskar von Stryk, Johannes Meyer, and Uwe Klingauf. A
flexible and scalable slam system with full 3d motion estimation. In 2011 IEEE
International Symposium on Safety, Security, and Rescue Robotics, pages 155–160,
2011.
[8] Shengshu Liu, Yixing Lei, and Xin Dong. Evaluation and comparison of gmapping
and karto slam systems. In 2022 12th International Conference on CYBER Tech-
nology in Automation, Control, and Intelligent Systems (CYBER), pages 295–300,
2022.
[9] Wolfgang Hess, Damon Kohler, Holger Rapp, and Daniel Andor. Real-time loop
closure in 2d lidar slam. In 2016 IEEE International Conference on Robotics and
Automation (ICRA), pages 1271–1278, 2016.
[10] Ra ́ul Mur-Artal, J. M. M. Montiel, and Juan D. Tard ́os. Orb-slam: A versatile
and accurate monocular slam system. IEEE Transactions on Robotics, 31(5):1147–
1163, 2015.
[11] Jakob Engel, J ̈org St ̈uckler, and Daniel Cremers. Large-scale direct slam with
stereo cameras. In 2015 IEEE/RSJ International Conference on Intelligent Robots
and Systems (IROS), pages 1935–1942, 2015.
[12] Mathieu Labb ́e and Fran ̧cois Michaud. Rtab-map as an open-source lidar and
visual simultaneous localization and mapping library for large-scale and long-term
online operation: Labb ́E and michaud. Journal of Field Robotics, 36, 10 2018.
[13] Alexandre Bernardino and Jos ́e Santos-Victor. Visual behaviours for binocular
tracking. Robotics and Autonomous Systems, 25(3):137–146, 1998. Autonomous
Mobile Robots.
[14] Matthew Szenher. Visual homing in dynamic indoor environments. 01 2008.
[15] Dongsung Kim and Ramakant Nevatia. Recognition and localization of generic
objects for indoor navigation using functionality. Image and Vision Computing,
16(11):729–743, 1998. Reasoning about functionality in object recognition.
[16] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection.
In 2005 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR’05), volume 1, pages 886–893 vol. 1, 2005.
[17] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature
hierarchies for accurate object detection and semantic segmentation. In 2014
IEEE Conference on Computer Vision and Pattern Recognition, pages 580–587,
2014.
[18] Ross Girshick. Fast r-cnn. In 2015 IEEE International Conference on Computer
Vision (ICCV), pages 1440–1448, 2015.
[19] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards
real-time object detection with region proposal networks. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 39(6):1137–1149, 2017.
[20] Kaiming He, Georgia Gkioxari, Piotr Doll ́ar, and Ross Girshick. Mask r-cnn. In
2017 IEEE International Conference on Computer Vision (ICCV), pages 2980–
2988, 2017.
[21] Chien-Yao Wang, Alexey Bochkovskiy, and Hong-yuan Liao. Yolov7: Trainable
bag-of-freebies sets new state-of-the-art for real-time object detectors. 07 2022.
[22] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed,
Cheng-Yang Fu, and Alexander C. Berg. Ssd: Single shot multibox detector.
2016. To appear.
[23] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll ́ar. Fo-
cal loss for dense object detection. In 2017 IEEE International Conference on
Computer Vision (ICCV), pages 2999–3007, 2017.
[24] Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin Cubuk,
Quoc Le, and Barret Zoph. Simple copy-paste is a strong data augmentation
method for instance segmentation. pages 2917–2927, 06 2021.
[25] Shifeng Zhang, Cheng Chi, Yongqiang Yao, Zhen Lei, and Stan Li. Bridging the
gap between anchor-based and anchor-free detection via adaptive training sample
selection. pages 9756–9765, 06 2020.
[26] Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, and Hwalsuk Lee.
Character region awareness for text detection. In 2019 IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR), pages 9357–9366, 2019.
[27] Sepp Hochreiter and J ̈urgen Schmidhuber. Long short-term memory. Neural
computation, 9:1735–80, 12 1997.
[28] Ehsan Variani, Tom Bagby, Kamel Lahouel, Erik McDermott, and Michiel Bacchi-
ani. Sampled connectionist temporal classification. In 2018 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4959–
4963, 2018.
[29] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification
with deep convolutional neural networks. Communications of the ACM, 60(6):84–
90, 2017.
[30] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for
large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[31] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir
Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going
deeper with convolutions. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 1–9, 2015.
[32] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning
for image recognition. In Proceedings of the IEEE conference on computer vision
and pattern recognition, pages 770–778, 2016.
[33] Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba.
Places: A 10 million image database for scene recognition. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 2017.
[34] Alejandro L ́opez-Cifuentes, Marcos Escudero-Vi ̃nolo, Jes ́us Besc ́os, and ́Alvaro
Garc ́ıa-Mart ́ın. Semantic-aware scene recognition. Pattern Recognition,
102:107256, 2020.
[35] Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin,
Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander
Kolesnikov, et al. The open images dataset v4. International Journal of Computer
Vision, 128(7):1956–1981, 2020.
[36] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva
Ramanan, Piotr Doll ́ar, and C Lawrence Zitnick. Microsoft coco: Common objects
in context. In European conference on computer vision, pages 740–755. Springer,
2014.
[37] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet:
A large-scale hierarchical image database. In 2009 IEEE Conference on Computer
Vision and Pattern Recognition, pages 248–255, 2009.
[38] Angelo Vittorio. Toolkit to download and visualize single or multiple classes from
the huge open images v4 dataset. https://github.com/EscVM/OIDv4_ToolKit,
2018.
[39] Ariadna Quattoni and Antonio Torralba. Recognizing indoor scenes. In 2009
IEEE Conference on Computer Vision and Pattern Recognition, pages 413–420,
2009.
[40] Jianxiong Xiao, James Hays, Krista A. Ehinger, Aude Oliva, and Antonio Tor-
ralba. Sun database: Large-scale scene recognition from abbey to zoo. In 2010
IEEE Computer Society Conference on Computer Vision and Pattern Recognition,
pages 3485–3492, 2010.
[41] Chandrajit Pal, Amlan Chakrabarti, and Ranjan Ghosh. A brief survey of re-
cent edge-preserving smoothing algorithms on digital images. arXiv preprint
arXiv:1503.07297, 2015.
[42] Chien-Yao Wang, Hong-Yuan Mark Liao, Yueh-Hua Wu, Ping-Yang Chen, Jun-
Wei Hsieh, and I-Hau Yeh. Cspnet: A new backbone that can enhance learning
capability of cnn. In Proceedings of the IEEE/CVF conference on computer vision
and pattern recognition workshops, pages 390–391, 2020.
[43] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial pyramid
pooling in deep convolutional networks for visual recognition. IEEE transactions
on pattern analysis and machine intelligence, 37(9):1904–1916, 2015.
[44] Tsung-Yi Lin, Piotr Doll ́ar, Ross Girshick, Kaiming He, Bharath Hariharan, and
Serge Belongie. Feature pyramid networks for object detection. In Proceedings of
the IEEE conference on computer vision and pattern recognition, pages 2117–2125,
2017.
[45] Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. Path aggregation
network for instance segmentation. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 8759–8768, 2018.
[46] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional net-
works for biomedical image segmentation. In International Conference on Medi-
cal image computing and computer-assisted intervention, pages 234–241. Springer,
2015.
[47] Yuning Du, Chenxia Li, Ruoyu Guo, Xiaoting Yin, Weiwei Liu, Jun Zhou, Yifan
Bai, Zilin Yu, Yehua Yang, Qingqing Dang, et al. Pp-ocr: A practical ultra
lightweight ocr system. arXiv preprint arXiv:2009.09941, 2020.
[48] Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, and Xiang Bai. Real-time
scene text detection with differentiable binarization. In Proceedings of the AAAI
conference on artificial intelligence, volume 34, pages 11474–11481, 2020.
[49] Michael Hahsler, Matthew Piekenbrock, and Derek Doran. dbscan : Fast density-
based clustering with r. Journal of Statistical Software, 91, 10 2019.
[50] Shi Na, Liu Xumin, and Guan Yong. Research on k-means clustering algorithm:
An improved k-means clustering algorithm. In 2010 Third International Sympo-
sium on Intelligent Information Technology and Security Informatics, pages 63–67,
2010.