| 研究生: |
黃韋晧 Huang, Wei-Hao |
|---|---|
| 論文名稱: |
高效繪圖式影像分割與追蹤 Efficient Scribble-based Interactive Segmentation And Tracking |
| 指導教授: |
許志仲
Hsu, Chih-Chung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 統計學系 Department of Statistics |
| 論文出版年: | 2024 |
| 畢業學年度: | 112 |
| 語文別: | 英文 |
| 論文頁數: | 62 |
| 中文關鍵詞: | 自駕車 、圖片標記 、追蹤 、影像分割 、互動式影像分割 |
| 外文關鍵詞: | Autonomous vehicles, Annotation, Tracking, Segmentation, Interactive Segmentation |
| 相關次數: | 點閱:61 下載:11 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
圖像的物件分割是深度學習中常見的任務之一,在這個領域中也是非常重要的任務之一,像是在醫療影像或是在自駕車方面,常常需要經由影像分割來辨別出目標物的位置或是涵蓋的範圍。然而,在傳統上想要使用監督式學習訓練好一個能夠有效分割出物件的模型,其中一個關鍵就是要有大量標註好的圖片來提供學習。要取得大量的標記必須依賴人力對每張圖片進行標註,這需要花費大量的時間與人力。
為了解決上述問題,互動式的影像分割方法被提出來且能有效的解決標記所帶來的大量開銷。本文提出一種用繪圖的方式來輔助標記的模型 Efficient Scribble-based interactive Segmentation and Tracking (ESST) ,可以有效解決標記所帶來的開銷。而且該方法特別在自駕車方面的資料集標記效率上相較於其他方法有著顯著的提升。
在影片方面,因為每一幀的影像相差不大,本文也提出了一種結合點追蹤的方式來完成遮罩追蹤,使得影片上的標記得到進一步的加速。實驗結果表示,ESST 實現了最先進的性能。應用上也提供了標記軟體方便使用。
Image segmentation is one of the prevalent and crucial tasks in deep learning. This task is particularly essential in fields such as medical imaging and autonomous driving, where segmentation is often required to identify the location or extent of target objects. Traditionally, training an effective supervised learning model for segmentation necessitates a large number of annotated images. Obtaining such a large dataset of labeled images relies on manual annotation, which is time and labor consuming.
To address this issue, interactive image segmentation methods have been proposed. This paper presents a model Efficient Scribble base interactive Segmentation and Tracking (ESST) effectively mitigating the high cost of labeling. Notably improves annotation efficiency in the context of autonomous driving.
Regarding video data, this paper also proposes a point tracking approach to accomplish mask tracking, further accelerating the annotation process in videos. Experimental results demonstrate that ESST achieves state-of-the-art performance. Additionally, the application provides user-friendly annotation software.
[1] Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying Chen, Fangchen Liu, Vashisht Madhavan, and Trevor Darrell. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
[2] Fei Du, Jianlong Yuan, Zhibin Wang, and Fan Wang. Efficient mask correction for click-based interactive image segmentation. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22773–22782, 2023.
[3] Agrim Gupta, Piotr Doll´ar, and Ross B. Girshick. Lvis: A dataset for large vocabulary instance segmentation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5351–5359, 2019.
[4] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Sys- tems, volume 25. Curran Associates, Inc., 2012.
[5] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.
[6] Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015.
[7] Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 580–587, 2013.
[8] Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39:1137–1149, 2015.
[9] Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 779–788, 2015.
[10] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. ArXiv, abs/1804.02767, 2018.
[11] Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. Yolov4: Op- timal speed and accuracy of object detection. ArXiv, abs/2004.10934, 2020.
[12] Chuyin Li, Lu Li, Hongliang Jiang, Kaiheng Weng, Yifei Geng, L. Li, Zaidan Ke, Qingyuan Li, Meng Cheng, Weiqiang Nie, Yiduo Li, Bo Zhang, Yufei Liang, Linyuan Zhou, Xiaoming Xu, Xiangxiang Chu, Xiaoming Wei, and Xiaolin Wei. Yolov6: A single-stage object detection framework for industrial applications. ArXiv, abs/2209.02976, 2022.
[13] Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. Yolov7: Train- able bag-of-freebies sets new state-of-the-art for real-time object detectors. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7464–7475, 2022.
[14] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436, 2015.
[15] J¨urgen Schmidhuber. Deep learning in neural networks: An overview. Neural networks : the official journal of the International Neural Network Society, 61:85– 117, 2014.
[16] Daniel A. Roberts, Sho Yaida, and Boris Hanin. The principles of deep learning theory. ArXiv, abs/2106.10165, 2021.
[17] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional net- works for biomedical image segmentation. ArXiv, abs/1505.04597, 2015.
[18] Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, and Bin Xiao. Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43:3349–3364, 2019. [19] Jiacong Xu, Zixiang Xiong, and S. Bhattacharyya. Pidnet: A real-time semantic segmentation network inspired by pid controllers. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19529–19539, 2022.
[20] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Doll´ar, and Ross B. Girshick. Segment anything. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3992–4003, 2023.
[21] Xi Chen, Zhiyan Zhao, Yilei Zhang, Manni Duan, Donglian Qi, and Heng- shuang Zhao. Focalclick: Towards practical interactive image segmentation. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1290–1299, 2022.
[22] Xi Chen, Zhiyan Zhao, Feiwu Yu, Yilei Zhang, and Manni Duan. Conditional dif- fusion for interactive segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 7345–7354, October 2021.
[23] N. Xu, Brian L. Price, Scott D. Cohen, Jimei Yang, and Thomas S. Huang. Deep interactive object selection. 2016 IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 373–381, 2016.
[24] Victor S. Lempitsky, Pushmeet Kohli, Carsten Rother, and Toby Sharp. Image segmentation with a bounding box prior. 2009 IEEE 12th International Confer- ence on Computer Vision, pages 277–284, 2009.
[25] Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. GrabCut: Interactive Foreground Extraction Using Iterated Graph Cuts. Association for Computing Machinery, New York, NY, USA, 1 edition, 2023.
[26] Hongkai Yu, Youjie Zhou, Hui Qian, Min Xian, and Song Wang. Loosecut: Inter- active image segmentation with loosely bounded boxes. 2017 IEEE International Conference on Image Processing (ICIP), pages 3335–3339, 2015.
[27] Eirikur Agustsson, Jasper R. R. Uijlings, and Vittorio Ferrari. Interactive full image segmentation by considering all regions jointly. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11614–11623, 2018.
[28] Junjie Bai and Xiaodong Wu. Error-tolerant scribbles based interactive image seg- mentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 392–399, 2014.
[29] Xi Chen, Yau Shing Jonathan Cheung, Ser Nam Lim, and Hengshuang Zhao. Scribbleseg: Scribble-based interactive image segmentation. ArXiv, abs/2303.11320, 2023.
[30] Carl Doersch, Yi Yang, Mel Vecer´ık, Dilara Gokay, Ankush Gupta, Yusuf Aytar, Jo˜ao Carreira, and Andrew Zisserman. Tapir: Tracking any point with per-frame initialization and temporal refinement. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 10027–10038, 2023.
[31] Mingyuan Fan, Shenqi Lai, Junshi Huang, Xiaoming Wei, Zhenhua Chai, Jun- feng Luo, and Xiaolin Wei. Rethinking bisenet for real-time semantic segmenta- tion. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9711–9720, 2021.
[32] Yuxin Fang, Wen Wang, Binhui Xie, Quan-Sen Sun, Ledell Yu Wu, Xinggang Wang, Tiejun Huang, Xinlong Wang, and Yue Cao. Eva: Exploring the limits of masked visual representation learning at scale. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19358–19369, 2022.
[33] Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiao hua Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, and Y. Qiao. Internimage: Exploring large-scale vision foundation models with deformable con- volutions. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), pages 14408–14419, 2022.
[34] Feng Li, Hao Zhang, Hu-Sheng Xu, Siyi Liu, Lei Zhang, Lionel Ming shuan Ni, and Heung yeung Shum. Mask dino: Towards a unified transformer-based framework for object detection and segmentation. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3041–3050, 2022.
[35] Jitesh Jain, Jiacheng Li, Man Chun Chiu, Ali Hassani, Nikita Orlov, and Humphrey Shi. Oneformer: One transformer to rule universal image segmenta- tion. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2989–2998, 2022.
[36] Konstantin Sofiiuk, Ilya A. Petrov, and Anton Konushin. Reviving iterative train- ing with mask guidance for interactive segmentation. 2022 IEEE International Conference on Image Processing (ICIP), pages 3141–3145, 2021.
[37] Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C. Lawrence Zitnick. Microsoft coco: Common ob- jects in context. In European Conference on Computer Vision, 2014.
[38] Yifu Zhang, Xing-Hui Wang, Xiaoqing Ye, Wei Zhang, Jincheng Lu, Xiao Tan, Errui Ding, Pei Sun, and Jingdong Wang. Bytetrackv2: 2d and 3d multi-object tracking by associating every detection box. ArXiv, abs/2303.15334, 2023.
[39] Yifu Zhang, Pei Sun, Yi Jiang, Dongdong Yu, Zehuan Yuan, Ping Luo, Wenyu Liu, and Xinggang Wang. Bytetrack: Multi-object tracking by associating every detection box. In European Conference on Computer Vision, 2021.
[40] Zhiyu Zhu, Junhui Hou, and Dapeng Oliver Wu. Cross-modal orthogonal high- rank augmentation for rgb-event transformer-trackers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 22045– 22055, October 2023.
[41] Ho Kei Cheng and Alexander G. Schwing. Xmem: Long-term video object seg- mentation with an atkinson-shiffrin memory model. In European Conference on Computer Vision, 2022.
[42] Matthieu Paul, Martin Danelljan, Christoph Mayer, and Luc Van Gool. Robust visual tracking by segmentation. ArXiv, abs/2203.11191, 2022.
[43] Jialian Wu, Jiale Cao, Liangchen Song, Yu Wang, Ming Yang, and Junsong Yuan. Track to detect and segment: An online multi-object tracker. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12352–12361, June 2021.
[44] Qiang Wang, Li Zhang, Luca Bertinetto, Weiming Hu, and Philip H. S. Torr. Fast online object tracking and segmentation: A unifying approach. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1328– 1338, 2018.
[45] B. Yan, Yi Jiang, Jiannan Wu, D. Wang, Ping Luo, Zehuan Yuan, and Huchuan Lu. Universal instance perception as object discovery and retrieval. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15325–15336, 2023.
[46] Lei Ke, Martin Danelljan, Henghui Ding, Yu-Wing Tai, Chi-Keung Tang, and Fisher Yu. Mask-free video instance segmentation. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22857–22866, 2023.
[47] Jinyu Yang, Mingqi Gao, Zhe Li, Shanghua Gao, Fang Wang, and Fengcai Zheng. Track anything: Segment anything meets videos. ArXiv, abs/2304.11968, 2023.
[48] Rodrigo Benenson, Stefan Popov, and Vittorio Ferrari. Large-scale interactive object segmentation with human annotators. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11692–11701, 2019.
[49] Ashish Vaswani, Noam M. Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Neural Information Processing Systems, 2017.
[50] Sabarinath Mahadevan, Paul Voigtlaender, and B. Leibe. Iteratively trained in- teractive segmentation. ArXiv, abs/1805.04398, 2018.
[51] Yuhui Yuan, Xilin Chen, and Jingdong Wang. Object-contextual representations for semantic segmentation. ArXiv, abs/1909.11065, 2019.
[52] Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. ”grabcut”: interac- tive foreground extraction using iterated graph cuts. In ACM SIGGRAPH 2004 Papers, SIGGRAPH ’04, page 309–314, New York, NY, USA, 2004. Association for Computing Machinery.
[53] Kevin McGuinness and Noel E. O’Connor. A comparative evaluation of interactive segmentation algorithms. Pattern Recognition, 43(2):434–444, 2010. Interactive Imaging and Vision.
[54] Bharath Hariharan, Pablo Arbel´aez, Lubomir Bourdev, Subhransu Maji, and Ji- tendra Malik. Semantic contours from inverse detectors. In 2011 International Conference on Computer Vision, pages 991–998, 2011.
[55] Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2):303–338, 2010.
[56] Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
[57] Gabriel J. Brostow, Jamie Shotton, Julien Fauqueur, and Roberto Cipolla. Seg- mentation and recognition using structure from motion point clouds. In ECCV (1), pages 44–57, 2008.