| 研究生: |
戴宏倫 Tai, Hung-Lun |
|---|---|
| 論文名稱: |
基於師生學習網路架構下之無監督式前景分割方法 A Teacher-Student Model for Unsupervised Foreground Segmentation |
| 指導教授: |
胡敏君
Hu, Min-Chun |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 英文 |
| 論文頁數: | 46 |
| 中文關鍵詞: | 前景分割 、無監督式學習 、注意力模型 、自動標記 |
| 外文關鍵詞: | Foreground Segmentation, Unsupervised Learning, Attention model, Auto Labeling |
| 相關次數: | 點閱:153 下載:11 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
深度學習領域中語意分割(Semantic Segmentation)[2][3][4]及實例分割(Instance Segmentation)[5][6][7]為像素等級(pixelwise)之分類技術,其特點為精確描述影像中每個像素點之類別資訊,具體來說可以辨識物體之邊緣輪廓,深度學習模型訓練無可避免資料前處理(Preprocessing)及資料標註(Data Annotation),其中語意分割模型中之資料標註工作需要對影像中物體的輪廓進行標記,標註成本十分高昂。
本研究以此問題延伸發想,擬於無任何先驗知識之前提下,減低語意分割及實例分割之資料標註成本,為此設計了一組End-to-End無監督式學習方式、師生網路為學習模式(Teacher-Student Learning)之神經網路架構,教師網路(Teacher Network)為Dual Encoder Attention Network(DEANet),可以於無任何人為標記資料之訓練下、自主學習大量影像中透過編碼器(Encoder)壓縮之前景物體之相似結構訊息,並透過解碼器(Encoder)分別重建出影像之前景和背景、同時生成物體遮罩(Mask),再透過多尺度特徵學生網路(Student Network)Hierarchical Network對教師網路之初步結果進行學習,得到邊緣細節更精準之遮罩結果。本研究於University of Oxford Flower Dataset及Stanford Cars Dataset分別得到0.83及0.94之Jaccard Similarity結果、影像如Figure 1所示。
In recent years, deep learning has been popular in both researches and applications. There are pixelwise classification with deep learning research, for example, semantic segmentation[2][3][4] and instance segmentation[5][6][7]. These methods feature the capability of precisely recognizing every pixel and classifying objects in a given image. In the other words, it can distinguish the contour of object in an image. In past, before deep learning training, two crucial works have to be conducted, data preprocessing and data annotation. Data annotation costs much especially in the part of semantic segmentation which is labeling the contour of objects.
Aimed to solve the high cost from handcraft data labeling, we present an end-to-end teacher-student model of unsupervised foreground segmentation in this study. In addition, we apply a Dual Encoder Attention Network (DEANet) as teacher network in order to explore the similarity structure between object features from encoder. DEANet network reconstructs the foreground and background of image respectively, and generates the mask of object without data labeling. Next, we apply the Hierarchical Network as student network to learn how teacher network works to generate masks. We find that student network in this study gets more accuracy details on object contour. After experiments, we get results with 0.83 and 0.94 of Jaccard Similarity on University of Oxford Flower Dataset and Stanford Cars Dataset, which are pretty good results. The result is shown in Figure 2 as below.
[1] Parijat,D. Bishwaranjan,B. Automatic Labeling of Data for Transfer Learning. CVPR, 2019.
[2] Kai Xu1 , Longyin Wen. Spatiotemporal CNN for Video Object Segmentation. In CVPR, 2019.
[3] Liang-Chieh Chen, Yi Yang, Jiang Wang, Wei Xu, and Alan L Yuille. Attention to scale: Scale-aware semantic image segmentation. In CVPR, 2016.
[4] Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, and Amit Agrawal. Context encoding for semantic segmentation. In CVPR, 2018.
[5] Ziyu Zhang, Sanja Fidler, and Raquel Urtasun. Instancelevel segmentation for autonomous driving with deep densely connected mrfs. In CVPR, 2016.
[6] Yiding Liu, Siyu Yang, Bin Li, Wengang Zhou, Jizheng Xu, Houqiang Li, and Yan Lu. Affinity derivation and graph merge for instance segmentation. In ECCV, 2018.
[7] Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. Path aggregation network for instance segmentation. In CVPR, 2018.
[8] Emanuela, H. Marius, L. Unsupervised object segmentation in video by efficient selection of highly probable positive features. In ICCV, 2017.
[9] Jonathon, S. A Tutorial on Principal Component Analysis. Systems Neurobiology Laboratory, Salk Insitute for Biological Studies La Jolla, CA 92037 and Institute for Nonlinear Science, University of California, San Diego La Jolla, CA 92093-0402. 2005.
[10] O. Stretcu and M. Leordeanu. Multiple frames matching for object discovery in video. In BMVC, pages 186–1, 2015.
[11] Moore, A., Prince, S., Warrell, J., Mohammed, U., Jones, G.: Superpixel Lattices. In CVPR, 2008.
[12] W. Zhu, S. Liang, Y. Wei, J. Sun, Saliency Optimization from Robust Background Detection. In CVPR, 2014.
[13] G.E. Hinton, O. Vinyals, J. Dea. Distilling the Knowledge in a Neural Network. In NIPS, 2014.
[14] I. Croitoru , S.V. Bogolin , M. Leordeanu, Unsupervised learning from video to detect foreground objects in single images. In ICCV, 2017.
[15] J. Zhang, T. Zhang, Y. Dai, M. Harandi, R. Hartley. Deep Unsupervised Saliency Detection: A Multiple Noisy Labeling Perspective. In CVPR, 2018.
[16] J. Long, E. Shelhamer, T. Darrell. Fully Convolutional Networks for Semantic Segmentation. In CVPR, 2015.
[17] W. Zhu, S. Liang, Y. Wei, and J. Sun. Saliency optimization from robust background detection. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 2814–2821, 2014.
[18] M.-M. Cheng, J. Warrell, W.-Y. Lin, S. Zheng, V. Vineet, and N. Crook. Efficient salient region detection with soft image abstraction. In Proc. IEEE Int. Conf. Comp. Vis., pages 1529–1536, 2013.
[19] M. Cheng, G. Zhang, N. Mitra, X. Huang, and S.-M. Hu. Global contrast based salient region detection. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 409–416, 2011.
[20] S. Goferman, L. Zelnik-Manor, and A. Tal. Context-aware saliency detection. IEEE Trans. Pattern Anal. Mach. Intell., 34(10):1915–1926, 2012.
[21] Kuang-Jui Hsu, Yen-Yu Lin, Yung-Yu Chuang. Co-attention CNNs for Unsupervised Object Co-segmentation. In ICJAI, 2018.
[22] P. Krähenbühl, V. Koltun. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. In NIPS, 2011.
[23] K. Simonyan, A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556. 2014.
[24] A. Krizhevsky, I. Sutskever, G.E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS, 2012.
[25] G. Huang, Z. Liu, L.V.D. Maaten, K.Q. Weinberger. Densely Connected Convolutional Networks. In CVPR, 2017.
[26] Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, Xiaoou Tang. Residual Attention Network for Image Classification. arXiv:1704.06904. 2017.
[27] Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon. CBAM: Convolutional Block Attention Module. arXiv:1807.06521. In ECCV, 2018.
[28] Wei Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, Cheng-Yang Fu, A.C. Berg. SSD: Single Shot MultiBox Detector. arXiv:1512.02325. In ECCV, 2016.
[29] O. Ronneberger, P. Fischer, T. Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv:1505.04597. In MICCAI, 2015.
[30] J J Hopfield. Neural networks and physical systems with emergent collective computational abilities. PNAS. 1982.
[31] R. Frank. The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain, Cornell Aeronautical Laboratory, Psychological Review, v65, No. 6, pp. 386–408. 1958.
[32] Schatz, Michael C.; Trapnell, Cole; Delcher, Arthur L.; Varshney, Amitabh. "High-throughput sequence alignment using Graphics Processing Units". BMC Bioinformatics. 8: 474. doi:10.1186/1471-2105-8-474. 2007.
[33] O. Russakovsky, Jia Den. ImageNet Large Scale Visual Recognition Challenge. 2010.
[34] Simonyan, Karen, Andrea Vedaldi, and Andrew Zisserman. “Deep inside convolutional networks: Visualising image classification models and saliency maps.” arXiv preprint arXiv:1312.6034 (2013).
[35] Zeiler, Matthew D., and Rob Fergus. “Visualizing and understanding convolutional networks.” European conference on computer vision. Springer, Cham, 2014.