研究生: |
張秉謙 Chang, Bin-Qian |
---|---|
論文名稱: |
具加強語意分割之深度學習影像壓縮 Deep-Learned Image Compression with Enhancing Semantic Segmentation |
指導教授: |
楊家輝
Yang, Jar-Ferr |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 英文 |
論文頁數: | 48 |
中文關鍵詞: | 深度學習 、影像壓縮 、語意分割 、轉換器 、遮罩式轉換器 |
外文關鍵詞: | Deep Learning, Image Compression, Semantic Segmentation, Transformer, Mask2Former |
相關次數: | 點閱:63 下載:4 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
深度學習在許多電腦視覺任務例如影像辨識和語意分割成為了主流。然而,為了儲存、傳送影像通常會透過影像壓縮方式節省存儲的空間和傳輸帶寬,不同的影像壓縮的演算法和壓縮比皆會影響語意分割網路的表現。傳統影像壓縮演算法無法使用反向傳播的方式針對不同語意分割網路調整,為了解決這個問題就必須針對特定演算法及壓縮率優化語意分割的神經網路。近幾年,深度學習在影像的壓縮技術也大幅發展,且影像壓縮網路可以透過反向傳播的方式更新。所以,我們提出了基於神經網路可調整壓縮率的系統,藉此可以同時達到端到端的訓練影像壓縮模組和語意分割模組。本論文所提的影像壓縮網路是基於轉換器(Transformer)網路,並參考提示微調(Prompt Tuning)的方法接收調整壓縮率的參數。語意分割模組的部分,輸入是壓縮過的圖像特徵,我們修改遮罩式轉換器(Mask2Former)架構,使得該架構可以節省將特徵還原成圖片的過程進行辨識。實驗結果顯示,我們提出的方法可以的減少因圖像壓縮造成語意分割任務效能損失。
Deep learning has become mainstream in many computer vision tasks such as image recognition and semantic segmentation. However, to save storage space and transmission bandwidth, images are often compressed. Different image compression algorithms and compression ratios can affect the performance of segmentation networks. Traditional image compression algorithms cannot be adjusted for specific segmentation networks using backpropagation. To address this issue, it is necessary to optimize segmentation neural networks for specific algorithms and compression rates. In recent years, deep learning has made advancements in image compression techniques. Image compression networks can be updated through backpropagation. Therefore, we propose a system based on neural networks that can adjust the compression ratio, allowing end-to-end training of image compression and semantic segmentation modules. Our image compression network is based on a transformer neural network, using the prompt tuning method with additional parameters to adjust the compression ratio. For the semantic segmentation module, which takes compressed image features as input, we modify the Mask2Former architecture to reduce the process of reconstruct image from compressed features for segmentation. Experimental results show that our proposed method can reduce the performance loss in semantic segmentation tasks caused by image compression.
1. G. K. Wallace, ‘The JPEG still picture compression standard’, IEEE transactions on consumer electronics, vol. 38, no. 1, pp. xviii–xxxiv, 1992.
2. D. T. Lee, ‘JPEG 2000: Retrospective and new developments’, Proceedings of the IEEE, vol. 93, no. 1, pp. 32–41, 2005.
3. L. D. Chamain, F. Racapé, J. Bégaint, A. Pushparaja, and S. Feltman, ‘End-to-end optimized image compression for multiple machine tasks’, arXiv preprint arXiv:2103. 04178, 2021.
4. Y.-H. Chen, Y.-C. Weng, C.-H. Kao, C. Chien, W.-C. Chiu, and W.-H. Peng, ‘Transtic: Transferring transformer-based image compression from human perception to machine perception’, in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 23297–23307.
5. H. Choi and I. V. Bajić, ‘Scalable image coding for humans and machines’, IEEE Transactions on Image Processing, vol. 31, pp. 2739–2754, 2022.
6. J. Liu, H. Sun, and J. Katto, ‘Learning in compressed domain for faster machine vision tasks’, in 2021 International Conference on Visual Communications and Image Processing (VCIP), 2021, pp. 01–05.
7. J. Liu, H. Sun, and J. Katto, ‘Improving multiple machine vision tasks in the compressed domain’, in 2022 26th International Conference on Pattern Recognition (ICPR), 2022, pp. 331–337.
8. M. Jia et al., ‘Visual prompt tuning’, in European Conference on Computer Vision, 2022, pp. 709–727.
9. J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, ‘Variational image compression with a scale hyperprior’, arXiv preprint arXiv:1802. 01436, 2018.
10. I. H. Witten, R. M. Neal, and J. G. Cleary, ‘Arithmetic coding for data compression’, Communications of the ACM, vol. 30, no. 6, pp. 520–540, 1987.
11. M. Lu, P. Guo, H. Shi, C. Cao, and Z. Ma, ‘Transformer-based image compression’, arXiv preprint arXiv:2111. 06707, 2021.
12. J. Liu, H. Sun, and J. Katto, ‘Learned image compression with mixed transformer-cnn architectures’, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14388–14397.
13. D. He, Z. Yang, W. Peng, R. Ma, H. Qin, and Y. Wang, ‘Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding’, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5718–5727.
14. D. Minnen and S. Singh, ‘Channel-wise autoregressive entropy models for learned image compression’, in 2020 IEEE International Conference on Image Processing (ICIP), 2020, pp. 3339–3343.
15. D. He, Y. Zheng, B. Sun, Y. Wang, and H. Qin, ‘Checkerboard context model for efficient learned image compression’, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14771–14780.
16. Z. Liu et al., ‘Swin transformer: Hierarchical vision transformer using shifted windows’, in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10012–10022.
17. Z. Cui, J. Wang, S. Gao, T. Guo, Y. Feng, and B. Bai, ‘Asymmetric gained deep image compression with continuous rate adaptation’, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10532–10541.
18. M. Song, J. Choi, and B. Han, ‘Variable-rate deep image compression through spatially-adaptive feature transform’, in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 2380–2389.
19. J. Liu, H. Sun, and J. Katto, ‘Semantic segmentation in learned compressed domain’, in 2022 Picture Coding Symposium (PCS), 2022, pp. 181–185.
20. J. He, C. Zhou, X. Ma, T. Berg-Kirkpatrick, and G. Neubig, ‘Towards a unified view of parameter-efficient transfer learning’, arXiv preprint arXiv:2110. 04366, 2021.
21. N. Houlsby et al., ‘Parameter-efficient transfer learning for NLP’, in International conference on machine learning, 2019, pp. 2790–2799.
22. C.-H. Kao, Y.-C. Weng, Y.-H. Chen, W.-C. Chiu, and W.-H. Peng, ‘Transformer-based variable-rate image compression with region-of-interest control’, in 2023 IEEE International Conference on Image Processing (ICIP), 2023, pp. 2960–2964.
23. B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, ‘Masked-attention mask transformer for universal image segmentation’, in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 1290–1299.
24. K. Perlin, ‘Improving noise’, in Proceedings of the 29th annual conference on Computer graphics and interactive techniques, 2002, pp. 681–682.
25. M. Cordts et al., ‘The cityscapes dataset for semantic urban scene understanding’, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213–3223.
26. J. Liu, G. Lu, Z. Hu, and D. Xu, ‘A unified end-to-end framework for efficient deep image compression’, arXiv preprint arXiv:2002. 03370, 2020.
27. T.-Y. Lin et al., ‘Microsoft coco: Common objects in context’, in Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, 2014, pp. 740–755.
28. I. Loshchilov and F. Hutter, ‘Decoupled weight decay regularization’, arXiv preprint arXiv:1711. 05101, 2017.
29. “True Color Kodak Images,” r0k.us. http://r0k.us/graphics/kodak/.
30. D. Shanmugam, D. Blalock, G. Balakrishnan, and J. Guttag, ‘Better aggregation in test-time augmentation’, in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 1214–1223.