簡易檢索 / 詳目顯示

研究生: 李於庭
Lee, Yu-Ting
論文名稱: 改良型正規化與注意力之對稱性影像壓縮網路
A Symmetric Image Compression Network with Improved Normalization Attention Mechanism
指導教授: 戴顯權
Tai, Shen-Chuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 67
中文關鍵詞: 影像壓縮正規化注意力機制
外文關鍵詞: image compression, normalization, attention mechanism
相關次數: 點閱:116下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 影像壓縮在各種應用中起著至關重要的作用,如數字圖像的存儲、傳輸和共享。本論文提出了一種新的對稱影像壓縮方法,利用改進的正規化和注意力機制來實現卓越的壓縮性能。
    這項研究的主要重點是解決正規化在有效處理不同影像特徵方面的局限性。為了克服這一挑戰,提出了一個自適應正規化模塊,它根據輸入影像的內容動態地調整正規化參數。這種自適應方法確保了最佳的數據表示,並有助於提高壓縮效率。此外,一個先進的注意力機制被整合到壓縮網絡中,在抑制噪聲和冗餘的同時有選擇地關注重要的影像區域。通過有效地捕捉和保留重要的視覺特徵,擬議的注意力機制提高了整體壓縮質量和重建的保真度。
    為了評估擬議方法的性能,在基準數據集上進行了廣泛的實驗。與最先進的方法進行的比較分析表明,所提出的對稱影像壓縮網絡在壓縮率、重建質量和主觀視覺感受方面,相較於其他方法具有優勢。

    Image compression plays a vital role in various applications, such as storage, transmission, and sharing of digital images. This thesis presents a novel approach to symmetric image compression that leverages improved normalization and attention mechanisms to achieve superior compression performance.
    The primary focus of this research is to address the limitations of conventional normalization techniques in handling diverse image characteristics effectively. To overcome this challenge, an adaptive normalization module is proposed, which dynamically adjusts normalization parameters based on the content of the input image. This adaptive approach ensures optimal data representation and contributes to improved compression efficiency.
    Furthermore, an advanced attention mechanism is integrated into the compression network to selectively focus on significant image regions while suppressing noise and redundancies. By effectively capturing and preserving important visual features, the proposed attention mechanism enhances the overall compression quality and reconstruction fidelity.
    To evaluate the performance of the proposed approach, extensive experiments are conducted on benchmark datasets. Comparative analysis with state-of-the-art methods demonstrates the superiority of the proposed symmetric image compression network in terms of compression ratio, reconstruction quality, and subjective visual perception. Moreover, the effects of different normalization and attention configurations on compression performance are thoroughly investigated and analyzed.
    The results of the experiments validate the effectiveness of the proposed symmetric image compression network with improved normalization and improved attention. The integration of adaptive normalization and advanced attention mechanisms not only enhances compression efficiency but also enables adaptability to diverse image characteristics.
    This research contributes to the field of image compression by presenting a comprehensive framework that combines advanced normalization and attention techniques within a symmetric network architecture. The proposed approach offers promising advancements in efficient and high-quality image compression, paving the way for further research and development in the field.

    Contents Contents. v List of Tables. vii List of Figure viii Chapter 1 Introduction . 1 1.1 Overview. 1 2.1 Learned Image Compression 4 2.1.1 Variational Autoencoder (VAE) 4 2.1.2 Uniform Noise 7 2.1.3 Hyperprior. 10 2.2 Attention Mechanism 12 2.3 GDN (Generalized Divisive Normalization). 15 2.4 Evaluation Settings 17 2.4.1 PSNR. 17 2.4.2 MS-SSIM 18 2.4.3 BD-rate and BD-PSNR. 19 Chapter 3 The Proposed Algorithm . 21 3.1 Problem Definition 21 3.2 Proposed Network Architecture . 23 3.3 Improved Normalization Module . 25 3.4 Windowed Attention 30 3.5 Loss Functions. 33 Chapter 4 Experiment Results 35 4.1 Experimental Dataset 35 4.1.1 Flickr Dataset 35 4.1.2 DIV2K Dataset 36 4.1.3 ImageNet Dataset 37 4.1.4 Kodak Dataset . 37 4.1.5 CLIC Dataset. 38 4.2 Parameter and Experimental Setting. 39 4.3 Experimental Results 39 4.4 Ablation Experimental Result. 61 Chapter 5 Conclusion and Future Work 62 5.1 Conclusion . 62 5.2 Future Work 62 References 64

    [1] G. K. Wallace, "The JPEG still picture compression standard," IEEE Transactions on Consumer Electronics, vol. 38, no. 1, pp. xviii-xxxiv, 1992, doi: 10.1109/30.125072.
    [2] D. T. Lee, "JPEG 2000: Retrospective and New Developments," Proceedings of the IEEE, vol. 93, no. 1, pp. 32-41, 2005, doi: 10.1109/JPROC.2004.839613.
    [3] G. J. Sullivan, J. R. Ohm, W. J. Han, and T. Wiegand, "Overview of the High Efficiency Video Coding (HEVC) Standard," IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649-1668, 2012, doi: 10.1109/TCSVT.2012.2221191.
    [4] B. Bross et al., "Overview of the Versatile Video Coding (VVC) Standard and its Applications," IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736-3764, 2021, doi: 10.1109/TCSVT.2021.3101953.
    [5] N. Ahmed, T. Natarajan, and K. R. Rao, "Discrete Cosine Transform," IEEE Transactions on Computers, vol. C-23, no. 1, pp. 90-93, 1974, doi: 10.1109/T-C.1974.223784.
    [6] W. Bing-Fei and L. Chung-Fu, "A high-performance and memory-efficient pipeline architecture for the 5/3 and 9/7 discrete wavelet transform of JPEG2000 codec," IEEE Transactions on Circuits and Systems for Video Technology, vol. 15, no. 12, pp. 1615-1628, 2005, doi: 10.1109/TCSVT.2005.858610.
    [7] E. Agustsson, M. Tschannen, F. Mentzer, R. Timofte, and L. Van Gool, "Generative Adversarial Networks for Extreme Learned Image Compression," p. arXiv:1804.02958doi: 10.48550/arXiv.1804.02958.
    [8] D. P. Kingma and M. Welling, "An Introduction to Variational Autoencoders," p. arXiv:1906.02691doi: 10.48550/arXiv.1906.02691.
    [9] P. Harremoës, "Rate Distortion Theory for Descriptive Statistics," Entropy, vol. 25, no. 3, p. 456, 2023. [Online]. Available: https://www.mdpi.com/1099-4300/25/3/456.
    [10] D. P. Kingma and M. Welling, "Auto-Encoding Variational Bayes," p. arXiv:1312.6114doi: 10.48550/arXiv.1312.6114.
    [11] D. Mishra, S. K. Singh, and R. K. Singh, "Wavelet-Based Deep Auto Encoder-Decoder (WDAED)-Based Image Compression," IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 4, pp. 1452-1462, 2021, doi: 10.1109/TCSVT.2020.3010627.
    [12] C. Baskin et al., "Uniq: Uniform noise injection for non-uniform quantization of neural networks," ACM Transactions on Computer Systems (TOCS), vol. 37, no. 1-4, pp. 1-15, 2021.
    [13] J. Ballé, V. Laparra, and E. P. Simoncelli, "End-to-end Optimized Image Compression," p. arXiv:1611.01704doi: 10.48550/arXiv.1611.01704.
    [14] J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, "Variational image compression with a scale hyperprior," p. arXiv:1802.01436doi: 10.48550/arXiv.1802.01436.
    [15] A. Vaswani et al., "Attention Is All You Need," p. arXiv:1706.03762doi: 10.48550/arXiv.1706.03762.
    [16] M. Li, W. Zuo, S. Gu, J. You, and D. Zhang, "Learning Content-Weighted Deep Image Compression," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, pp. 3446-3461, 2021, doi: 10.1109/TPAMI.2020.2983926.
    [17] Z. Zhong, H. Akutsu, and K. Aizawa, "Channel-Level Variable Quantization Network for Deep Image Compression," p. arXiv:2007.12619doi: 10.48550/arXiv.2007.12619.
    [18] X. Wang, R. Girshick, A. Gupta, and K. He, "Non-local Neural Networks," p. arXiv:1711.07971doi: 10.48550/arXiv.1711.07971.
    [19] J. Ballé, V. Laparra, and E. P. Simoncelli, "Density Modeling of Images using a Generalized Normalization Transformation," p. arXiv:1511.06281doi: 10.48550/arXiv.1511.06281.
    [20] D. Minnen and S. Singh, "Channel-wise Autoregressive Entropy Models for Learned Image Compression," p. arXiv:2007.08739doi: 10.48550/arXiv.2007.08739.
    [21] P. Ramachandran, B. Zoph, and Q. V. Le, "Swish: a Self-Gated Activation Function," arXiv: Neural and Evolutionary Computing, 2017.
    [22] P. Ramachandran, B. Zoph, and Q. V. Le, "Searching for Activation Functions," p. arXiv:1710.05941doi: 10.48550/arXiv.1710.05941.
    [23] Y. Qian, M. Lin, X. Sun, Z. Tan, and R. Jin, "Entroformer: A Transformer-based Entropy Model for Learned Image Compression," p. arXiv:2202.05492doi: 10.48550/arXiv.2202.05492.
    [24] Z. Liu et al., "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows," p. arXiv:2103.14030doi: 10.48550/arXiv.2103.14030.
    [25] T. Rattenbury, N. Good, and M. Naaman, "Towards automatic extraction of event and place semantics from flickr tags," in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007, pp. 103-110.
    [26] E. Agustsson and R. Timofte, "NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study," in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 21-26 July 2017 2017, pp. 1122-1131, doi: 10.1109/CVPRW.2017.150.
    [27] J. Deng, W. Dong, R. Socher, L. J. Li, L. Kai, and F.-F. Li, "ImageNet: A large-scale hierarchical image database," in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 20-25 June 2009 2009, pp. 248-255, doi: 10.1109/CVPR.2009.5206848.
    [28] E. Kodak, "Kodak lossless true color image suite (PhotoCD PCD0992)," URL http://r0k. us/graphics/kodak, vol. 6, 1993.
    [29] G. Toderici et al., "Workshop and challenge on learned image compression (clic2020)," in CVPR, 2020.
    [30] J. Bégaint, F. Racapé, S. Feltman, and A. Pushparaja, "CompressAI: a PyTorch library and evaluation platform for end-to-end compression research," p. arXiv:2011.03029doi: 10.48550/arXiv.2011.03029.
    [31] D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," p. arXiv:1412.6980doi: 10.48550/arXiv.1412.6980.
    [32] T. Chen, H. Liu, Z. Ma, Q. Shen, X. Cao, and Y. Wang, "End-to-End Learnt Image Compression via Non-Local Attention Optimization and Improved Context Modeling," IEEE Transactions on Image Processing, vol. 30, pp. 3179-3191, January 01, 2021 2021, doi: 10.1109/tip.2021.3058615.
    [33] D. Minnen, J. Ballé, and G. Toderici, "Joint Autoregressive and Hierarchical Priors for Learned Image Compression," p. arXiv:1809.02736doi: 10.48550/arXiv.1809.02736.
    [34] Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, "Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules," p. arXiv:2001.01568doi: 10.48550/arXiv.2001.01568.
    [35] Y. Xie, K. L. Cheng, and Q. Chen, "Enhanced Invertible Encoding for Learned Image Compression," presented at the Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event, China, 2021. [Online]. Available: https://doi.org/10.1145/3474085.3475213.
    [36] J. Guo, D. Xu, and G. Lu, "CBANet: Toward Complexity and Bitrate Adaptive Deep Image Compression Using a Single Network," IEEE Transactions on Image Processing, vol. 32, pp. 2049-2062, 2023, doi: 10.1109/TIP.2023.3251020.
    [37] Z. Tang, H. Wang, X. Yi, Y. Zhang, S. Kwong, and C. C. J. Kuo, "Joint Graph Attention and Asymmetric Convolutional Neural Network for Deep Image Compression," IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 1, pp. 421-433, 2023, doi: 10.1109/TCSVT.2022.3199472.
    [38] H. Fraunhofer, "VVC official test model VTM," URL: https://vcgit. hhi. fraunhofer. de/jvet/VVCSoftware_VTM/tree/VTM-5.2, 2019.
    [39] M. Li, K. Ma, J. You, D. Zhang, and W. Zuo, "Efficient and Effective Context-Based Convolutional Entropy Modeling for Image Compression," IEEE Transactions on Image Processing, vol. 29, pp. 5900-5911, 2020, doi: 10.1109/TIP.2020.2985225.
    [40] J. Lee, S. Cho, and S.-K. Beack, "Context-adaptive Entropy Model for End-to-end Optimized Image Compression," p. arXiv:1809.10452doi: 10.48550/arXiv.1809.10452.
    [41] Y. Hu, W. Yang, Z. Ma, and J. Liu, "Learning End-to-End Lossy Image Compression: A Benchmark," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 8, pp. 4194-4211, 2022, doi: 10.1109/TPAMI.2021.3065339.
    [42] C. Jia, Z. Ge, S. Wang, S. Ma, and W. Gao, "Rate Distortion Characteristic Modeling for Neural Image Compression," in 2022 Data Compression Conference (DCC), 22-25 March 2022 2022, pp. 202-211, doi: 10.1109/DCC52660.2022.00028.

    下載圖示 校內:2024-08-07公開
    校外:2024-08-07公開
    QR CODE