| 研究生: |
吳尚育 Wu, Shang-Yu |
|---|---|
| 論文名稱: |
一個基於VAE的高壓縮性能無損影像壓縮架構 A High Compression Performance VAE-Based Framework for Lossless Image Compression |
| 指導教授: |
戴顯權
Tai, Shen-Chuan |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2026 |
| 畢業學年度: | 114 |
| 語文別: | 英文 |
| 論文頁數: | 72 |
| 中文關鍵詞: | 無損影像壓縮 、變分自動編碼器 、深度學習 |
| 外文關鍵詞: | Lossless Image Compression, Variational Autoencoder, Deep Learning |
| 相關次數: | 點閱:9 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
無損影像壓縮在醫療影像、衛星遙測、工業檢測與數位典藏等應用領域中具有關鍵地位,這類應用對影像內容的完整性與可逆性有嚴格要求,任何資訊遺失皆可能影響後續分析與判讀結果。因此,在完全保留影像細節的前提下,如何進一步提升壓縮效率,始終是無損影像壓縮研究中的重要課題。然而,自然影像具有高度複雜的結構與長距離相依性,傳統方法多仰賴手工設計之機率模型,難以充分刻畫真實資料分布;另一方面,深度學習技術雖已在有損壓縮領域取得顯著進展,但在無損情境中同時兼顧精準分布建模與可行解碼策略仍具挑戰性。
為解決上述問題,本研究結合超先驗模型與自回歸式殘差建模機制,提出一套以變分自編碼器為基礎的高效能無損影像壓縮架構。系統透過變分架構與超先驗模型估計主潛在變數之逐像素機率分布,並引入具遮罩機制之自回歸殘差模型,融合局部上下文與全域先驗,以離散邏輯混合模型進行殘差分布建模,兼顧解碼可行性與分布擬合能力。
實驗結果顯示,所提出的方法在無損影像壓縮任務中具備良好的有效性與競爭力,並於多組標準資料集上達成優於現有深度學習方法與傳統編碼器之壓縮效能。
Lossless image compression plays a crucial role in applications such as medical imaging, remote sensing, industrial inspection, and digital archiving, where every pixel value must be preserved without distortion. However, achieving high compression ratios remains challenging, as natural images exhibit complex structures and long-range dependencies that traditional hand-crafted probability models struggle to accurately capture. Although deep learning–based methods have made significant progress in lossy compression, constructing precise probabilistic models and feasible decoding strategies for lossless scenarios is still a demanding task.
This work proposes a high-performance VAE-based framework for lossless image compression, integrating both hyperprior modeling and autoregressive residual learning. The system first leverages a variational architecture with a hyperprior to estimate pixel-wise probability distributions of the main latent representations, providing a strong foundation for accurate reconstruction. A masked autoregressive residual model is then introduced to refine pixel-level probability estimates by combining local contextual cues with global priors, and a logistic mixture model is employed to achieve fine-grained density approximation. This design effectively captures the complex statistical behavior of residual components while ensuring practical decodability.
Experimental results demonstrate that the proposed method achieves superior compression ratios across multiple benchmark datasets, outperforming both traditional codecs and recent deep learning–based approaches. These results highlight the effectiveness of the proposed VAE-based framework in advancing the state of the art in lossless image compression.
[1] Z. Zhang, Z. Chen, and S. Liu, “Fitted neural lossless image compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 23249–23258, 2025.
[2] J. Ball´ e, V. Laparra, and E. P. Simoncelli, “Density modeling of images using a generalized normalization transformation,” in Proceedings of the International Conference on Learning Representations, 2016.
[3] C. E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948.
[4] A. Skodras, C. Christopoulos, and T. Ebrahimi, “The JPEG 2000 still image compression standard,” IEEE Signal Processing Magazine, vol. 18, no. 5, pp. 36–58, 2001.
[5] Google, “WebP image format,” https://developers.google.com/speed/webp/, 2010.
[6] T. Boutell, “PNG (Portable Network Graphics) specification version 1.0,” RFC 2083, 1997.
[7] M. J. Weinberger, G. Seroussi, and G. Sapiro, “TheLOCO-I lossless image compression algorithm: Principles and standardization into JPEG-LS,” IEEE Transactions on Image Processing, vol. 9, no. 8, pp. 1309–1324, 2000.
[8] Information technology– Lossless and near-lossless compression of continuous-tone still images: Baseline, ISO/IEC Std. 14495-1, 1999.
[9] F. Bellard, “BPG image format,” https://bellard.org/bpg/, 2014.
[10] J.-R. Ohm and G. J. Sullivan, “Versatile video coding–towards the next generation of video compression,” in Proceedings of the Picture Coding Symposium, 2018.
[11] E. Antenehaye and S. B. Dhok, “Review of proposed high efficiency video coding (HEVC)standard,” International Journal of Computer Applications, vol. 59, no. 15, pp. 1–9, 2012.
[12] J. Sneyers and P. Wuille, “FLIF: Free lossless image format based on MANIAC compression,” in Proceedings of the IEEE International Conference on Image Processing, pp. 66–70, 2016.
[13] J. Alakuijala et al., “JPEG XL next-generation image compression architecture and coding tools,” in Applications of Digital Image Processing XLII, vol. 11137. SPIE, pp. 112–124, 2019.
[14] D. Marpe, H. Schwarz, and T. Wiegand, “Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 620–636, 2003.
[15] I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic coding for data compression,” Communications of the ACM, vol. 30, no. 6, pp. 520–540, 1987.
[16] J. Duda, “Asymmetric numeral systems,” arXiv:0902.0271, 2009.
[17] A. van den Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent neural networks,” in Proceedings of the International Conference on Machine Learning, pp. 1747–1756, 2016.
[18] A. van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, and A. Graves, “Conditional image generation with PixelCNN decoders,” in Advances in Neural Information Processing Systems, pp. 4790–4798, 2016.
[19] T. Salimans, A. Karpathy, X. Chen, and D. P. Kingma, “PixelCNN++: Improving the PixelCNN with discretized logistic mixture likelihood,” in Proceedings of the International Conference on Learning Representations, 2017.
[20] S. Reed et al., “Parallel multiscale autoregressive density estimation,” in Proceedings of the International Conference on Machine Learning, pp. 2912–2921, 2017.
[21] I. Kobyzev, S. J. D. Prince, and M. A. Brubaker, “Normalizing flows: An introduction and review of current methods,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 11, pp. 3964–3979, 2021.
[22] R. van den Berg, A. A. Gritsenko, M. Dehghani, C. K. Sønderby, and T. Salimans, “Idf: Analyzing and improving integer discrete flows for lossless compression,” in Proceedings of the International Conference on Learning Representations, 2021.
[23] S. Zhang, C. Zhang, N. Kang, and Z. Li,“iVPF:Numerical invertible volume preserving flow for efficient lossless compression,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 620–629, 2021.
[24] J. Ho, E. Lohn, and P. Abbeel, “Compression with flows via local bits-back coding,” in Advances in Neural Information Processing Systems, pp. 3879–3888, 2019.
[25] J. Townsend, T. Bird, J. Kunze, and D. Barber, “Hilloc: Lossless image compression with hierarchical latent variable models,” in Proceedings of the International Conference on Learning Representations, 2020.
[26] F. H. Kingma, P. Abbeel, and J. Ho, “Bit-swap: Recursive bits-back coding for lossless compression with hierarchical latent variables,” in Proceedings of the International Conference on Machine Learning, pp. 3408–3417, 2019.
[27] J. Townsend, T. Bird, and D. Barber, “Practical lossless compression with latent variables using bits-back coding,” in Proceedings of the International Conference on Learning Representations, 2019.
[28] F. Mentzer, E. Agustsson, M. Tschannen, R. Timofte, and L. Van Gool, “Practical full resolution learned lossless image compression,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10621–10630, 2019.
[29] F. Mentzer, L. Van Gool, and M. Tschannen, “Learning better lossless compression using lossy compression,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6637–6646, 2020.
[30] Y. Bai, X. Liu, K. Wang, X. Ji, X. Wu, and W. Gao, “Deep lossy plus residual coding for lossless and near-lossless image compression,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 5, pp. 3577–3594, 2024.
[31] Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7939–7948, 2020.
[32] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
[33] Y. Zhang, K. Li, K. Li, B. Zhong, and Y. Fu, “Residual non-local attention networks for image restoration,” in Proceedings of the International Conference on Learning Representations, 2019.
[34] J. Ball´ e, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” in Proceedings of the International Conference on Learning Representations, 2018.
[35] P. Chrabaszcz, I. Loshchilov, and F. Hutter, “A downsampled variant of imagenet as an alternative to the cifar datasets,” in ICLR Workshop, 2017.
[36] E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super-resolution: Dataset and study,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 126–135, 2017.
[37] G. Toderici, W. Shi, R. Timofte, L. Theis, J. Balle, E. Agustsson, N. Johnston, and F. Mentzer, “Workshop and challenge on learned image compression (clic 2020),” CVPR Challenge, 2020.
[38] M. Babaie, S. Kalra, A. Sriram, C. Mitcheltree, S. Zhu, A. Khatami, S. Rahnamayan, and H. R. Tizhoosh, “Classification and retrieval of digital pathology scans: A new dataset,” in CVPR Workshops, pp. 8–16, 2017.
[39] Z. Zhang, Z. Liu, T. Xie, C. Ma, X. Zhang, and D. Lin, “Learned lossless image compression based on bit plane slicing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10186–10196, 2024.