研究生: |
王景新 Wang, Jing-Xin |
---|---|
論文名稱: |
平行化H.264/AVC編碼器於分散式共用記憶體系統 Parallel H.264/AVC Rate-Distortion Optimization Baseline Profile Encoder on Distributed Shared Memory System |
指導教授: |
蘇文鈺
Su, W. Y. Alvin |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 英文 |
論文頁數: | 83 |
中文關鍵詞: | H.264/AVC編碼器 、位元率失真度最佳化 、共用記憶體系統 、平行切片演算法 |
外文關鍵詞: | H.264/AVC encoder, rate-distortion optimization, distributed shared memory system, parallel slice scheme |
相關次數: | 點閱:125 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
H.264/AVC壓縮標準提供了相當多的新的壓縮元件來提升影片壓縮品質,但是這些元件也造成了計算量的提升,其中支援位元率失真度最佳化的H.264/AVC編碼器更需要大量的計算來得到更多的壓縮品質,為了提升壓縮速度,平行化H.264/AVC編碼器為其中一種選擇。然而H.264/AVC編碼器需要相當多的參考資料於相鄰的區塊(macroblock),因此H.264/AVC編碼器並不適合於平行化的實作,尤其是應用於分散式共用記憶體系統。在叢集式電腦系統中,分散式共用記憶體系統提供了虛擬共用記憶體的機制,讓程式設計者能夠輕易的撰寫平行化程式,但是分散式共用記憶體系統的共用記憶體大小及共用資料傳輸次數深深影響著平行化的效能,為了提升平行化的效能,我們提出了一個平行化H.264/AVC編碼器的架構來減少參考資料傳輸的次數,然後於這個平行化架構下提出三種平行H.264/AVC編碼器的演算法,分別為平行切片演算法(parallel slice scheme,PSS)、平行多張參考影像演算法(Parallel Multiple Reference Frames Scheme,PMRFS)及平行塊模式演算法(Parallel Block Mode Scheme,PBM),將這三種演算實作於分散式共用記憶體系統,平行切片演算法(parallel slice scheme)為三種演算法中速度提升效能最高的演算法,然而這個平行化H.264/AVC編碼器架構及平行切片演算法會使得影片壓縮率下降,接著依據平行切片演算法特性提出改良的平行化H.264/AVC編碼器架構及改良的平行切片演算法(PSS_M)來提升影片壓縮率,再將改良的平行切片演算法實作於分散式共用記憶體系統。本論文實驗的分散式共用記憶體系統使用了5台電腦,每台電腦有兩個雙核心處理器,在影片壓縮率部分,使用改良的平行切片演算法的影片壓縮壓縮率於移動較少的影像影響相當微小,例如:Akiyo;在速度提升的部分,最大壓縮速度提升的比率為4.22在n=5/p=1(由5台電腦執行,每台電腦只使用一顆處理器),最後平行切片演算法結合wavefront order scheme (PSS_MW)也被實作於分散式共用記憶體系統,而使用的電腦數量為5與每台電腦使用4個處理器(n=5/p=4),PSS_MW的壓縮速度還可以再提升2.61倍。由於論文中提出了三種演算法速度提升及影像品質的結果,雖然改良的平行切片演算法提供了較佳的效能,但是這三種演算法之間為獨立演算法,如有更多台電腦的平行計算平台時,三種方法混合使用可能可以達到更佳的效果,此論文可以作為三種方法混和的參考。
H.264/AVC video coding standard incorporates many coding tools into its design to improve its compression performance. In a H.264/AVC rate-distortion optimization (RDO) encoder, computation time is primarily spent on calculating the rate-distortion cost (RD) of choosing the best coding mode. Parallel computation is one of the methods to speed up the encoder. However, calculating the rate-distortion cost requires lots of reference data obtained from coded adjacent macroblocks. This is not a good property for any parallel computing strategy, especially for distributed shared memory (DSM) system. In a cluster computing system, DSM provides the virtual shared memory scheme to write the parallel program more easily. But the amount of transferring data and the frequency of transferring data on each computer affect the speedup. To gain more speedup, this thesis proposes a parallel H.264/AVC RDO encoder architecture to reduce the frequency of transferring reference data. Based on this architecture, three parallel computing schemes, including Parallel Slice Scheme (PSS), Parallel Multiple Reference Frames Scheme (PMRFS) and Parallel Block Mode Scheme (PBM) are proposed. Parallel slice scheme (PSS) outperforms other two schemes on a DSM system. However, the video quality would be decreased in our proposed parallel architecture with PSS. To improve more video quality, this thesis also proposes the modified parallel architecture and the modified PSS (PSS_M) based on PSS. PSS_M is run over a DSM system consisting of 5 PC computers (one master node with four slave processing nodes). Each computer has two dual-core processors. The difference in PSNR curve between PSS_M and H.264/AVC RDO encoder without parallelism is slight in slow motion sequence such as Akiyo. The maximum speedup of PSS_M is 4.22 in n=5/p=1 (five computers are used and each computer only uses one core). In addition, PSS_M combined with wavefront order scheme (PSS_MW) in n=5/p=4 had executed in this thesis. The maximum improvement in speedup in p=4 is 2.61. The video quality and speedup of our proposed three schemes are shown in this thesis. Although PSS_M obtains more coding efficiency than the other method, it is possible to combine three schemes to get more video quality and speedup when more number of the computers is used. This thesis provides a good reference for implementing the combined scheme.
[1] "Draft ITU-T recommendation and final draft international standard of joint video specification (ITU-T Rec. H.264/ISO/IEC 14 496-10 AVC," Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, JVTG050, May, 2003.
[2] T. Wiegand, G. J. Sullivan, G. Bjntegaard, and A. Luthra, "Overview of the H.264/AVC video coding standard," IEEE Transactions on Circuits and Systems for Video Technology , vol. 13, no. 7, pp. 560-576, 2003.
[3] T. Wiegand, M. Lightstone, D. Mukherjee, T. G. Campbell, and S. K. Mitra, "Rate-distortion optimized mode selection for very low bit rate video coding and the emerging H.263 standard," IEEE Transactions on Circuits and Systems for Video Technology, vol. 6, pp. 182-190, 1996.
[4] B. Meng, and O. C. Au, “Fast Intra-Prediction Mode Selection for 4x4 Blocks in H.264,” in Proc. ICASSP 2003, vol. 3, pp. 389-392, Hong-Kong, April 2003.
[5] B.Meng, and O. C. Au, C. W. Wong, and H. K. Lam, “Efficient Intra-Prediction Mode Selection for 4x4 Blocks in H.264,” in Proc. IEEE ICME 2003, vol. 3, pp. 521-524, Baltimore, Maryland , July 2003.
[6] B.Meng, and O. C. Au, C. W. Wong, and H. K. Lam, “Efficient Intra-Prediction Algorithm in H.264,” in Proc. IEEE ICIP 2003, pp. 837-840, Barcelona, Spain, September 2003.
[7] C. Y. Chang, C. H. Pan, and H. Chen, “Fast Mode Decision for P-Frames in H.264,” Picture Coding Symposium, San Francisco, December 2004.
[8] C. S. Kim, H. H. Shih, and C. C. J. Kuo, “Feature-Based Intra-Prediction Mode Decision for H.264,” IEEE ICIP 2004, vol. 2, pp.769-772, Singapore, October 2004.
[9] D. Wu, S. Wu, K. P. Lim, F. Pan, Z. G. Li, and X. Lin, “Block Inter Mode Decision for Fast Encoding of H.264,” in Proc. IEEE ICASSP 2004, vol.3, pp. 181-184, Montreal, Quebec, Canada.
[10] C. S. Kim, Q. Li, and C. C. J. Kuo, “Fast Intra/Inter Mode Decision for H.264 Encoding Using A Risk-Minimization Criterion,” Special Session on Video Coding, SPIE 49th Annual Meeting 2004, Denver, Colorado, Aug. 2-6, 2004.
[11] F. Pan, X. Lin, S. Rahardja, K. P. Lim, Z. G. Li, D. Wu, and S. Wu, “Fast Mode Decision Algorithm for Intra prediction in H.264/AVC Video Coding,” IEEE Trans. on Circuits and System for Video Technology, vol. 15, no. 7, pp. 813-822, July 2005.
[12] Y. K. Tu, J. F. Yang, and M. T. Sun, “Efficient Rate-Distortion Estimation for H.264/AVC Coders,” IEEE Trans. on Circuits and System for Video Technology, vol. 16, no. 5, pp. 600-611, May 2006.
[13] Texas instruments high performance DSPs. Available: http://dspvillage.ti.co-m/docs/allproducttree.jhtml.
[14] A. Ishfaq, M. A. Shahriar, L. L. Ming, and K. Muhammad, "A scalable off-line MPEG-2 video encoding scheme using a multiprocessor system," Parallel Computing, vol. 27, pp. 823-846, 2001.
[15] Y. K. Chen, X. Tian, S. Ge, and M. Girkar, "Towards efficient multi-level threading of H.264 encoder on Intel hyper-threading architectures," International conference on Parallel and Distributed Processing Symposium, Santa Fe, New Mexico, April 26-30, 2004.
[16] T. R. Jacobs, V. A. Chouliaras, and D. J. Mulvaney, "Thread-parallel MPEG-2, MPEG-4 and H.264 video encoders for SoC multi-processor architectures," IEEE Transactions on Consumer Electronics, vol. 52, pp. 269-275, 2006.
[17] Y. K. Chen, E. Q. Li, X. Zhou, and S. Ge, "Implementation of H.264 encoder and decoder on personal computers," Journal of Visual Communication and Image Representation, vol. 17, No. 2, pp. 509-532, 2006.
[18] "Message Passing Interface Forum. MPI: A message-passing interface standard," International Journal of Supercomputer Applications, vol. 8(3/4), pp165–414, 1994.
[19] "Message Passing Interface Forum. MPI2: A message passing interface standard," International Journal of High Performance Computing Applications, vol. 12(1–2), pp. 1–299, 1998.
[20] J. Protic, M. Tomasevic, and V. Milutinovic, "Distributed shared memory: concepts and systems," IEEE Parallel & Distributed Technology: Systems & Applications, vol. 4, pp. 63-71, 1996.
[21] T. Olivares, P. Cuenca, F. J. Quiles, and A. Garrido, "Parallelization of the MPEG coding algorithm over a multicomputer. A proposal to evaluate its interconnection network," IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, vol. 1, pp. 113-116, New York, USA, 1997.
[22] A. Rodriguez, A. Gonzalez, and M. P. Malumbres, "Performance Evaluation of Parallel MPEG-4 Video Coding Algorithms on Clusters of Workstations," IEEE Intertional Conference on Parallel Computing in Electrical Engineering, pp. 354- 357, Dresden, 2004.
[23] H. Yong, I. Ahmad, and M. L. Liou, "A software-based MPEG-4 video encoder using parallel processing," IEEE Transactions on Circuits and Systems for Video Technology, vol. 8, pp. 909-920, 1998.
[24] Y. C. Chiu, C. K. Shieh, J. X. Wang, A. W. Y. Su, and T. Y. Liang, "A Real Time MPEG-4 Parallel Encoder on Software Distributed Shared Memory Systems," Parallel and distributed processing and applications, proceedings lecture notes in computer science, pp. 965-974, 2004.
[25] H. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, “Low-Complexity transform and quantization in H.264/AVC,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, pp. 598–603, 2003.
[26] I. E. G. Richardson, "H.264 and MPEG-4 Video Compression," John Wiley & Sons Ltd, 2003.
[27] T. Wedi and H.G. Musmann, “Motion- and aliasing-compensated prediction for hybrid video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, pp. 577–587, July 2003.
[28] P. List, A. Joch, J. Lainema, G. Bjøntegaard, and M. Karczewicz, “Adaptive deblocking filter,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, pp. 614–619, July 2003.
[29] J. B. Carter, J. K. Bennett and W. Zwaenepoel, "Techniques for reducing consistency-related communication in distributed shared-memory systems, " ACM Transactions on Computer Systems, vol 13, pp 205-243, 1995.
[29] A. Rodriguez, A. Gonzalez, and M. P. Malumbres, "Hierarchical parallelization of an h.264/avc video encoder," International Symposium on Parallel Computing in Electrical Engineering, pp. 363–368, 2006.
[30] G. J. Sullivan and T. Wiegand, "Rate-distortion optimization for video compression," IEEE Signal Processing Magazine, vol. 15, pp. 74-90, 1998.
[31] T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, and G. J. Sullivan, "Rate-constrained coder control and comparison of video coding standards," IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, pp. 688-703, 2003.
[32] Y. W. Huang, B. Y. Hsieh, S. Y. Chien, S. Y. Ma, and L. G. Chen, "Analysis and Complexity Reduction of Multiple Reference Frames Motion Estimation in H.264/AVC," IEEE Trans. Circuits Syst. Video Technol., vol. 16, pp. 507–522, April 2006.
[33] Joint Video Team (JVT) Reference Software [Online]. Available: http://bs.hhi.de/~suehring/tml/download/.
[34] Intel’ Single-chip Cloud Computer (SCC). Available: http://techresearch.intel.com/articles/Tera-Scale/1826.htm.
[35] TILERA TILE64. Available: http://www.tilera.com/products/processors/TILE64.