| 研究生: |
王泓民 Wang, Hung-Ming |
|---|---|
| 論文名稱: |
基於H.264/AVC核心之快速演算法與立體視訊合成之研究 Researches on Fast Algorithms and 3D Contents Generation Based on H.264/AVC Kernels |
| 指導教授: |
楊家輝
Yang, Jar-Ferr |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2009 |
| 畢業學年度: | 97 |
| 語文別: | 英文 |
| 論文頁數: | 115 |
| 中文關鍵詞: | 立體視訊合成 、H.264/AVC 、框內預測模式 、框外預測模式 |
| 外文關鍵詞: | stereo video generation, H.264/AVC, intra mode decision, inter mode decision |
| 相關次數: | 點閱:90 下載:5 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文主要針對H.264/AVC快速演算法與立體視訊合成兩個部分作研究。在H.264/AVC快速演算法部分,本論文首先對H.264/AVC的框內預測模式(Intra Mode Decision)提出一個快速演算法來計算「哈達瑪-絕對轉換差值和」(SATD),同時為了進一步提升編碼效率,也另外提出一個新的位元率-失真衡量標準(RD-cost function)來改善編碼效率,並且也對其中的「整數-絕對轉換差值和」(SAITD)提出快速演算法;除了框內模式的改良外,本論文也對框外預測模式(Inter Mode Decision)的搜尋提出了一個快速演算法,為了要從七種方塊模式(16×16, 16×8, 8×16, 8×8, 8×4, 4×8, 4×4)中快速的選擇出正確的方塊預測模式,我們首先針對巨方塊模式中的各種模式(16×16, 16×8, 8×16, 8×8)作搜尋,一開始先檢驗16x16方塊在空間域(Spatial Domain)與殘值域(Residual Domain)上的同質性(Homogeneity),一旦偵測出此方塊是同質的(Homogeneous),則提早終止方塊搜尋機制並直接選用16x16模式為最佳方塊預測模式,若此16x16方塊不滿足同質性條件(Homogeneous Condition)時,我們接著將執行8x8的動態估計(Motion Estimation),根據分析16x16方塊與8x8方塊的失真成本,我們便能進一步的決定是否要繼續檢驗8x16與16x8這兩種預測方塊模式或是省略不作以加速整個搜尋時間。至於對P 8x8方塊模式,則可利用類似巨方塊模式的搜尋方式來決定是否要繼續朝更小的方塊模式作搜尋。在整個搜尋過程中,一旦停止機制發生時,我們便會從已搜尋過的所有方塊模式中選擇失真成本最小的方塊模式作為最佳方塊預測模式。根據實驗結果,也證明了本論文所提出的方法能確實改善壓縮品質與壓縮速度。
在立體視訊合成的研究部分,本論文也發展出一個能將H.264/AVC編碼端所壓縮後的靜態影像資料轉換為立體視訊影像的方法,本方法首先利用解碼端所產生之移動資訊(Motion Information)挑選出適當的對應框架(Matching Frame),接著對挑選出的影像對作視差校正以合成出立體影片;為了進一步提升所產生立體視訊的品質,本論文另外針對如何挑選出適當的對應框架作研究,並提出一個可以快速估測出相機在不同拍攝時間點之間的基線距離(Baseline Distance)的方法,以挑選出符合人類雙眼視覺的對應框架;除此之外,我們也針對不匹配影像對提出了一套校正流程,以改善影像對之間水平與垂直視差(Horizontal/Vertical Disparity)不匹配的狀況。經實驗證明,利用本論文方法所產生的立體影像對與立體影片,能更符合人類的雙眼視覺,使人們有更佳的觀賞品質。
In this dissertation, we investigate and develop some useful algorithms for improving H.264/AVC video coding standard and simplifying stereoscopic contents generation. In H.264/AVC part, we first propose a fast algorithm which successfully utilizes the property of linear transform and the fixed spatial relationship of predicted pixels in intra modes to compute the sum of absolute Hadamard-transformed differences (SATD) in H.264/AVC intra 4x4 mode decision. To improve the coding efficiency, we further propose an enhanced rate-distortion cost function, which combines the sum of absolute integer-transformed differences (SAITD) and a rate predictor for H.264/AVC intra 4x4 mode decision. Similar to the fast SATD computation algorithms, we also develop a fast computation algorithm for SAITD to reduce the computation of the proposed cost function. Besides the improvement of intra prediction, we also propose a successive termination and elimination (STE) method to achieve fast inter mode decision. The termination detection starts from residual homogeneous detection and then performs spatial homogeneous detection for each 16×16 macroblock. For either residual or spatial homogeneous case, we can directly terminate the inter prediction and choose 16×16 mode as the best inter mode. For non-homogeneous cases, we then carry out the 8×8 subblock motion estimation. Based on the cost analysis of 8×8 and 16×16 modes, the elimination detection method, which could help to remove unlikely 8×16 and 16×8 modes, is also suggested. Similarly, the STE method for each 8×8 block can be also applied to decide if the inter prediction need to further perform smaller subblocks. Once the algorithm reaches the termination stage, the best inter mode will be decided by selecting the least cost among all searched modes. Experimental results reveal that the proposed above algorithms can improve the coding efficiency and coding time.
As for stereoscopic content generation, we first propose a stereoscopic video generation method for static monoscopic H.264/AVC compressed video, which effectively utilizes the existing motion information contained in compressed video to select the proper matching frame. Afterward, we perform some rectifications such as frame shifting, transformation, and reshaping for the selected frame pair to minimize the vertical parallax and generate the stereoscopic video. To improve the quality of the generated stereoscopic video, we further propose another matching frame selection method based on the developed baseline distance estimation. In addition, we also propose an effective automatic calibration procedure to adjust the mismatched image pairs to achieve better stereo visualization. The proposed calibration procedure contains six steps, including feature point extraction, bidirectional feature point matching, relative distance checking, image transformation, hole-filling, and reshaping. Experimental results reveal that the proposed methods can effectively generate the stereoscopic images and videos such that they can properly exhibit stereo scenes in stereoscopic LCD display systems.
[1] ITU-T Rec. H.264 / ISO/IEC 11496-10, “Advanced Video Coding,” Final Committee Draft, Document JVTG050, March 2003.
[2] Thomas Wiegand, Gary J. Sullivan, Gisle Bjontegaard, and Ajay Luthra, “Overview of the H.264/AVC Video Coding Standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560-576, July 2003.
[3] Feng Pan, Xiao Lin, Susanto Rahardja, Keng Pang Lim, Z. G. Li, Dajun Wu, and Si Wu, “Fast Mode Decision Algorithm for Intraprediction in H.264/AVC Video Coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 15, no. 7, pp. 813-822, July 2005.
[4] F. Pan, L. S. Rahardja, K. P. Lim, L. D. Wu, W. S. Wu, C. Zhu, W. Ye, Z. Liang, “Fast intra mode decision algorithm for H.264-AVC video coding,” Proceedings of International Conference on Image Processing, vol. 2, pp. 781-784, Oct. 2004.
[5] F. Fu, X. Lin, and L. Xu, "Fast intra prediction algorithm in H.264-AVC," Proceedings of 7th International Conference on Signal Processing, vol. 2, pp. 1191-1194, Aug. 2004.
[6] Jong-ho Kim and Jechang Jeong, "Fast Intra-mode Decision in H.264 Video Coding using Simple Directional Masks," Proceedings of VCIP2005 SPIE, vol. 5960, pp.1071-1079, July 2005.
[7] Yung-Chiang Wei and Jar-Ferr Yang, “Transformed-Domain Intra Mode Decision in H.264/AVC Encoder,” Proceedings of IEEE Region 10 Conference, pp. 1-4, Nov. 2006.
[8] Jhing-Fa Wang, Jia-Ching Wang, Jang-Ting Chen, An-Chao Tsai, and Anand Paul, “A Novel Fast Algorithm for Intra Mode Decision in H.264/AVC Encoders,” IEEE International Symposium on Circuits and Systems, pp. 3498-3501, May 2006.
[9] D. Wu, F. Pan, K. P. Lim, S. Wu, Z. G. Li, X. Lin, S. Rahardja, and C. C. K,” Fast Intermode Decision in H.264/AVC Video Coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 15, no. 7, pp. 953-958, July 2005.
[10] Christos Grecos and Ming Yuan Yang,“ Fast Inter Mode Prediction for P Slices in the H.264 Video Coding Standad,” IEEE Transactions on Broadcasting, vol. 51, no. 2, pp. 256-263, June 2005.
[11] Liu Qiong, Hu Ruimin, Ye Shengfeng, and Zhu Li, "Adaptive Fast Inter Mode Decision for Wireless Video Applications Using H.264/AVC," International Conference on Wireless Communications, Networking and Mobile Computing, pp. 1-4, Sept. 2006.
[12] J. Fan, Y. Chen, and X. Zhang, "A Novel Fast Inter Mode Decision for H.264," The 8th International Conference on Signal Processing, vol. 2, pp.16-20, 2006.
[13] Pei-Jun Lee and Ming-Long Lin, "Fast Inter Mode Selection Algorithm for Motion Estimation in MPEG-4 AVC/JVT/H.264," IEEE International Conference on Image Processing, pp. 1365-1368, Oct. 2006.
[14] Yun Cheng, Silian Xie, Jianjun Guo, Zhiying Wang, and Minlian Xiao, "A Fast Inter Mode Selection Algorithm for H.264," 1st International Symposium on Pervasive Computing and Applications, pp. 821-824, Aug. 2006.
[15] Zhen Han, Ruimin Hu, Li Zhu, Xinchen Zhang, and Qiong Liu, " Fast Inter Mode Selection Algorithm in MPEG-2-to-AVS Transcoder," First International Conference on Innovative Computing, Information and Control, vol. 2, pp. 213-216, Aug. 2006.
[16] Ming Yin and Hong-yuan Wang, "An improvement fast INTER mode selection for H.264 joint with spatio-temporal correlation," International Conference on Wireless Communications, Networking and Mobile Computing, vol. 2, pp. 1237-1240, Sept. 2005.
[17] D. Wu, S. Wu, K.P. Lim, F. Pan, Z. G. Li, C.C. Ko, "Fast INTER mode decision with adaptive thresholds for H.264 encoding," IEEE International Symposium on Consumer Electronics, pp. 406-409, Sept. 2004.
[18] J. F. Yang, S. C. Chang, and C. Y. Chen, "Computation Reduction Algorithms of Motion Estimation for Low Rate Video Coders", IEEE Transactions on Circuits and Systems for Video Technology, vol. 12, no.10, pp. 948-951, Oct. 2002.
[19] Liang Zhang and Wa James Tam, “Stereoscopic Image Generation Based on Depth Images for 3D TV,” IEEE Transactions on Broadcasting, vol. 51, no. 2, pp. 191-199, June 2005.
[20] H.-H.P. Wu, Y.-H. Yu and W.-C. Chen, “Projective Rectification Based on Relative Modification and Size Extension for Stereo Image Pairs,” IEE Proceedings-Vision, Image and Signal Processing, vol. 152, no. 5, pp. 623-633, Oct. 2005.
[21] Konstantinos Moustakas, Dimitrios Tzovaras, and Michael G. Strintzis, ”Stereoscopic Video Generation Based on Efficient Layered Structure and Motion Estimation From a Monoscopic Image Sequence,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 15, no. 8, pp. 1065-1073, Aug. 2005.
[22] Guofeng Zhang, Wei Hua, Xueying Qin, Tien-Tsin Wong, and Hujun Bao,” Stereoscopic Video Synthesis from a Monocular Video,” IEEE Transactions on Visualization and Computer Graphics, vol. 13, no. 4, pp. 686-696, Aug. 2007.
[23] Luo Yan, Zhang Zhaoyang, and An Ping, “Stereo Video Coding Based on Frame Estimation and Interpolation,” IEEE Transactions on Broadcasting, vol. 49, no. 1, pp. 14-21, March 2003.
[24] Nikolaos V. Boulgouris and Michael G. Strintzis, “A Family of Wavelet-Based Stereo Image Coders,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 12, no. 10, pp. 898-903, Oct. 2002.
[25] J.N. Ellinas and M.S. Sangriotis, “Stereo Video Coding Based on Quad-tree Decomposition of B–P Frames by Motion and Disparity Interpolation,” IEE Proceedings-Vision, Image and Signal Processing, vol. 152, no. 5, pp. 639-647, Oct. 2005.
[26] W. Yang, K. N. Ngan, and J. Cai, “An MPEG-4-Compatible Stereoscopic/Multiview Video Coding Scheme,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, no. 2, pp. 286-290, Feb. 2006.
[27] JVT Reference Software JM 6.1d, http://bs.hhi.de/~suehring/tml/download/Unofficial/.
[28] ITU-Telecommunications Standardization Sector, Document:”Q15-I-57d0,” Filename: “q15I57d0.doc”, Generated: 15 Nov ’99.
[29] C. P. Fan and J. F. Yang, "Fast Center Weighted Hadamard Transform Algorithms", IEEE Transactions on Circuit and Systems, Part II, Analog and Digital Signal Processing, vol. 45, no. 3, pp. 429-432, March 1998.
[30] C. P. Fan and J. F. Yang, "Fixed-Pipeline Two Dimensional Hadamard Transform Algorithms", IEEE Transactions on Signal Processing, vol. 45, no. 6, pp. 1669-1674, June 1997.
[31] Iain E G Richardson, “H.264 / MPEG-4 Part 10 : Transform & Quantization,” H.264 / MPEG-4 Part 10 White Paper, March 2003.
[32] G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” presented at the 13th VCEG-M33 Meeting, Austin, TX, April 2001.
[33] Iain E. G. Richardson, “H.264 / MPEG-4 Part 10 : Transform & Quantization,” H.264 / MPEG-4 Part 10 White Paper, March,2003.
[34] Iain E. G. Richardson,“H.264/MPEG-4 Part 10: Variable length coding,” H.264 /MPEG-4 Part 10 White Paper, March, 2003.
[35] JVT Reference Software JM 10.1, http://iphome.hhi.de/suehring/tml/download/old_jm/
[36] Canny, J., “A Computational Approach To Edge Detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 8, no. 6, pp.679-698, Nov. 1986
[37] Levenberg, K. "A Method for the Solution of Certain Problems in Least Squares," The Quarterly of Applied Mathematics 2, pp.164-168, 1944.
[38] Marquardt, D., "An Algorithm for Least-Squares Estimation of Nonlinear Parameters," SIAM J. Appl. Math. 11, pp.431-441, 1963.
[39] H.C. Longuet-Higgins, “The Visual Ambiguity of a Moving Plane.” Proceedings of the Royal Society of London. Series B, Biological Sciences, vol. 223, no. 1231, pp. 165-175, 1984.
[40] J. Shi and C. Tomasi, “Good Features to Track,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 593-600, June 1994.
[41] J. Weng, N. Ahuja, T. S. Huang “Matching Two Perspective Views,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no.8, pp. 806-825, Aug. 1992.