簡易檢索 / 詳目顯示

研究生: 張維城
Chang, Wei-Chen
論文名稱: 位元層音訊壓縮法於改良式離散餘弦轉換之研究
On the Use of Bit-plane Audio Coding in MDCT Domain
指導教授: 蘇文鈺
Su, Wen-Yu Alvin
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2008
畢業學年度: 96
語文別: 英文
論文頁數: 83
中文關鍵詞: Zero-tree假設可調式音訊壓縮改良式離散餘弦轉換內容式位元層編碼頻帶拓展
外文關鍵詞: Zero-tree Assumption, Scalable Audio Compression, Bandwidth Extension, Context-based Bit-plane Coding, Modified Discrete Cosine Transform
相關次數: 點閱:83下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文旨在探討壓縮音訊和編碼器相關特性的關係並據此發展以改良式離散餘弦轉換領域為主的位元層編碼新策略,位元層編碼被廣泛地利用於影像壓縮並且已納入JPEG2000的標準中,它成功地結合小波轉換的階層性結構與一組漸進式地掃描順序和編碼動作以達到在品質與傳輸可調性上的次佳化。在運用相同壓縮策略於發展可調式音訊壓縮法上時,研究人員面臨了兩種困難。其一,人類的聽覺感知與視覺感知截然不同,對於頻率係數的重要性判斷需要新的依據;多數音訊感知的實驗進行於傅立葉領域,而非小波領域,這使得編碼策略難以利用已知的感知現象。相反地,屬於一般傅立葉轉換的改良式離散餘弦轉換使用於音訊壓縮達十餘年並已成功地證明為一傑出的音訊轉換工具,然而它缺乏小波轉換之階層性自相似結構,很顯然地缺少此種自相似結構將無法使漸進式編碼策略達到其最佳的編碼效率,此為第二項困難。
    透過理解內容式位元層編碼的zero-tree假設,作者嘗試著分析不同係數次序的編碼效率,以改善音訊在改良式離散餘弦轉換缺少自相似結構的問題;本論文的二項主要貢獻為:第一、依據幾項zero-tree位元層編碼器上的觀察,提出關於編碼次序與編碼效率的理論分析,有助於後續相關研究發展。第二、提出一個新的內容式位元層編碼演算法與相對應的頻帶拓展演算法,這個整合性的設計展示了一種由低階資料描述壓縮到高階參數化合成的漸進式方法論。

    The purpose of this dissertation is to exploit the relationship between compressed audio and the coder-related characteristics to develop new algorithms when bit-plane coding strategy is applied on the Modified Discrete Cosine Transform (MDCT) domain. The bit-plane coding method has been extensively investigated in image compression domain and standardized in JPEG2000. It successfully combines the hierarchical content of the Wavelet transform with a set of progressive scanning and coding operations to achieve a sub-optimal performance in both quality and transmission scalabilities. There are two difficulties when researchers attempted to use the same compression strategy in developing scalable audio compression methods. The first difficulty is that human aural perception is quite different from human visional perception. New methods are needed to guide the order of the importance of transformed coefficients. Huge research literatures discussed the audio perceptual coding effects in the generic Fourier domain but less in the Wavelet domain. It is hard to determine coding strategies without appropriate audio perceptual criterions in the Wavelet domain. On the contrary, as one of generic Fourier transforms, the MDCT has been developed for audio coding for decades and is practically proven an outstanding tool. Different from the Wavelet transform applied to the image compression, the MDCT coefficients of an audio piece lack a hierarchical self-similarity structure. The second difficulty is revealed here that the coding efficiency of the progressive coding strategy may not be maximized without such kind of self-similarity context.
    By understanding the zero-tree assumption of the context-based bit-plane coding, the author tried to analyze coding efficiencies of different ordering schemes to avoid the lack of self-similarity context in the MDCT representation;Two original contributions in this dissertation are presented. The first part is to provide a systemic analysis between the ordering scheme and the coding efficiency for zero-tree based bit-plane coding strategies. This analysis gives a strong groundwork for the follow-up researches. The second part consists of one novel bit-plane algorithm and one corresponding bandwidth extension algorithm. This integrated design demonstrated a progressive methodology from the low-level data representation/compression to the high-level parametric synthesis.

    LIST OF TABLES 9 LIST OF FIGURES 10 CHAPTER 1 INTRODUCTION 11 1.1 MOTIVATION 12 1.2 CONTRIBUTION 14 1.3 OUTLINE 14 CHAPTER 2 BIT-PLANE CODING 16 2.1 CONTEXT-BASED BIT-PLANE CODING 16 2.2 SET PARTITIONING IN HIERARCHICAL TREES ALGORITHM 18 2.3 CONCURRENT ENCODING IN HIERARCHICAL TREES ALGORITHM 20 CHAPTER 3 HARMONIC STRUCTURE QUAD TREE 26 3.1 PRINCIPLES 26 3.2 HARMONIC STRUCTURE QUAD TREE CONSTRUCTION 27 3.3 CODING PROCEDURE 31 3.3.1 Encoding Scheme 32 3.3.2 Decoding Scheme 33 3.4 BIT-STREAM FORMAT 34 3.5 COMPARISON WITH DIFFERENT SIGNIFICANT THRESHOLDS 36 3.5.1 Music Preparation 37 3.5.2 Objective Performance 38 3.5.3 Subjective Performance 40 CHAPTER 4 HARMONIC STRUCTURE RECONSTRUCTION AND RESIDUAL SUPPORT 43 4.1 MAGNITUDE ESTIMATION 44 4.2 PHASE ADJUSTMENT 46 4.3 EVALUATION 49 4.3.1 Music Preparation 49 4.3.2 Spectral View 50 4.3.3 Objective Test 52 4.3.4 Subjective Test 55 CHAPTER 5 ANALYSIS OF CODING EFFICIENCIES OF VARIOUS COEFFICIENT ORDERING SCHEMES 58 5.1 COMBINED SIGNIFICANCE TREE QUANTIZATION 60 5.2 CODING EFFICIENCY OF ORDERING SCHEME 61 5.3 CODING EFFICIENCY OF HIERARCHICAL STRUCTURE 66 CHAPTER 6 CONCLUSIONS AND FUTURE WORK 69 6.1 F0 CONTOUR EXTRACTION 69 6.2 FROM LOSSY TO LOSSLESS 71 6.3 INTER-FRAME PREDICTION 72 APPENDIX 73 LOUDNESS CONVERSION 73 REFERENCES 76

    [1] S.H. Park et al., “Multi-layer bit-sliced bit-rate scalable audio coding,” in 103rd Convention of the Audio Engineering Society, New York, Sep. 1997, preprint 4520.
    [2] Zhitao Lu, and W.A. Pearlman, “An efficient, low-complexity audio coder delivering multiple levels of quality for interactive applications,” in Multimedia Signal Processing, 1998 IEEE Second Workshop on, pp. 529-534, Dec. 7-9 1998.
    [3] C. Dunn, “Efficient audio coding with fine-grain scalability,” in 111th Convention of the Audio Engineering Society, NY, USA, Sept. 2001, preprint 5492.
    [4] M. Raad and A. Mertins, “From lossy to lossless audio coding using SPIHT,” in Proceedings of the 5th International Conference on Digital Audio Effects, pp. 245-250, Hamburg, Germany, Sept. 2002.
    [5] J. Li, “Embedded audio coding (EAC) with implicit auditory masking,” in Proc. ACM on Multimedia, pp. 592-601, Nice, France, Dec. 2002.
    [6] M. Raad, A. Mertins, and I. Burnett, “Scalable to lossless audio compression based on perceptual set partitioning in hierarchical trees (PSPIHT),” in Proceedings of 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'03), vol.5, pp.624-627, Hong Kong, April 6-10 2003.
    [7] W.-C. Chang, Alvin W.Y. Su and J.-X. Wang, “A new audio compression method based on spectral oriented trees,” in 118th Convention of the Audio Engineering Society, Barcelona, Spain, May 28-31, 2005, preprint 6042.
    [8] H. Zhou, A. Mertins, and S. Strahl, “An efficient, fine-grain scalable audio compression scheme,” in 118th Convention of the Audio Engineering Society, Barcelona, Spain, May 28-31, 2005, preprint 6435.
    [9] C. Dunn, “Scalable bitplane runlength coding,” in 120th Convention of the Audio Engineering Society, Paris, France, May 20-23, 2006, preprint 6749.
    [10] ISO/IEC JTC1/SC29/WG11 N2803, “MPEG-4 Audio Version 2 (Final Committee Draft 14496-3 AMD1),” Vancouver, Canada, 1999 July.
    [11] Khalid Sayood, Introduction to Data Compression, 3rd ed., Morgan Kaufmann, 2005.
    [12] A. Said, W.A. Pearlman, “A new, fast, and efficient image codec based on set portioning in hierarchical trees” IEEE Transactions on circuits and systems for video technology, vol. 6, no. 3, pp. 243-250, 1996.
    [13] Lecture Notes in Computer Science: The Wavelet Based Contourlet Transform and Its Application to Feature Preserving Image Coding, vol. 4827/2007, pp. 590-600, ISSN:0302-9743 (Print) 1611-3349 (Online), ISBN: 978-3-540-76630-8, Springer Berlin / Heidelberg, 2007.
    [14] J.-X. Wang, F.H. Cheng, and Alvin W.Y. Su, “Concurrent Encoding in Hierarchical Trees for Wavelet Based Image Compression,” in IEEE Int. Conf. Image Processing, Singapore, 24-27 Oct., 2004.
    [15] Nuggehally S. Jayant, and P. Noll, Digital Coding of Waveforms: Principles and Applications to Speech and Video, Prentice Hall, 1990, ISBN: 0132119137.
    [16] Fraunhofer Institut Integrierte Schaltungen, http://www.iis.fraunhofer.de/index.html
    [17] SQAM – Sound Quality Assessment Material, http://sound.media.mit.edu/mpeg4/audio/sqam/
    [18] The LAME Project, http://www.mp3dev.org/mp3/
    [19] HSQT – Harmonic Structure Quad Tree Audio Codec, http://scream.csie.ncku.edu.tw/~bff/HSQT.htm
    [20] MP3' Tech Website, http://www.mp3-tech.org
    [21] ITU Radiocommunication Study Group 6, “DRAFT REVISION TO RECOMMENDDATION ITU-R BS.1387 - Method for objective measurements of perceived audio quality”.
    [22] W.-C. Chang, J.-X. Wang and Alvin W. Y. Su, “Harmonic structure reconstruction in audio compression method based on spectral oriented trees,” in 120th AES Convention, Paris, France, May 20-23, 2006, preprint 6809.
    [23] W.-C. Chang and Alvin W. Y. Su, "Quality Improvement of Scalable Audio Codec Based on Phase Estimation Technique for Reconstructed Harmonic Structure," 121st AES Convention, San Francisco, CA, USA, Oct. 5-8, 2006.
    [24] L. Daudet and M. Sandler, “MDCT Analysis of Sinusoids: Exact Results and Applications to Coding Artifacts Reduction,” IEEE Trans. Speech, Audio Processing, vol. 12. no. 3, May 2004.
    [25] Corey I. Cheng, “Method for estimating magnitude and phase in the MDCT domain,” in 116th AES Convention, Berlin, May 2004.
    [26] Udo Zölzer et al., DAFX - Digital Audio Effects, John Wiley & Sons, Ltd., 2002.
    [27] ITU-T Recommendation ITU-R P.830, Subjective Performance Assessment of Telephone-Band and Wide-Bandwidth Digital Codecs, 1996.
    [28] ISO/IEC, “WD Text for Backward Compatible Bandwidth Extension for General Audio Coding”, ISO/IECJTC1/SC29/WG11, MPEG2002/N4611 March 2002.
    [29] Per Ekstrand et al., “Enhancing mp3 with SBR: Features and Capabilities of the new mp3PRO Algorithm,” in 112th AES Convention, Munich, May 10-13, 2002.
    [30] Martin Dietz et al., “Spectral Band Replication, a novel approach in audio coding,” in 112th AES Convention, Munich, May 10-13, 2002.
    [31] “Recommendations p.80 methods of subjective determination of transmission quality.” ITU, 1993.
    [32] On Locality of Spectral Oriented Tree for Bit-Plane Based Low-Bit Rate Audio Coding, http://scream.csie.ncku.edu.tw/~daphne/Locality/Locality.html
    [33] Y.-S. Siao, W.-C. Chang and Alvin W. Y. Su, “Robust pitch detection/tracking strategy for musical recording of solo bowed-string and wind instruments,” to appear in Journal of Information Science and Engineering (JISE).
    [34] ISO/IEC 15938-4, Information Technology – Multimedia Content Description Interface – Part 4: Audio, 2002.
    [35] H.-G. Kim, N. Moreau, and T. Sikora, MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval, John Wiley, 2005.
    [36] R. Geiger, T. Sporer, J. Koller, and K. Brandenburg, “Audio coding based on integer transform,” 111th AES Convention, Preprint 5471, 2001.
    [37] R. Geiger, Y. Yokotani, G. Schuller, and J. Herre, “Improved Integer Transforms using Multi-Dimensional Lifting,” International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 17-21, 2004, Montreal, Canada.
    [38] H. Huang, R. Yu, X. Lin, and S. Rahardja, “Method for realizing reversible integer type-IV discrete cosine transform,” Electronic Letters, vol. 40, no.8, pp 514-515, 2004.
    [39] H. Huang, S. Rahardja, R. Yu, and X. Lin, “A fast algorithm of integer MDCT for lossless audio coding,” IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. IV, (Montreal, Canada), pp. 177-180, 2004.
    [40] ISO. Acoustics: Normal equal-loudness level contours. Technical Report ISO 226:1987, International Organisation for Standardization, 1987.

    下載圖示 校內:2009-08-26公開
    校外:2009-08-26公開
    QR CODE