簡易檢索 / 詳目顯示

研究生: 李溪福
Lee, Hsi-Fu
論文名稱: 以心理聲學模型為基礎之低複雜度可調性音訊編碼
Low Complexity Scalable Audio Coding Based on Psychoacoustic Model
指導教授: 雷曉方
Lei, Sheau-Fang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2005
畢業學年度: 93
語文別: 英文
論文頁數: 105
中文關鍵詞: 心理聲學模型小波封包轉換可調式音訊編碼
外文關鍵詞: Psychoacoustic model, Wavelet packet transform, Scalable audio coding
相關次數: 點閱:99下載:10
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   近幾年,隨著多媒體技術的進步及通訊網路的蓬勃發展,為了有效率的儲存與傳遞多媒體的音頻資訊,需要對這些音頻訊號做適當的壓縮處理。現今大部分音訊編碼系統乃是利用心理聲學模型來估算人耳對聲音的遮蔽效應,使得訊號編碼的資料量得以降低,但一般的心理聲學模型是以快速傅立葉轉換(Fast Fourier Transform, FFT)來進行分析,所以需要大量的數值運算量。
      而依不同的多媒體應用特性,如傳輸頻寬、傳輸即時性等有著不同的網路需求,為了使多媒體應用達到即時、有效率傳輸的目的,本研究提出一低複雜度的可調式音訊編碼系統,其原理乃是利用離散小波封包分頻方式來模擬符合人耳聽覺響應中的非線性關鍵頻帶,不但可以改善非穩態聲音分析,也因在小波域上計算其遮罩曲線,所以能有效地減少心理聲學模型的運算量。本研究的音訊編碼系統利用零樹編碼的技術來建立一個三層可調性編碼,使接收端能針對不同頻寬選擇其適當的位元率來傳輸。不同於傳統零樹編碼演算法,本研究還配合心理聲學模式將聽不見的訊號移除,使得在低位元率時仍有不錯的音質。實驗結果指出與MPEG-1 Layer I和傳統零樹編碼的演算法比較,我們提出的方法都有較佳的音質與較低的運算複雜度。

     In recent years, multimedia technique and communication has been developed rapidly. The multimedia data in order to be effectively stored and transmitted by variable channel (e.g. Internet), need proper to do compression and processing the multimedia signal especially. In order to reduce audio encode data, most of modern audio codecs exploits the psychoacoustic model to remove audio irrelevancy without losing audio quality. However, the Fast Fourier Transform (FFT) which has been used by psychoacoustic models to analyze the frequency components requires high computational complexity.
     Scalable coding is desirable for heterogeneous network with varies bandwidths. In this research, we propose a low complexity scalable audio coding system. The wavelet packet decomposition was utilized in a way to closely mimic the nonlinear critical band. Since our audio coder directly calculates the masking thresholds in wavelet domain, our audio coder can not only improve on analysis of nonstationary audio signal but also the complexity is greatly reduced. We build a three-layer scalable audio coding system based on a zero-tree coding algorithm, thus the decoder can choose its suitable bandwidth to transmit the data. The main different from our algorithm and a traditional zero-tree coding algorithm is that we have taken input from the psychoacoustic model so that the inaudible audio signal would be removed and furthermore the audio still remains its quality at the low bit rate. The results indicate that we proposed algorithm compares with MP-1 and traditional zero-tree algorithm has the better quality and it is apparent that our audio coder required less the computational effort than general psychoacoustic model.

    Abstract (IN Chinese) i Abstract (IN English) ii Acknowledgments iii Table of Contents iv List of Table vii List of Figures viii Chapter 1 Introduction 1 1.1 Context and Motivation 1 1.2 Related Researches 1 1.3 Thesis Organization 3 Chapter 2 Overview of Wavelet Transform 4 2.1 What is Wavelets 4 2.1.1 Wavelets History 5 2.1.2 Why Wavelets 5 2.2 Theory of Wavelet 7 2.3 Wavelet Transform for Multi-resolution Decomposition 10 2.3.1 Multi-resolution Decomposition 10 2.3.2 The Scaling Function 12 2.3.3 The Wavelet Function 12 2.3.4 Direct Sum Decomposition 13 2.4 Discrete Wavelet Transform Packet 13 2.5 Boundary Extension 15 Chapter 3 Fundamentals of Audio Compression 18 3.1 Overview of Audio Coding Technologies 18 3.1.1 Historical Development 19 3.1.2 Basic Audio Coding Tools 20 3.2 Psychoacoustic Model 21 3.2.1 General Psychoacoustic Coding Principles 21 3.2.2 Absolute Threshold of Hearing 23 3.2.3 Critical Bands 23 3.2.4 Frequency Masking 26 3.2.5 Temporal Masking 26 3.2.6 The Spread of Masking 27 3.2.7 Recommended Model from MPEG 28 3.2.8 Example Codec Perceptual Model 30 3.3 MPEG audio codecs 35 3.3.1 MPEG-1 Layer I audio coding 35 3.3.2 MPEG-1 Layer II audio coding 36 3.3.3 MPEG-1 Layer II audio coding 36 3.4 Audio Quality Measurements 38 Chapter 4 Embedded Zero-Tree Coding Review 42 4.1 Introduction 42 4.2 Shapiro’s Zero-tree Algorithm 42 Chapter 5 Low complexity Scalable Audio Codec 48 5.1 System Overview 48 5.2 Filter Bank Design 48 5.3 The Simplified Psychoacoustic Model 53 5.3.1 The disadvantage of psychoacoustic model from MPEG Standard 54 5.3.2 The Johnston model 55 5.3.2.1 Window and Critical Band Analysis 55 5.3.2.2 Spreading Function 57 5.3.2.3 Coefficient of Tonality 58 5.3.2.4 Spread Threshold Estimate 58 5.3.2.5 Re-normalization of The Threshold Estimate 59 5.3.2.6 Accounting for Absolute Thresholds 60 5.4 Scalability Encoding algorithm 62 5.4.1 The Pre-Processing Module 63 5.4.1.1 Building the Zero-Tree Structure 63 5.4.1.2 The Application of the Masking Threshold 65 5.4.2 The Modifying Zero-Tree Coding 66 5.4.3 Entropy Coding and Formatting 75 5.5 Scalability Decoding algorithm 77 Chapter 6 Experimental Results 80 6.1 Objective Quality Measurements 80 6.1.1 PEAQ 80 6.1.2 NMR 86 6.2 Subjective Assessment 87 6.3 Computational Efficiency 89 Chapter 7 Concepts of Proposed Hardware Design 93 7.1 Numerical Analysis 93 7.2 Hardware Design for Proposed Simplified Psychoacoustic Model 94 Chapter 8 Conclusions and Future Works 101 8.1 Conclusions 101 8.2 Future Works 101 References 103 Curriculum Vita 105

    [1] D. Sinha and A. Tew. k,”Low Bit Rate Transparent Audio Compression Using Adapted Wavelets”, IEEE Trans. Signal Processing, vol. 41, 1993.
    [2] M. B. Sandler, etc., “Audio Coding for Mobile Multimedia Communications”, IEEE Colloquium on the Future of Mobile Multimedia communications, pp. 11/1-11/9, Dec. 1996.
    [3] P. E. Kudumakis and M. B. Sandler, “Wavelet Packet Based Scalable Audio Coding”, IEEE ISCAS’96, vol.2, pp. 41-44, 1996.
    [4] Y. Karelic and D. Malah, “Compression of High-Quality Audio Signals Using Adaptive Filterbanks and A Zero-Tree Coder”, Electrical and Electronics Engineers in Israel, 1995.
    [5] P. Srinivasan and L. H. Jamieson, “high-Quality Audio Compressing Using an Adaptive Wavelet Packet Decomposition and Psychoacoustic Modeling”, IEEE Trans. On Signal Processing, vol. 46, no. 4, pp. 1085-1093, April 1998.
    [6] M. Black and M. Zeytinoglu, “Computationally efficient wavelet packet coding of wide-band stereo signals”, Proc. ICASSP, pages 3075-3078, 1995.
    [7] J. M. Shapiro, “Embedded Image Coding Using Zerotrees of Wavelet Coefficients,” IEEE Trans. on Signal Processing, Special Issue on Wavelets and Signal Processing, vol. 41, no. 12, pp. 3445-3462, December 1993.
    [8] I. Daubechies, “Orthonormal Bases of Compactly Supported Wavelets”, Comm. In Pure and Applied Math., vol. 41, no. 7, pp. 909-996, 1988.
    [9] S. Mallat, “Multi-frequency Channel Decompositions of Images and Wavelet Models”, IEEE Trans. ASSP, vol. 37, pp. 2091-2110, 1989.
    [10] S. Li and W. Li, “Shape-Adaptive Discrete Wavelet Transforms for Arbitrarily Shaped Visual Object Coding”, IEEE Tran. on circuits and systems for video technology, vol. 10, no. 5, August 2000.
    [11] Painter, T., Spanias, A., “Perceptual Coding of Digital Audio”, Proceedings of the IEEE, Vol.88, No.4, April
    [12] ISO/IEC 11172-3 International Standard, “Information Technology - Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s”, 1993
    [13] Shlien, S., “Guide to MPEG-1 Audio Standard”, IEEE Transactions on Broadcasting, vol. 40, No.4, December 1994
    [14] ISO/IEC 13818-3, “Information Technology – Generic Coding of Moving Pictures and Associated Audio, Part 3: Audio”, 1997
    [15] ISO/IEC 13818-7, “Information Technology – Generic Coding of Moving Pictures and Associated Audio, Part 7: Advanced Audio Coding”, 1997
    [16] E. Zwicker and H. Fastl, Psychoacoustics: Facts and Models. Springer-Verlag, second ed., 1999.
    [17] N. Jayant, J. Johnston, R. Safranek, “Signal Compression Based on Models of Human Perception”, Proc. IEEE 81(10) pp. 1385-1422, 1993.
    [18] J. D. Johnston, “Transform Coding of Audio Signals Using Perceptual Noise Criteria”, IEEE Journal on Selected Area in Commnications, vol. 6, no. 2, pp. 314-323, 1988.
    [19] J. D. Johnston, “Estimation of Perceptual Entropy Using Noise Masking Criteria”, Proc. ICASSP, A1.9, pp. 2524-2527, 1988
    [20] International Telecommunication Union, “Methods for Subjective Assessement of small impairments in Audio Systems including Multichannel Sound Systems.” ITUR BS 1116-1, 1997.
    [21] Draft revision of recommendation ITU-R BS.1387, “Method for Object Measurements of Perceived Audio Quality”, 1998.
    [22] http://sound.media.mit.edu/mpeg4/audio/
    [23] http://www.ebu.ch/fr/technical/publications/tech3000_series/tech3253/index.php?display=FR
    [24] http://www.mp3-tech.org/programmer/sources/eaqual.tgz
    [25] R. Geiger, J. Herre, G. Schuller and T. Sporer, “Fine Grain Scalable Perceptual and Lossless Audio Coding Based on IntMDCT”, IEEE International Conference on Speech and Signal Processing, vol. 5, pp. 6-10, April 2003.
    [26] Brandenburg, K. and Sporer, T., “NMR and Masking Flag: Evaluation of Quality Using Perceptual Criteria”, in Proc. of the 11th international AES Conference on audio test and measurement, Portland, pp. 169-179, AES, 1992.
    [27] T. C. Chen, “Automatic Computation of Exponentials, Logarithms, Rations and Square”, IBM Journal Res. and Dev., July 1972, pp.380-388
    [28] M. Erne, G. Moschytz, and C. Faller, “Best Wavelet-Packet Bases for Audio Coding using Perceptual and Rate-Distortion Criteria”, in Proc. Int. Conf. Acoutics, Speech, and Signal Processing (ICASSP-99), Mar. 1999, pp.909-912.
    [29] P.Philippe, F. Moreau de Saint-Martin, and L. Mainard, “On the Choice of Wavelet Filters for Audio Compression”, in Proc. Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP-95), May 1995, pp. 1045-1048.
    [30] S. S. Lin, “Design and Implementation of a MP3 Audio Codec System Using the ARM Integrator”, department of electrical engineering NCKU, Tainan, Taiwan, R.O.C., July 2003, pp. 44-46.

    下載圖示 校內:2008-09-05公開
    校外:2008-09-05公開
    QR CODE