簡易檢索 / 詳目顯示

研究生: 路健華
Lu, Chain-Hua
論文名稱: 具備可調適性,使用音響心理學並架構於諧波結構四元樹之音訊壓縮演算法。
PHSQT - A Scalable Audio Compression Method Based on Harmonic Structure Quad Tree with Psychoacoustic Model
指導教授: 蘇文鈺
Su, Wen-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2006
畢業學年度: 94
語文別: 中文
論文頁數: 83
中文關鍵詞: 可調適性音訊壓縮諧波結構音響心理學
外文關鍵詞: psychoacoustic model, audio codec, harmonic structure
相關次數: 點閱:65下載:9
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   諧波結構四元樹音訊壓縮演算法(Harmonic Structure Quad Tree, HSQT),將頻域資料依照諧波關係重新排列,再以具有可調適性(Scalable)的SPIHT(Set Partitioning In Hierarchical Trees)技術進行編碼。在沒有音響心理學模型作為輔助的情況下,諧波結構四元樹壓縮演算法達到可接受的音質表現。

      然而使用音響心理學模型(Psychoacoustic Model),是感知音訊壓縮演算法(Perceptual Audio Codec)能同時達到高壓縮率、高音質的主要原因之一。藉由計算人耳所能容忍的誤差,感知音訊壓縮演算法將位元(Bit)分配於各個臨界頻帶(Critical Band),讓壓縮所造成的失真減到最低。通常需要迭代(Iterative)的計算許多次 ,才能計算出較佳的位元分配方式。

      本篇論文中提出一個一次動作(One-Pass)的機制,將修改過的MPEG音響心理學模型整合至諧波結構四元樹內,整合的過程中完整保留諧波結構四元樹的種種優點。實驗結果顯示當資料傳輸率高時,此機制在編碼效率以及壓縮之後的音質,都有極大的改進。

      A scalable audio coder, named Harmonic Structure Quad Tree (HSQT), rearranges frequency coefficients by harmonic relationship, and then encodes data into a scalable bitstream with Set Partitioning In Hierarchical Trees (SPIHT). Without the assistance of any specific psychoacoustic model, HSQT presented an acceptable quality.

      Using the psychoacoustic model is the main reason that perceptual audio coders achieves high compression ratio and tolerable quality at the same time. Perceptual audio coders can allocate sufficient bits for dominant critical bands by calculating hearing masking threshold, therefore people will not perceive severe quantization noise. Because this process is iterative, we need several passes to calculate the best bit allocation strategy.

      In this paper, we proposed an one-pass mechanism that combines HSQT with the modified MPEG Psychoacoustic Model 2. The experiment results suggested that it can achieve greater improvements in both coding efficiency and perceptual quality while the data rate goes higher.

    摘要 I 誌謝 III 目錄 IV 圖目錄 VI 表目錄 VII 第一章 緒論 1 1.1研究背景與動機 1 1.2章節概要 2 第二章Set Partitioning In Hierarchical Trees 3 2.1 編解碼演算法 3 2.1.1 空間導向樹狀結構(Spatial Orientation Tree) 3 2.1.2 演算法流程 5 2.2 利於SPIHT編碼的資料特性 6 2.3 Concurrent Encoding In Hierarchical Trees 8 第三章 諧波結構四元樹音訊壓縮演算法 9 3.1 介紹 9 3.2 頻域係數重排之規則 10 3.2.1 諧波理論 11 3.2.1 諧波結構四元樹的建造 11 3.3 演算法與流程 15 3.3.1 編碼端架構 15 3.3.2 解碼端架構 16 3.3.3 立體聲處理 18 3.3.4 可調適性 19 3.4重建諧波結構 19 第四章 音響心理學模型 22 4.1人類聽覺感知相關基本原理 22 4.1.1聽覺閥值(Hearing Threshold) 22 4.1.2聽覺範圍(Hearing Range) 24 4.1.3時域遮蔽(Temporal Masking) 25 4.1.4頻域遮蔽(Frequency Masking) 28 4.1.5臨界頻帶(Critical Band) 29 4.2 感知音訊壓縮演算法 32 4.3 計算遮蔽閥值曲線 36 4.4 音響心理學模型計算步驟 38 第五章 Psychoacoustic Harmonic Structure Quad Tree 49 5.1 緊密樹機制(Compact Tree Process) 49 5.1.1遮蔽閥值轉換處理 49 5.1.2失真的定義與計算式 50 5.1.3緊密樹機制與SPIHT之結合 52 5.1.4由SPIHT的LIP、LIS、LSP中刪除節點 56 5.2 PHSQT編解碼流程 61 5.3 實驗結果與比較 63 5.3.1 測試音樂 63 5.3.2.客觀測試 63 第六章 結論與未來研究方向 68 6.1結論 68 6.2未來研究方向 70 參考文獻 71 附錄A 音響心理學參數模型表 74 附錄B HSQT、PHSQT、HSQT+HSR、PHSQT+HSR、MP3、MP3pro綜合比較圖表 78

    [1] 張維城 , 路健華 , 王景新 , 蘇文鈺 , “諧波結構四元樹音訊壓縮演算法” , Workshop On Computer Music and Audio Technology , 3/2006.

    [2] Said A., Pearlman W. A., “A new, fast, and efficient image codec based on set portioning in hierarchical trees” in IEEE Transactions on circuits and systems for video technology, Vol. 6, No. 3, pp. 243-250, 1996.

    [3] Jing-Xin Wang, F.H. Cheng, and Alvin W.Y. Su, “Concurrent Encoding in Hierarchical Trees for Wavelet Based Image Compression,” in IEEE Int. Conf. Image Processing, Singapore, 24-27 Oct., 2004.

    [4] D.A.Huffman,” A Method for the Construction of Minimum Redundancy Codes.”, Proceedings of the IRE, 40:1098-1101, 1951

    [5] Shapiro J. M., “Embedded image coding using zerotrees of wavelet coefficients” in IEEE Transactions on signal processing, Vol. 41, No. 12, pp. 3445-3462, 1993.

    [6] Forchheimer R., Image coding and data compression, Linköping: Department of electrical engineering at Linköpings University, 1999.

    [7] Boulgouris N. V., Athanasios L., Strintzis M. G., “Wavelet compression of 3D medical images using conditional arithmetic coding” in IEEE International symposium on circuits and system, Geneva, pp. 557-560, 2000.

    [8] Zhitao Lu, and Pearlman, W.A., “An efficient, low-complexity audio coder delivering multiple levels of quality for interactive applications,” Multimedia Signal Processing, 1998 IEEE Second Workshop on , pp. 529-534, Dec. 7-9 1998.

    [9] Raad, M., Mertins, A., and Burnett, I., “Scalable to lossless audio compression based on perceptual set partitioning in hierarchical trees (PSPIHT),” Proceedings of 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), Vol. 5, Pages:V-624-7, April 6-10 2003.

    [10] ISO/IEC JTC1/SC 29/WG 11, ISO/IEC FDIS 14496-3 Subparts 1, 2, 3, Coding of Audio-Visual Objects-Part 3: Audio, ISO/IEC JTC1/SC 29/WG 11 N2503, October 1998.

    [11] ISO/IEC, “WD Text for Backward Compatible Bandwidth Extension for General Audio Coding”, ISO/IECJTC1/SC29/WG11, MPEG2002/N4611 March 2002.

    [12] Per Ekstrand et. al., "Enhancing mp3 with SBR: Features and Capabilities of the new mp3PRO Algorithm." In 112th AES Convention, Munich, May 10-13, 2002.

    [13] Martin Dietz et. al., "Spectral Band Replication, a novel approach in audio coding." In 112th AES Convention, Munich, May 10-13, 2002.

    [14] Jeongil Seo et. al., "A Simple Method for Reproducing High Frequency Components at Low-Bit Rate Audio Coding" In 113th AES Convention, Los Angeles, October 5-8, 2002.

    [15] J. P. Princen and A. B. Bradley, “Subband/transform coding using filter bank designs based on time domain aliasing cancellation,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-34, pp. 1153–1161, Oct. 1986.

    [16] Nuggehally S. Jayant, and P. Noll, Digital Coding of Waveforms: Principles and Applications to Speech and Video, Prentice Hall, 1990, ISBN: 0132119137.

    [17] P. Duhamel, Y. Mahieux, J.P. Petit, “A fast algorithm for the implementation of filter banks based on time domain aliasing cancellation,” Proceedings of the IEEE ICASSP'91, Toronto, Canada, pp. 2209-2212, May 1991.

    [18] Lipshitz, Stanly P., Pocock, Mark, and Vanderkooy, John, “On the Audibility of Midrange Phase Distortion in Audio System,” J. Audio Eng. Soc., Vol. 30, No. 9, pp 580-595, Sept. 1982.

    [19] H. Fletcher, “Auditory Patterns,” Rev. Mod. Phys., pp. 47-65, Jan. 1940.

    [20] D.D. Greenwood, “Critical Bandwidth and the Frequency Coordinates of the Basilar Mem.brane,” J.Acous. Soc. Am.,pp. 1344-1356, Oct. 1961.

    [21] J. Zwislocki, “Analysis of Some Auditory Characteristics,” in Handbook of Mathematical Psychology, R.Luce, et al., Eds., New York: John Wiley and Sons, Inc., 1965.

    [22] B. Scharf, ”Critical Bands,” in Foundations of Modern Auditory Theory, New York: Academic Press, 1970.

    [23] E. Zwicker and H. Fastl, Psychoacoustics: Facts and Models, Springer-Verlag, Berlin Heidelberg 1990.

    [24] M. R. Schroeder, B. S. Atal and J. L. Hall, “Optimizing Digital Speech Coders by Exploiting Masking Properties of the Human Ear” , J. Acoust. Soc. Am., Vol. 66 no. 6, pp. 1647-1652, December 1979.

    [25] Laurent Daudet , Mark Sandler , “MDCT Analysis of Sinusoids: Exact Results and Applications to Coding Artifacts Reduction” , IEEE Transactions on speech and audio processing, Vol. 12,No. 3,pp 302-312 , May 2004.

    [26] Thomas H. Cormen , Charles E. Leiserson , Ronald L. Rivest , Clifford Stein , Introduction to algorithms , The MIT Press , 2001 .

    [27] ISO/IEC JTC 1/SC 29/WG, 11172-3 ANNEX_D, Psychoacoustic Models , ISO/IEC JTC 1/SC 29/WG, November 1991.

    [28] ISO/IEC JTC 1/SC 29/WG 11,13818-7 , Part7:Advance Audio Coding, ISO/IEC JTC 1/SC 29/WG 11, December 1997

    [29] J.J. Rissanen and G.G. Langdon.” Arithmetic Coding”. IBM Journal of Research and Development, 23(2):149-162,March 1979.

    [30] Ian H. Witten , Radford M. Neal, and John G. Cleary ,” Arithmetic coding for data compression “, Commun ACM, vol. 30, pp.520-540, June 1987.

    [31] Adobe Syntrillium software –http://www.adobe.com/special/products/audition/syntrillium.html

    [32] Fraunhofer Institut Integrierte Schaltungen - http://www.iis.fraunhofer.de/index.html

    [33] HSQT - Harmonic Structure Quad Tree Audio Codec, http://scream.csie.ncku.edu.tw/~bff/HSQT.htm

    [34] SQAM - Sound Quality Assessment Material, http://sound.media.mit.edu/mpeg4/audio/sqam/

    [35] MP3' Tech Website - http://www.mp3-tech.org/

    [36] ITU Radiocommunication Study Group 6, “DRAFT REVISION TO RECOMMENDDATION ITU-R BS.1387 - Method for objective measurements of perceived audio quality”.

    [37] The Lame Project - http://www.mp3dev.org/mp3

    [38] E. Terhardt, “Calculating Virtual Pitch” , Hearing Res., Vol. 1, pp.155-182,1979

    [39] James D. Johnston , “Estimation of Perceptual Entropy Using Noise Masking”, in Proc. ICASSP, 1988, pp. 2524--2527.

    下載圖示 校內:2008-08-16公開
    校外:2008-08-16公開
    QR CODE