簡易檢索 / 詳目顯示

研究生: 王木良
Wang, Mu-Liang
論文名稱: 可調性語音編碼器之複雜度簡化與實現
Implementation and Complexity Reduction for Scalable Speech Coders
指導教授: 楊家輝
Yang, Jar-Ferr
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2005
畢業學年度: 93
語文別: 英文
論文頁數: 138
中文關鍵詞: 語音編碼
外文關鍵詞: classified LPC quantization, spectral estimation, spectral envelope, stochastic codebook search, speech coding
相關次數: 點閱:146下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   本論文提出數個演算法來改善可調性語音編碼器之性能及降低其計算複雜度。首先,我們提出一快速搜尋法來降低CELP語音編碼器之計算複雜度。此外,為實現低位元率之參數語音編碼器,我們提出兩種頻譜估計方法及兩種頻譜封包向量量化方法,分別使用於頻譜封包參數之估測與量化。最後,我們提出一個分類線性預估係數(LPC)量化法來量化線頻譜頻率(LSF)向量,藉此實現分類之LPC參數的穿透編碼。

      CELP語音編碼器利用合成以分析(AbS)之搜尋機制來搜尋隨機碼簿之碼字,此方式需要極大之計算量。我們提出通用候選(GC)法來降低其計算複雜度,由理論分析和實驗結果顯示,本論文所提出之方法,針對在MPEG-4 CELP語音編碼器中所採用之多脈衝最大近似量化(MP-MLQ)法,節省了50%以上的計算量;而對於第三代行動通訊之窄頻帶調適性多位元率語音編碼器(AMR-NB),所提出之方法將代數碼字激發線性預測編碼器(ACELP)之碼字搜尋迴圈數降低為四分之一。

      低位元率之參數語音編碼器廣泛採用諧波模式來模擬有聲之語音,所以對參數語音編碼器而言,如何將頻譜封包參數作有效地編碼是一個重要的課題。我們提出兩種頻譜估計方法來估測諧波頻譜振幅並求得精確之非整數的基週值;一者基於達到精確估測信號參數之目的,藉由時變弦波模式(TSM)我們提出一精確之頻譜估測法,使用此方法降低了頻譜估測誤差約為1.69分貝;另一者基於降低計算複雜度之考量,提出一快速頻譜估測法,在不造成明顯感官上的語音品質降低之原則下,使用此方法降低70%以上的頻譜估測所需之計算量。為了有效率地量化頻譜封包向量,我們根據人類聽覺之特性,提出一聽覺頻譜封包向量量化(HSEVQ)法,此方法根據最小巴克頻譜失真準則(MBSD)作量化;另一簡化之HSEVQ(SSEVQ)法亦被提出用以降低計算複雜度,由理論分析和實驗結果顯示,本論文所提出之SSEVQ法相對於傳統上採用之頻譜封包向量量化法,將計算量大幅降低9倍而能維持原本所具有之語音信號品質。

      最後我們提出一分類LPC量化(CLPQ)法,針對已完成有聲或無聲(V/UV)分類之LSF向量作量化,用最小位元率達到穿透編碼之目標,由客觀之頻譜失真量測得知,有聲信號的短期語音頻譜之穿透編碼,可在每一音框採用21位元之位元率下達成;而對於無聲信號之語音,每一音框之LPC頻譜量化只需10位元。本論文所提出之CLPQ法以可變位元率量化LPC參數並同時具備可變計算複雜度之功能。

      In this dissertation, we proposed several algorithms to improve the performance of scalable speech coder and reduce the computational complexity. First, a fast search algorithm for the CELP speech coders is proposed to reduce the computational complexity. Second, two spectral estimation schemes and two quantization schemes are proposed to effectively estimate and to quantize the spectral envelope of parametric speech coder, respectively. Finally, a classified LPC quantization scheme is proposed to quantize the LSF vector and to achieve transparent quantization of classified LPC parameters.

      The searching of stochastic codebook of CELP speech coder, which is based on the analysis-by-synthesis (AbS) search mechanism, requires a huge computational effort. To further reduce the computational complexity, we proposed a generalized candidate (GC) scheme. Theoretical analyses and experimental results demonstrate that the proposed GC scheme incorporated with the multi-pulse maximum likelihood quantization (MP-MLQ) scheme of MPEG-4 CELP coder enables a reduction of over 50% of the computational load. Combined with the depth-first-tree search (DFTS) scheme in the 3GPP narrowband adaptive multi-rate speech coder (AMR-NB), the number of search loops involved in ACELP codebook search has been reduced by a factor about 4. In both case, the degradation of reconstructed speech quality is perceptually intangible.

      The harmonic modeling has been widely adopted in low rate parametric speech coders. To efficiently encode the spectral envelope parameter is an essential issue in the harmonic speech coder. We propose two spectral estimation algorithms to estimate the spectral amplitudes and to refine the fractional pitch lag of speech signal. To estimate the parameter of speech signal with time-varying characteristics, the precise spectral estimation approach is proposed based on a time-varying sinusoidal model (TSM) and the spectral distortion is reduced by approximately 1.69 dB. Another fast spectral estimation scheme is proposed with low complexity consideration and the computational task involved in spectral estimation is reduced more than 70%. The informal listening test confirms that there is virtually no detectable quality difference between the original estimation scheme and the proposed fast scheme. To effectively quantize the spectral envelope vector, a spectral envelope quantization scheme based on human hearing properties is proposed. The proposed hearing-based spectral envelope vector quantization (HSEVQ) scheme quantize the spectral envelope vector based on the minimum Bark spectral distortion (MBSD) criterion. A simplified HSEVQ (SSEVQ) scheme is developed to reduce the complexity of the computation. The theoretical analyses and simulations results reveal that the SSEVQ method reduces the amount of computation of the traditional SE vector quantization scheme by a factor of nine, while retaining the quality of the reconstructed speech signal.

      Finally, a classified LPC quantization (CLPQ) scheme is proposed to quantize the classified LSF vector at a minimum bit rate. With an objective spectral distortion measure, the CLPQ scheme achieves transparent quantization of the unvoiced speech spectra information with 10 bits and the voiced with 21 bits for each 20 ms frame. The proposed CLPQ scheme could encode the LPC coefficients with variable bit rate and computational scalability.

    1. Introduction 1 1.1 Background 1 1.2 Speech Coding Methods 2 1.3 Speech Coding Standards 4 1.4 Organization of Dissertation 9 2. MPEG-4 Speech Coding Algorithms 13 2.1 MPEG-4 HVXC Encoder 13 2.2 MPEG-4 HVXC Decoder 18 2.3 Analysis-by-Synthesis Coding - CELP Coder 19 2.4 MPEG-4 CELP Encoder 21 2.4.1 Pre-processing and LPC Analysis and Quantization 21 2.4.2 Perceptual Weighting Filter 23 2.4.3 Adaptive Codebook Search Algorithm 24 2.4.4 Stochastic Codebook Search Algorithm 25 2.4.5 MPEG-4 CELP-MPE Enhancement Layers 26 2.5 MPEG-4 CELP Decoder 26 3. Fast ACELP Searching Algorithms 28 3.1 Overview of CELP Speech Coder 28 3.2 Traditional Ternary Codebook Search 30 3.3 Proposed Generalized Candidate Scheme 37 3.3.1 Prediction of Excited Pulses 38 3.3.2 Proposed MCP Scheme 38 3.3.3 Sign Prediction Scheme 47 3.3.4 Codebook Search for MPEG-4 CELP-MPE Enhancement Layers 49 3.4 Computational Complexity and Experimental Results 50 3.5 Fast Search Scheme for AMR Speech Coder 53 3.6 Conclusions 55 4. Shape VQ of Spectral Envelope with Perception Consideration 56 4.1 Overview of MPEG-4 HVXC Speech Coder 56 4.2 SEVQ Scheme in HVXC Coder 58 4.3 SEVQ with Formal Auditory Model 61 4.4 Simplified HSEVQ Scheme 66 4.5 Experimental Results 73 4.6 Conclusions 80 5. Spectral Estimation for Harmonic Speech Coder 82 5.1 Overview of Sinusoidal Modeling 82 5.2 Spectral Representation for Harmonic Speech Coder 84 5.3 The Precise Spectral Estimation Scheme 87 5.4 Fast Spectral Estimation Scheme 94 5.4.1 S-step Fine Pitch Search Algorithm 95 5.4.2 Spectral Peak-picking Method 96 5.5 Simulation Results 97 5.5.1 Simulation for Precise Spectral Estimation Scheme 98 5.5.2 Simulation for Fast Spectral Estimation Scheme 100 5.6 Conclusions 103 6. Classified VQ of LPC Parameters 105 6.1 Overview of LPC Quantization 105 6.2 Spectral Quantization of LPC Parameters 109 6.2.1 Line Spectral Frequencies (LSF) 109 6.2.2 Multi-stage Vector Quantization (MSVQ) 110 6.2.3 M-L Tree Search Procedure 111 6.2.4 Split Vector Quantization (SVQ) 111 6.2.5 Predictive Vector Quantization (PVQ) 112 6.2.6 Weighting Function of Squared Euclidean Distance 113 6.3 Class-dependent VQ of LPC Parameters 114 6.3.1 Class-specific LPC Quantization 115 6.3.2 The Proposed Classified LPC Quantization Scheme 117 6.3.3 Codebook Training Procedure 119 6.4 Experimental Results 121 6.5 Conclusions 129 7. Conclusions 130 7.1 Summaries of Research 130 7.2 Future Works 132 Bibliography 134

    1 Spanias, A.S., “Speech coding: a tutorial review”, Proceedings of the IEEE, vol. 82, Issue: 10, Oct. 1994, pp.1541 – 1582.
    2 Ming Yang, “Low bit rate speech coding”, Potentials, IEEE, vol. 23, no. 4, Oct.-Nov. 2004, pp.32 – 36.
    3 Westall, F.A., “Review of speech technologies for telecommunications”, Electronics & Communication Engineering Journal, vol. 9, no. 5, Oct. 1997,
    pp.197 – 207.
    4 Zhong, Y.X., “Advances in coding and compression”, IEEE Communications Maga-zine, vol. 31, Issue: 7, July 1993, pp.70 – 72.
    5 Cox R.V., Speech coding standards, Elsevier, New York, 1995.
    6 Tremain T.E., “The government standard linear predictive coding algorithm: LPC-10”, Speech technology, April 1982, pp.40-49.
    7 Vary P., Hoffman R., Hellwig K., Sluyter R., “A regular-pulse excited linear predic-tive coder”, Speech communication, vol. 7, no. 2, 1988, pp.209-215.
    8 Chen, J.-H.; Cox, R.V.; Lin, Y.-C.; Jayant, N.; Melchner, M.J., “A low-delay CELP coder for the CCITT 16 kb/s speech coding standard”, Selected Areas in Communica-tions, IEEE Journal on , vol.10 , no.5 , June 1992, pp.830 – 849.
    9 Schroeder, M.R. and Atal, B.S., “Code-excited linear prediction (CELP): high quality speech at very low bit rate”, in the Proc. IEEE ICASSP, vol. 10, 1985, pp. 937-940.
    10 NCS (National Communications System) Technical Information Bulletin FS-1016, CELP Speech Coding at 4800 bps.
    11 Gerson, I.A.; Jasiuk, M.A., “A 5600 bps vselp speech coder candidate for half-rate GSM”, Speech Coding for Telecommunications, 1993. in the Proc. IEEE Workshop on, Oct., 1993, pp.43 – 44.
    12 Gerson, I.A., Jasiuk, M.A., “Vector sum excited linear prediction (VSELP) speech coding at 8 kbps”, in the Proc. IEEE ICASSP, vol.1, April 1990, pp.461-464.
    13 ITU-T Recommendation G.723.1, Dual rate speech coder for multimedia communica-tions transmitting at 5.3 and 6.3 kbit/s, March 1996.
    14 Laflamme, C., Adoul, J.P., Su, H. Y. and Morissette, S., “On reducing computational complexity of codebook search in CELP coder through the use of algebraic codes”, in the Proc. IEEE ICASSP, 1990, vol. 1, pp. 177-180.
    15 ITU-T Recommendation G.729, Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP), March 1996.
    16 ITU-T Recommendation G.729 - Annex A: Reduced complexity 8 kbit/s CS-ACELP speech codec, March 1996.
    17 ETSI, Digital cellular telecommunications system (Phase 2); Enhanced Full Rate (EFR) speech transcoding (GSM 06.60 version 6.0.0), ETSI EN 300 726, 1997.
    18 McAulay, R.J. and Quatieri, T.F., “Speech analysis/synthesis based on a sinusoidal representation”, IEEE Trans. on ASSP, Aug. 1986, vol. 34, no.4, pp.744-754.
    19 Hardwick, J.C. and Lim, J.S., “A 4800 kbit/s multiband excitation speech coder”, in the Proc. of IEEE ICASSP, 1988, pp.374-377.
    20 Digital voice Systems Inc.: “Inmarsat-M voice coding system description: draft ver-sion 1.3”, Feb. 1991.
    21 McCree, A.V., Barnwell III, T.P., “A Mixed excitation LPC vocoder model for low bit rate speech coding”, IEEE Trans. on ASSP, July 1995, vol. 3, no. 4, pp. 242-250.
    22 Kleijn, W.B., “Speech coding below 4 kb/s using waveform interpolation”, in the Global Telecommunications Conference, GLOBECOM '91. vol. 3, Dec. 1991 pp.1879–1883.
    23 W.B. Kleijn, “Encoding Speech Using Prototype Waveforms” , IEEE Trans. On Speech and Audio Processing, vol.1, no.4, October 1993, pp. 386-399.
    24 Kleijn, W.B., Haagen, J., “A speech coder based on decomposition of characteristic waveforms”, in the Proceeding of IEEE ICASSP, 1995. ICASSP, vol. 1, May 1995, pp.508–511.
    25 Nishiguchi, M., Matsumoto, J., Wakatsuki R., and Ono, S., “Vector quantized MBE with simplified V/UV division at 3.0kb/s”, in Proc. IEEE ICASSP, 1993, pp. 151-154.
    26 Nishiguchi, M., Matsumoto, J., “Harmonic and noise coding of LPC residuals with classified vector quantization”, in Proc. IEEE ICASSP, 1995, vol. 1, pp. 484 –487.
    27 ISO/IEC 14496-3, Information technology – very low bit rate audio-visual coding, Part 3: Audio, Subpart 1-3, March 1998.
    28 ISO/JTC 1/SC29/WG11, Information technology - coding of audio visual objects, N2503-2C, Nov, 1998.
    29 ISO/IEC JTC1/SC29/WG11, Coding of moving pictures and audio, N2424, October 1998.
    30 3GPP, Adaptive multi-rate (AMR) transcoding, 3GPP TS 26.090.
    31 3GPP, Adaptive multi-rate wideband speech transcoding, 3GPP TS 26.190.
    32 Laroia, R.; Phamdo, N.; Farvardin, N., “Robust and efficient quantization of speech LSP parameters using structured vector quantizers”, in Proc. IEEE ICASSP, vol. 1, 1991, pp. 641 – 644.
    33 Hagen, R., Ekudden, E., Johansson, B., and Kleijn, W. B.: “Removal of sparse-excitation artifacts in CELP”, in Proc. IEEE ICASSP, 1998, vol. 1, pp. 145-148.
    34 Ramírez, M. A., and Gerken, M., “A multistage search of algebraic CELP codebooks”, in Proc. IEEE ICASSP, 1999, vol. 1, pp. 17-20.
    35 Ha, N. K., “A fast search method of algebraic codebook by reordering search se-quence”, Proc. IEEE ICASSP, 1999, vol. 1, pp. 21-24.
    36 Ramírez M.A. and Gerken M., “Joint position and amplitude search of algebraic mul-tipulses”, IEEE Trans. on Speech and Audio Processing, Sept, 2000, Vol. 8, No. 5, pp. 633-637.
    37 Chen F. K., Yang J. F. and Yan Y. L., “Candidate scheme for fast ACELP search”, IEE Proceedings - Vision, Image and Signal Processing, Sept. 2002, vol.149, no. 1, pp. 10-16.
    38 Chen F. K., Yang J. F. and Lin Y. P., “Complexity Scalability for ACELP and MP-MLQ speech coders”, IEICE Trans. Information and Systems, Jan. 2002, vol. E85-D, no.1, pp. 255-263.
    39 Suen, A. N., Wang, J. F. and Yao, T. C. “Dynamic partial search scheme for stochastic codebook of FS1016 CELP coder”, IEE Proceedings - Vision, Image and Signal Processing, Feb. 1995, vol.142, no. 1, pp. 52 - 58.
    40 Das, A., Rao, A. V., and Gersho, A., “Variable-dimension vector quantization”, IEEE Signal Processing Letters, July 1996, vol. 3, no. 7, pp. 200-202.
    41 Shlomot, E., Cuperman, V. and Gersho, A., “Hybrid coding: combined harmonic and waveform coding of speech at 4kb/s”, IEEE Trans. on Speech and Audio Processing, Sep. 2001, vol. 9, no. 6, pp. 632-646.
    42 Wicker, E. Z. and Fastl, H., “Psychoacoustic-facts and models”, Hirzel-verlag, Berlin, Germany, 1990.
    43 Fourcin, “Speech processing by man and machine – Group report”, Recognition of Complex Acoustic Signals, ed. T. Bulllock, 1977.
    44 Chang, W. W. and Wang, D. Y., “Quality enhancement of sinusoidal transform vocod-ers”, IEE, Proceeding Vision Image and Signal Processing, Dec 1998, vol. 145, no. 6, pp.379-383.
    45 Wang, S., Sekey, A., and Gersho, A., “An objective measure for predicting subjective quality of speech coders”, IEEE Journal on Select Area in Communication, June 1992, vol. 10, no. 5, pp.819-829.
    46 Hermansky, H., “Perceptual linear predictive (PLP) analysis of speech”, J. Acoust. Soc. Am., Apr. 1990, vol. 87, pp.1738-1752.
    47 Sekey, A. and Hanson, B., “Improved one-Bark bandwidth auditory filter”, J. Acoust. Soc. Am, June 1984, vol. 75, pp.1902-1904.
    48 Bladon, R., “Modeling the judgment of vowel quality difference”, J. Acoust. Soc. Am., May 1981, vol. 69, pp.1414-1422.
    49 Chen, G., Koh, S. N. and Soon, I. Y., “Enhanced Itakura measure incorporating mask-ing properties of human auditory system”, Signal Processing, 2003, pp. 1445-1456.
    50 Purnhagen, H.,Meine, N. and Edler, B., “Sinusoidal coding using loudness-based component selection”, in Proc. IEEE ICASSP, 2002, vol. 2, pp.1817–1820.
    51 Choi, Y. S., Youn, D. H., “Fast harmonic estimation method for harmonic speech cod-ers”, IEE Electronic Letters, vol. 38, no. 7, 2002, pp.346-347.
    52 Kay S., “A fast and accurate single frequency estimator”, IEEE Trans. ASSP, vol. 37, no. 12, 1989. pp.1987-1989.
    53 Purnhagen, H., Edler, B., Ferekidis, C., “Object-based analysis/synthesis audio coder for very low bit rates”, in AES 104th Convention, Amsterdam, 1998, Preprint 4747.
    54 Marques J.S., Almeida L.B., “A background for sinusoid based representation of voiced speech”, in Proc. IEEE ICASSP, 1986, pp. 24.3.1-24.3.4.
    55 Stylianou, Y., “Decomposition of speech signals into a deterministic and a stochastic part”, in Proc. Fourth Int. Conf. on Spoken Language, vol. 2, 1996, pp. 1213 –1216.
    56 Purnhagen, H., “Parameter estimation and tracking for time-varying sinusoids”, in IEEE workshop on MPCA, 2002, pp. MPCA02 1-4.
    57 Master, A. S., Liu, Y. W., “Nonstationary sinusoidal modeling with efficient estima-tion of linear frequency chirp parameters”, in IEEE ICASSP, 2003, pp.656-659.
    58 Stylianou, Y., “Applying the harmonic plus noise model in concatenative speech syn-thesis”, IEEE Trans. On Speech and Audio Processing, vol. 9, no. 1, 2001, pp. 21–29.
    59 Stylianou, Y., “Simple and fast way of generating a harmonic signal”, IEEE Signal Processing Letters, vol. 7, no. 5, 2000, pp.111-113.
    60 Haykin, S., Communication systems- 4th edition, John Wiley & Sons, Inc., New York, 2001.
    61 Paliwal K. K. and Atal B. S., “Efficient vector quantization of LPC parameters at 24 bits/frame”, IEEE Trans. Speech Audio Processing, vol. 1, 1993, pp. 3–14.
    62 Hedelin P., “Single stage spectral quantization at 20 bits”, in Proc. IEEE ICASSP 1994, vol.1, April 1994, pp.19-22.
    63 LeBlanc W.P., Bhattacharya B., Mahmoud S. A., and Cuperman V., “Efficient search and design procedures for robust multi-stage VQ of LPC parameters for 4 kb/s speech coding” , IEEE Trans. Speech Audio Processing, vol. 1, 1993, pp.373–385.
    64 Farvardin N. and Laroia R., “Efficient encoding of speech LSP parameters using the discrete cosine transform”, in IEEE Proc. ICASSP1989, May 1989, pp.168-171.
    65 Hagen R., Paksoy E. and Gersho A., “Voicing-specific LPC quantization for vari-able-rate speech coding”, IEEE Trans. Speech Audio Processing, vol. 7, 1999, pp. 485-494.
    66 Heikkinen A., “Development of a 4 kbit/s hybrid sinusoidal/CELP speech coder”, Speech communication, Elsevier, vol. 42, 2004, pp.353-371.
    67 Cree A. V., Truong K., George E. B., Barnwell T. P., and Viswanathan V., “A 2.4 kbit/s MELP coder candidate for the new U.S. Federal Standard”, in Proc. IEEE ICASSP,1996 , pp.200-203.
    68 Han W. J., Kim E. K., and Oh Y. H., “Multicodebook split vector quantization of LSF parameters”, IEEE signal processing letters, vol. 9, no.12, Dec. 2002, pp.418-421.
    69 Chen C.Q., Koh S.N., Soon I.Y., “An associatively classified partitioned vector quan-tizer”, Signal Processing, Elsevier, 1999, pp.311-322.
    70 Jiang Y., Cuperman V., “An improved 2.4 kbps class-dependent CELP speech coder”, in IEEE ICC1995, Seattle, Gateway to Globalization, vol. 3, 1995, pp.1414- 1417.
    71 James H. Y. Loo, “Intraframe and interframe coding of speech spectral parameters”, Master Thesis, Department of Electrical Engineering McGill University Montreal, Canada, 1996, pp. 76.
    72 Lee M.S., Kim H.K. and Lee H.S., “A new distortion measure for spectral quantiza-tion based on the LSF intermodel interlacing property”, Speech communication, El-sevier, 2001, pp.191-202.
    73 Quatieri T. F., Discrete-time speech signal processing principles and practice, United State of America, Upper Saddle River, NJ 07458.
    74 Prentice Hall. TIMIT, DARPA TIMIT acoustic-phonetic continuous speech corpus (CD-ROM), National Institute of Standards and Technology, NIST Speech Disc 1-1.1, 1990.
    75 Linde Y., Buzo A., Gray R.M., “An algorithm for vector quantizer design”, IEEE Trans. Communications, vol. 28, Jan. 1980, pp.84-95.

    下載圖示 校內:2006-01-31公開
    校外:2006-01-31公開
    QR CODE