| 研究生: | 張宗賢 Chang, Chung-Hsien | 
|---|---|
| 論文名稱: | 基於定點運算之低複雜度演算法與硬體架構設計應用於語音與音訊信號處理 Low-complexity Algorithms and Hardware Designs based on Fixed-point Arithmetic for Speech and Audio Processing | 
| 指導教授: | 王駿發 Wang, Jhing-Fa | 
| 學位類別: | 博士 Doctor | 
| 系所名稱: | 電機資訊學院 - 電機工程學系 Department of Electrical Engineering | 
| 論文出版年: | 2016 | 
| 畢業學年度: | 104 | 
| 語文別: | 英文 | 
| 論文頁數: | 84 | 
| 中文關鍵詞: | 定點化運算 、語音信號處理 、音訊信號處理 、線頻譜 、數位IC 設計 | 
| 外文關鍵詞: | speech signal processing, audio signal processing, line spectral pairs, fixed-point arithmetic, digital IC design | 
| 相關次數: | 點閱:127 下載:10 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
語音與音訊信號處理技術被廣泛的應用在許多消費性產品當中,其硬體規格針對不同的使用層面也有些許不同。針對這些使用範疇,採用定點運算進行開發的演算法,應用於無浮點運算單元(Floating-Point Unit)的低階嵌入式開發環境,相較於浮點運算而言,能有效提升處理效能並降低開發成本。除此之外,以定點運算為基礎的硬體設計更能進一步減少其晶片面積及功耗。在此,本論文主要針對語音與音訊處理上運算量較高的部份進行演算法與硬體之優化,分為以下三個部份。
在第一部份,本論文將對語音壓縮中經常使用的線頻譜對 (Line Spectral Pairs) 運算,進行演算法與其硬體設計上之改良。針對不同階數的線頻譜對,提出了一個基於協同式運算的硬體架構。此協同式運算主要包含了伯奇-韋達法 (Birge-Vieta method) 與改良式契爾豪森轉換 (Enhanced Tschirnhaus Transform)。在本部份,本工作分析了伯奇-韋達法的係數相依性,並在中,減少了原算式中大量的小數運算,提出了一個基於整數運算的改良式契爾豪森轉換,來降低運算複雜度,使得大多數的乘法與除法運算得以使用位移及加法來替代。在實驗結果中,在相同的操作頻率上,所提出的硬體架構能有效的提升約40 倍的處理速度,並減少約1.16%的邏輯閘數量。
在第二部份,本論文提出了一個低複雜度的音訊編解碼器,應用於無浮點運算器之低階嵌入式系統。為了降低運算量運算與系統資源的消耗,提出的編解碼器僅使用了處理器中包含的基本的整數運算,以避免在一般音訊編碼器中經常使用到的複雜數學函式。在此編碼器中,主要包含了一個三種狀態的編碼模式,並使用了對應的資料型態標記轉換頻率域後之零信號,以提升壓縮率。實驗的結果顯示,在低階嵌入式開發板上的表現,所提出的編碼器與解碼器,相較於其他編碼標準,在程式碼大小上減少了約95%以上,提升了約113 倍以上的速度,並提供了近似的音質。
在第三部份,本論文提出了一個基於提出了一個基於定點格式的演算法來實現來二變數的幂函数(Power Function)。相較於原始浮點數的版本,此函數能有效的提升在低階嵌入式開發板上。此外,考慮到語音處理上經常使用的運算及大量數學函式上的需求。在此研究背景下,來設計一個可重組式運算單元來計算在語音處理上經常使用的超越函式及設計來提升硬體之使用率。實驗結果顯示,我們所提出的演算法執行在無浮點運算器的低階嵌入式開發板中,和原浮點函式相比,達到約5 倍速度的提升。此外,我們所設計的硬體架構可操作在200MHz 的頻率下並僅有11.7K 的邏輯閘大小,此硬體應能有效的應付大部份的語音信號處理上的運算。
Speech signal processing has been widely used in many consumer products for various applications. These applications might use different hardware specifications for different user needs. In the selection of the digital numeral system, fixed-point arithmetic is more appropriate than floating-point arithmetic to be applied in low-end embedded system which usually have no floating point unit (FPU). The fixed-point arithmetic can effectively improve the processing efficiency and reduce development costs. Moreover, the fixed-point hardware design is more efficient to reduce their usage of gate count and power consumption. This dissertation focuses on the high-complexity computations of speech and audio processing and design their algorithms and hardware architectures to accelerate computational performance. Three different topics are discussed in this dissertation.
In the first part of this dissertation, this work proposes a low-complexity algorithm and the corresponding hardware design for solving line spectrum pairs (LSPs). The proposed algorithm includes two main parts, the enhanced Tschirnhaus transform (ETT) and the Birge-Vieta method (BVM). The ETT replaces the fractional multiplications by several additions and shift operations so the redundant multiplication in original transform can be avoided. Besides, by analyzing the coefficient correlation of the BVM, a pipeline-recursive framework is implemented to save more calculation. The experimental results reveal that the proposed ETT can reduce the computation about 70.1%. Moreover, at the same operating frequency, the proposed hardware architecture can effectively enhances the performance of about 40 times, and reduce the amount of logic gates around 1.16%.
In the second part of the dissertation, this work presents a fixed-point algorithm and its hardware architecture based on a reconfigurable scheme for common-used transcendental functions and basic operations for speech signal processing. By analyzing the adopted algorithms of the above operations, a simplified computing unit is designed. This unit combines six types of operation by reconfiguring the data paths, the same multiply-add architecture can be reused for reducing the usage of logic gates. The experimental results indicate that in algorithm level, the proposed one can enhance about 5-fold computational performance performed in low-end embedded system. Furthermore, the proposed hardware design also can work at 200MHz clock rate, with only 11.9K of gate count, the average errors of logarithm and powering function respectively are 0.57% and 0.11%.
The third part of this dissertation presents a low complexity audio coder and decoder (CODEC) for the low-end embedded system with no FPU. For reducing computational complexity and system resource consumption, the proposed CODEC only uses the basic fixed-point arithmetic instructions to avoid the high-complexity mathematical functions in the conventional audio CODEC. To further improve the compression rate, a tri-mode zeroes recording algorithm (TZRA) is proposed. The proposed algorithm utilizes different encoding modes with the corresponding bit data structures which can be effectively used to record the locations of zero wavelet coefficients, and determines the optimal encoding mode which has the fewest bits. Compared with the baseline CODECs, the experimental results show that the proposed CODECs can reduce the size of the binary file about 95%, and provides 113-fold computation speed. The proposed system still maintains the approximate sound quality.
[1]	W. C. Chu, Speech Coding Algorithm: Foundation and Evolution of Standardized Coders, New York, NY: Wiley-Interscience, 2003.
[2]	W. T. P. D. V. Anderson, Fixed-Point Signal Processing, San Rafael, California: Morgan & Claypool Publishers, 2009.
[3]	D. A. Linebarger, and T. A. Bryan, “An Introduction of Fixed-point Signal Processing,” in IEEE Digtal Signal Processing & Education Workshop Austin, Texas, USA, 2004, Aug. 1-4, pp. 19-23.
[4]	W. T. Padgett, and D. V. Anderson, Fixed-Point Signal Processing, San Rafael, California: Morgan & Claypool Publishers, 2009.
[5]	G. Govindu, L. Zhuo, S. Choi, P. Gundala, and V. K. Prasanna, “Area, and Power Performance Analysis of a Floating-point based Application on FPGAs ” in High Performance Extreme Computing Workshop, Sep. 23-25, 2003.
[6]	F. Itakura, “Line spectrum representation of linear predictive coefficients of speech signals,” Journal of the Acoustical Society of America, vol. 57, pp. 35, Apr., 1975.
[7]	P. Kabal, and R. P. Ramachandram, “The computation of line spectral frequencies using Chebyshev polynomials,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 34, no. 6, pp. 1419–1426, Dec., 1986.
[8]	S. S. Yedlapalli, “Transforming real linear prediction coefficients to line spectral representations with a real FFT,” IEEE Trans. Speech and Audio Processing, vol. 5, no. 3, pp. 733–740, Sep., 2005.
[9]	F. K. Soong, and B.-H. Juang, “Line spectrum pair and speech data compression,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, San Diego, California, USA, 1984, Mar. 19–21, pp. 1.10.1–1.10.4.
[10]	C.-H. Wu, and J.-H. Chen, “A novel two-level method for the computation of the LSP frequencies using a decimation-in-degree algorithm,” IEEE Trans. Speech and Audio Processing, vol. 5, no. 2, pp. 106–115, Mar., 1997.
[11]	J. Rothweiler, “On ploynomial reduction in the computation of LSP frequencies,” IEEE Trans. Speech and Audio Processing, vol. 7, no. 5, pp. 592–594, Sep., 1999.
[12]	S.-H. Chen, J.-C. Ruan, C.-H. Chen, H.-T. Wu, and T. K. Truong, “The computation of line spectrum pair using modified complex-free Ferrari formula,” in Proc. IEEE International Symposium on Intelligent Signal Processing and Communication Systems, Hong Kong, China, 2005, Dec. 14–17, pp. 393–396.
[13]	S.-H. Chen, Y.-T. Chang, and Y.-S. C. Jain, “The computation of line spectral pair frequencies using Tschirnhaus transform,” in IEEE International Symposium on Circuits and Systems, Taipei, Taiwan, 2009, May 24–27, pp. 2333–2336.
[14]	S.-H. Chen, and M.-L. Hsu, “The use of K-means Algorithm to Compute the Line Spectrum Pair Frequencies with Tschirnhaus Transform,” in Proc. IEEE International Conference on Intelligent Information Hiding and Multimedia Signal Processing, Darmstadt, Germany, 2010, Oct. 15−17, pp. 288–291.
[15]	D. Sundararajan, M. O. Ahmad, and M. N. S. Swamy, “Fast computation of the discrete Fourier transform of real data,” IEEE Trans. Signal Processing, vol. 45, no. 8, pp. 2010–2022, Aug., 1997.
[16]	J.-F. Wang, J.-C. Wang, H.-C. Chen, T.-L. Chen, C.-C. Chang, and M.-C. Shih, “Chip design of portable speech memopad suitable for persons with visual disabilities,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 10, no. 8, pp. 644–658, Nov., 2002.
[17]	D. L. Reynolds, L. M. Head, and R. P. Ramachandran, “ASIC implementation of efficient line spectral frequency computation for speech coding application,” in Proc. 46th IEEE International Midwest Symposium on Circuits and Systems, Cairo, Egypt, 2003, Dec. 27−30, pp. 1367–1370.
[18]	D. L. Reynolds, L. M. Head, and R. P. Ramachandran, “VLSI architecture for the efficient computation of line spectral frequencies,” in Proc. IEEE International Symposium on Circuits and Systems, Bangkok, Thailand, 2003, May 25−28, pp. 718–721.
[19]	D. L. Reynolds, L. M. Head, and R. P. Ramachandran, “ASIC design for the efficient computation of line spectral frequencies using Cheby polynomial series,” in Proc. 50th IEEE International Midwest Symposium on Circuits and Systems, Montreal, Canada, 2007, Aug. 05−08, pp. 1465–1468.
[20]	"ISO/IEC Standard 11172-3: Information technology — Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s — Part 3: Audio," http://www.iso.org/iso/catalogue_detail.htm?csnumber=22412.
[21]	"ISO/IEC Standard 13818-7: Information Technology — Generic coding of moving pictures and associated audio information — Part 7: Advanced Audio Coding (AAC)."; http://www.iso.org/iso/catalogue_detail.htm?csnumber=43345.
[22]	"Ogg Vorbis web site [online]," http://xiph.org/vorbis/.
[23]	G. A. F. Seber, and A. J. Lee, Linear Regression Analysis, 2nd ed., Hoboken, New Jersey: Wiley, 2003.
[24]	P. K. Meher, J. Valls, T. B. Juang, K. Sridharan, and K. Maharatna, “50 Years of CORDIC: algorithms, architectures, and applications,” IEEE Trans. Circuits and Systems I: Regular Papers, vol. 56, no. 9, pp. 1893-1907, 2009.
[25]	J. Stopple, A Primer of Analytic Number Theory: From Pythagoras to Riemann, New York, NY: Cambridge, 2003.
[26]	S.-Y. Hong, D.-S. Kim, and M.-K. Song, “A low power full accuracy MPEG1 audio layer III (MP3) decoder with on-chip data converters,” IEEE Tran. Consumer Electronics, vol. 46, no. 3, pp. 903–906, 2000.
[27]	A. Kosaka, S. Yamaguchi, and H. Okuhata, “VLSI implementation of Ogg Vorbis decoder for embedded application,” in Proc. 15th Annual IEEE International ASIC/SOC Conference, Rochester, New York, 2002, Sep. 25-28, pp. 20–24.
[28]	Y.-C. Lu, C.-F. Shen, and C.-K. Chen, “A novel hardware accelerator architecture for MPEG-2/4 AAC encoder,” in Proc. IEEE International Conference on Multimedia and Expo, Taipei, Taiwan, 2004, Jun. 27–30, pp. 1139–1142.
[29]	T.-H. Tsai, and C.-N. Liu, “Low-power system design for MPEG-2/4 AAC audio decoder using pure ASIC approach,” IEEE Trans. Circuits and Systems I: Regular Papers, vol. 56, no. 1, pp. 144–155, 2009.
[30]	Y.-S. Yi, and I.-C. Park, “A fixed-point MPEG audio processor operating at low frequency,” IEEE Tran. Comsumer Electronics, pp. 779–786, Nov., 2001.
[31]	S.-C. Lai, S.-F. Lei, and C.-H. Luo, “Common architecture design of novel recursive MDCT and IMDCT algorithms for application to AAC, AAC in DRM, and MP3 codecs,” IEEE Trans. Circuits and Systems II: Express Briefs, vol. 56, no. 10, pp. 793–797, 2009.
[32]	G.-Y. Wang, H.-S. Zhang, M.-Y. Lu, C. Zhang, T. Jiang, and G.-Y. Guo, “Low-cost low-power ASIC solution for both DAB+ and DAB audio decoding,” IEEE Trans. Very Large Scale Integration Systems, vol. 22, no. 4, pp. 913–921, 2014.
[33]	S.-W.Huang, T.-H. Tsai, and L.-G. Chen, “A Low Complexity Design of Psycho-Acoustic Model for MPEG-2/4 Advanced Audio Coding,” IEEE Trans. Comsumer Electronics, vol. 50, no. 4, pp. 1209-1217, 2004.
[34]	B.-E. Kim, J.-Y. Chung, and S.-Y. Hwang, “An Efficient Fixed-point IMDCT Algorithm for High-resolution Audio Appliances,” IEEE Tran. Consumer Electronics, vol. 54, no. 4, pp. 1867–1872, Nov., 2008.
[35]	R. Pagniez, “Port of a fixed point MPEG-2 AAC encoder on a ARM platform,” Department of Computational Science, Dublin University 2004.
[36]	J. A. Pineiro, M. D. Ercegovac, and J. D. Bruguera, “Algorithm and architecture for logarithm, exponential, and powering computation,” IEEE Transactions on Computers, vol. 53, no. 9, pp. 1085-1096, 2004.
[37]	A. Vazquez, and J. D. Bruguera, “Iterative Algorithm and Architecture for Exponential, Logarithm, Powering, and Root Extraction,” IEEE Transactions on Computers, vol. 62, no. 9, pp. 1721-1731, 2013.
[38]	A. Vazquez, and J. D. Bruguera, “Composite Iterative Algorithm and Architecture for q-th Root Calculation,” in Proceeding of 2011 20th IEEE Symposium on Computer Arithmetic., 2011, 25-27 July 2011, pp. 52-61.
[39]	"Intel® 64 and IA-32 Architectures Software Developer's Manual," Intel, Jan. 2015.
[40]	C. V. Ramamoorthy, J. R. Goodman, and K. H. Kim, “Some Properties of Iterative Square-Rooting Methods Using High-Speed Multiplication,” IEEE Transactions on Computers, vol. C-21, no. 8, pp. 837-847, 1972.
[41]	A. Seth, and W.-S. Gan, “Fixed-Point Square Roots Using L-b Truncation [DSP Tips and Tricks],” IEEE Signal Processing Magazine, vol. 28, no. 6, pp. 149-153, 2011.
[42]	F. Auger, L. Zhen, B. Feuvrie, and L. Feng, “Multiplier-Free Divide, Square Root, and Log Algorithms [DSP Tips and Tricks],” IEEE Signal Processing Magazine, vol. 28, no. 4, pp. 122-126, 2011.
[43]	H. Kim, B.-G. Nam, J.-H. Sohn, W. Jeong-H, and H.-J. Yoo, “A 231-MHz, 2.18-mW 32-bit Logarithmic Arithmetic Unit for Fixed-Point 3-D Graphics System,” IEEE Journal of Solid-State Circuits, vol. 41, no. 11, pp. 2373-2381, 2006.
[44]	B.-G. Nam, H. Kim, and H.-J. Yoo, “A Low-Power Unified Arithmetic Unit for Programmable Handheld 3-D Graphics Systems,” IEEE Journal of Solid-State Circuits, vol. 42, no. 8, pp. 1767-1778, 2007.
[45]	L. Rabiner, and B.-H. Juang, Fundamentals of Speech Recognition, Englewood Cliffs, NJ: Prentice-Hall, 1993.
[46]	E. Shlomot, V. Cuperman, and A. Gersho, “Hybrid coding: combined harmonic and waveform coding of speech at 4kb/s,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 9, no. 6, pp. 632–646, Sep., 2001.
[47]	M. Sahidullah, S. Chakroborty, and G. Saha, “On the use of perceptual line spectral pairs frequencies and higher-order residual moments for speaker identification,” International Journal of Biometrics, vol. 2, no. 4, pp. 358–378, Sep., 2010.
[48]	G.-J. Jang, and H.-Y. Cho, “Efficient spectrum estimation of noise using line spectral pairs for robust speech recognition,” IET Electronics Letters, vol. 47, no. 25, pp. 1399–1401, Dec., 2011.
[49]	A. Mabrouk, and N. Hassin, “A low-complexity, hardware architecture for a parametric, real-time, LSF speech decoder,” in Proc. IEEE International Symposium on Consumer Eletronics, Singapore, 2011, Jun. 14–17, pp. 488–491.
[50]	N. Nishikawa, K. Itoyama, H. Fujihara, and M. Goto, “A musical mood trajectory estimation method using lyrics and acoustic features,” in Proc. 1st International ACM Workshop on Music Information Retrieval with User-Centered and Multimodal Strategies, Arizona, 2011, Nov. 28−Dec. 01, pp. 51–56.
[51]	T. Ogunfunmi, and M. J. Narasimha, “Speech over VoIP networks: Advanced signal processing and system implementation,” IEEE Circuits and System Magzine, vol. 12, no. 2, pp. 35–55, May, 2012.
[52]	WangHuan-Liang, P. Wen-Jiang, and W. Kai, “Parameter generation considering LSP ordering property for HMM-based speech synthesis,” IEEE Signal Processing Letters vol. 19, no. 8, pp. 467–470, Aug., 2012.
[53]	F. B. Hildebrand, Introduction to Numerical Analysis, New York, NY: McGraw-Hill, 1974.
[54]	"Tschirnhausen Transformation," http://mathworld.wolfram.com/TschirnhausenTransformation.html.
[55]	"Depressed polynomial," http://www.proofwiki.org/wiki/Definition:Depressed_Polynomial.
[56]	"Converting to a depressed quartic," http://en.wikipedia.org/wiki/Quartic_function#Converting_to_a_depressed_quartic.
[57]	A. D. Polyanin, and A. V. Manzhirov, Handbook of Mathmatics for Engineers and Scientists, Boca Raton, FL: Chapman and Hall, 2007.
[58]	F. K. Soong, and B.-H. Juang, “Optimal quantization of LSP parameters,” IEEE Trans. Speech and Audio Processing, vol. 1, no. 1, pp. 15–24, Jan., 1993.
[59]	K. K. Parhi, VLSI Design Signal Processing System: Design and Implementation, p.^pp. 157−163, New York, NY: Wiley-Interscience, 1999.
[60]	D. Addou, S. Selouani, K. Kifaya, M. Boudraa, and B. Boudraa, “A noise-robust front-end for distributed speech recognition in mobile communications,” International Journal of Speech Technologies, vol. 10, pp. 167–173, 2009.
[61]	A. M. Jahangir, O. Douglas, and K. Patrick, “A novel feature extractor employing regularized MVDR spectrum estimator and sub-band spectrum enhancement technique,” in Proc. International Workshop on Systems, Signal Processing and Their Applications, Algiers, Algeria, 2013, May 12−25, pp. 342–346.
[62]	J. A. H. Gray, and J. D. Markel, “Distance measure for speech processing,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP–24, no. 5, pp. 380–391, Oct. 1976.
[63]	T. Z. Shabestary, and P. Hedelin, “LSP quantization by a union of locally trained codebooks,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 13, no. 5, pp. 811–820, Sep., 2005.
[64]	C. Wei, H. Hui, T. Jiarong, L. Jinmei, and M. Hao, “A high-performance reconfigurable VLSI architecture for VBSME in H.264,” IEEE Trans. Consumer Electronic, vol. 54, no. 3, pp. 1338–1345, Aug. 2008.
[65]	A. V. Oppenheim, R. W.Schafer, and J. R. Buck, Discrete-Time Signal Processing, 2nd ed., Upper Saddle River, NJ: Prentice-Hall, 2000.
[66]	G.-Y. Luo, “An Efficient DSP Implantation of Wavelet Audio Coding for Digital Communication,” in Proc.  International Conference on Digital Society, St. Maarten, Netherlands, 2010, Feb. 10-16, pp. 66-71.
[67]	D. L. Donoho, “De-noising by soft-thresholding,” IEEE Tran. Information Theory, vol. 41, no. 3, pp. 613–627, May, 1995.
[68]	"Tool Interface Standard Executable and Linking Format Specification Verion 1.2," http://www.uclibc.org/docs/elf.pdf.
[69]	"GCC, the GNU Compiler Collection," http://gcc.gnu.org/.
[70]	"The LAME Project," http://lame.sourceforge.net/.
[71]	"MAD: MPEG Audio Decoder," Apr. 28, 2014; http://www.underbit.com/products/mad/.
[72]	"Reference code of Tremor," http://svn.xiph.org/trunk/Tremor/.
[73]	J.-G. Dumas, J.-L. Roch, E. Tannier, and S. Varrette, Foundations of Coding: Compression, Encryption, Error Correction: Wiley, 2015.
[74]	C. S. Turner, “A Fast Binary Logarithm Algorithm [DSP Tips & Tricks],” IEEE Signal Processing Magazine, vol. 27, no. 5, pp. 124-140, 2010.
[75]	B.-G. Nam, M.-W. Lee, and H.-J. Yoo, “Development of a 3-D graphics rendering engine with lighting acceleration for handheld multimedia systems,” IEEE Tran. Comsumer Electronics, vol. 51, no. 3, pp. 1020-1027, 2005.