| 研究生: | 羅信富 Luo, Hsin-Fu | 
|---|---|
| 論文名稱: | 應用於記憶體式快速傅立葉轉換處理器設計之有效率記憶體管理方案 Efficient Memory Management Schemes for Memory-Based FFT Processor Design | 
| 指導教授: | 謝明得 Shieh, Ming-Der | 
| 學位類別: | 博士 Doctor | 
| 系所名稱: | 電機資訊學院 - 電機工程學系 Department of Electrical Engineering | 
| 論文出版年: | 2016 | 
| 畢業學年度: | 104 | 
| 語文別: | 英文 | 
| 論文頁數: | 102 | 
| 中文關鍵詞: | 快速傅立葉轉換 、記憶體式 、區塊合併記憶體 、快取記憶體式 、管線化-記憶體混合型 | 
| 外文關鍵詞: | fast Fourier transform, memory-based, merged-bank memory, cached-memory, pipelined shared-memory | 
| 相關次數: | 點閱:136 下載:5 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
正交分頻多工(OFDM)技術,因為提高了頻寬的使用效率以及較佳的抗多重路徑干擾的特性,因此被廣泛的應用在許多的無線通訊系統中。在正交分頻多工系統中,快速傅立葉轉換(FFT)處理器為系統內不可或缺的關鍵元件。由於快速傅立葉轉換具高度的運算複雜度,因此在硬體實現上根據效能、耗電量等不同的系統需求而演衍生出許多不同的硬體架構,其中記憶體式(memory-based)快速傅立葉轉換處理器是一個具低面積的硬體架構,尤其在大點數的應用上更顯其優勢。
在記憶體式快速傅立葉轉換處理器中,70%以上的硬體面積由記憶體佔據,因此記憶體的硬體架構、記憶體的存取機制以及記憶體定址方法為主要的設計考量重點。本論文首先提出一種資料重置(data relocation)的機制,透過這樣的機制,區塊合併(merged-bank)記憶體將可以被採用,採用區塊合併記憶體相較於採用多重區塊(multi-bank)記憶體的處理器在硬體實現上具較低面積及低功率消耗的特性。進一步,我們提出一種有效率且無衝突(conflict-free)記憶體定址方法,不但可以導入單埠(single-port)記憶體於硬體實現中,並可以適用於高基數(high-radix)的快速傅立葉轉換應用。實驗結果顯示,採用單埠且區塊合併記憶體架構相較於雙埠(dual-port)且多重區塊記憶體架構,在記憶體面積上減低了超過30%以上的面積需求。
針對低功率消耗及高處理效能設計考量,記憶體式快速傅立葉轉換處理器分別衍生出快取記憶體式(cached-memory)及管線化-記憶體式(pipelined shared-memory)硬體架構。於快取記憶體式快速傅立葉轉換處理器設計上,我們提出了一個適用於多階層式(multi-level)快取記憶體架構之記憶體定址演算法,藉由此方法,任一階層記憶體皆能引用區塊合併記憶體架構來實現,完成低面積消耗的特性。跟據此方法,我們完成了一個適用於數位視訊廣播(DVB-T/H)之8192點快取記憶體式快速傅立葉轉換處理器,該處理器相較於多重區記憶體架構節省10.1-29.3%面積及9.6-67.9%的功率消耗。
最後,本論文針對兼具高處理效能的管線化-記憶體式快速傅立葉轉換處理器設計亦有所探討。文中提出了一個改良式的多路徑延遲換向器(multi-path delay commutator)電路架構及相對應的記憶體定址方法,此方案能夠使得資料重置機制得以在該架構下運行,進而導入區塊合併式記憶體於硬體實現中來減低整體的面積需求。另外,我們一併提出了一個能夠支持基數3 (radix-3)的多路徑延遲換向器電路架構,於相關應用中提供更有效率的記憶管理方案。
With higher bandwidth efficiency and better immunity to multipath interference, orthogonal frequency-division multiplexing (OFDM) technology has been widely adopted in many wireless communication systems. For OFDM systems, a fast Fourier transform (FFT) processor is an indispensable key component. Due to high computation complexity, the FFT design was inspired by various hardware architectures to leverage performance and power consumption. The memory-based FFT architecture is a low-area design which has significant advantages especially in large-point applications.
In memory-based FFT design, more than 70% area of the processor is comprised of the memory module, so the hardware structure, access mechanism, and addressing method of the memory are key design considerations. This dissertation proposes a data relocation mechanism using a merged-bank memory. Merged-bank memory uses lower area and power consumption compared to multi-bank memory processors in terms of the hardware implementation. Furthermore, we propose an efficient and conflict-free memory addressing method, based on single-port memory which can be adopted in the hardware implementation, and is also applicable to high-radix FFT applications. The experimental results show that the single-port and merged-block memory architecture occupies more than 30% less area as compared to dual-port and multi-block memory architecture.
For the design considerations of low power consumption and high performance, cached-memory architecture and pipelined shared-memory FFT architecture are respectively derived from memory-based FFT architecture design. For cached-memory FFT processor designs, we proposed a memory addressing algorithm applicable to multi-level cached-memory FFT architecture. With this algorithm, the memory at each level can be realized using merged-bank memory structure, so as to maintain the low-area design feature. Based on this scheme, an 8192-point cached-memory FFT processor for digital video broadcasting (DVB-T/H) applications is demonstrated, which uses 10.1-29.3% less area and 9.6-67.9% less power compared to a comparable multi-bank design.
Finally, this dissertation also explores high performance pipelined-memory FFT processor designs. A multi-path delay commutator (MDC) architecture and its corresponding addressing algorithm are developed. Using such a scheme, the data relocation mechanism can be implemented in a way to further integrate the merged-bank memory in the hardware implementation to reduce overall space requirements. In addition, we also proposed an MDC architecture that supports radix-3 operation which provides more efficient memory management in related applications.
[1] 	Y.-W. Lin, H.-Y. Liu, and C.-Y. Lee, “A 1-GS/s FFT/IFFT Processor for UWB Applications,” IEEE J. Solid-State Circuit, vol. 40, no. 8, pp. 1726-1735, Aug. 2005.
[2]	Y.-N. Chang and K. K. Parhi, “An Efficient Pipelined FFT Architecture,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 50, no. 6, pp. 322-325, June 2003.
[3]	Y.-T. Lin, P.-Y. Tsai, and T.-D. Chiueh, “Low-Power Variable-Length Fast Fourier Transform Processor,” IEE Proc. Comput. Digit. Tech., vol. 152, no. 4, pp. 499-506, July 2005.
[4]	Y.-N. Chang, “An Efficient VLSI Architecture for Normal I/O Order Pipeline FFT Design,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 55, no. 12, pp. 1234-1238, Dec. 2008.
[5]	H.-Y. Lee and I.-C. Park, “Balanced Binary-Tree Decomposition for Area-Efficient Pipelined FFT Processing,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 54, no. 4, pp. 889-900, April 2007.
[6]	H. Lee and M. Shin, “A High-Speed Low-Complexity Two-Parallel Radix-24 FFT/IFFT Processor for UWB Applications,” in Proc. IEEE Asia Solid-State Circuits Conf. (ASSCC’07), Nov. 2007, pp. 284-287.
[7]	S. He and M. Torkelson,  "Designing Pipeline FFT Processor for OFDM (de)Modulation," in  Proc. URSI Int. Symp. Signals, Syst., and Electron. (ISSSE’98), vol. 29, Oct. 1998, pp. 257-262.
[8]	S. He and M. Torkelson, “Design and Implementation of a 1024-Point Pipeline FFT Processor,” in Proc. IEEE Custom Integrated Circuits Conf. (CICC’98), May 1998, pp. 131-134.
[9]	E. E. Swartzlander, W. K. W. Young, and S. J. Joseph, “A Radix 4 Delay Commutator for Fast Fourier Transform Processor Implementation,” IEEE J. Solid-State Circuits, vol. sc-19, no. 5, pp. 702-709, Oct. 1984.
[10]	L. R. Rabiner and B. Gold, Theory and Application of Digital Signal Processing. New Jersey: Prentie-Hall, 1975.
[11]	H. L. Groginsky and G. A. Works, “A Pipeline Fast Fourier Transform,” IEEE Trans. Comput., vol. c-19, no. 11, pp. 1015-1019, Nov. 1970.
[12]	E. H. Wold and A. M. Despain, “Pipeline and Parallel-Pipeline FFT Processors for VLSI Implementations,” IEEE Trans. Comput., vol. c-33, no. 5, pp. 414-426, May 1984.
[13]	S. Bouguezel, M. O. Ahmad, and M. N. S. Swamy, “Improved Radix-4 and Radix-8 FFT Algorithm,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS’04), vol. 3, May 2004, pp. III-561-564.
[14]	A. Delaruelle, J. huisken, J. V. Loon, and F. Welten, “A Channel Demodulator IC for Digital Audio Broadcasting,” in Proc. IEEE Custom Integrated Circuits Conf. (CICC’94), May 1994, pp. 47-50.
[15]	R. Radhouane, P. Liu, and C. Modin, “Minimizing the Memory Requirement for Continuous Flow FFT Implementation: Continuous Flow Mixed Mode FFT (CFMM-FFT),” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS’00), vol. 1, May 2000, pp.116-119.
[16]	C.-H. Chang, C.-L. Wang, and Y.-T. Chang, “A Novel Memory-Based FFT Processor for DMT/OFDM Applications,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Process. (ICASSP’99), vol. 4, March 1999, pp. 1921-1924.
[17]	C.-L. Wang and C.-H. Chang, “A New Memory-Based FFT Processor for VDSL Transceivers,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS’03), vol. 4, May 2001, pp. 670-673.
[18]	C.-K. Chang, C.-P. Hung, and S.-G. Chen, “An Efficient Memory-Based FFT Architecture,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS’03), vol. 2, May 2003, pp. 129-132.
[19]	Y. Ma, “An Effective Memory Addressing Scheme for FFT Processors,” IEEE Trans. Signal Process., vol. 47, pp. 907-911, Mar. 1999.
[20]	X. Xiao, E. Oruklu, and J. Saniie, “An Efficient FFT Engine with Reduced Addressing Logic,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 55, no. 11, pp. 1149-1153, Nov. 2008.
[21]	J. H. Takala, T. S. Jarvinen, and H. T. Sorokin, “Conflict-Free Parallel Memory Access Scheme for FFT Processors,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS’03), vol. 4, May 2003, pp. 524-527.
[22]	H.-F. Lo, M.-D. Shieh, and C.-M. Wu, “Design of an Efficient FFT Processor for DAB System,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS’01), vol. 4, May 2001, pp. 654-657.
[23]	C.-M. Wu, M.-D. Shieh, H.-F. Lo, and M.-H. Hu, “Implementation of Channel Demodulator for DAB System,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS’03), vol. 2, May 2003, pp. 137-140.
[24]	L. G. Johnson, “Conflict Free Memory Addressing for Dedicated FFT Hardware,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 39, no. 5, pp. 312-316, May 1992.
[25]	D. Reisis and N. Vlassopoulos, “Conflict-Free Parallel Memory Accessing Techniques for FFT Architectures,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 11, pp. 3438–3447, Dec. 2008.
[26]	C.-F. Hsiao, Y. Chen, and C.-Y. Lee, “A Generalized Mixed-Radix Algorithm for Memory-Based FFT Processors,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 57, no. 1, pp. 26-30, Jan. 2010.
[27]	B. G. Jo and M. H. Sunwoo, “New Continuous-Flow Mixed-Radix (CFMR) FFT Processor Using Novel In-Place Strategy,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 5, pp.911-919, May 2005.
[28]	A. T. Jacobson, D. N. Truong, and B. M. Bass, “The Design of a Reconfigurable Continuous-Flow Mixed-Radix FFT Processor,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS’09), May 2009, pp.1133-1136.
[29]	P.-Y Tsai, and C.-Y. Lin, “A Generalized Conflict-Free Memory Addressing Scheme for Continuous-Flow Parallel-Processing FFT Processors with Rescheduling,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 12, pp. 2290-2302, Dec. 2011.
[30]	J.-Y. Yu and Y. Li, “An Efficient Conflict-Free Parallel Memory Access Scheme for Dual-Butterfly Constant Geometry Radix-2 FFT Processor,” in Proc. Int. Conf. Signal Process. (ICSP’08), Oct. 2008, pp.458-461.
[31]	W. Chao and H. Y. Peng, “Twin Butterfly High Throughput Parallel Architecture FFT Algorithm,” in Proc. Int. Symp. Inf. Sci. Eng. (ISISE’08), Dec. 2008, pp. 637-640.
[32]	M. Ayinala, Y. Lao, and K. K. Parhi, “An In-Place FFT Architecture for Real-Valued Signals,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 60, no. 10, pp. 652-656, Oct. 2013.
[33]	C.-L. Wey, W.-C. Tang, and S.-Y. Lin, “Efficient Memory-Based FFT Architectures for Digital Video Broadcasting (DVB-T/H),” in Proc. Int. Symp. VLSI Design, Autom. Test (VLSI-DAT’07), Apr. 2007, pp. 1-4.
[34]	C.-L. Wey, S.-Y. Lin, and W.-C. Tang, “Efficient Memory-based FFT Architectures for OFDM Applications,” in Proc. IEEE Int. Conf. Electro/Inf. Technol. (EIT’07), May 2007, pp. 345-350.
[35]	H. Saleh and E.-E. Swartzlander, “A Contention-Free Radix-2 8k-Point Fast Fourier Transform Engine Using Single-Port SRAMs,” in Proc. IEEE Region 3 Technical, Professional, and Student Conf. (SoutheastCon’08), Apr. 2008, pp.497-502.
[36]	S.-Y. Lin, C.-L. Wey, and M.-D. Shieh, “Low-Cost FFT Processor for DVB-T2 Applications,” IEEE Trans. Consum. Electron., vol. 56, no. 4, pp. 2072-2079, Nov. 2010.
[37]	J.-A. Hidalgo, J. Lopez, F. Arguello, and E. L. Zapata, “Area-Efficient Architecture for Fast Fourier Transform,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 46, no. 2, pp. 187-193, Feb. 1999.
[38]	E. Bidet, D. Castelain, C. Joanblanq, and P. Senn, ”A Fast Single Chip Implementation of 8192 Complex Point FFT,” IEEE J. of Solid-state Circuits, vol. 30, pp.300-305, March 1995.
[39]	B. M. Bass, “An Energy-Efficient Single-Chip FFT Processor,” in Proc. Symp. on VLSI Circuits, June 1996, pp. 164-165.
[40]	B. M. Baas, “A Low-Power, High-Performance, 1024-Point FFT Processor,” IEEE J. of Solid-State Circuits, vol. 34, pp. 380-387, Mar. 1999.
[41]	Y. Chen, Y.-W. Lin, and C.-Y. Lee, “A Block Scaling FFT/IFFT Processor for WiMAX Applications.” In Proc. IEEE Asian Solid-State Circuits Conf. (A-SSCC’06), Nov. 2006, pp. 203-206.
[42]	Y.-J. Cho, C.-L. Yu, T.-H. Yu, C.-Z. Zhan, and A.-Y. Wu, “Efficient Fast Fourier Transform Processor Design for DVB-H System,”in Proc. VLSI Design/CAD Symp., Aug. 2007, pp. 65-68.
[43]	Y.-W. Lin, H.-Y. Liu, and C.-Y. Lee, “A Dynamic Scaling FFT Processor for DVB-T Applications,” IEEE J. Solid-State Circuits, vol. 39, no. 11, pp. 2005–2013, Nov. 2004.
[44]	L. Jia, Y. Gao, and H. Tenhunen, “A Pipelined Shared-Memory Architecture for FFT Processor,” in Proc. IEEE Int. Midwest Symp. Circuits Syst. (MWSCAS’99), Aug. 1999, vol. 2, pp.804-807.
[45]	P.-Y. Tsai, T.-H. Lee, and T.-D. Chiueh, “Power-Efficient Continuous-Flow Memory-Based FFT Processor for WiMax OFDM Mode,” in Proc. Int. Symp. Intell. Signal Process. Commun. Syst. (ISPACS’06), Dec. 2006, pp. 622-625.
[46]	H. Xiao, A. Pan, Y. Chen, and X. Zeng, “Low-Cost Reconfigurable VLSI Architecture for Fast Fourier Transform,” IEEE Trans. Consum. Electron., vol. 54, no. 4, pp. 1617-1622, Nov. 2008.
[47]	K. Jung and H. Lee, “Low-Complexity Multi-Mode Memory-Based FFT Processor for DVB-T2 Applications,” IEICE Trans. Fundam., vol. E94-A, no. 11, pp. 2376-2383, Nov. 2011.
[48]	K. Jung and H. Lee, “Low-Cost Variable-Length FFT Processor for DVB-T/H Applications,” in Proc. IEEE Asia Pacific Conf. Circuits Syst. (APCCAS’10), Dec. 2010, pp. 752-755.
[49]	C.-M. Chen, C.-C. Hung, and Y.-H. Huang, “An Energy-Efficient Partial FFT Processor for OFDMA Communication System,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 57, no.2, pp. 136-140, Feb. 2010.
[50]	S.-Y. Lee, C.-C. Chen, C.-C. Lee, and C.-J. Chen, “A Low-Power VLSI Architecture for Shared-Memory FFT Processor with a Mixed-Radix Algorithm and a Simple Memory Control Scheme,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS’06), May 2006, pp. 157-160.
[51]	3GPP TS 36.201 V8.30 LTE Physical Layer – General Description, E-UTRA, Mar. 2009. (3GPP-LTE)
[52]	A. M. Despain, “Very Fast Fourier Transform Algorithms Hardware for Implementation,” IEEE Trnas. Comput., vol. c-28, no. 5, May 1979.
[53]	I. Cho, T. Patyk, D. Guevorkian, J. Takala, and S. Bhattacharyya, “Pipelined FFT for Wireless Communications Supporting 128-2048/1536-Point Transforms,” in Proc. IEEE Global Conf. Signal and Inf. Process. (GlobalSIP’13), Dec. 2013, pp. 1242-1245.
[54]	C. Yu, and M.-H. Yen, “Area-Efficient 128- to 2048/1536-Point Pipeline FFT Processor for LTE and Mobile WiMAX System”, to be published in IEEE Trans. Very Large Scale Integration (VLSI) System.
[55]	G. Wang, B. yin, I. Cho, J. R. Cavallaro, S. Bhattacharyya, and J. Takala, “Efficient Architecture Mapping of FFT/IFFT for Cognitive Radio Networks,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Process. (ICASSP’14), May 2014, pp. 3933-3937.
[56]	T. Patyk, D. Guevorkian, T. Pitkȁnen, P. Jȁȁskelȁinen, and J. Takala, “Low-Power Application-Specific FFT Processor for LTE applications,” in Proc. Int. Conf. Embedded Comput. Syst.: Architectures, Modeling and Simulation (IC-SAMOS’13), July 2013, pp. 28-32.
[57]	1536-Point FFT for 3GPP Long Term Evolution, 1st ed., Altera Corp., San Jose, CA, Oct. 2007.
[58]	S.-Y. Peng, K.-T. Shr, C.-M. Chen and Y.-H. Huang, “Energy-Efficient 128~2048/1536-Point FFT Processor with Resource Block Mapping for 3GPP-LTE System,” in Proc. IEEE Int. Conf. Green Circuits Syst. (ICGCS’10), June 2010, pp. 14-17.