| 研究生: |
林世祥 Lin, Shih-Hsiang |
|---|---|
| 論文名稱: |
低功耗排序單元與中值濾波器之VLSI設計 The Low-power VLSI Design for Sorting Unit and Median Filter |
| 指導教授: |
陳培殷
Chen, Pei-Yin |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 英文 |
| 論文頁數: | 92 |
| 中文關鍵詞: | 邏輯最佳化 、低功率 、高效能 、中值濾波器 、排序電路 、硬體架構 |
| 外文關鍵詞: | Logic optimization, low-power, high-efficiency, median filter, sorting network, VLSI architecture |
| 相關次數: | 點閱:186 下載:21 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文針對排序演算法以及中值濾波器提出了硬體架構的設計及實現,其中包括了低功耗排序電路設計、低功耗中值濾波器設計、以及高效能之模組化中值濾波器架構。
首先我們基於電路之動態功耗分析,設計了一個使資料搬移次數最小化的排序電路,藉由減少電路中訊號的狀態轉換達到節省功耗的目的。預期所減少的狀態轉換次數藉由機率與期望值進行計算。比對實驗結後,系統實際運作時的資料搬移次數符合數學模型之分析。同時,實驗數據顯示在與現有之架構使用同樣的工作頻率進行比較時,所提出的排序電路架構平均降低的動態功耗為30.45%,在資料寬度極大時最多可以節省64.9%的功耗。
在低功耗中值濾波器設計中,本論文進一步考慮了動態功耗中的各個變因。我們設計了一個多重時序的佇列結構,不僅能使資料搬移次數最小化,同時能關閉非運作狀態的暫存器。佇列結構中的訊號及狀態轉換次數同樣藉由統計分析,實驗結果顯示這樣的電路架構不會影響執行時的最大工作頻率及資料吞吐量,平均降低的動態功耗為25.1%。
高效能之模組化中值濾波器架構則是將字級運算轉換成位元級運算,透過完整觀察並分析電路運作之行為,我們在硬體實作時將所有運算式進行最佳化之運算化簡。同時,模組化中值濾波器可以透過不同組合提供使用者調整,使其適用於低成本或是高效能之情境。在成本最佳化的配置下其電路面績優於現有之架構,且最高工作頻率亦快於現有之架構。
本論文所提出之電路均由Verilog 硬體描述語言設計,電路合成是利用Synopsys Design Compiler 以及TSMC 90-nm 標準元件庫,電路的佈局與繞線是採用Synopsys IC Compiler,功率消耗是採用Synopsys PrimeTime PX 量測電路佈局後模擬之結果。依據合成結果與功率消耗量測,我們所提出的前二電路架構設計在低功率消耗上具有極佳的競爭力,第三個架構則滿足使用者在不同情境下需要高效能或低成本的考量。
This thesis presents hardware design and implementation of sorting algorithm and median filter architecture. There are mainly three portion in this thesis, including the design of low-power sorting unit, design of low-power median filter, and modular design of bit-level median filter.
The dynamic power dissipation of VSLI is analyzed when designing low-power sorting unit. The data migration in register or signal transition is minimized in order to reduce the total power consumption. The comparing modules move the indexes of samples instead of moving the input data directly. Statistical analysis is conducted to predict the reduction of switching activities, and simulation results highlight the reliability and accuracy of our prediction. Experiment results show that the proposed method has lower power dissipation than
previous methods had when the systems work on a same clock rate, the power consumption is reduced by 64.9% at most and high-throughput performance can also be achieved.
For median filter, a novel FIFO structure and mathematical model for controlling the clock signals attached to circuit is presented by analyzing the behavior of the filter. The design immobilizes the data in registers and reduces not only signal transitions but also switching activities, thereby reducing the total dynamic power consumption. Furthermore, the proposed architecture provides high-speed computation. Experimental results show that the proposed method is more energy efficient than existing designs. The power consumption is reduced by 25.1% on average.
Bit-level median filter is constructed by modular architecture hierarchically. Different types of submodules could be applied to form a customized architecture in order to meet different constraints and requirements. Hardware-oriented optimization is performed to achieve the optimal configurations when the input size and data length were changed. Resource consumption is reduced by 23.29% when compared to state-of-the-art design. The experimental results show that the proposed cascaded architecture is superior to existing designs
in terms of maximal operating speed and resource costs.
The VLSI architectures of the proposed designs were implemented by using Verilog HDL and synthesized by Synopsys Design Compiler with the TSMC 90-nm cell library. Synopsys IC Compiler was adopted for automatic placement and routing(APR). Switching activity interchange format (SAIF) files from the post-layout simulation are used to produce reliable measurements of power dissipation.
[1] D. E. Knuth, The Art of Computer Programming, vol. 3, Sorting and Searching. Reading, MA, USA: Addison-Wesley, 1998.
[2] L. Njejimana et al., “Design of a real-time FPGA-based data acquisition architecture for the LabPET II: An APD-based scanner dedicated to small animal PET imaging,” IEEE Trans. Nucl. Sci., vol. 60, no. 5, pp. 3633–3638, Oct. 2013.
[3] A. Colavita, E. Mumolo, and G. Capello,“A Novel Sorting Algorithm and Its Application to a Gam-ma-Ray Telescope Asynchronous Data Acquisition System,” Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 394, no. 3, pp. 374-380, 1997.
[4] A. Gabiger-Rose, M. Kube, R. Weigel, and R. Rose, “An FPGA-based fully synchronized design of a bilateral filter for real-time image denoising,” IEEE Trans. Ind. Electron., vol. 61, no. 8, pp. 4093–4104, Aug. 2014.
[5] Giorgos Dimitrakopoulos, Christos Mavrokefalidis, Kostas Galanopoulos and Dimitris Niolos, “Sorter based permutation units for Media Enhanced Processors,” IEEE Trans. Very Large Scale In-tegr. (VLSI)Syst., vol. 15, no. 6, pp 711-715, June 2007.
[6] S. Chen, T. Zhang, and Y. Xin, “Relaxed K-best MIMO signal detector design and VLSI implementa-tion,” IEEE Trans. Very Large Scale Integration (VLSI) Syst., vol. 15, no. 3, pp. 328–337, Mar. 2007.
[7] J.S. Lin, S.H. Fang, Y.H. Jen, and M.-D. Shieh, “Design of high throughput MIMO detectors using sort-free and early-pruned techniques,” Proc. IEEE TENCON, pp. 1513–1516, Nov. 2010.
[8] M. Shabany and P. G. Gulak, “A 675 Mbps, 4×4 64-QAM K-best MIMO detector in 0.13 μm CMOS,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 1, pp. 135–147, Jan. 2012.
[9] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein, Introduction to Algo-rithms, The MIT Press, Third Edition, 2009.
[10] Chin-Long Wey, Ming-Der Shieh, and Shin-Yo Lin, “Algorithms of finding the first two minimum values and their hardware implementation,” IEEE Trans. Circuits and Systems I, vol. 55, no. 11, pp. 3430–3437, 2008.
[11] A. Farmahini-Farahani, A. Gregerson, M. Schulte, and K. Compton, “Modular High-Throughput and Low-Latency Sorting Units for FPGAs in the Large Hadron Collider,” Proc. IEEE Int’l Symp. Appli-cation Specific Processors, pp. 38-45, June 2011.
[12] D. Koch and J. Torresen, “FPGA Sort: a high performance sorting architecture exploiting run-time reconfiguration on FPGAs for large problem sorting,” in Proc. of International Symposium on Field Programmable Gate Arrays, February 2011, pp. 45–54.
[ 13] C. Condo, M. Martina, and G. Masera, “VLSI implementation of a multi-mode turbo/LDPC decoder architecture,” IEEE Trans. Circuits Syst. I,Reg. Papers, vol. 60, no. 6, pp. 1441–1454, Jun. 2013
[ 14] E. Li, D. Declercq, and K. Gunnam, “Trellis-based extended min-sum algorithm for non-binary LDPC codes and its hardware structure,” IEEE Trans. Commun., vol. 61, no. 7, pp. 2600–2611, Jul. 2013.
[ 15] Z. Guo and P. Nilsson, “Algorithm and implementation of the K-best sphere decoding for MIMO detection,” IEEE J. Sel. Areas Commun., vol. 24, no. 3, pp. 491–503, Mar. 2006.
[ 16] B. Cerato, G. Masera, and E. Viterbo, “Decoding the golden code: A VLSI design,” IEEE Trans. Very Large ScaleIntegr.(VLSI)Syst., vol. 17,no. 1, pp. 156–160, 2009.
[ 17] M. Mahdavi and M. Shabany, “Novel MIMO detection algorithm for high-order constellations in the complex domain,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 21, pp. 834–847, May 2013.
[ 18] R. C. H. Chang et al., “Implementation of a high-throughput modified merge sort in MIMO detection systems,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 61, no. 9, pp. 2730–2737, Sep. 2014.
[ 19] J. Chhugani, A. D. Nguyen, V. W. Lee, W. Macy, M. Hagog, Y. K. Chen, A. Baransi, S. Kumar, and P. Dubey, “Efficient implementation of sorting on multi-core SIMD CPU architecture,” Proc. VLDB Endow., vol. 1, no. 2, pp. 1313–1324, Aug. 2008.
[ 20] J. O. Cadenas, G. M. Megson and R. S. Sherratt, "Median Filter Architecture by Accumulative Paral-lel Counters," in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 62, no. 7, pp. 661-665, July 2015.
[ 21] B. Parhami and C. H. Yeh, “Accumulative parallel counters,” in Proc.23rd Asilomar Conf. Signals, Syst., Comput., Seattle, WA, USA, 1995, pp. 513–516.
[ 22] K.E. Batcher, “Sorting Networks and Their Applications, Proc. AFIPSProc. Spring Joint Computer Conf., pp. 307-314, 1968.
[ 23] A. Farmahini-Farahani, H. J. Duwe, III, M. J. Schulte, and K. Compton, “Modular design of high-throughput, low-latency sorting units,” IEEE Trans. on Computers, vol. 62, no. 7, pp. 1389–1402, July 2013.
[ 24] R. D. Chen, P. Y. Chen, and C. H. Yeh, “A low-power architecture for the design of a one-dimensional median filter,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 62, no. 3, pp. 266–270, Mar. 2015.
[ 25] D. S. Richards, "VLSI median filters," IEEE Trans. Acoust., Speech, Signal Process., vol. 38, no. 1, pp. 145–153, Jan. 1990.
[26] S. Marshall, Logic-Based Nonlinear Image Processing. Bellingham, WA: SPIE Press, 2007, pp. 57–71.
[27] G. Mikolajczak and J. Peksinski, "Estimation of the variance of noise in digital images using a me-dian filter," 2016 39th International Conference on Telecommunications and Signal Processing (TSP), Vienna, 2016, pp. 489–492.
[28] Q. Yang et al., "Fusion of Median and Bilateral Filtering for Range Image Upsampling," IEEE Transactions on Image Processing, vol. 22, no. 12, pp. 4841-4852, Dec. 2013.
[29] A. D. Tafti and E. Mirsadeghi, “Neural Network with Median Filter for Image Noise Reduction,” In-ternational Conference on Mechatronic Systems and Materials, Inpress, 2012.
[30] J. Cadenas, G. M. Megson, R. S. Sherratt, and P. Huerta, “Fast median calculation method,” Electron. Lett., vol. 48, no. 10, pp. 558–560, May 2012
[31] D. Prokin and M. Prokin, "Low Hardware Complexity Pipelined Rank Filter," in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 57, no. 6, pp. 446–450, June 2010.
[32] V. G. Moshnyaga and K. Hashimoto, “An efficient implementation of 1-D median filter,” in Proc. IEEE Int. MWSCAS, 2009, pp. 451–454.
[33] R.-D. Chen, P.-Y. Chen, and C.-H. Yeh, "Design of an area-efficient one-dimensional median filter," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 60, no. 10, pp. 662–666, Oct. 2013.
[34] T. S. Le, N. T. Do and K. Hamamoto, "Speed up temporal median filter and its application in back-ground estimation,"2016 IEEE International Conference on Computing & Communication Tech-nologies, Research, Innovation, and Vision for the Future, Hanoi, 2016, pp. 175–180.
[35] E. Nikahd, P. Behnam and R. Sameni, "High-Speed Hardware Implementation of Fixed and Runtime Variable Window Length 1-D Median Filters," in IEEE Transactions on Circuits and Systems II: Ex-press Briefs, vol. 63, no. 5, pp. 478–482, May 2016.
[36] E. Kayalvizhi and N. Sasirekha, "A modified low power architecture for Gabor filter," 2016 Interna-tional Conference on Communication and Signal Processing (ICCSP), Melmaruvathur, 2016, pp. 597–600.
[37] J. O. Cadenas, G. M. Megson and R. S. Sherratt, "Median Filter Architecture by Accumulative Paral-lel Counters," in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 62, no. 7, pp. 661-665, July 2015.
[38] B. Parhami and C. H. Yeh, “Accumulative parallel counters,” in Proc.23rd Asilomar Conf. Signals, Syst., Comput., Seattle, WA, USA, 1995, pp. 513–516.
[39] Doh-Kyung Kim, Ki-Won Kwon, Jong-Chan Choi and Chul-Dong Lee, "Reusable intellectual prop-erty cores in PC data protection ASIC design," ASICs, 1999. AP-ASIC '99. The First IEEE Asia Pa-cific Conference on, Seoul, 1999, pp. 278-281.
[40] A. Bogliolo, R. Corgnati, E. Macii and M. Poncino, "Parameterized RTL power models for soft mac-ros," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 9, no. 6, pp. 880-887, Dec. 2001.
[41] F. Ren, W. Xu and D. Markovic, "Scalable and parameterised VLSI architecture for efficient sparse approximation in FPGAs and SoCs," in Electronics Letters, vol. 49, no. 23, pp. 1440-1441, Nov. 7 2013.