| 研究生: |
游世杰 Yu, Shih-Chieh |
|---|---|
| 論文名稱: |
俱雙重精準度函數近似之特殊函數單元設計 Design of Special Function Unit with Dual-Precision Function Approximation |
| 指導教授: |
郭致宏
Kuo, Chih-Hung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2017 |
| 畢業學年度: | 105 |
| 語文別: | 中文 |
| 論文頁數: | 91 |
| 中文關鍵詞: | 分段多項式近似法 、泰勒級數 、查找表 、特殊函數單元 、雙重精準度運算 、低功率 |
| 外文關鍵詞: | IEEE 754,Special function unit, Taylor series, Piecewise polynomial approximation, dual-precision |
| 相關次數: | 點閱:82 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
特殊函數單元的設計中會使用多項式近似目標函數,多項式近似運算的係數會預先計算求得並儲存於查找表中加速硬體計算,然而查找表卻占了整體大部分的面積資源。本論文提出使用分段多項式近似的運算架構,透過不均勻分段(Non-uniform segmentation)設計,減少查找表所需使用面積。我們探討泰勒級數展開(Taylor series expansion method)及最小化極大近似法(Minimax approximation)的近似效果及演算法特性,利用泰勒級數展開較易於分析的特性,透過計算泰勒多項式的餘項設計分段演算法。硬體運算中可經由精準度選擇線,減少多項式運算階數並輸出低精準度的近似運算,設計可變精準度之低功耗運算架構。本篇論文硬體架構使用VerilogHDL語言撰寫,以Design Compiler在TSMC 40nm製程下合成並模擬驗證,並使用Memory Compiler模擬記憶體查找表單元,藉由Primetime程式分析雙重精準度近似單元於不同模式中的功耗。
This paper presents a non-uniform segmentation method for a special function unit based on Taylor series approximation. Compared with the uniform segmentation, our method reduces the sizes of the look-up tables by decreasing the number of segments. By reducing the calculation of polynomial order and partitioning the multipliers, our architecture can output low-precision result with lower power consumption.
The proposed design is synthesized with TSMC 40-nm cell library, and the implementation results based on 2^nd order polynomial approximation show that our method can efficiently reduce the total area by 15% in single-precision.
[1] B. Taylor, “Methodus incrementorum directa et inversa,” Pearsonianis, London, 1715.
[2] Phillips, George M. “Interpolation and approximation by polynomials,” Springer Science & Business Media, 2003.
[3] IEEE Standard for Floating-Point Arithmetic IEEE Std 754-2008 In IEEE Std 754-2008 (August 2008), pp. 1-70
[4] D. Goldberg, “What every computer scientist should know about floating-point arithmetic,” ACM Computing Surveys, Vol 23, No 1, March 1991.
[5] S. A. Tawfik and H. A. H. Fahmy, "Algorithmic truncation of minimax polynomial coefficients," IEEE International Symposium on Circuits and Systems, Island of Kos, pp. 2421 -2424, May 2006.
[6] J. A. Pineiro, S. F. Oberman, J. M. Muller, and J. D. Bruguera, “High-speed function approximation using a minimax quadratic interpolator,” IEEE Transactions on Computers, vol. 54, no. 3, pp. 304–318, March 2005.
[7] C. Chen, “High-order Taylor series approximation for efficient computation of elementary functions,” IET Computers & Digital Techniques, vol. 9, no. 6, pp. 328-335, Nov. 2015.
[8] H. J. Ko, S. F. Hsiao and W. L. Huang, “A new non-uniform segmentation and addressing remapping strategy for hardware-oriented function evaluators based on polynomial approximation,” Proceedings of 2010 IEEE International Symposium on Circuits and Systems, Paris, , pp. 4153-4156 2010.
[9] S. F. Hsiao and K. C. Huang, "Low-power dual-precision table-based function evaluation supporting dynamic precision changes," Asia Pacific Conference on Circuits and Systems (APCCAS), Jeju, South Korea, 2016, pp. 710-712.
[10] S. F. Hsiao, H. J. Ko, Y. L. Tseng, W. L. Huang, S. H. Lin and C. S. Wen, “Design of Hardware Function Evaluators Using Low-Overhead Nonuniform Segmentation With Address Remapping,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 21, no. 5, pp. 875-886, May 2013.
[11] S. Lloyd, “Least squares quantization in PCM,” IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129-137, Mar 1982.
[12] J. Max, “Quantizing for minimum distortion,” IRE Transactions on Information Theory, vol. 6, no. 1, pp. 7-12, March 1960.
[13] M. J. Schulte and J. E. Stine, “Approximating elementary functions with symmetric bipartite tables,” IEEE Transactions on Computers, vol. 48, no. 8, pp. 842-847, Aug 1999.
[14] P. T. P. Tang, “Table-driven implementation of the logarithm function in IEEE floating-point arithmetic.” ACM Transactions on Mathematical Software (TOMS), vol.16, issue 4, pp. 378-400, Dec. 1990.
[15] D. Piso, J. A. Pineiro and J. D. Bruguera, “Analysis of the impact of different methods for division/square root computation in the performance of a superscalar microprocessor,” Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools, pp. 218-225, Jan. 2002.
[16] Wang, Ling, and Joseph Needham. “Horner's method in Chinese mathematics: Its origins in the root-extraction procedures of the Han dynasty,” T'oung Pao 43. Livr. 5, pp. 345-401, 1955.
[17] K. Diefendorff, P. K. Dubey, R. Hochsprung and H. Scale, "AltiVec extension to PowerPC accelerates media processing," IEEE Micro, vol. 20, no. 2, pp. 85-95, Mar/Apr 2000.
[18] P. Markstein, IA-64 and Elementary Functions. Hewlett-Packard Professional Books, 2000.
[19] D. U. Lee, W. Luk, J. Villasenor and P. Y. K. Cheung, “Hierarchical segmentation schemes for function evaluation,” Proceedings of 2003 IEEE International Conference on Field-Programmable Technology (FPT), pp. 92-99, Mar. 2003.
[20] D. U. Lee, W. Luk, J. Villasenor and P. Y. K. Cheung, “Non-uniform segmentation for hardware function evaluation,” International Conference on Field Programmable Logic and Applications. Springer Berlin Heidelberg, 2003.
[21] D. U. Lee, R. C. C. Cheung, W. Luk and J. D. Villasenor, "Hierarchical Segmentation for Hardware Function Evaluation," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 17, no. 1, pp. 103-116, Jan. 2009.
[22] D. D. Caro, N. Petra and A. G. M. Strollo, "High-Performance Special Function Unit for Programmable 3-D Graphics Processors," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 56, no. 9, pp. 1968-1978, Sept. 2009.
[23] D. De Caro, N. Petra and A. G. M. Strollo, "A High Performance Floating-Point Special Function Unit Using Constrained Piecewise Quadratic Approximation," 2008 IEEE International Symposium on Circuits and Systems, Seattle, WA, pp. 472-475, June 2008.
[24] F. de Dinechin, A. Tisserand, “Multipartite table methods,” IEEE Transactions on Computers, vol. 54, no. 3, pp. 319-330, March 2005.
[25] G. Cao, H. Du, P. Wang, Q. Du and J. Ding, “A Piecewise Cubic Polynomial Interpolation Algorithm for Approximating Elementary Function,” 14th International Conference on Computer-Aided Design and Computer Graphics (CAD/Graphics), Xi'an, 2015, pp. 57-64.
[26] Shen-Fu Hsiao, Ping-Chung Wei and Ching-Pin Lin, "An automatic hardware generator for special arithmetic functions using various ROM-based approximation approaches," International Symposium on Circuits and Systems, 2008, pp. 468-471.