| 研究生: |
尤敬泓 Yu, Jing-Hong |
|---|---|
| 論文名稱: |
應用於 GPU 內部之浮點數算數運算單元設計 Design of Floating-Point Arithmetic Process Unit for GPU |
| 指導教授: |
郭致宏
Kuo, Chih-Hung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2015 |
| 畢業學年度: | 104 |
| 語文別: | 中文 |
| 論文頁數: | 82 |
| 中文關鍵詞: | IEEE 754 、浮點數運算單元 、泰勒級數 、牛頓逼近法 |
| 外文關鍵詞: | IEEE 754, Floating point unit, Taylor series, Newton-raphson method |
| 相關次數: | 點閱:144 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
浮點數算數運算比起整數運算是較為複雜且難以用硬體實現,尤其是所使用的演算法會影響精準度、面積、速度、功耗。本論文提出基於IEEE 754標準的高精準度、低面積、快速運算的浮點數算數運算單元。我們探討牛頓逼近法(Newton-Raphson method)及泰勒級數展開式(Taylor series expansion method)計算浮點數算數函數,比較牛頓逼近法的迭代次數及泰勒級數展開式的階數所能達到的精準度及運算所花費的時間。此外,演算法使用查找表(Look Up Table, LUT)並利用Lloyd-Max 量化器降低查找表的尺寸且保持最佳的精準度。此浮點數算數運算單元使用VerilogHDL在TSMC 180nm製程下合成並模擬驗證。
This paper proposes a high speed and small area design of an IEEE-754 Floating-Point Process Unit (FPU) for GPU. We analyze the orders of Taylor series expansion and Lookup Table (LUT) to ensure less execution time. In addition, we use Lloyed-Max quantizer in Lookup Table to keep sizes of tables being only 56.902 Kbytes and keep high accuracy on our algorithms. The hardware uses pipeline architecture to improve throughput. It is modeled in VerilogHDL and synthesized in 180nm CMOS technology after verification. The simulation results of the proposed FPU demonstrates the frequency can reach 102MHz.
[1] Taylor, Brook, Methodus Incrementorum Directa et Inversa [Direct and Reverse Methods of Incrementation] (London, 1715), pages 21-23 (Proposition VII, Theorem 3, Corollary 2). Translated into English in D. J. Struik, A Source Book in Mathematics 1200-1800 (Cambridge, Massachusetts: Harvard University Press, 1969), pages 329-332.
[2] Institute of Electrical and Electronics Engineers (IEEE). 754-2008 IEEE standard for floating-point arithmetic. IEEE, pp. 158, 2008.
[3] D. Goldberg, “What Every Computer Scientist Should Know About Floating-Point Arithmetic,” ACM Computing Surveys, Vol 23, No 1, March 1991, Xerox Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, California 94304.
[4] D. E. Knuth. “The Art of Computer Programming,” Volume 2: Seminumerical Algorithms. Addison-Wesley, Reading, Massachusetts, 3rd edition, 1997.
[5] S.P. Lloyd, “Least Squares Quantization in PCM,” IEEE Transactions on Information Theory, Vol. IT-28, pp. 129-137, March, 1982.
[6] J. Max, “Quantizing for Minimum Distortion,” IRE Transactions on Information Theory, Vol. IT-6, pp. 7-12, March, 1960.
[7] P. Soderquist and M. Leeser, Division And Square Root Choosing the Right Implementation , IEEE Micro, July/August 1997, p. 56-66
[8] D. Piso, J. A. Pineiro and J. D. Bruguera, Analysis of the Impact of Different Methods for Division Square Root Computation in the Performance of a Superscalar Microprocessor, Proceedings of the Euromicro Symposium on Digital System Design 2002, p.543-555
[9] R.E. Fowkes, “Hardware efficient algorithms for trigonometric functions,” IEEE Transactions on Computers, vol. 42, pp. 235-239, Feb. 1993.
[10] J.C. Majithia, D. Levan, “A Note on Base-2 Logarithm Computations,” Proceedings of the IEEE, vol. 61, pp. 1519-1520, Oct. 1973.
[11] J.E. Volder, “The CORDIC Trigonometric Computer Technique,” IRE Transactions. Electron. Computers, vol. EC-8, no.3, pp. 330-334, Sept. 1959.
[12] N. Hanim, A. Rahman , A. Ibrahim, M. I. Jayes, “Numerical solving for nonlinear using higher order homotopy Taylor-perturbation,” New Trends In Mathematical Sciences, vol. 1, pp. 24-28, 2013
[13] T. J. Kwon, J. Draper, “Floating-point division and square root implementation using a Taylor-series expansion algorithm with reduced look-up tables,” 2008. MWSCAS 2008. 51st Midwest Symposium on Circuits and Systems, pp. 954-957, Aug. 2008.
[14] S.Y. Chen, D.H. Wang, T.J. Zhang, C.H. Hou, “Design and Implementation of a 64/32-bit Floating-point Division, Reciprocal, Square root, and Inverse Square root Unit,” 2006. ICSICT '06. 8th International Conference on Solid-State and Integrated Circuit Technology, pp. 1976-1979, 2006.
[15] M.X. Nguyen, Anh-Vu Dinh-Duc, “Hardware-based algorithm for Sine and Cosine computations using fixed point processor,” 2014 11th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), pp. 1-6, May 2014.
[16] H. Bessalah, M. Issad, K. Messaoudi, M. Anane, “Reconfigurable architecture for elementary functions evaluation,” 2009. DTIS '09. 4th International Conference on Design & Technology of Integrated Systems in Nanoscal Era, pp. 90-94, Apr. 2009.
[17] O. Vinyals, G. Friedland, “A Hardware-Independent Fast Logarithm Approximation with Adjustable Accuracy,” IEEE International Symposium on Multimedia (ISM), pp. 61-65, Dec. 2008.
[18] J.C. Majithia, D. Levan, “A note on base-2 logarithm computations,” Proceedings of the IEEE, pp. 1519-1520, Oct. 1973.
校內:2017-11-25公開