| 研究生: |
林威宏 Lin, Wei-Hung |
|---|---|
| 論文名稱: |
適用於RISC32之浮點數協同處理器 A Vector Floating-point Coprocessor for RISC32 |
| 指導教授: |
陳中和
Chen, Chung-Ho |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系碩士在職專班 Department of Electrical Engineering (on the job class) |
| 論文出版年: | 2014 |
| 畢業學年度: | 102 |
| 語文別: | 中文 |
| 論文頁數: | 45 |
| 中文關鍵詞: | 微處理器 、管線 、協同處理器 |
| 外文關鍵詞: | Microprocessor, Pipeline, Coprocessor |
| 相關次數: | 點閱:81 下載:4 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文主要以 ARM VFPv2 指令集架構建構一暫存器轉移層級(RTL)之高速浮點數協同處理器,以四級管線化的架構實現,並附加上危障前饋處理。
我們將實驗室日前發展的 NCKU-RISC32 CPU 經由 Coprocessor Interface 與浮點數協同處理器連結,使 NCKU-RISC32 CPU 具有硬體支援浮點運算的能力,藉以提高 CPU 於浮點運算上的性能。
在整體系統架構中,我們要針對 VFPv2 指令集中的三大類指令 (Data processing、Load & Store、Register transfer Instruction) 來做整體架構的規畫。另外在管線結構中對於 Data Hazard 的處理不但關係到整體動作的正確性,對於效能也具有極大的影響。對此我們針對 Data Hazard 問題設計 Forwarding unit 電路來解決。我們用 NC Verilog-Simulator 模擬出運算的結果並比較運算的性能,使用 ARM Real View Debugger3.1 來驗證運算的結果。最後燒入到 FPGA 開發版進行功能測試。根據實驗的結果,含有浮點數協同處理器的 CPU 可以提高執行速度平均達到5.5倍以上。
This thesis is mainly about building a high-speed Vector Floating-point Coprocessor(VFP)with register transfer level (RTL) under the ARM VFPv2 instruction set. In terms of register transfer level design, this coprocessor is implemented with 4-stage pipeline architecture along with a forwarding unit.
We have integrated the VFP coprocessor and NCKU-RISC32 processor through the Coprocessor Interface. Therefore, the NCKU-RISC32 processor has a hardware support on floating point operation. The VFPv2 ISA has three types of instructions (Data Processing, Load/Store, and Register transfer Instruction). In addition, the handling of data hazard in pipeline architecture is not only related to the accuracy of the output result, but also the performance. The data hazard problem has been resolved by a forwarding unit design.
We simulate and compare the performance by using the NC-Verilog Simulator and verify the output result by using the ARM Real View Debugger3.1. Finally, we verify the processor function by using the Xilinx FPGA development board. According to the experimental results, the VFP Coprocessor can speed up to about 5.5 times on average.
[1]Hsun-Wei Kao, “Embedded Processor Verification using Particular Characteristics of Linux Operating System,” 2006 master thesis of National Cheng Kung University, Tainan, Taiwan, July, 2005
[2]J.L. Patterson and D.A Hennessy, Computer Organization and Design – The Hardware/Software Interface 3rd edition. 2002
[3]IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Std 754-1985, New York, 1985
[4]OpenCores Floating-point IP, http://opencores.org/
[5]ARM Corporation, “ARM946E-STechnical Reference Manual,” 2007
[6]ARM Corporation, “ARM Architecture Reference Manual,” 2005
[7]ARM Corporation, “ARM Architecture Reference Manual– Part C Vector Floating-point Architecture.” 2005
[8]王振傑,“雙指令集架構之嵌入式微處理器的設計與實作”碩士論文,國立成功大學電腦與信工程研究所, 2005
[9]ARM Real View Development Suite 3.1 (RVDS),http://infocenter.arm.com/help/index.jsp
[10]ARM Real View Debugger user guide, http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0153k/index.html
[11]Xilinx ISE design Suite: http://www.xilinx.com/support.html
[12]SCDK2: http://www.socle-tech.com/en/service_63.html
[13]Pi benchmark, http://www.60bits.net/msu/mycomp/bench.htm
[14]Whetstone benchmark, http://en.wikipedia.org/wiki/Whetstone_(benchmark)
[15]fbench benchmark, https://www.fourmilab.ch/fbench/fbench.html
[16]ffbench benchmark, https://www.fourmilab.ch/fbench/ffbench.html
[17]Synopsys DesignWare IP, http://www.synopsys.com/IP/Pages/default.aspx