簡易檢索 / 詳目顯示

研究生: 郭君宇
Guo, Jun-Yu
論文名稱: 以FPGA為基礎之可擴充異質多核心硬體平台與擴展到OpenCL框架下之開發
A FPGA Based Scalable Heterogeneous Many-Core HW Platform with Extension to OpenCL Framework Development
指導教授: 蘇文鈺
Su, Wen-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 英文
論文頁數: 38
中文關鍵詞: 異質多核心多核心平台
外文關鍵詞: FPGA, OpenRISC, OpenCL, Many-Core
相關次數: 點閱:70下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 多核心系統在現今是非常熱門的議題,被廣泛應用在許多嵌入式系統上,例如智慧型手機和平板電腦上通常都配備著兩個以上的CPU核心。因此如何有效的利用系統運算能力和在其上進行軟體開發是接踵而至的議題。OpenCL定義了平行程式運算的標準框架,適用於跨平台異質多核心系統,此類平台可以同時擁有多個CPU、GPUs、DSPs或是其它處理器。
    我們設計了一個以FPGA與OpenRISC為基礎之多核心平台,我們稱之為Multi-Jerry,在這個平台上我們利用個人電腦(PC)或者是ARM Core當作主CPU(Host),移植一組Multi-Jerry當作協助處理器(Co-Processors),Multi-Jerry平台是以Verilog硬體描述語言撰寫而成,其運算核心採用開放原始碼OpenRISC 32-bits IP,並加入我們自己設計的控制電路,同時我們也提供必要的協同系統軟體。使用者可以透過我們提供的應用程式介面(APIs)去存取個別的OpenRISC記憶體空間、載入程式到其上執行並回收執行結果。系統運作類似CUDA程式運用GPUs去處理運算工作的模式,其不同點是CUDA GPUs在同一時間點上只能根據同一段程式碼來處理不同組的資料,而我們的系統則沒有這種限制。
    此系統目前已經成功在Xilinx Virtex-5、Spartan-6 等不同的FPGA卡上進行運作,透過USB的介面我們可以從PC端將程式碼載入OpenRISC Core的記憶體空間,並對其記憶體空間進行存取。並將Multi-Jerry移植到SCREAM Multi-ARM Platform上當作硬體加速單元使用,同時我們也設計了一個source to source的Parser,這個Parser可以幫助我們剖析來源程式碼並自動地產生相對應的APIs。所以當使用者想進行平行程式開發時,可以參考我們所提供的API的規定去撰寫程式或是透過我們的Parser來自動產生程式碼。最後我們整合這個平台到OpenCL的框架下,目前我們已可以成功編譯OpenCL的程式碼,並載入OpenCL kernel function到此平台上執行,也就是初步成功的打造Host與Device端的運作介面,不過,要符合OpenCL的完整框架,還需要進一步的努力,這是我們未來要繼續努力的工作。

    Today the Many-Core system is very popular and applied to many embedded devices, like smart phone and tablet PC that are with two or more CPUs. Because of that how to utilize full computing power and the developing software tools are the arising issues. OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of different processing units, such as CPUs, GPUs, DSPs and other processors.
    In this work we develop a Many-Core hardware FPGA platform using OpenRISC. We can use either PC or ARM as the host processing unit and the Multi-Jerry as co-processing system. The Multi-Jerry is designed by Verilog a hardware description language around the open source OpenRISC 32-bits core IP with proprietary control HW as well as the associate system software. User can directly access OpenRISC core’s memory space and download program to each core through the provided application programming interfaces (APIs). It works like CUDA using GPUs for computing tasks. However, the difference is GPUs can execute only one program for different sets of data at the same time, but our platform can execute different programs at the same time.
    This platform had been successfully tested on Xilinx Virtex-5 and Spartan-6 FPGA boards. Through the USB interface we can download programs and access core’s memory from the PC side using the APIs. Moreover, we ported the Multi-Jerry as a hardware accelerator on SCREAM Multi-ARM Platform, designed a source to source Parser which could parse the source program and generate corresponding APIs codes automatically. So user can either write parallel code using our APIs themselves or use our Parser to have parallel programs working on the platform. Finally we integrate this platform under OpenCL framework, we can compile the OpenCL code and execute OpenCL kernel functions on this platform. That is, we have successfully provided a preliminary host-device working model for OpenCL operations. However, to be fully compatible with OpenCL framework is still our major work in the future.

    CHAPTER 1 INTRODUCTION 2 1.1 THE ORGANIZATION OF THIS THESIS 2 1.2 MOTIVATION 2 1.3 THIS WORK 3 CHAPTER 2 BACKGROUND 5 2.1 OPENRISC 5 2.2 SCREAM MULTI-ARM PLATFORM 9 2.3 CETUS 12 2.4 OPENCL 13 CHAPTER 3 HOST PLUS SINGLE OPENRISC OPERATION 15 3.1 INTRODUCTION 15 3.2 HARDWARE ARCHITECTURE 15 3.3 COMMUNICATION BETWEEN HOST PC AND FPGA 16 3.4 ARRANGING THE OBJECTS IN A PROGRAM'S ADDRESS SPACE 17 3.5 EXAMPLE 19 3.6 TIMING PERFORMANCE 21 CHAPTER 4 MULTI-JERRY 22 4.1 INTRODUCTION 22 4.2 HARDWARE ARCHITECTURE 22 4.3 MULTI-JERRY INTERFACE 23 4.4 MULTI-JERRY API 27 4.5 SOURCE TO SOURCE PARSER 28 CHAPTER 5 PORTING MULTI-JERRY ON SCREAM MULTI-ARM PLATFORM 30 5.1 INTRODUCTION 30 5.2 SYSTEM ARCHITECTURE 30 5.3 COMMUNICATION INTERFACE BETWEEN ARM AND FPGA 32 CHAPTER 6 OUR OPENCL RUNTIME 33 6.1 INTRODUCTION 33 6.2 OPENCL APPLICATION OPERATION 33 6.3 SUPPORTED RUNTIME APIS 34 CHAPTER 7 CONCLUSIONS AND FUTURE WORKS 36 REFERENCE 37

    [1] [Online] available: http://opencores.org/or1k/OR1K:Community_Portal
    [2] Jhe-RongLiu, “A Many-Processor Prototyping SW/HW Framework and Component Based Dataflow Programming” NCKU, Master thesis, 2011
    [3] Aaftab Munshi, “The OpenCL Specification Version: 1.2”, Khronos OpenCL Working Group
    [4] [Online] available: http://docs.nvidia.com/cuda/cuda-c-programming-guide/
    [5] Chirag Dave, Hansang Bae, Seung-Jai Min, Seyong Lee, Rudolf Eigenmann, Samuel Midkiff, “Cetus: A Source-to-Source Compiler Infrastructure for Multicores”, IEEE Computer, vol. 42, no. 12, pp 36-42, Dec. 2009.s
    [6] [Online] available: http://cdn.opencores.org/downloads/wbspec_b4.pdf
    [7] OpenCores, ”OpenRISC 1000 Architecture Manual”, July 13, 2004
    [8] [Online] available: http://opencores.org/or1k/OR1200_OpenRISC_Processor
    [9] Julius Baxter, OpenRISC 1200 IP Core Specification (Preliminary Draft) v0.11, January 19, 2011
    [10] Shih-Tun Yen, “Exploiting High Speed FPGA Interconnect to Improve Performance of Message Passing”, NCKU, Master thesis, 2011
    [11] [Online] available: http://www.open-mpi.org/
    [12] Luiz Fernando Capretz, Miriam A. M. Capretz, Dahai Li, ”Component-Based Software Development”, The 27th Annual Conference of the IEEE Industrial Electronics Society, 2001
    [13] [Online] available: http://cetus.ecn.purdue.edu/
    [14] Troy A. Johnson, Sang-Ik Lee, Long Fei, Ayon Basumallik, Gautam Upadhyaya, Rudolf Eigenmann, Samuel Midkiff , “Experiences in using Cetus for Source-to-Source transformations”, LCPC '04
    [15] Hansang Bae, Leonardo Bachega, Chirag Dave, Sang-Ik Lee, Seyong Lee, Seung-Jai Min, Rudolf Eigenmann, Samuel Midkiff, “Automatic Parallelization with Cetus”, Technical Report HPCLAB, ECE, Purdue University
    [16] Seyong Lee, Seung-Jai Min, and Rudolf Eigenmann, “OpenMP to GPGPU: a compiler framework for automatic translation and optimization”, PPoPP '09
    [17] Seyong Lee and Rudolf Eigenmann,”OpenMPC: Extended OpenMP Programming and Tuning for GPUs”, SC '10 2010
    [18] Samsung S5PC100 user manual
    [19] The GNU linker.pdf

    無法下載圖示 校內:2023-12-31公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE