簡易檢索 / 詳目顯示

研究生: 簡榮胤
Chien, Jung-Yin
論文名稱: 應用資料流模型於IBM Cell寬頻引擎處理器之開發環境
A Development Environment of Dataflow Programming Model with Application to IBM Cell Broadband Engine
指導教授: 蘇文鈺
Su, Alvin
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 中文
論文頁數: 57
中文關鍵詞: 資料流開發環境
外文關鍵詞: Dataflow, Development Environment
相關次數: 點閱:109下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 多核心處理器提供了大量的運算能力但也帶來了在撰寫平行運算程式的複雜。現今評估平行運算的效能其中之一的指標是能否隨著核心數目的增長,能夠有效率地降低程式的執行時間。傳統式平行程式的設計方法,它必須在一個選定的平台下,花了很多精力與時間為程式除錯和將效能提升到最好,而當平台改變的時候,整個設計流程可能必須重新來過,非常浪費時間。因此,對於在多核心處理器下撰寫程式,一個有彈性的設計方法是必須的。
    在本篇論文中,我們展示了Dataflow設計方法並將這套方法實作在擁有多核心處理器的Cell寬頻引擎平台。Dataflow模型對底層的硬體提供了高度的抽象化。我們將應用程式的計算與傳輸部份分別以模組(module)及通道(channel)代表。為了展示所提出的Dataflow模型,我們以MPEG-4 SP解碼器做為本論文範例。所以,MPEG-4解碼器的平行方法將在本篇論文討論,並將它以Dataflow的方式實現。為了能均衡分配每顆核心的工作負擔,我們必須剖析應用程式上模組間的傳輸量與計算量並實作核心之間的傳輸同步機制。根據剖析的資訊,我們提出一套分配與排程演算法,盡可能均衡地將任務分配給多核心處理器。我們也提出了在Cell平台下有效率的同步方式。我們也將討論Dataflow所帶來的效率與速度。結果也顯示出效率隨著核心數目越多而越來越好。
    另外,我們可以抽換底層硬體平台或是軟體,換成任何一個多核心處理器的平台或是換成不同的語言所描述的應用程式以防硬體轉換後軟體及編譯工具必須更改。舉例來說,我們所提出的模型可以抽換成SystemC以幫助系統層級的設計方式。

    Multicore processor provides large computation capability but also involves the complicate parallel programming. One of major considerations in parallel programming is the performance. Traditional design methodologies which start a design on a selected platform usually spend a lot of effort and time on tuning performance and debugging. When platform is changed, the entire design flow may have to be repeated and very time-consuming. Hence a flexible design methodology is necessary.
    In this thesis, we present a dataflow design methodology and use it in the programming of Cell processor. The dataflow model provides a high-level abstraction of underlying hardware. Computation and communication of the target application are separated and represented as modules and channels, respectively. To demonstrate the proposed programming model, a MPEG-4 SP decoder is used as an example. The parallelisms of MPEG-4 decoder are discussed and exposed with the dataflow model. To map the high level dataflow model to Cell processor, the mapping flow, including offline profiling, task allocation and runtime libraries, are developed. According to the profiled data, the allocation algorithm could allocate task on multiprocessors as balanced as possible. An efficient synchronization mechanism on Cell processor is also proposed. We also discuss the impact of the models and the mapping flow corresponding to performance about decoding speed. The results show that the proposed methodology gets considerable performance boost when number of cores is increased.
    It is possible to synthesize the model targeting to either dedicate hardware or software on multiprocessor once the original tool chain of the new platform is modified. For example, the proposed model can be translated into SystemC model to facilitate system level design methodology.

    1 緒論 1 2 背景介紹 3 2.1 Cell B.E硬體介紹 3 2.2 Cell開發環境 4 2.3 MPEG4 SP Decoder 6 3 設計方法 9 3.1 Dataflow Modeling 9 3.2 MPEG4 Decoder Modeling using Dataflow 10 3.3 數據剖析(Profiling) 11 3.4 雙重緩衝 13 3.5 任務排程演算法 16 4 系統實作 22 4.1 開發環境 22 4.2 系統元件介紹 22 4.2.1 Eclipse圖形化整合開發環境 22 4.2.2 Simple Cell Shell 24 4.3 傳輸與同步機制 25 4.3.1 使用郵筒與訊號通道同步 26 4.3.2 使用本地儲存記憶體同步 27 4.4 程式碼產生器(Code Generator) 30 5 實驗結果 34 6 結論與未來展望 41 7 參考文獻 43 附錄A 整合型開發環境操作方式 46

    [1] G. W. Johnson and R. Jennings, LabVIEW Graphical Programming: McGraw-Hill Professional, 2001.
    [2] K. Huang, S.-i. Han, K. Popovici, L. Brisolara, X. Guerin, L. Li, X. Yan, S.-l. Chae, L. Carro, and A. A. Jerraya, "Simulink-based MPSoC design flow: case study of Motion-JPEG and H.264," in Proceedings of the 44th annual conference on Design automation, San Diego, California, 2007.
    [3] W. Thies, M. Karczmarek, and S. Amarasinghe5, "StreamIt: A Language for Streaming Applications," Lecture Notes in Computer Science, vol. 2304, pp. 49-84, 2002.
    [4] J. Gummaraju, J. Coburn, Y. Turner, and M. Rosenblum, "Streamware: programming general-purpose multicore processors using streams," SIGARCH Comput. Archit. News, vol. 36, pp. 297-307, 2008.
    [5] N. Zea and J. S. R. Kumar, "Servo: A Programming Model for Many-core Computing," in Workshop on Design, Architecture, and Simulation of Chip Multiprocessors, Dec, 2007.
    [6] M. I. Gordon, W. Thies, and S. Amarasinghe, "Exploiting coarse-grained task, data, and pipeline parallelism in stream programs," in Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, San Jose, California, USA, 2006.
    [7] D. A. Bader and S. Patel, "High performance MPEG-2 software decoder on the cell broadband engine," in Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, 2008, pp. 1-10.
    [8] Y. Jin, R. Esser, C. Lakos, "Light consistency analysis of dataflow process networks," in the Proceedings of the 26th Australasian computer science conference, Vol. 16, pp 291-300, 2003
    [9] J. M. P. Cardoso, "Dynamic loop pipelining in data-driven architectures," in ACM Conference on Computing Frontiers, pp. 106-115, 2005.
    [10] W. M. Johnston, J. R. Paul Hanna, R. J. Millar, "Advances in dataflow programming languages," ACM Computing Surveys, Vol. 36, pp. 1-34, 2004.
    [11] M. J. Rutten, J. T. J. v. Eijndhoven, E. G. T. Jaspers, P. v. d. Wolf, O. P. Gangwal, and A. Timmer, "A Heterogeneous Multiprocessor Architecture for Flexible Media Processing," Design & Test of Computers, IEEE, vol. 19, 2002.
    [12] W. O. Cesario, D. Lyonnard, G. Nicolescu, Y. Paviot, S. Yoo, and A. A. Jerraya, "Multiprocessor SoC platforms: a component-based design approach," Design & Test of Computers, IEEE, vol. 19, pp. 52-63, 2002.
    [13] N. Pazos, P. Ienne, Y. Leblebici, and A. Maxiaguine, "Parallel Modelling Paradigm in Multimedia Applications: Mapping and Scheduling onto a Multi-Processor System-on-Chip Platform," Int. Global Signal Processing Conference, 2004.
    [14]Y. Cho, S. Yoo, K. Choi, N.-E. Zergainoh, and A. A. Jerraya, "Scheduler implementation in MP SoC design," in Proceedings of the 2005 conference on Asia South Pacific design automation, Shanghai, China, 2005.
    [15] L. Kaouane, D. Houzet, and S. Huet, "SysCellC: SystemC on Cell," in Computational Sciences and Its Applications, 2008. ICCSA '08. International Conference on, 2008, pp. 234-244.
    [16] K. Andreev and H. Racke, "Balanced Graph Partitioning" Theory of Computing Systems, vol. 39, 2006.
    [17] M. Ruggieroy, A. Guerriy, D. Bertozziz, F. Polettiy, and M. Milanoy, "Communication-aware allocation and scheduling framework for stream-oriented multi-processor systems-on-chip," in conference on Design, automation and test in Europe, 2006.
    [18] E. Carvalho, N. Calazans, and F. Moraes, "Heuristics for Dynamic Task Mapping in NoC-based Heterogeneous MPSoCs," in Rapid System Prototyping, 2007. RSP 2007. 18th IEEE/IFIP International Workshop on, 2007.
    [19] D. Zhang, Q. J. Li, R. Rabbah, and S. Amarasinghe, "A Lightweight Streaming Layer for Multicore Execution," in Workshop on Design, Architecture, and Simulation of Chip Multiprocessors, Dec, 2007.

    下載圖示 校內:2011-07-29公開
    校外:2011-07-29公開
    QR CODE