| 研究生: |
林鼎原 Lin, Ding-Yuan |
|---|---|
| 論文名稱: |
基於不同平行化層級之多核處理器架構研究與分析 Study and Analysis of Multi-Processor Architecture for Various Levels Parallelism |
| 指導教授: |
周哲民
Jou, Jer-Min |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2015 |
| 畢業學年度: | 103 |
| 語文別: | 中文 |
| 論文頁數: | 104 |
| 中文關鍵詞: | 亂序執行 、暫存器重命名 、平行處理 |
| 外文關鍵詞: | Out-Of-Order Execution, Register Renaming, Parallelism |
| 相關次數: | 點閱:81 下載:8 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
傳統單處理器透過亂序執行、超純量、預測執行等技術來提高執行效能,但當時脈達到一定程度時會衍生出能量消耗以及散熱等問題,受限於記憶體存取延遲以及程式指令固有平行度(Instruction Level Parallelism),現今處理器採用多執行緒技術(Thread Level Parallelism),讓多個執行緒並行執行,從執行緒間發覺執行緒並行執行的潛力。在能耗等問題考量下,多處理器系統單晶片成為新一代設計主流趨勢。
本論文主要是針對不同平行層級,研究與分析處理器的架構與執行行為。在了解gem5模擬器的配置以及系統模擬方式後,可透過gem5模擬器快速完整地模擬出目標平台,gem5為一個週期時序準確性模擬的模擬器,可以模擬處理器每個週期管線之動作,使用MiBench做為目標平台的測試程式,利用模擬器先針對三種不同處理器架構進行效能評估並分析整個系統發生效能瓶頸可能的原因,並針對問題點做對應的改進與修正。
接著研究分析控制處理器的硬體架構與執行行為。控制處理器負責在執行時動態地分析及紀錄任務之間的相依性,將可以獨立執行的任務萃取出來,並動態地分配給底層空閒的處理單元processing Unit (PU)平行執行。最後比較指令層級與任務層級處理器架構間的差異性。
Traditional single-core processors use out-of-order execution, superscalar, speculative execution and other techniques to improve performance. Clock speed is not the answer when it comes to energy consumption and heat dissipation. Limited by memory access latency and inherent parallelism of the program instructions, known as instruction level parallelism. Modern processors use multithreading techniques, which allows us to perform concurrent processing, and find the potential parallelism between threads (Thread level Parallelism), even if there’s only one single-core processor. In consideration of energy efficiency, the trend of processor design toward single chip multi-processor.
In this thesis, we study and analysis of multi-processor architecture for various levels parallelism. By using gem5 which is a cycle accurate simulator simulates pipeline stages cycle by cycle, we can configure and simulate the target platform as soon as possible. We have shown that the performance of MiBench applications running with out-of-order processor is much faster than those running with in-order and non-pipelined processors.
In addition, we also study and analysis task level control processor hardware architecture and its behavior. The control processor keeps tacking dependencies between tasks, automatically extracts parallelism among coarse-grain tasks and schedules them for execution on underlying processors. In the end, we have made a comparison between instruction and task level processors architecture.
[1] Chuck Moore, "DATA PROCESSING IN EXASCALE-CLASS COMPUTER SYSTEMS", The Salishan Conference on High Speed Computing, 2011.
[2] Hammond, L., Hubbert, B., Siu, M., Prabhu, M. K., Chen, M., & Olukolun, K. "The Stanford hydra cmp". Micro, IEEE, 20(2), 71-84, 2000.
[3] Place, One AMD. "AMD SimNow™ Simulator." ,2004.
[4] Bohrer, P., Peterson, J., Elnozahy, M., Rajamony, R., Gheith, A., Rockhold, & Zhang,
L."Mambo: a full system simulator for the PowerPC architecture".ACM SIGMETRICS Performance Evaluation Review, 31(4), 8-12, 2004.
[5] Bellard, "QEMU, a Fast and Portable Dynamic Translator.", In USENIX Annual Technical Conference, FREENIX Track (pp. 41-46), 2005.
[6] Austin, Todd, Eric Larson, and Dan Ernst. "SimpleScalar: An infrastructure for computer system modeling." Computer 35.2: 59-67, 2002.
[7] Binkert, N., Beckmann, B., Black, G., Reinhardt, S. K., Saidi, A., Basu, A., & Wood, D. A. "The gem5 simulator". ACM SIGARCH Computer Architecture News, 39(2), 1-7, 2011.
[8] Binkert, N. L., Dreslinski, R. G., Hsu, L. R., Lim, K. T., Saidi, A. G., & Reinhardt, S. K. "The M5 simulator: Modeling networked systems". IEEE Micro, (4), 52-60, 2006.
[9] Martin, Milo MK, et al. "Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset." ACM SIGARCH Computer Architecture News 33.4, 2005
[10] ISA Support Matrices: http://www.m5sim.org/Status_Matrix.
[11] Tomasulo, Robert M. "An efficient algorithm for exploiting multiple arithmetic units." IBM Journal of research and Development 11.1: 25-33, 1967.
[12] Karim, F., Mellan, A., Nguyen, A., Aydonat, U., & Abdelrahman, T. "A multilevel computing architecture for embedded multimedia applications". Micro, IEEE, 24(3), 56-66, 2004.
[13] Abdelrahman, Tarek, et al. "The MLCA: a solution paradigm for parallel programmable SoCs." IEEE North-East Workshop on Circuits and Systems. 2006.
[14] Capalija, D., & Abdelrahman, T. S. "Microarchitecture of a coarse-grain out-of-order superscalar processor". Parallel and Distributed Systems, IEEE Transactions on, 24(2), 392-405, 2013.
[15] Control processor : http://www.eecg.toronto.edu/~davor/MLCA/
[16] Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., & Brown, R. B. "MiBench: A free, commercially representative embedded benchmark suite". In Workload Characterization, WWC-4. 2001 IEEE International Workshop on (pp. 3-14). IEEE, 2001.
[17] MiBench Version 1.0: http://wwweb.eecs.umich.edu/mibench/