| 研究生: |
劉冠成 Liu, Kuan-Cheng |
|---|---|
| 論文名稱: |
多核心處理器平台即時系統之快取結構效能分析 The Performance Analysis of Cache Architecture for Real-Time System in Multicore Platform |
| 指導教授: |
陳 敬
Chen, Jing |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2009 |
| 畢業學年度: | 97 |
| 語文別: | 中文 |
| 論文頁數: | 102 |
| 中文關鍵詞: | 多核心處理器 、快取 、叢集 、轉移 、匯流排 |
| 外文關鍵詞: | Multi-Core, Cache, Cluster, Migrate, Bus |
| 相關次數: | 點閱:56 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文研究並分析在即時系統(Real-Time System)下使用多核心(Multi-Core)處理器,其快取記憶體(Cache Memory)的結構配置對系統效能之影響。多核心處理器是在同一個晶片(Chip)上內含有2個以上相同的處理器核心,屬於同質性(Homogeneous)的多處理器架構。而多核心處理器的特色之一即是晶片上的核心可以共享相同快取。
多核心處理器上共享相同L2快取的核心(Core)稱之叢集(Cluster)。假設當行程(Task)被迫需要轉移(Migrate)到相同叢集編號的核心,則不必額外重新載入(Reload)資料到L2快取中;反之,當行程被迫轉移到不相同叢集編號的核心上執行,則必須要花費額外的時間成本再次重新載入(Reload)資料到新叢集的L2快取。在系統運行的過程中,具有週期性的行程集合反覆地釋出(Release)執行,因此減少快取區塊(Block)重新載入資料的次數,將是快取相關的研究重點。
本論文探討多核心處理器之快取結構對系統效能之影響,並透過軟體模擬來模擬系統運作與結果分析。其模擬結果顯示,快取匯流排(Cache Bus)與記憶體匯流排(Memory Bus)是影響系統效能最主要的兩項因素。為了有效改善匯流排引發之效能瓶頸,本論文提出記憶體匯流排優先權分配機制與實作,經由模擬結果顯示本機制能夠有效改善系統整體效能,並提高排程成功機率。
This thesis presents the study and analysis of the performance impact imposed by different configurations of cache memory in real-time system using multi-core processor. A multi-core processor includes two or more cores on a single chip and belongs to the homogeneous multi-processor architecture. The feature of multi-core processor which inspires this study is that different cores share the same L2 cache on the chip. While shared cache might achieve better utilization and better system performance, it can not be overlooked that cache reloading and access conflict, in addition to cache miss, cause timing overhead and might affect the performance of individual task.
The cores that use the same L2 cache on a multi-core processor are grouped as a cluster. When a task migrates between the cores within the same cluster, due to scheduling decision, the data of this task might not necessarily be reloaded in the shared L2 cache. However, it will need extra time to totally reload the data of this task into different L2 caches if the migration takes place among the cores of different clusters. In real-time systems, periodic tasks are common and they are released to execute repetitively. Cache reloading might incur overhead and impose timing impact that, in the worst case, results in missing deadlines. Therefore, reducing the time overhead of reloading cache blocks is one key issue on cache performance.
This study is focused on the performance impacts of cache structure in a multi-core processor system. The system operations and the results are obtained by software simulation. It could be observed from the simulation that the cache bus and the memory bus are the most important elements to the system efficiency. The simulation results show that the pre-allocation with memory bus mechanism will break the bus performance bottleneck and thus increase the efficiency and schedulability of tasks in the system.
[1] Sih, G.C., Lee, E.A., “A Compile-time Scheduling Heuristic for Interconnection- Constrained Heterogeneous Processor Architectures”, IEEE Transactions on Parallel and Distributed Systems, pp. 175-187, 1993.
[2] Gilles muller, Julia L. Lawall, Hervé Duchesne, “A framework for Simplifying the Development of Kernel Schedulers: Design and performance evaluation”, Ninth IEEE International Symposium on High-Assurance Systems Engineering, pp. 56-65, 2005.
[3] John M. Calandrino, James H. Anderson, Dan P. Baumberger, “A Hybrid Real-Time Scheduling Approach for Large-Scale Multicore Platforms”, 19th Euromicro Conference on Real-Time Systems, pp. 247-258, 2007.
[4] Pelleh, M., “A Study of Real Time Scheduling for Multiprocessor Systems”, 2006 IEEE 24th Convention of Electrical and Electronics Engineers in Israel, pp. 295-299, 2006.
[5] Busquets-Mataix, J.V., Serrano, J.J., Ors, R., Gil, P., Wellings, A., “Adding Instruction Cache Effect to Schedulability Analysis of Preemptive Real-Time Systems”, Real-Time Technology and Applications Symposium, Proceedings of the 1996 IEEE, pp. 204-212, 1996.
[6] Maged M. Michael, Ashwini K. Nanda, “Design and Performance of Directory Caches for Scalable Shared Memory Multiprocessors”, The Fifth International Symposium on High Performance Computer Architecture, p.142, 1999.
[7] Jie Tao, Marcel Kunze, Wolfgang Karl, “Evaluating the Cache Architecture of Multicore Processors”, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing, pp. 12-19, 2008.
[8] Asaduzzaman, A., Limbachiya, N., Mahgoub, I., Sibai, F.N., “Evaluation of I-Cache Locking Technique for Real-Time Embedded Systems”, Innovations '07. 4th International Conference on Innovations in Information Technology, pp. 342-346, 2007.
[9] Rui Min, Wen-Ben Jone, Yiming Hu, “Location Cache: A Low-Power L2 Cache System”, Proceedings of the 2004 International Symposium on Low Power Electronics and Design, pp. 120-125, 2004.
[10] Gai, P., Abeni, L., Buttazzo, G., “Multiprocessor DSP Scheduling in System-on-a-chip Architectures”, Proceedings of the 14th Euromicro Conference on Real-Time Systems, pp. 231-238, 2002.
[11] Sha, L., Rajkumar, R., Lehoczky, J.P., “Priority Inheritance Protocols: An Approach to Real-Time Synchronization”, IEEE Transactions on Computers, pp. 1175-1185, 1990.
[12] Shao, Z., Wang, M., Chen, Y., Xue, C., Qiu, M., Yang, L. T., Sha, E. H. -M., “Real-Time Dynamic Voltage Loop Scheduling for Multi-Core Embedded Systems”, IEEE Transactions on Circuits and Systems II: Express Briefs, pp. 445-449, 2007.
[13] James H. Anderson, John M. Calandrino, UmaMaheswari C. Devi, “Real-Time Scheduling on Multicore Platforms”, 12th IEEE Real-Time and Embedded Technology and Applications Symposium, pp. 179-190, 2006.
[14] Pontani, L., Dupont, D., “Scheduling and Assignment for Real-Time Embedded Systems with Resource Contention”, Proceedings of the Euromicro Symposium on Digital System Design, pp. 55-61, 2003.
[15] John M. Calandrino, Dan Baumberger, Tong Li, Scott Hahn, James H. Anderson, “Soft Real-Time Scheduling on Performance Asymmetric Multicore Platforms”, 13th IEEE Real Time and Embedded Technology and Applications Symposium, pp. 101-112, 2007.
[16] Leontyev, H., Anderson, J.H., “Tardiness Bounds for EDF Scheduling on Multi-Speed Multicore Platforms”, 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, pp. 103-110, 2007.
[17] C. L. Liu, James W. Layland, “Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment”, Journal of the Association for Computing Machinery, Vol. 20, No. 1. pp. 46-61, 1973.
[18] Pierre Michaud, “Exploiting the Cache Capacity of a Single-Chip Multi-Core Processor with Execution Migration”, High Performance Computer Architecture, 2004. HPCA-10. Proceedings. 10th International Symposium on, 2004.
[19] Miyamoto, H., Iiyama, S., Tomiyama, H., Takada, H., Nakashima, H., “An Efficient Search Algorithm of Worst-Case Cache Flush Timings”, Embedded and Real-Time Computing Systems and Applications, 2005. Proceedings. 11th IEEE International Conference on, pp. 45-52, 2005.
[20] Henrik Yheiling, “ILP-Based Interprocedural Path Analysis”, EMSOFT 2002, LNCS 2491, pp. 349-363, 2002.
[21] Yudong Tan and Vincent mooney, “Integrated Intra- and Inter-task Cache Analysis for Preemptive Multi-tasking Real-Time Systems”, SCOPES 2004, LNCS 3199, pp. 182-199, 2004.
[22] Rahman Hassan, Antony Harris, Nigel Topham, Aris Efthymiou, “Synthetic Trace-Driven Simulation of Cache Memory”, 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07), ainaw, vol. 1, pp.764-771, 2007.
[23] Gordon Moore, “Cramming more components onto integrated circuits”, Electronics, Volume 38, Number 8, 1965.
[24] Hammond, L., Hubbert, B.A., Siu, M., Prabhu, M.K., Chen, M., Olukolun, K., “The Stanford Hydra CMP”, Micro, IEEE, 2000.
[25] T. Austin, E. Larson, and D. Ernst., “SimpleScalar: An Infrastructure for Computer System Modeling”, IEEE Computer, pp.59–67, 2002.
[26] M. Vachharajani, N. Vachharajani, DA Penry, JA Blome, and DI August. “Microarchitectural exploration with Liberty”, In Proceedings of the 35th International Symposium on Microarchitecture, pp.271–282, 2002.
[27] Michael B. Healy, Hsien-Hsin S. Lee, Gabriel H. Loh, Sung Kyu Lim, “Thermal optimization in multi-granularity multi-core floorplanning”, Proceedings of the 2009 Conference on Asia and South Pacific Design Automation, pp.43-48, 2009.
[28] Andrew S. Tanenbaum, “Structured Computer Organization”, 4th edition, Prentice Hall, 1999, ISBN 9814024589.
[29] C. M. Krishna, Kang G. Shin, “Real-Time Systems”, 1th edition, McGraw-Hill Higher Education, Elsevier, 1997, ISBN: 0070570434.
[30] David A. Patterson, John L. Hennessy, “Computer Organization and Design: the Hardware/Software Interface”, 3th edition, Morgan Kaufmann, 2005, ISBN: 9812592172.
[31] Avi Silberschatz, Peter Baer Galvin, Greg Gagne, “Operating System Concepts”, 8th edition, John Wiley & Sons, Inc., 2008, ISBN 0470128720.
[32] David A. Patterson, John L. Hennessy, “Cmoputer Architecture: A Quantitative Approach”, Morgan Kaufmann, 2006, ISBN 0123704901.
[33] Wikipedia, “Multicore”, http://en.wikipedia.org/wiki/Multi-core_(computing)
[34] Wikipedia, “MMX”, http://en.wikipedia.org/wiki/MMX_(instruction_set)
[35] Wikipedia, “Deadline-monotonic scheduling”,
http://en.wikipedia.org/wiki/Deadline-monotonic_scheduling
[36] Wikipedia, “Least slack time scheduling”,
http://en.wikipedia.org/wiki/Least_slack_time_scheduling
[37] James Reinders, “About Multicore”,
http://www.drabba.com/IntelSoftware/Whatsnew/tabid/176/language/en-US/Default.aspx, http://www.zdnet.com.tw/white_board/intel/video-1.htm
[38] Intel, “Intel® Core™ Microarchitecture”,
http://www.intel.com/technology/architecture-silicon/core/
[39] Wikipedia, “List Scheduling”,
http://en.wikipedia.org/wiki/Scheduling_algorithm
[40] Tilera, “TILE64”, http://www.tilera.com/index.php
[41] RealView ARMulator ISS, ARM Ltd, 2004.