簡易檢索 / 詳目顯示

研究生: 郭倉碩
Kuo, Tsang-Shuo
論文名稱: 基於PAC Duo異質性多核心系統晶片之容錯機制實作
The Implementation of Fault Tolerance Mechanisms for PAC Duo Heterogeneous Multi-core SoC
指導教授: 楊中平
Young, Chung-Ping
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2011
畢業學年度: 99
語文別: 英文
論文頁數: 108
中文關鍵詞: 容錯多核心工作遷移MicroC/OS-IIPAC
外文關鍵詞: Fault tolerance, Multi-core, Process migration, MicroC/OS-II, PAC
相關次數: 點閱:95下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著資訊系統的進步,電腦可靠度的需求也逐漸提高,錯誤發生後要如何修復或是處理就一直都是資訊領域中的重要議題。本論文的主要目的在於實作多核心晶片的容錯機制,其研究議題包括錯誤處理、工作回復及行程搬移等項目,其實作則建立於已移植到多核心系統晶片PAC Duo上的即時作業系統核心MicroC/OS-II。本系統提供了資料備份應用編程介面 (Check Point API)、MicroC/OS-II工作回復點 (Recovery Point) 以及MicroC/OS-II行程搬移機制 (Task Migration)。資料備份應用編程介面讓使用者在系統上開發應用程式時,可決定資料備份的時機。MicroC/OS-II工作回復點使核心能夠自上一個備份點 (Check Point) 重新啟動讓工作繼續進行。行程搬移機制可以使MicroC/OS-II上的行程在硬體核心發生無法修復的錯誤時,將工作搬移至其他核心並且持續的運作。我們也分析容錯能力所造成的額外開銷,讓使用者可以依其需求,設定最佳的系統狀態。最後,本機制也支援最多至64顆核心的容錯能力,更能夠配合未來多核心系統晶片的發展趨勢。

    High reliability and availability are becoming the basic requirements of computer systems. The fault tolerance mechanisms, including fault detection, error handling and error recovery, are realized on a heterogeneous multi-core SoC PAC Duo platform. We developed some features like data replication API and software-based self-test program for fault detection, task resume or task migration. In case of component errors, task restarting and task migration insure the continuous execution of the job. The users need to set the check point location in code section for one-time data replication or periodical backup. Our mechanism also supports up to 64 cores, so it will fit the future trends of multi-core SoC. The analysis of overhead of fault tolerance mechanism is investigated for configuring the system to meet its best situation.

    摘要 I Abstract II Acknowledgement III Contents IV List of Figures VII List of Tables X List of Listings XI Chapter 1 Introduction 1 1.1 Introduction and Motivation 1 1.2 Previous work 2 1.3 Organization 4 Chapter 2 Related works 5 2.1 Fault Tolerance 5 2.2 Self Testing Technique 11 Chapter 3 Background Knowledge 12 3.1 Real-Time Systems 12 3.1.1 Types of Constraints in Real-Time System 12 3.1.2 Real-Time Operating Systems 15 3.2 Real-Time Kernel MicroC/OS-II 16 3.2.1 MicroC/OS-II Features 16 3.2.2 Multi-Task Management in MicroC/OS-II 18 3.2.3 Task Control Blocks in MicroC/OS-II 20 3.2.4 Statistics Task in MicroC/OS-II 22 3.2.5 Ready List in MicroC/OS-II 23 3.3 Task Migration 25 3.3.1 Hardware/Software Architecture 25 3.3.2 Task Migration in Multiprocessor SoC 26 3.4 Processor Testing Techniques 27 3.4.1 External-Testing and Self-testing 27 3.4.2 Manufacturing Testing and On-Line Testing 29 3.4.3 Hardware-based Self-Test and Software-based Self-Test 30 Chapter 4 Implementation 32 4.1 PAC Duo platform 32 4.2 System Architecture Overview 34 4.3 Fault Detection 37 4.3.1 PAC DSP Software Self-testing Program 37 4.3.2 Test Response Check Daemon on MPU 50 4.3.3 Periodic Check with Watch dog timer 52 4.4 Program Data Replication 58 4.4.1 MicroC/OS-II Checkpoint API 58 4.4.2 The Option of Backup Area 61 4.5 Error Recovery 63 4.5.1 Error Recovery Daemon on MPU 64 4.5.2 MicroC/OS-II Recovery Point 66 4.5.3 MicroC/OS-II Task Migration Process 71 4.6 Communication Between Different SoCs 76 4.7 Task Migration Policy 77 Chapter 5 Experimental Results 82 5.1 Maximum Fault Recovery Time 82 5.1.1 Fault Detection Time 82 5.1.2 Error Process Time 85 5.2 Fault Recovery Testing 87 5.3 System Overhead 88 5.3.1 Code Size Increment 89 5.3.2 Memory Occupation 91 5.3.3 Time Measurement 93 5.3.4 Performance Overhead 97 Chapter 6 Conclusions and Future work 104 6.1 Conclusion 104 6.2 Future works 105 References 106

    [1] R. Dorsch, R.-H. Rivera and H.-J. Wunderlich et al., "Adapting an SoC to ATE concurrent test capabilities," in Test Conference, 2002. Proceedings. International on, 2002, pp. 1169-1175.
    [2] S. Dasnurkar and J. Abraham, "Real-time dynamic hybrid BiST solution for Very-Low-Cost ATE production testing of A/D converters with controlled DPPM," in Quality Electronic Design (ISOED), 2010 11th International Symposium on, 2010, pp. 562-569.
    [3] Z. Liang, I. Ghosh and M. S. Hsiao, "A Framework for Automatic Design Validation of RTL Circuits Using ATPG and Observability-Enhanced Tag Coverage," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 25 no. 11, pp. 2526-2538, Nov. 2006.
    [4] M. D. Hill and M. R. Marty, "Amdahl's Law in the Multicore Era," Computer, vol. 41 no. 7, pp. 33-38, July 2008.
    [5] W. Wolf, "The future of multiprocessor systems-on-chips," Proceedings of the 41st annual Design Automation Conference, San Diego, CA, USA, 2004.
    [6] W. Wolf, A. Jerraya and G. Martin, "Multiprocessor System-on-Chip (MPSoC) Technology," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 27 no. 10, pp. 1701-1713, Oct. 2008.
    [7] "PAC Duo SoC Specification," http://pac.itri.org.tw/Default.aspx
    [8] J. Chen, C.-P. Young and D.-W. Chang et al., "Building Multi-kernel Embedded System on PAC Multi-core Platform," in Quality Software (QSIC), 2010 10th International Conference on, 2010, pp. 465-472.
    [9] J. D. George Coulouris, Tim Kindberg, "Distributed systems: concepts and design," Addison-Wesley Publishers, New York, 2005.
    [10] P. Jalote, "Fault tolerance in distributed systems," PTR Prentice Hall Publishers, Englewood Cliffs, 1994.
    [11] A. Paschalis, Y. Zorian and D. Gizopoulos, "Embedded processor-based self-test," Kluwer Academic Publishers, London, 2004.
    [12] N. Bonvin, T. G. Papaioannou and K. Aberer "A self-organized, fault-tolerant and scalable replication scheme for cloud storage," Proceedings of the 1st ACM symposium on Cloud computing, Indianapolis, Indiana, USA, 2010.
    [13] R. Guerraoui and A. Schiper, "Software-based replication for fault tolerance," Computer, vol. 30 no. 4, pp. 68-74, Apr. 1997.
    [14] T.-B. Trinh, T.-A. Do and N.-T. Truong et al., "Checking the Compliance of Timing Constraints in Software Applications," in Knowledge and Systems Engineering, International Conference on, 2009, pp. 220-225.
    [15] K. Mens and A. Kellens, "IntensiVE, a toolsuite for documenting and checking structural source-code regularities," in Software Maintenance and Reengineering, 2006. Proceedings of the 10th European Conference on, 2006, pp. 10 -248.
    [16] C. Morin and I. Puaut, "A survey of recoverable distributed shared virtual memory systems," Parallel and Distributed Systems, IEEE Transactions on, vol. 8 no. 9, pp. 959-969, Sep. 1997.
    [17] G.-M. H., H. Tabkhi and S.-G. Miremadi et al., "A cost-effective error detection and roll-back recovery technique for embedded microprocessor control logic," in Microelectronics, 2008. International Conference on, 2008, pp. 470-473.
    [18] S.-R. Chalamalasetti, S. Purohit, M. Margala et al., "Radiation-Hardened Reconfigurable Array With Instruction Roll-Back," Embedded Systems Letters, IEEE, vol. 2 no. 4, pp. 123-126, Dec. 2010.
    [19] X. Jie and B. Randell, "Roll-forward error recovery in embedded real-time systems," Proceedings of Parallel and Distributed Systems on, 1996, pp. 414-421.
    [20] B. Gupta, S. Rahimi and Z. Liu et al., "Novel low-overhead roll-forward recovery scheme for distributed systems," Computers & Digital Techniques, IET, vol. 1 no. 4, pp. 397-404, July 2007.
    [21] K. Radecka, J. Rajski and J. Tyszser, "Arithmetic built-in self-test for DSP cores," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 16 no. 11, pp. 1358-1369, Nov. 1997.
    [22] E.-J. McCluskey, "Built-In Self-Test Techniques," Design & Test of Computers, IEEE, vol. 2 no. 2, pp. 21-28, Apr. 1985.
    [23] V. D. Agrawal, C. R. Kime and K. K. Saluja et al., "A tutorial on built-in self-test. I. Principles," Design & Test of Computers, IEEE, vol. 10 no. 1, pp. 73-82, Mar 1993.
    [24] V. D. Agrawal, C. R. Kime and K. K. Saluja et al., "A tutorial on built-in self-test. 2. Applications," Design & Test of Computers, IEEE, vol. 10 no. 2, pp. 69-77, Jun 1993.
    [25] D. R. Kuhn, D. R. Wallacea and A. M. Jr. Gallo, "Software fault interactions and implications for software testing," Software Engineering, IEEE Transactions on, vol. 30 no. 6, pp. 418-421, June. 2004.
    [26] N. Kranitis, A. Paschalis and D. Gizopoulos et al., "Software-based self-testing of embedded processors," Computers, IEEE Transactions on, vol. 54 no. 4, pp. 461-475, Apr. 2005.
    [27] M. Psarakis, D. Gizopoulos and E. Sanchez et al., "Microprocessor Software-Based Self-Testing," Design & Test of Computers, IEEE, vol. 27 no. 3, pp. 4-19, May 2010.
    [28] G. Xenoulis, D. Gizopoulos and N. Kranitis et al., "Low-cost, on-line software-based self-testing of embedded processor cores," in On-Line Testing Symposium, 9th IEEE, 2003, pp. 149-154.
    [29] A. Paschalis and D. Gizopoulos, "Effective software-based self-test strategies for on-line periodic testing of embedded processors," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 24 no. 1, pp. 88-99, Jane. 2005.
    [30] C.-H. Chen, C.-K. Wei and T.-H. Lu et al., "Software-Based Self-Testing With Multiple-Level Abstractions for Soft Processor Cores," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 15 no. 5, pp. 505-517, May 2007.
    [31] A. M. K. Cheng, "Real-time systems: scheduling, analysis and verification," Hoboken, Wiley-Interscience Publishers, 2002.
    [32] Q. Li and C. Yao, "Real-time concepts for embedded systems," San Francisco, CMP Books Publishers, 2003.
    [33] A.-C. Shaw, "Real-time systems and software," New York, John Wiley Publishers, 2001.
    [34] J.-J. Labrosse, "MicroC/OS-II: the real-time kernel," San Francisco, CMP Books Publishers, 2002.
    [35] Wikipedia. http://en.wikipedia.org/wiki/Automatic_test_pattern_generation
    [36] Wikipedia. http://en.wikipedia.org/wiki/Design_For_Test

    下載圖示 校內:2014-09-05公開
    校外:2014-09-05公開
    QR CODE