| 研究生: | 郭倉碩 Kuo, Tsang-Shuo | 
|---|---|
| 論文名稱: | 基於PAC Duo異質性多核心系統晶片之容錯機制實作 The Implementation of Fault Tolerance Mechanisms for PAC Duo Heterogeneous Multi-core SoC | 
| 指導教授: | 楊中平 Young, Chung-Ping | 
| 學位類別: | 碩士 Master | 
| 系所名稱: | 電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering | 
| 論文出版年: | 2011 | 
| 畢業學年度: | 99 | 
| 語文別: | 英文 | 
| 論文頁數: | 108 | 
| 中文關鍵詞: | 容錯 、多核心 、工作遷移 、MicroC/OS-II 、PAC | 
| 外文關鍵詞: | Fault tolerance, Multi-core, Process migration, MicroC/OS-II, PAC | 
| 相關次數: | 點閱:95 下載:1 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
隨著資訊系統的進步,電腦可靠度的需求也逐漸提高,錯誤發生後要如何修復或是處理就一直都是資訊領域中的重要議題。本論文的主要目的在於實作多核心晶片的容錯機制,其研究議題包括錯誤處理、工作回復及行程搬移等項目,其實作則建立於已移植到多核心系統晶片PAC Duo上的即時作業系統核心MicroC/OS-II。本系統提供了資料備份應用編程介面 (Check Point API)、MicroC/OS-II工作回復點 (Recovery Point) 以及MicroC/OS-II行程搬移機制 (Task Migration)。資料備份應用編程介面讓使用者在系統上開發應用程式時,可決定資料備份的時機。MicroC/OS-II工作回復點使核心能夠自上一個備份點 (Check Point) 重新啟動讓工作繼續進行。行程搬移機制可以使MicroC/OS-II上的行程在硬體核心發生無法修復的錯誤時,將工作搬移至其他核心並且持續的運作。我們也分析容錯能力所造成的額外開銷,讓使用者可以依其需求,設定最佳的系統狀態。最後,本機制也支援最多至64顆核心的容錯能力,更能夠配合未來多核心系統晶片的發展趨勢。
High reliability and availability are becoming the basic requirements of computer systems. The fault tolerance mechanisms, including fault detection, error handling and error recovery, are realized on a heterogeneous multi-core SoC PAC Duo platform. We developed some features like data replication API and software-based self-test program for fault detection, task resume or task migration. In case of component errors, task restarting and task migration insure the continuous execution of the job. The users need to set the check point location in code section for one-time data replication or periodical backup. Our mechanism also supports up to 64 cores, so it will fit the future trends of multi-core SoC. The analysis of overhead of fault tolerance mechanism is investigated for configuring the system to meet its best situation.
[1]	R. Dorsch, R.-H. Rivera and H.-J. Wunderlich et al., "Adapting an SoC to ATE concurrent test capabilities," in Test Conference, 2002. Proceedings. International on, 2002, pp. 1169-1175.
[2]	S. Dasnurkar and J. Abraham, "Real-time dynamic hybrid BiST solution for Very-Low-Cost ATE production testing of A/D converters with controlled DPPM," in Quality Electronic Design (ISOED), 2010 11th International Symposium on, 2010, pp. 562-569.
[3]	Z. Liang, I. Ghosh and M. S. Hsiao, "A Framework for Automatic Design Validation of RTL Circuits Using ATPG and Observability-Enhanced Tag Coverage," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 25 no. 11, pp. 2526-2538, Nov. 2006.
[4]	M. D. Hill and M. R. Marty, "Amdahl's Law in the Multicore Era," Computer, vol. 41 no. 7, pp. 33-38, July 2008.
[5]	W. Wolf, "The future of multiprocessor systems-on-chips," Proceedings of the 41st annual Design Automation Conference, San Diego, CA, USA, 2004.
[6]	W. Wolf, A. Jerraya and G. Martin, "Multiprocessor System-on-Chip (MPSoC) Technology," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 27 no. 10, pp. 1701-1713, Oct. 2008.
[7]	"PAC Duo SoC Specification," http://pac.itri.org.tw/Default.aspx
[8]	J. Chen, C.-P. Young and D.-W. Chang et al., "Building Multi-kernel Embedded System on PAC Multi-core Platform," in Quality Software (QSIC), 2010 10th International Conference on, 2010, pp. 465-472.
[9]	J. D. George Coulouris, Tim Kindberg, "Distributed systems: concepts and design," Addison-Wesley Publishers, New York, 2005.
[10]	P. Jalote, "Fault tolerance in distributed systems," PTR Prentice Hall Publishers, Englewood Cliffs, 1994.
[11]	A. Paschalis, Y. Zorian and D. Gizopoulos, "Embedded processor-based self-test," Kluwer Academic Publishers, London, 2004.
[12]	N. Bonvin, T. G. Papaioannou and K. Aberer "A self-organized, fault-tolerant and scalable replication scheme for cloud storage," Proceedings of the 1st ACM symposium on Cloud computing, Indianapolis, Indiana, USA, 2010.
[13]	R. Guerraoui and A. Schiper, "Software-based replication for fault tolerance," Computer, vol. 30 no. 4, pp. 68-74, Apr. 1997.
[14]	T.-B. Trinh, T.-A. Do and N.-T. Truong et al., "Checking the Compliance of Timing Constraints in Software Applications," in Knowledge and Systems Engineering, International Conference on, 2009, pp. 220-225.
[15]	K. Mens and A. Kellens, "IntensiVE, a toolsuite for documenting and checking structural source-code regularities," in Software Maintenance and Reengineering, 2006. Proceedings of the 10th European Conference on, 2006, pp. 10 -248.
[16]	C. Morin and I. Puaut, "A survey of recoverable distributed shared virtual memory systems," Parallel and Distributed Systems, IEEE Transactions on, vol. 8 no. 9, pp. 959-969, Sep. 1997.
[17]	G.-M. H., H. Tabkhi and S.-G. Miremadi et al., "A cost-effective error detection and roll-back recovery technique for embedded microprocessor control logic," in Microelectronics, 2008. International Conference on, 2008, pp. 470-473.
[18]	S.-R. Chalamalasetti, S. Purohit, M. Margala et al., "Radiation-Hardened Reconfigurable Array With Instruction Roll-Back," Embedded Systems Letters, IEEE, vol. 2 no. 4, pp. 123-126, Dec. 2010.
[19]	X. Jie and B. Randell, "Roll-forward error recovery in embedded real-time systems," Proceedings of Parallel and Distributed Systems on, 1996, pp. 414-421.
[20]	B. Gupta, S. Rahimi and Z. Liu et al., "Novel low-overhead roll-forward recovery scheme for distributed systems," Computers & Digital Techniques, IET, vol. 1 no. 4, pp. 397-404, July 2007.
[21]	K. Radecka, J. Rajski and J. Tyszser, "Arithmetic built-in self-test for DSP cores," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 16 no. 11, pp. 1358-1369, Nov. 1997.
[22]	E.-J. McCluskey, "Built-In Self-Test Techniques," Design & Test of Computers, IEEE, vol. 2 no. 2, pp. 21-28, Apr. 1985.
[23]	V. D. Agrawal, C. R. Kime and K. K. Saluja et al., "A tutorial on built-in self-test. I. Principles," Design & Test of Computers, IEEE, vol. 10 no. 1, pp. 73-82, Mar 1993.
[24]	V. D. Agrawal, C. R. Kime and K. K. Saluja et al., "A tutorial on built-in self-test. 2. Applications," Design & Test of Computers, IEEE, vol. 10 no. 2, pp. 69-77, Jun 1993.
[25]	D. R. Kuhn, D. R. Wallacea and A. M. Jr. Gallo, "Software fault interactions and implications for software testing," Software Engineering, IEEE Transactions on, vol. 30 no. 6, pp. 418-421, June. 2004.
[26]	N. Kranitis, A. Paschalis and D. Gizopoulos et al., "Software-based self-testing of embedded processors," Computers, IEEE Transactions on, vol. 54 no. 4, pp. 461-475, Apr. 2005.
[27]	M. Psarakis, D. Gizopoulos and E. Sanchez et al., "Microprocessor Software-Based Self-Testing," Design & Test of Computers, IEEE, vol. 27 no. 3, pp. 4-19, May 2010.
[28]	G. Xenoulis, D. Gizopoulos and N. Kranitis et al., "Low-cost, on-line software-based self-testing of embedded processor cores," in On-Line Testing Symposium, 9th IEEE, 2003, pp. 149-154.
[29]	A. Paschalis and D. Gizopoulos, "Effective software-based self-test strategies for on-line periodic testing of embedded processors," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 24 no. 1, pp. 88-99, Jane. 2005.
[30]	C.-H. Chen, C.-K. Wei and T.-H. Lu et al., "Software-Based Self-Testing With Multiple-Level Abstractions for Soft Processor Cores," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 15 no. 5, pp. 505-517, May 2007.
[31]	A. M. K. Cheng, "Real-time systems: scheduling, analysis and verification," Hoboken, Wiley-Interscience Publishers, 2002.
[32]	Q. Li and C. Yao, "Real-time concepts for embedded systems," San Francisco, CMP Books Publishers, 2003.
[33]	A.-C. Shaw, "Real-time systems and software," New York, John Wiley Publishers, 2001.
[34]	J.-J. Labrosse, "MicroC/OS-II: the real-time kernel," San Francisco, CMP Books Publishers, 2002.
[35]	Wikipedia. http://en.wikipedia.org/wiki/Automatic_test_pattern_generation
[36]	Wikipedia. http://en.wikipedia.org/wiki/Design_For_Test