| 研究生: |
吳尚倫 Wu, Shang-Lun |
|---|---|
| 論文名稱: |
具錯誤後轉移能力之應用程式叢集服務 Application Cluster Service with Failover Capability |
| 指導教授: |
鄭芳田
Cheng, Fan-Tien 楊浩青 Yang, Haw-Ching |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 製造工程研究所 Institute of Manufacturing Engineering |
| 論文出版年: | 2004 |
| 畢業學年度: | 92 |
| 語文別: | 中文 |
| 論文頁數: | 168 |
| 中文關鍵詞: | 叢集服務 、高妥善率 、設計樣版 、狀態回復 、效能評估器 、錯誤後轉移 |
| 外文關鍵詞: | Design Pattern, State Recovery, Failover, Performance Evaluator, High Availability, Cluster Service |
| 相關次數: | 點閱:126 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
資訊應用系統的可靠度需求願景為每週7天,每天24小時不間斷地工作。一個可靠度差的資訊系統往往因為執行環境的異常或者人為操作的疏失而使工作停擺,導致金錢上嚴重的損失。有鑑於此,本研究提出一種具應用程式錯誤後轉移能力之叢集服務架構 (Application Cluster Service, APCS)。此叢集服務針對應用程式的管理提供了兩種服務機制:“錯誤後轉移機制” 與 “狀態回復機制”。“錯誤後轉移機制” 主要功能為當工作中的應用程式或電腦發生異常而終止服務時,會自動地啟動備援應用程式來代替原本失效的應用程式。而 “狀態回復機制”主要是針對有狀態回復需求的應用程式,設計了一套讓應用程式繼承的設計樣版,應用程式只要繼承此設計樣版並實做出內容即可與 “狀態回復機制” 溝通,完成狀態備份與還原的工作。另外本研究更進一步地研發具電腦效能偵測與評估機制的 “效能評估器” (Performance Evaluator, PEV)。將 PEV 與 APCS 結合後,由於PEV可事先偵測出電腦效能衰竭情況,所以可在叢集中之節點當機前通知並執行應用程式轉移的動作,達到當機前轉移的效果,俾提供連續不間斷的服務。
The required reliability in applications of a distributed computer system is 24 hours a day, 7 days a week nonstop services. However, the abnormalities of the operating environment or manual errors may interrupt the services that may cause a great loss. Hence, this work proposes an Application Cluster Service with failover capacities. The proposed clustering services offer both “Failover Scheme” and “State Recovery Scheme” for the failure management. “Failover Scheme” is mainly to automatically activate the backup application for replacing the failed application when it is sick or down. “State Recovery Scheme” is mainly to provide an inheritable design pattern for supporting the applications that have state recovery requirement. The applications only have to inherit this design pattern and implement the contents, and then the applications are able to accomplish the task of state backup and recovery. Furthermore, a Performance Evaluator (PEV) that can detect performance degradation and predict time to failure is developed in this study. By using these detection and prediction capabilities, the APCS can perform the failover process before a node breakdown. Thus, by applying APCS and PEV, a distributed computer system can provide near-zero-downtime services.
[1] R. Gamache, R. Short, and M. Massa, “Windows NT Clustering Service,”
Computer, vol. 31, no. 10, pp.55-62, Oct. 1998.
[2] W. Vogels, D. Dumitriu, K. Birman, R. Gamache, M. Massa, R. Short, J.
Vert, J. Barrera, and J. Gray, “The Design and Architecture of the Microsoft
Cluster Service—A Practical Approach to High-Availability and
Scalability,”in Proc. 28th Symp. Fault-Tolerant Computing, CS Press, 1998,
pp. 422-431.
[3] J. S. Vetter, and F. Mueller, “Communication Characteristics of Large-Scale
Scientific Applications for Contemporary Cluster Architectures,” in
Proceedings of International Parallel and Distributed Processing Symposium,
IPDPS 2002, Fort Lauderdale, California, April 2002.
[4] H. M. Bucker, B. Eck, and J. Henrichs, “Experiences with Scientific
Applications on an SCI-based Linux Cluster,” in Proceedings of 2000
International Workshop on Parallel Processing, Toronto, Canada, Aug 2000.
[5] K.-P. Chow and Y.-K. Kwok, “On Load Balancing for Distributed Multiagent
Computing, ” IEEE Transactions on Parallel and Distributed Systems, vol.
13, no. 8, pp.787-801, Aug. 2002.
[6] D. A. Menasce, “Trade-offs in Designing Web Clusters,” IEEE Internet
Computing, vol. 6, no. 5, pp.76-80, Sept.-Oct. 2002.
[7] K. Shen, T. Yang, and L. Chu, “Clustering Support and Replication Management
for Scalable Network Services,” IEEE Transactions on Parallel and
Distributed Systems, vol. 14, no. 11, pp.1168-1179, Nov. 2003.
[8] B. Gordon and V. I. Catharine, “DSM Perspective: Another Point of View,”
Proceedings of IEEE, IEEE, vol. 87, no. 3, pp. 412-417, March 1999.
[9] S. Hairong, J. J. Han, and H. Levendel, “Availability Requirement for a
Fault-Management Server in High-Availability Communication Systems,” IEEE
Transactions on Reliability, vol. 52, no. 2, pp. 238-244, June 2003.
[10] M.-S. Kim, M.-J. Choi, and J.-W. Hong, “Highly Available and Efficient Load
Cluster Management System Using SNMP and Web,” in Proc. of the IEEE/IFIP
Network Operations and Management Symposium (NOMS 2002), Florence, Italy,
pp. 619-632, April 2002.
[11] C. Leangsuksun, L. Shen, T. Liu, S. Hertong, and S. L. Scott, “Availability
Prediction and Modeling of High Availability OSCAR Cluster,” in Proc. IEEE
International Conference on Cluster Computing, Hong Kong, pp. 380-386, 1-4
Dec. 2003.
[12] F.-T. Cheng, H.-C. Yang, and C.-Y. Tsai, “Developing a Service Management
Scheme for Semiconductor Factory Management Systems,” IEEE Robotics and
Automation Magazine, vol. 11, no. 1, pp. 26-40, March 2004.
[13] B. Meyer, “Applying “Design by Contract”,” IEEE Computer, vol. 25, no.
10, pp. 40-51, Oct. 1992.
[14] K. Birman and R. V. Renesse, Reliable Distributed Computing with the Isis
Toolkit, LA: IEEE Computer Society Press, 1994.
[15] Y. Huang and C. Kintala, “Software Implemented Fault Tolerance:
Technologies and Experience,” in the 23rd International Symposium on
Fault-tolerance Computing (FTCS), Toulouse, France, pp.2-10, June 1993.
[16] T. Osman and A. Bargiela, “FADI: A Fault Tolerant Environment for Open
Distributed Computing,” IEE Proceedings of Software, vol. 147, no. 3, pp.
91-99, June 2000.
[17] B. Natarajan, A. Gokhale, and S. Yajnik, “DOORS: Towards High-performance
Fault Tolerant CORBA,” IEEE International Symposium on Distributed Objects
and Applications, Antwerp, Belgium, pp. 39-48, 2000.
[18] M. Silvano, “Piranha: A CORBA Tool For High Availability,” Computer, vol.
30, no.4, pp. 59-66, April 1997.
[19] K. Arnold, B. O’Sullivan, R. W. Scheifler, J. Waldo, and A. Wollrath, The
Jini Specification, USA: Addison-Wesley, 1999.
[20] Object Management Group, CORBAservices: Common Object Services
Specification, OMG Document 95-3-31, Framingham, Mass., 1995.
[21] R. E. Johson, “Components, Frameworks, Patterns” in 1997 IEEE
International Conference on Robotics and Automation, Albuguerque, NM,
U.S.A., April 1997.
[22] J. McGehee, J. Hebley, J. Mahaffey, “The MMST Computer Integrated
Manufacturing System Framework, ” IEEE Transactions on Semiconductor
Manufacturing, vol. 7, No. 2, PP. 107-115, May 1994.
[23] D. J. Chen and D. T. K. Chen, “An Experiment Study of Using Reusable
Software Design Frameworks to Achieve Software Reuse,” Journal of Object
Oriented Programming (JOOP), May 1994.
[24] J. Siegel, CORBA 3 Fundamentals and Programming, OMG Press, 1999.
[25] Object Management Group, Fault Tolerant CORBA Specificatoin, OMG Document
orbos/99-12-08 edition, December 1999.
[26] E. E. Lewis, Introduction to Reliability Engineering, NY: John Wiley & Sons,
1996.
[27] F.-T. Cheng, C.-F. Chang, and S.-L. Wu, “Development of Holonic
Manufacturing Execution Systems,” Journal of Intelligent Manufacturing,
vol. 15, no. 2, pp. 253-
267, April 2004.
[28] SEMATECH, HSMS Technical Education Report, SEMATECH, 1995.
[29] SEMI, Equipment Automation/Software Vol. 1 and 2, Semiconductor Equipment
and Materials International, 1996.
[30] Z. Mowbray, The Essential CORBA: Systems Integration Using Distributed
Objects, ISBN 0-471-10611-9. NY: John Wiley & Sons, 1995.
[31] R. Orfali, D. Harkey, and J. Edwards, The Essential Distributed Objects
Survival Guide, NY: John Willy & Sons, 1996.
[32] G. Erich, H. Richard, J. Ralph, and V. John, Design Patterns: Elements of
Reusable Object-Oriented Software (1st ed.). USA: Addison Wesley
Professional, 1994.
[33] H. E. Eriksson and M. Penker, UML Toolkit, NY: John Willy & Sons, 1998.
[34] G. Booch, Object-Oriented Analysis and Design with Applications, Redwood
City, CA: Benjamin Cummings, 1994.
[35] J. Rumbough, M. Blaha, and W. Premerlani, F. Eddy, and F. Lorensen,
Object-Oriented Modeling and Design, Englewood Cliffs, NJ: Prentice-Hall,
1991.
[36] I. Jacobson, M. Christerson, and G. Övergaard, Object-Oriented Software
Engineering, Reding, NY: Addison-Wesley, 1992.
[37] D. Coleman, S. Bodoff, and P. Arnold, Object-Oriented Development: The
Fusion Method, NJ: Prentice Hall, 1994.
[38] M. Evan and S. Hal, Blueprints for High Availability: Designing Resilient
Distributed Systems, NY: John Wiley & Sons, February 2000.
[39] P. Folyed and H. Michael, High Availability: Design, Techniques And
Processes, NJ: Prentice Hall. 2001.
[40] IMS vision, Center for Intelligent Maintenance Systems. [Online]. Available:
http://wumrc.engin.umich.edu/ims/?page=home
[41] Veritas FirstWatch ,Veritas Corporation. [Online]. Available:
http://www.veritas.com/us/products/firstwatch
[42] UML, OMG's 1997 Press Releases. [Online]. Available:
http://www.omg.org/news/pr97.htm
[43] PC Cluster, Clustering Center.com. [Online]. Available:
http://www.clusteringcenter.com/
[44] HA Cluster, High Availability Center.com. [Online]. Available:
http://www.highavailabilitycenter.com/
[45] The definition of Transition Function, National Institute of Standards and
Technology. [Online]. Available: http://www.nist.gov/dads/HTML/transitionfn.html
[46] Matrix HA/Server, PolyServe Corporation. [Online]. Available:
http://www.polyserve.com/
[47] Borland C++ Builder 6.0, Borland Corporation. [Online]. Available:
http://www.borland.com/cbuilderx/
[48] Microsoft SQL Server 2000, Microsoft Corporation. [Online]. Available:
http://www.microsoft.com/
[49] Beowulf Project, Beowulf Introduction & Overview. [Online]. Available:
http://beowulf.es.embnet.org/intro.html