簡易檢索 / 詳目顯示

研究生: 吳尚倫
Wu, Shang-Lun
論文名稱: 具錯誤後轉移能力之應用程式叢集服務
Application Cluster Service with Failover Capability
指導教授: 鄭芳田
Cheng, Fan-Tien
楊浩青
Yang, Haw-Ching
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 製造工程研究所
Institute of Manufacturing Engineering
論文出版年: 2004
畢業學年度: 92
語文別: 中文
論文頁數: 168
中文關鍵詞: 叢集服務高妥善率設計樣版狀態回復效能評估器錯誤後轉移
外文關鍵詞: Design Pattern, State Recovery, Failover, Performance Evaluator, High Availability, Cluster Service
相關次數: 點閱:126下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 資訊應用系統的可靠度需求願景為每週7天,每天24小時不間斷地工作。一個可靠度差的資訊系統往往因為執行環境的異常或者人為操作的疏失而使工作停擺,導致金錢上嚴重的損失。有鑑於此,本研究提出一種具應用程式錯誤後轉移能力之叢集服務架構 (Application Cluster Service, APCS)。此叢集服務針對應用程式的管理提供了兩種服務機制:“錯誤後轉移機制” 與 “狀態回復機制”。“錯誤後轉移機制” 主要功能為當工作中的應用程式或電腦發生異常而終止服務時,會自動地啟動備援應用程式來代替原本失效的應用程式。而 “狀態回復機制”主要是針對有狀態回復需求的應用程式,設計了一套讓應用程式繼承的設計樣版,應用程式只要繼承此設計樣版並實做出內容即可與 “狀態回復機制” 溝通,完成狀態備份與還原的工作。另外本研究更進一步地研發具電腦效能偵測與評估機制的 “效能評估器” (Performance Evaluator, PEV)。將 PEV 與 APCS 結合後,由於PEV可事先偵測出電腦效能衰竭情況,所以可在叢集中之節點當機前通知並執行應用程式轉移的動作,達到當機前轉移的效果,俾提供連續不間斷的服務。

    The required reliability in applications of a distributed computer system is 24 hours a day, 7 days a week nonstop services. However, the abnormalities of the operating environment or manual errors may interrupt the services that may cause a great loss. Hence, this work proposes an Application Cluster Service with failover capacities. The proposed clustering services offer both “Failover Scheme” and “State Recovery Scheme” for the failure management. “Failover Scheme” is mainly to automatically activate the backup application for replacing the failed application when it is sick or down. “State Recovery Scheme” is mainly to provide an inheritable design pattern for supporting the applications that have state recovery requirement. The applications only have to inherit this design pattern and implement the contents, and then the applications are able to accomplish the task of state backup and recovery. Furthermore, a Performance Evaluator (PEV) that can detect performance degradation and predict time to failure is developed in this study. By using these detection and prediction capabilities, the APCS can perform the failover process before a node breakdown. Thus, by applying APCS and PEV, a distributed computer system can provide near-zero-downtime services.

    目 錄 第一章、緒論....................................................................1 1.1 緒論........................................................................1 1.2 應用實例....................................................................5 1.3 論文結構....................................................................7 第二章、理論基礎................................................................9 2.1 CORBA (Common Object Request Broker Architecture)...........................9 2.2高妥善率電腦叢集 (HA PC Cluster)............................................10 2.3 設計樣版 ( Design Pattern )................................................12 2.4 統一塑模語言 (Unified Modeling Language, UML)..............................13 2.5 物件導向軟體開發程序.......................................................14 第三章、具錯誤後轉移能力之應用程式叢集服務 (Application Cluster Service, APCS).17 3.1 系統架構與模組設計.........................................................17 3.2 錯誤後轉移機制.............................................................20 3.2.1節點聚集..................................................................21 3.2.2啟動與偵測應用程式........................................................22 3.2.3應用程式失效復原..........................................................24 3.2.4 節點偵測.................................................................25 3.2.5 節點替換.................................................................27 3.2.6 錯誤後轉移服務管理員架構.................................................30 3.3 應用程式狀態回復機制.......................................................32 3.4 整合電腦效能偵測機制.......................................................38 3.4.1 效能評估器 (PEV) 簡介....................................................38 3.4.2 整合 APCS 與PEV..........................................................41 3.5 軟硬體環境之應用與改善.....................................................43 3.6 系統效能與可靠度探討.......................................................46 3.6.1 狀態回復機制對系統效能之影響.............................................46 3.6.2 PEV誤判的可能性..........................................................47 3.6.3 結合 APCS 與 PEV 之系統可靠度............................................48 第四章、物件導向分析...........................................................51 4.1 錯誤後轉移機制之物件導向分析...............................................51 4.1.1 需求分析.................................................................51 4.1.2 使用者案例圖.............................................................52 4.1.3 物件導向分析之循序圖.....................................................54 4.1.4 物件導向分析之類別圖.....................................................82 4.2 狀態回復機制之物件導向分析.................................................85 4.2.1 需求分析.................................................................85 4.2.2 使用者案例圖.............................................................85 4.2.3 物件導向分析之循序圖.....................................................87 4.2.4 物件導向分析之類別圖.....................................................92 第五章、物件導向設計...........................................................95 5.1 錯誤後轉移機制之物件導向設計...............................................95 5.1.1 物件導向設計之循序圖.....................................................95 5.1.2 物件導向設計之類別圖....................................................115 5.1.3 定義IDL (interface definition language).................................127 5.2 狀態回復機制之物件導向設計................................................128 5.2.1 物件導向設計之循序圖....................................................128 5.2.2 物件導向設計之類別圖....................................................133 第六章、機制之實現............................................................140 6.1 開發環境..................................................................140 6.2 使用者介面................................................................141 6.3 案例實作..................................................................143 第七章、叢集服務機制之優缺點比較..............................................161 第八章、結論..................................................................163 參考文獻......................................................................165

    [1] R. Gamache, R. Short, and M. Massa, “Windows NT Clustering Service,”
    Computer, vol. 31, no. 10, pp.55-62, Oct. 1998.

    [2] W. Vogels, D. Dumitriu, K. Birman, R. Gamache, M. Massa, R. Short, J.
    Vert, J. Barrera, and J. Gray, “The Design and Architecture of the Microsoft
    Cluster Service—A Practical Approach to High-Availability and
    Scalability,”in Proc. 28th Symp. Fault-Tolerant Computing, CS Press, 1998,
    pp. 422-431.

    [3] J. S. Vetter, and F. Mueller, “Communication Characteristics of Large-Scale
    Scientific Applications for Contemporary Cluster Architectures,” in
    Proceedings of International Parallel and Distributed Processing Symposium,
    IPDPS 2002, Fort Lauderdale, California, April 2002.

    [4] H. M. Bucker, B. Eck, and J. Henrichs, “Experiences with Scientific
    Applications on an SCI-based Linux Cluster,” in Proceedings of 2000
    International Workshop on Parallel Processing, Toronto, Canada, Aug 2000.

    [5] K.-P. Chow and Y.-K. Kwok, “On Load Balancing for Distributed Multiagent
    Computing, ” IEEE Transactions on Parallel and Distributed Systems, vol.
    13, no. 8, pp.787-801, Aug. 2002.

    [6] D. A. Menasce, “Trade-offs in Designing Web Clusters,” IEEE Internet
    Computing, vol. 6, no. 5, pp.76-80, Sept.-Oct. 2002.

    [7] K. Shen, T. Yang, and L. Chu, “Clustering Support and Replication Management
    for Scalable Network Services,” IEEE Transactions on Parallel and
    Distributed Systems, vol. 14, no. 11, pp.1168-1179, Nov. 2003.

    [8] B. Gordon and V. I. Catharine, “DSM Perspective: Another Point of View,”
    Proceedings of IEEE, IEEE, vol. 87, no. 3, pp. 412-417, March 1999.

    [9] S. Hairong, J. J. Han, and H. Levendel, “Availability Requirement for a
    Fault-Management Server in High-Availability Communication Systems,” IEEE
    Transactions on Reliability, vol. 52, no. 2, pp. 238-244, June 2003.

    [10] M.-S. Kim, M.-J. Choi, and J.-W. Hong, “Highly Available and Efficient Load
    Cluster Management System Using SNMP and Web,” in Proc. of the IEEE/IFIP
    Network Operations and Management Symposium (NOMS 2002), Florence, Italy,
    pp. 619-632, April 2002.

    [11] C. Leangsuksun, L. Shen, T. Liu, S. Hertong, and S. L. Scott, “Availability
    Prediction and Modeling of High Availability OSCAR Cluster,” in Proc. IEEE
    International Conference on Cluster Computing, Hong Kong, pp. 380-386, 1-4
    Dec. 2003.

    [12] F.-T. Cheng, H.-C. Yang, and C.-Y. Tsai, “Developing a Service Management
    Scheme for Semiconductor Factory Management Systems,” IEEE Robotics and
    Automation Magazine, vol. 11, no. 1, pp. 26-40, March 2004.

    [13] B. Meyer, “Applying “Design by Contract”,” IEEE Computer, vol. 25, no.
    10, pp. 40-51, Oct. 1992.

    [14] K. Birman and R. V. Renesse, Reliable Distributed Computing with the Isis
    Toolkit, LA: IEEE Computer Society Press, 1994.

    [15] Y. Huang and C. Kintala, “Software Implemented Fault Tolerance:
    Technologies and Experience,” in the 23rd International Symposium on
    Fault-tolerance Computing (FTCS), Toulouse, France, pp.2-10, June 1993.

    [16] T. Osman and A. Bargiela, “FADI: A Fault Tolerant Environment for Open
    Distributed Computing,” IEE Proceedings of Software, vol. 147, no. 3, pp.
    91-99, June 2000.

    [17] B. Natarajan, A. Gokhale, and S. Yajnik, “DOORS: Towards High-performance
    Fault Tolerant CORBA,” IEEE International Symposium on Distributed Objects
    and Applications, Antwerp, Belgium, pp. 39-48, 2000.

    [18] M. Silvano, “Piranha: A CORBA Tool For High Availability,” Computer, vol.
    30, no.4, pp. 59-66, April 1997.

    [19] K. Arnold, B. O’Sullivan, R. W. Scheifler, J. Waldo, and A. Wollrath, The
    Jini Specification, USA: Addison-Wesley, 1999.

    [20] Object Management Group, CORBAservices: Common Object Services
    Specification, OMG Document 95-3-31, Framingham, Mass., 1995.

    [21] R. E. Johson, “Components, Frameworks, Patterns” in 1997 IEEE
    International Conference on Robotics and Automation, Albuguerque, NM,
    U.S.A., April 1997.

    [22] J. McGehee, J. Hebley, J. Mahaffey, “The MMST Computer Integrated
    Manufacturing System Framework, ” IEEE Transactions on Semiconductor
    Manufacturing, vol. 7, No. 2, PP. 107-115, May 1994.

    [23] D. J. Chen and D. T. K. Chen, “An Experiment Study of Using Reusable
    Software Design Frameworks to Achieve Software Reuse,” Journal of Object
    Oriented Programming (JOOP), May 1994.

    [24] J. Siegel, CORBA 3 Fundamentals and Programming, OMG Press, 1999.

    [25] Object Management Group, Fault Tolerant CORBA Specificatoin, OMG Document
    orbos/99-12-08 edition, December 1999.

    [26] E. E. Lewis, Introduction to Reliability Engineering, NY: John Wiley & Sons,
    1996.

    [27] F.-T. Cheng, C.-F. Chang, and S.-L. Wu, “Development of Holonic
    Manufacturing Execution Systems,” Journal of Intelligent Manufacturing,
    vol. 15, no. 2, pp. 253-
    267, April 2004.

    [28] SEMATECH, HSMS Technical Education Report, SEMATECH, 1995.

    [29] SEMI, Equipment Automation/Software Vol. 1 and 2, Semiconductor Equipment
    and Materials International, 1996.

    [30] Z. Mowbray, The Essential CORBA: Systems Integration Using Distributed
    Objects, ISBN 0-471-10611-9. NY: John Wiley & Sons, 1995.

    [31] R. Orfali, D. Harkey, and J. Edwards, The Essential Distributed Objects
    Survival Guide, NY: John Willy & Sons, 1996.

    [32] G. Erich, H. Richard, J. Ralph, and V. John, Design Patterns: Elements of
    Reusable Object-Oriented Software (1st ed.). USA: Addison Wesley
    Professional, 1994.

    [33] H. E. Eriksson and M. Penker, UML Toolkit, NY: John Willy & Sons, 1998.

    [34] G. Booch, Object-Oriented Analysis and Design with Applications, Redwood
    City, CA: Benjamin Cummings, 1994.

    [35] J. Rumbough, M. Blaha, and W. Premerlani, F. Eddy, and F. Lorensen,
    Object-Oriented Modeling and Design, Englewood Cliffs, NJ: Prentice-Hall,
    1991.

    [36] I. Jacobson, M. Christerson, and G. Övergaard, Object-Oriented Software
    Engineering, Reding, NY: Addison-Wesley, 1992.

    [37] D. Coleman, S. Bodoff, and P. Arnold, Object-Oriented Development: The
    Fusion Method, NJ: Prentice Hall, 1994.

    [38] M. Evan and S. Hal, Blueprints for High Availability: Designing Resilient
    Distributed Systems, NY: John Wiley & Sons, February 2000.

    [39] P. Folyed and H. Michael, High Availability: Design, Techniques And
    Processes, NJ: Prentice Hall. 2001.

    [40] IMS vision, Center for Intelligent Maintenance Systems. [Online]. Available:
    http://wumrc.engin.umich.edu/ims/?page=home

    [41] Veritas FirstWatch ,Veritas Corporation. [Online]. Available:
    http://www.veritas.com/us/products/firstwatch

    [42] UML, OMG's 1997 Press Releases. [Online]. Available:
    http://www.omg.org/news/pr97.htm

    [43] PC Cluster, Clustering Center.com. [Online]. Available:
    http://www.clusteringcenter.com/

    [44] HA Cluster, High Availability Center.com. [Online]. Available:
    http://www.highavailabilitycenter.com/

    [45] The definition of Transition Function, National Institute of Standards and
    Technology. [Online]. Available: http://www.nist.gov/dads/HTML/transitionfn.html

    [46] Matrix HA/Server, PolyServe Corporation. [Online]. Available:
    http://www.polyserve.com/

    [47] Borland C++ Builder 6.0, Borland Corporation. [Online]. Available:
    http://www.borland.com/cbuilderx/

    [48] Microsoft SQL Server 2000, Microsoft Corporation. [Online]. Available:
    http://www.microsoft.com/

    [49] Beowulf Project, Beowulf Introduction & Overview. [Online]. Available:
    http://beowulf.es.embnet.org/intro.html

    下載圖示 校內:2006-08-30公開
    校外:2007-08-30公開
    QR CODE