| 研究生: |
林益聖 Lin, Yi-Sheng |
|---|---|
| 論文名稱: |
需求導向的透通格網資料協同配置機制之研究 A Transparent On-Demand Co-Allocation Scheme for Data Grids |
| 指導教授: |
張志標
Chang, Jyh-Biau 謝錫堃 Shieh, Ce-Kuen |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2008 |
| 畢業學年度: | 96 |
| 語文別: | 英文 |
| 論文頁數: | 51 |
| 中文關鍵詞: | 資料格網 、資料存取機制 、需求導向 、透通 、資料協同配置 |
| 外文關鍵詞: | data Grid, co-allocation, data access scheme, on-demand, transparency |
| 相關次數: | 點閱:59 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在常見的資料格網系統當中,資料密集(Data-intensive)的應用程式通常利用資料預先準備(Pre-staging)機制或者是碎片資料共享機制來存取資料。資料預先準備機制同時地從多路的資料來源下載一個完整的共享檔案,甚至當只有一小塊的檔案片段被需要的時候,明顯地,資料預先準備機制浪費不必要的資料傳送以及儲存空間。相反地,碎片資料共享機制從單一的資料來源下載被需要的資料片段,然而,碎片資料共享機制不能充份利用可用的網路頻寬。因此這篇論文提出了一個利用混合機制的資料共享系統,這個混合機制被稱為需求導向的資料協同配置機制(On-Demand Co-Allocation)。它利用多路的資料串流來同時地抓取被需要的檔案片段,因此它能夠減少資料的傳輸時間、被浪費的網路頻寬,以及所需要的儲存空間。除此之外,它提供了一個透通的方式,讓不知曉格網的應用程式不需修改或重新編譯程式,就可使用格網上的資料。最後、由實驗結果指出,需求導向的協同配置機制能夠在資料格網上減少資料密集應用程式的執行時間。
With the conventional data grid system, a data-intensive application usually accesses data by using either the pre-staging scheme or the fragmented data sharing scheme. The pre-staging scheme simultaneously downloads an entire shared file from multiple data sources even when only a tiny file fragment is required. Obviously, the pre-staging scheme consumes unnecessary data transmission time and storage space. The fragmented data sharing scheme, on the other hand, downloads only the necessary fragments from a single data source. However, the fragmented data sharing scheme does not fully exploit available network bandwidth. Accordingly, this thesis presents a data sharing system which uses a hybrid scheme, designated as the On-Demand Co-Allocation scheme (ODCA). The ODCA scheme uses multiple data streams to concurrently fetch different necessary file fragment, thereby reducing data transmission time, wasted network bandwidth and required storage space. Beside, the ODCA scheme provides a transparency that facilitates the grid-unaware applications can use the Data Grid without any modification or re-compiling. Experimental results show the ODCA scheme successfully reduces turnaround time in data-intensive applications.
[1]I. Foster, C. Kesselman, S. Tuecke, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”, International Journal of High Performance Computing Applications, Vol. 15, No. 3, 2001, pp. 200-222.
[2]S. Venugopal, R. Buyya, and K. Ramamohanarao, “A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing”, ACM Computing Surveys, Volume 38, No. 1, ISSN 0360-0300, ACM Press, New York, USA, March 2006 , pp. 1-53.
[3]B. Shishir, B. David, C. Kasidit, M. CHEN, C. Ann, C. Luca, D. Bob, F. Ian, F. Peter, G. Jose, K. Carl, M. Rob, M. Don, N. Veronika, P. Line, S, Arie, S, Alex, S. Gary, W. Dean, “The Earth System Grid: Supporting the Next Generation of Climate Modeling Research”, Proceedings of the IEEE, Volume: 93, Issue: 3, March 2005 , pp. 485-495.
[4]CERN. http://public.web.cern.ch/Public/
[5]V. Astakhov, A. Gupta, S. Santini, JS, Grethe, “Data Integration in the Biomedical Informatics Research Network (BIRN)”, In: (B.Ludäscher, and L Raschid eds.) Second International Workshop, Data Integration in Life Sciences, San Diego, CA, USA, July 20-22, 2005.SCEC.
[6]S. Vazhkudai, “Enabling the Co-Allocation of Grid Data Transfers”, GRID '03: Proceedings of the Fourth International Workshop on Grid Computing, November 2003.
[7]A. Anjomshoaa, F. Brisard, M. Drescher, D. Fellows, A. Ly, S. McGough, D. Pulsipher, A. Savva. “Job Submission Description Language (JSDL) Specification v1.0”, OGF Grid Final Documents, 2005.
[8]I. Foster, C. Kesselman, “Globus: A Metacomputing Infrastructure Toolkit” International Journal of Supercomputer Applications 11(2), 1997, pp.115
[9]W. Allcock, J. Bresnahan, R, Kettimuthu, M. Link, C. Dumitrescu, I. Raicu, I, Foster, “The Globus Striped GridFTP Framework and Server”, Proceeding of Super Computing 2005 (SC05), November 2005.
[10]Rebecca L. Collins, James S. Plank, “Downloading Replicated, Wide-Area Files - A Framework and Empirical Evaluation”, NCA '04: Proceedings of the Network Computing and Applications, Third IEEE International Symposium on (NCA'04), August 2004.
[11]S. Vazhkudai, M. Schopf, I. Foster, “Predicting the Performance of Wide Area Data Transfers”, Proceedings of the 16th International Symposium on Parallel and Distributed Processing, 2002.
[12]S. Tikar, S. Vadhiyar, “Efficient reuse of replicated parallel data segments in computational grids”, Future Generation Computer Systems, Volume 24, Issue 7, July 2008, pp. 644-657.
[13]M. Allen, Rich Wolski, “The Livny and Plank-Beck Problems: Studies in Data Movement on the Computational Grid”, SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, November 2003.
[14]R.S. Chang, C.M. Wang, and P.H. Chen, “Replica Selection on Co-allocation Data Grids”, LNCS, Parallel and Distributed Processing and Applications, 2005, pp. 584-593.
[15]R.S. Chang, P.H. Chen, “Complete and fragmented replica selection and retrieval in Data Grids”, Future Generation Computer Systems, Volume 23 Issue 4, May 2007.
[16]C. Baru, R. Moore, A. Rajasekar, M. Wan, “The SDSC Storage Resource Broker”, Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research.
[17]A. Rajasekar, M. Wan, R. Moore, W. Schroeder, G. Kremenek, A. Jagatheesan, C. Cowart, B. Zhu, SY. Chen, R. Olschanowsky, “Storage Resource Broker - Managing Distributed Data in a Grid”, Computer Society of India Journal, Special Issue on SAN, Vol. 33, No. 4, October 2003, pp. 42-54
[18]J. Zhang, P. Honeyman, “NFSv4 replication for grid storage middleware”, Proceedings of the 4th international workshop on Middleware for grid computing MCG '06.
[19]J. Zhang, P. Honeyman, “A replicated file system for Grid computing”, Concurrency and Computation: Practice and Experience, Volume 20 Issue 9, 2008, pp. 1113 – 1130.
[20]O. Tatebe, Y. Morita, S. Matsuoka, N.Soda, S. Sekiguchi, “Grid Datafarm Architecture for Petascale Data Intensive Computing”, CCGRID '02: Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid, May 2002.
[21]Y. Morita, O. Tatebe, S. Matsuoka, N. Soda, H. Sato, Y. Tanaka, S. Sekiguchi, S. Kawabata, Y. Watase, M. Imori, T. Kobayashi, “Grid Data Farm for Atlas Simulation Data Challenges”, Proceedings of International Conference on Computing of High Energy and Nuclear Physics, 2001, pp. 699-701.
[22]N. Yamamoto, O. Tatebe, S. Sekiguchi, “Parallel and Distributed Astronomical Data Analysis on Grid Datafarm”, Proceedings of 5th IEEE/ACM International Workshop on Grid Computing (Grid 2004), 2004, pp.461-466.
[23]C.H. Lin, C.K. Shieh, J.B. Chang. “A DSM-based Block Level Data Sharing Mechanism on Grid”, a Master Thesis in NCKU, ROC, 2007.
[24]V. Gera, “FUSE Kernel Operations Function Specifications”, Market Development Engineering Sun Microsystems, Inc, September 2006.
[25]R. Wolski, N. Spring, J. Hayes, “The network weather service: a distributed resource performance forecasting service for metacomputing”, Future Generation Computer Systems, Volume 15, Issues 5-6, October 1999, pp. 757-768
[26]T.Y. Liang, C.Y. Wu, J.B. Chang, C.K. Shieh, “Teamster-G: a grid -enabled software DSM system”, Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid2005).
[27]J.B. Chang, T.Y. Liang, C.K. Shieh, “Teamster: A Transparent Distributed Shared memory for Clustered Symmetric Multiprocessors”, accepted for publication in the special issue of The Journal of Supercomputing, September 6, 2003.
[28]K. Li, “IVY: A shared virtual memory system for parallel computing”, In Proceeding of the 1988 International Conference on Parallel Processing (ICPP’88), 1988, pp. 94-101.
[29]UnixODBC. http://www.unixodbc.org/
[30]M. Carson, D. Santay, “NIST Net: A Linux-based Network Emulation Tool”, ACM SIGCOMM Computer Communication Review, Volume 33, Issue 3, July 2003, pp.111-126
[31]Wavelet. http://eeweb.poly.edu/~onur/source.html
[32]FFT. http://www.FFTW.org
[33]D. Thain, J. Basney, S.C. Son, M. Livny, “The Kangaroo Approach to Data Movement on the Grid”, 10th IEEE International Symposium on High Performance Distributed Computing (HPDC-10 ’01).