簡易檢索 / 詳目顯示

研究生: 鄭凱元
Cheng, Kai-Yuan
論文名稱: 在Apache HBase上高效能交易處理結合Scale-Out的快取系統:設計、實作、效能測試
High Performance Transactions Processing in Apache HBase with Scale-Out Caches: Design, Implementation and Performance Benchmarking
指導教授: 蕭宏章
Hsiao, Hung-Chang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 32
中文關鍵詞: 分散式快取系統交易
外文關鍵詞: HBase, Cache, transaction
相關次數: 點閱:104下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 儘管 Apache HBase™已經是一套相當出色的分散式big data store,但是缺少multi-row transactions的功能,因此本論文著重在研究提供transaction的HBase系統及加入scale-out caches來加快transactions的執行,間接地增加整體系統效能。我們在本研究中發現,caches的加入無法維持整個系統的transactional consistency,且違反底層資料庫的一致性(consistency)特性,因此我們藉由一台集中式伺服器來確保資料不論是來caches或是databases都是一致且最新的值;另外,我們也提供API讓使用者決定是否要存取cache的功能。最後,在實驗部分,我們透過TPC-C來驗證藉由加入scale-out caches的方式,可以提升整體系統執行transactions的效能。

    Although Apache HBase ™ is an emerging distributed key-value persistent data store. It lacks supporting multi-row transactions. In this thesis we explore how HBase can be enabled to provide transactions processing. Specifically, we suggest a scale-out caching mechanism to improve the overall system throughput on performing transactions. With caches, we observe that application programmers may have the difficulty to deal with the consistency issues between caches and databases server. We thus suggest a centralized approach to guarantee that data items stored in caches are up-to-date and consistent with their persistent database stores. In addition, we provide APIs to programmers to determine whether their application data are cached or not. Finally, with TPC-C, our experimental results in real environments validate that scale-out caches are efficient and effective in accelerating data access to NoSQL databases.

    摘要 iv ABSTRACT v ACKNOWLEDGEMENTS vi TABLE OF CONTENTS vii LIST OF TABLES viii LIST OF FIGURES ix CHAPTER 1 INTRODUCTION 1 CHAPTER 2 BACKGROUND 4 2.1 Apache HBASE 4 2.2 Memcached 5 2.3 Transaction Concurrency Control 7 2.4 Snapshot isolation 8 2.5 Omid 10 2.5.1 Overview 10 2.5.2 System model 11 CHAPTER 3 RELATED WORK 13 CHAPTER 4 OUR PROPOSED FRAMEWORK 14 4.1 Cache design 15 4.2 Commit process 16 4.3 Accessing cache objects 18 4.4 TPC-C based on our proposal 20 4.4.1 TPC-C 20 4.4.2 Our proposal with TPC-C 21 CHAPTER 5 EVULATION 22 5.1 System Deploying 22 5.2 Experiment 23 CHAPTER 6 SUMMURY AND FUTURE WORK 27 REFERENCES 29

    [1] R. Cattell, “Scalable SQL and NoSQL Data Stores,” ACM SIGMOD Record, vol. 39, no. 4, pp. 12–27, Dec 2010.
    [2] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, “Bigtable: a distributed storage system for structured data,” Proc. 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI '06), vol. 7, USENIX Association Berkeley, CA, USA, pp. 15-15, 2006.
    [3] Apache HBase, http://hbase.apache.org
    [4] James C. Corbett, Jeffery Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford, “Spanner: Google’s globally-distributed database,” Proc. 10th USENIX conference on Operating Systems Design and Implementation (OSDI ’12), USENIX Association, Berkeley, CA, USA, pp. 251–264, 2012.
    [5] Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni, “PNUTS: Yahoo!'s hosted data serving platform,” Proc. VLDB Endowment, vol. 1, no. 2, pp. 1277-1288, Aug. 2008.
    [6] Jason. Baker, Chris. Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson, Jean- Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh, “Megastore: Providing scalable, highly available storage for interactive services,” Proc. Conference on Innovative Data system Research (CIDR), pp. 223–234, 2011.
    [7] Jim Gray, “Notes on data base operating systems,” Proc. Operating Systems, an Advanced Course, London, UK, pp. 393-481, 1978.
    [8] Daniel Peng, and Frank Dabek, “Large-scale incremental processing using distributed transactions and notifications,” Proc. 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’10), pp. 1–15, 2010.
    [9] Hal Berenson, Phil. Bernstein, Jim Gray, Jim Melton, Elizabeth O’Neil, and Patrick O’Neal, “A critique of ANSI SQL isolation levels,” Proc. ACM SIGMOD international conference on Management of data, New York, NY, USA, pp 1-10, 1995.
    [10] Memcached, http://www.danga.com/memcached/.
    [11] N. Sampathkumar, M. Krishnaprasad, and A. Nori. Introduction to caching with Windows Server AppFabric. Technical report, Microsoft Corporation, Nov 2009.
    [12] NCache, http://www.alachisoft.com/ncache/
    [13] Omid, https://github.com/yahoo/omid/.
    [14] TPC-C, http://www.tpc.org/tpcc/.
    [15] Apache Hadoop, http://hadoop.apache.org/.
    [16] Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C Li, Ryan McElroy, Michael H Paleczny, Daniel N Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani, “Scaling Memcache at Facebook,” Proc. 10th USENIX conference on Networked Systems Design and Implementation (NSDI ’13), USENIX Association Berkeley, CA, USA, pp. 385-398, 2013.
    [17] Ashish Thusoo, Zheng Shao, Suresh Anthony, Dhruba Borthakur, Namit Jain, Joydeep Sen Sarma, Raghotham Murthy, and Hao Liu, “Data warehousing and analytics infrastructure at Facebook,” Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD ’10), New York, USA, pp. 1013–1020, 2010.
    [18] Twitter, http://twitter.com/
    [19] H. T. Kung, and John. T. Robinson, “On optimistic methods for concurrency control,” ACM Transactions on Database Systems (TODS), vol. 6, no. 2, pp. 213–226, June 1981.
    [20] Alan Fekete, Dimitrios Liarokapis, Elizabeth J O'Neil, Patrick E O'Neil, and Dennis Elliott Shasha. “Making snapshot isolation serializable,” ACM Transactions on Database Systems (TODS), vol. 30, no. 2, pp. 492–528, June 2005.
    [21] Atul Adya, Barbara Liskov, and Patrick O’Neil, “Generalized isolation level definitions,” Proc. 16th International Conference on Data Engineering(ICDE), IEEE Computer Society Washington, DC, USA, pp. 67–78, 2000.
    [22] Mihaela A Bornea, Orion Hodson, Sameh Elnikety, and Alan Fekete, “One-copy serializability with snapshot isolation under the hood,” Proc. IEEE 27th International Conference on Data Engineering. IEEE Computer Society Washington, DC, USA, pp. 625–636, Apr. 2011.
    [23] Michael J Cahill, Uwe Röhm, and Alan D. Fekete. “Serializable isolation for snapshot databases,” Proc. ACM SIGMOD international conference on Management of data, ACM, New York, NY, USA, pp. 729-738, 2008.
    [24] Stephen A Revilak, Patrick E O'Neil, and Elizabeth J O'Neil, “Precisely Serializable Snapshot Isolation (PSSI),” Proc. IEEE 27th International Conference on Data Engineering, IEEE Computer Society Washington, DC, USA, pp. 482–493, 2011.
    [25] Hyungsoo Jung, Hyuck Han, Alan Fekete, and Uwe Röhm, “Serializable Snapshot Isolation for Replicated Databases in High-Update Scenarios,” Proc. PVLDB, pp. 783-794, 2011.
    [26] Daniel Gomez Ferro´∗, Flavio Junqueira∗, Ivan Kelly, Benjamin Reed∗, Maysam Yabandeh∗†, “Omid: Lock-free Transactional Support for Distributed Data Stores,” Proc. Data Engineering (ICDE), 2014 IEEE 30th International Conference on, Chicago, IL, USA, pp. 676 – 687, Apr. 2014.
    [27] R Bakalova, Alex Chow, C Fricano, P Jain, N Kodali, D Poirier, S Sankaran, and D Shupp, “WebSphere dynamic cache: Improving J2EE application performance,” IBM Systems Journal, vol. 43, no. 2, pp. 351-370, Apr. 2004.
    [28] Redis, http://redis.io/.
    [29] Cassandra, http://cassandra.apache.org/.
    [30] John Kenneth Ousterhout, Parag Agrawal, David B Erickson, Christos E Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, Aravind Narayanan, Diego Ongaro, Guru Parulkar, Mendel Rosenblum, Stephen M Rumble, Eric Stratmann, and Ryan Stutsman, “The case for RAMClouds,” Communications of the ACM, vol. 54, no. 7, pp. 121-130, July. 2011.
    [31] JBoss Cache, http://www.jboss.org/jbosscache/.
    [32] OracleAS web cache, http://www.oracle.com/technology/products/ias/web_cache/.
    [33] Dan R K Ports, Austin T Clements, Irene Zhang, Samuel Ross Madden, Barbara H. Liskov, “Transactional consistency and automatic management in an application data cache,” Proc. 9th USENIX Symposium on Operating Systems Design and Implementation, USENIX Association Berkeley, CA, USA, pp. 1–15, 2010.
    [34] Ubuntu, http://www.ubuntu.com/, 2014.

    下載圖示 校內:2024-08-31公開
    校外:2024-08-31公開
    QR CODE