研究生: |
黃泰曄 Huang, Tai-Yeh |
---|---|
論文名稱: |
基於多叢集資料庫的交易系統:以HBase為例 Transactions Processing in Replicated NoSQL Databases: The Case of Apache HBase |
指導教授: |
蕭宏章
Hsiao, Hung-Chang |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2014 |
畢業學年度: | 102 |
語文別: | 英文 |
論文頁數: | 26 |
中文關鍵詞: | HBase 、Omid 、multi-cluster 、transaction |
外文關鍵詞: | HBase, Omid, multi-cluster, transaction |
相關次數: | 點閱:71 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
NOSQL是新興的鍵值永久儲存載體(key-value persistent data stores),其設計可以用來處理迅速增加的海量資料。幾個著名的NOSQL產品,如Google Bigtable/Spanner、Hadoop HBase、Cassandra、MongoDB等。這些系統多由資料切割、複本甚至是多叢集來達成高效率存取,每個系統皆有它們的優勢及成本,但為了可擴充性,常常放棄了multi-row transactions的操作。雖然有相關專案著手完成transactions操作(如Yahoo!所開發的Omid),但Omid僅提供單一叢集的服務,並不能用於目前許多NOSQL產品的多叢集佈置。由於HBase所提供的非同步式複製及Omid的transactions實現方式將會有資料不同步的問題,本論文著重在實現於多叢集架構下提供transactions操作。我們的系統在不使用高延遲的同步協定解決了以上問題並且提供了不同等級的一致性供使用者使用,包括原本就有的最終一致性、強一致性以及以transactions為單位的relaxed snapshot isolation以允許資料不一致的使用者使用。
NOSQL key-value persistent data stores are emerging, which are designed to accommodate a potentially large volume of data that are increased rapidly from a variety of sources. Most NoSQL data stores achieve high data access throughput via partitioning and replication, each with its own advantages and shortcomings. For high scalability, NoSQL data stores in general do not provide transactional operations. Existing efforts such as Yahoo Omid support transactions processing in NoSQL data stores. However, they are mainly developed for a single cluster. The thesis presents design and implementation of supporting transactions processing in a multi-cluster environment in which clusters that assemble the environment are distributed at geographically distinct locations. We suggest several data consistency models in this thesis to improve the overall system throughput on transactions processing. Specifically, our idea has been implemented in Apache HBase, and the performance of our proposed solutions are investigated in real environments.
[1] Apache Hadoop. http://hadoop.apache.org/
[2] Apache HBase. http://hbase.apache.org/
[3] B. F. Cooper, R. Ramakrishnan, et al. “Pnuts: Yahoo!’s hosted data serving platform”. Proc. VLDB Endow., 1(2):1277–1288, 2008.
[4] Cassandra. http://cassandra.apache.org/
[5] F. Chang, J. Dean, et al. “Bigtable: A Distributed Storage System for Structured Data”. Proc. ACM TOCS., 26(2):1–26, 2008.
[6] Facebook. http://www.facebook.com/
[7] G. DeCandia, D. Hastorun, et al. “Dynamo: Amazon’s highly available key-value store”. Proc. ACM SOSP., 41(6):205-220, 2007.
[8] G. FERRO, Daniel, et al. “Omid: Lock-free transactional support for distributed data stores”. Proc. IEEE ICDE., 676-687, 2014.
[9] How Google Serves Data from Multiple Datacenters. http://highscalability.com/blog/2009/8/24/how-google-serves-data-from-multiple-datacenters.html
[10] H. Mahmoud, F. Nawab, et al. “Low-latency multi-datacenter databases using replicated commit”. Proc. VLDB Endow., 6(9):661-672, 2013.
[11] J. Corbett, J. Dean, et al. “Spanner: Google's globally-distributed database”. Proc. ACM TOCS., 31(3):8, 2013.
[12] L. Lamport. “Paxos made simple”. ACM Sigact News., 32(4):18-25, 2001.
[13] Microsoft Azure. http://azure.microsoft.com/
[14] M. Liu, D. Agrawal, and Amr E. Abbadi. “The performance of two phase commit protocols in the presence of site failures.” Proc. DAPD., 6(2):157-182, 1998.
[15] S. Ghemawat, H. Gobioff, and S. Leung. “The Google file system”. Proc. ACM SOSP., 37(5):29-43, 2003.
[16] Yabandeh, Maysam, and G. Daniel. “A critique of snapshot isolation”. Proc. ACM EuroSys., 155-168, 2012.