簡易檢索 / 詳目顯示

研究生: 林霆鈞
Lin, Ting-Jyun
論文名稱: 在NoSQL資料庫上實現分散式快取系統:設計考量及效能影響
Distributed In-Memory Caches for NoSQL Persistent Stores: Design Considerations and Performance Impacts
指導教授: 蕭宏章
Hsiao, Hung-Chang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 英文
論文頁數: 44
中文關鍵詞: NoSQL分散式快取系統
外文關鍵詞: NoSQL, distrubuted in-memory cache
相關次數: 點閱:49下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • NoSQL是新興的鍵值永久儲存載體(key-value persistent data stores),它的設計可以用來處理迅速增加的海量資料。列舉數個較著名的NoSQL產品,例如Google Bigtable/Spanner、Hadoop HBase、Cassandra、MongoDB、Counchbase等。但是,典型的鍵值儲存載體,操作資料物件(data object)是在硬碟中進行,這是非常沒效率的。因此,我們將提出一個快取模型,將資料分散在多台快取伺服器上,並暫存於快取伺服器上的記憶體中。我們揭示一些設計快取模型的議題。這些議題,都會跟一致性(consistency)、擴展性(scalability),以及可用性(availability)等有關。特別是,我們研究報告指出,為了發展快取叢集(cache cluster),我們將會對後端鍵值永久儲存載體有深刻的了解。我們將介紹提出的快取模型,並假設後端的NoSQL資料庫為Hadoop HBase。此外,大部分的NoSQL資料庫,皆提供了一個主要的運算操作──範圍搜尋(range search)。我們的快取模型結合後端的HBase資料庫,執行範圍搜尋,並使用電腦模擬來調查快取模型的效能。

    NoSQL key-value persistent data stores are emerging, which are designed to accommodate a potentially large volume of data increased rapidly from a variety of sources. Examples of such key-value stores include Google Bigtable/Spanner, Hadoop HBase, Cassandra, MongoDB, Counchbase, etc. As typical key-value stores manipulate data objects stored in disks, the accesses are inefficiency. We present in this paper a caching model to store data objects tentatively in memories distributed in a number of cache servers. We reveal a number of design issues involved in developing such a caching model. These issues are relevant to consistency, scalability and availability. Specifically, our study notes that to develop a cache cluster shall have an in-depth understanding regarding the backend persistent key-value stores. We then present a caching model, assuming that Hadoop HBase serves as our backend. In addition, performing range search, one of major operations offered by most NoSQL
    key-value stores, in our caching model together with the HBase backend is illustrated. Computer simulations demonstrate our performance results.

    Introduction 5 1.1 Goals and Challenges 7 1.2 Our Contributions 9 1.3 Roadmap 11 Background 12 2.1 Fundamental Operations in a Persistent Data Store 13 2.2 Scalability and Self-Management 14 2.3 Consistency 15 2.4 Transactions 16 2.5 Availability and Fault Tolerance 17 Related Work 19 3.1 Distributed Hash Tables 19 3.2 Range Indexing and Search 20 3.3 Consistency Model 21 3.4 Hadoop HBase 24 3.5 memcached 25 Proposed Distributed In-Memory Caches 26 4.1 Overview 26 4.2 Data Object Placement and Replacement 26 4.3 Range Indexing and Lookup 27 4.3.1 Data Structures and Existing Algorithms 28 4.3.2 Partial Result Cache 29 4.3.3 Atomicity and FIFO/Total Ordering 30 4.4 Read-One-Write-All and Two-Phase Commit Protocol 30 4.5 Handling Failures 31 Performance Analysis 33 5.1 Parameters 33 5.2 Performance Metrics 35 5.3 Workloads 36 5.4 Simulation Results 36 Summary 40

    [1] Google Earth. http://earth.google.com/.
    [2] The Spread Toolkit. http://www.spread.org/.
    [3] Voldemort. http://www.project-voldemort.com/voldemort/.
    [4] M. Ahamad, G. Neiger, J. E. Burns, P. Kohli, and P. W. Hutto. Causal Memory:Definitions, Implementation, and Programming. Distributed Computing, 9(1):37–49,1995.
    [5] B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Workload Analysis of a Large-Scale Key-Value Store. In Proc. ACM SIGMETRICS 2012, pages 53–64, June
    2012.
    [6] A. Auradkar, C. Botev, S. Das, D. D. Maagd, A. Feinberg, P. Ganti, L. Gao, B. Ghosh, K. Gopalakrishna, B. Harris, J. Koshy, K. Krawez, J. Kreps, S. Lu, S. Nagaraj, N. Narkhede, S. Pachev, I. Perisic, L. Qiao, T. Quiggle, J. Rao, B. Schulman, A. Sebastian, O. Seeliger, A. Silberstein, B. Shkolnik, C. Soman, R. Sumbaly, K. Surlaker, S. Topiwala, C. Tran, B. Varadarajan, J. Westerman, Z. White, D. Zhang, and J. Zhang. Data Infrastructure at LinkedIn. In Proc. IEEE 28th Int’l Conf. Data Engineering(ICDE’12), pages 1370–1381, Apr. 2012.
    [7] Baidu. http://www.baidu.com/.
    [8] P. Bailis and A. Ghodsi. Eventual Consistency Today: Limitations, Extensions, and Beyond. Commun. ACM, 56(5):55–63, 2013.
    [9] Cassandra. http://cassandra.apache.org/.
    [10] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra,A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst., 26(2), Feb. 2012.
    [11] Y. Chawathe, S. Ramabhadran, S. Ratnasamy, A. LaMarca, J. Hellerstein, and S. Shenker. A Case Study in Building Layered DHT Applications. In Proc. ACM SIGCOMM 2005, pages 97–108, Aug. 2005.
    [12] G. Chockler, R. Guerraoui, I. Keidar, and M. Vukolic. Reliable Distributed Storage.IEEE Computer, 42(4):60–67, 2009.
    [13] B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!’s Hosted Data Serving Platform. In Proc. VLDB Endowment, volume 1, pages 1277-1288, Aug. 2008.
    [14] B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking Cloud Serving Systems with YCSB. In Proc. 1st ACM Symp. Cloud Computing(SoCC’10), pages 143–154, June 2010.
    [15] J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford. Spanner: Google’s Globally-Distributed Database. In Proc. Tenth Symp. Operating System Design and Implementation(OSDI’12), Oct. 2012.
    [16] G. Coulouris, J. Dollimore, T. Kindberg, and G. Blair. Distributed Systems: Concepts and Design, chapter Replication. Addison Wesley, 5th edition, 2011.
    [17] G. Coulouris, J. Dollimore, T. Kindberg, and G. Blair. Distributed Systems: Concepts and Design, chapter Transactions and Concurrency Control. Addison Wesley, 5th edition, 2011.
    [18] Counchbase. http://www.couchbase.com/.
    [19] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon’s Highly Available Key-Value Store. In Proc. 21st ACM Symp. Operating Systems Principles (SOSP’07), pages 205–220, Oct. 2007.
    [20] Facebook. http://www.facebook.com/.
    [21] S. Gilbert and N. A. Lynch. Perspectives on the CAP Theorem. IEEE Computer, 45(2): 30–36, 2012.
    [22] HBase. http://hbase.apache.org/.
    [23] Hive. http://hive.apache.org/.
    [24] H.-C. Hsiao, H. Liao, S.-S. Chen, and K.-C. Huang. Load Balance with Imperfect Information in Structured Peer-to-Peer Systems. IEEE Trans. Parallel Distrib. Syst., 22(4):634–649, Apr. 2011.
    [25] Hypertable. http://hypertable.org/.
    [26] A. D. Kshemkalyani and M. Singhal. Distributed Computing: Principles, Algorithms, and Systems, chapter Message Ordering and Group Communication. Cambridge University Press, 1st edition, 2008.
    [27] L. Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Trans. Computers, 28(9):690–691, 1979.
    [28] L. Lamport. Fast Paxos. Distributed Computing, 19(2):79–102, 2006.
    [29] Line. http://line.naver.jp.
    [30] LinkedIn. http://www.linkedin.com/.
    [31] Memcached. http://memcached.org/.
    [32] R. Miller. Facebook Server Count: 60,000 or More. http://www.datacenterknowledge.com/ archives/ 2010/06/28/facebook-server-count-60000-or-more/.
    [33] MongoDB. http://www.mongodb.org/.
    [34] R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, D. Stafford, T. Tung, and V. Venkataramani. Scaling Memcache at Facebook. In Proc. Tenth Usenix Symp. Networked Systems Design and Implementation (NSDI’13), Apr. 2013.
    [35] A. Rowstron and P. Druschel. Pastry: Scalable, Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems. Lecture Notes in Computer Science, 2218:161–172, Nov. 2001.
    [36] I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F. Dabek, and H. Balakrishnan. Chord: a Scalable Peer-to-Peer Lookup Protocol for Internet Applications. IEEE/ACM Trans. Netw., 11(1):17–21, Feb. 2003.
    [37] Y. Tang, S. Zhou, and J. Xu. LIGHT: A Query-Efficient Yet Low-Maintenance Indexing Scheme over DHTs. IEEE Trans. Knowl. Data Eng., 22(1):59–75, Jan. 2012.
    [38] Twitter. http://twitter.com/.
    [39] Wikipedia. http://www.wikipedia.org/.

    無法下載圖示 校內:2018-07-31公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE