| 研究生: |
林霆鈞 Lin, Ting-Jyun |
|---|---|
| 論文名稱: |
在NoSQL資料庫上實現分散式快取系統:設計考量及效能影響 Distributed In-Memory Caches for NoSQL Persistent Stores: Design Considerations and Performance Impacts |
| 指導教授: |
蕭宏章
Hsiao, Hung-Chang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 英文 |
| 論文頁數: | 44 |
| 中文關鍵詞: | NoSQL 、分散式快取系統 |
| 外文關鍵詞: | NoSQL, distrubuted in-memory cache |
| 相關次數: | 點閱:49 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
NoSQL是新興的鍵值永久儲存載體(key-value persistent data stores),它的設計可以用來處理迅速增加的海量資料。列舉數個較著名的NoSQL產品,例如Google Bigtable/Spanner、Hadoop HBase、Cassandra、MongoDB、Counchbase等。但是,典型的鍵值儲存載體,操作資料物件(data object)是在硬碟中進行,這是非常沒效率的。因此,我們將提出一個快取模型,將資料分散在多台快取伺服器上,並暫存於快取伺服器上的記憶體中。我們揭示一些設計快取模型的議題。這些議題,都會跟一致性(consistency)、擴展性(scalability),以及可用性(availability)等有關。特別是,我們研究報告指出,為了發展快取叢集(cache cluster),我們將會對後端鍵值永久儲存載體有深刻的了解。我們將介紹提出的快取模型,並假設後端的NoSQL資料庫為Hadoop HBase。此外,大部分的NoSQL資料庫,皆提供了一個主要的運算操作──範圍搜尋(range search)。我們的快取模型結合後端的HBase資料庫,執行範圍搜尋,並使用電腦模擬來調查快取模型的效能。
NoSQL key-value persistent data stores are emerging, which are designed to accommodate a potentially large volume of data increased rapidly from a variety of sources. Examples of such key-value stores include Google Bigtable/Spanner, Hadoop HBase, Cassandra, MongoDB, Counchbase, etc. As typical key-value stores manipulate data objects stored in disks, the accesses are inefficiency. We present in this paper a caching model to store data objects tentatively in memories distributed in a number of cache servers. We reveal a number of design issues involved in developing such a caching model. These issues are relevant to consistency, scalability and availability. Specifically, our study notes that to develop a cache cluster shall have an in-depth understanding regarding the backend persistent key-value stores. We then present a caching model, assuming that Hadoop HBase serves as our backend. In addition, performing range search, one of major operations offered by most NoSQL
key-value stores, in our caching model together with the HBase backend is illustrated. Computer simulations demonstrate our performance results.
[1] Google Earth. http://earth.google.com/.
[2] The Spread Toolkit. http://www.spread.org/.
[3] Voldemort. http://www.project-voldemort.com/voldemort/.
[4] M. Ahamad, G. Neiger, J. E. Burns, P. Kohli, and P. W. Hutto. Causal Memory:Definitions, Implementation, and Programming. Distributed Computing, 9(1):37–49,1995.
[5] B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Workload Analysis of a Large-Scale Key-Value Store. In Proc. ACM SIGMETRICS 2012, pages 53–64, June
2012.
[6] A. Auradkar, C. Botev, S. Das, D. D. Maagd, A. Feinberg, P. Ganti, L. Gao, B. Ghosh, K. Gopalakrishna, B. Harris, J. Koshy, K. Krawez, J. Kreps, S. Lu, S. Nagaraj, N. Narkhede, S. Pachev, I. Perisic, L. Qiao, T. Quiggle, J. Rao, B. Schulman, A. Sebastian, O. Seeliger, A. Silberstein, B. Shkolnik, C. Soman, R. Sumbaly, K. Surlaker, S. Topiwala, C. Tran, B. Varadarajan, J. Westerman, Z. White, D. Zhang, and J. Zhang. Data Infrastructure at LinkedIn. In Proc. IEEE 28th Int’l Conf. Data Engineering(ICDE’12), pages 1370–1381, Apr. 2012.
[7] Baidu. http://www.baidu.com/.
[8] P. Bailis and A. Ghodsi. Eventual Consistency Today: Limitations, Extensions, and Beyond. Commun. ACM, 56(5):55–63, 2013.
[9] Cassandra. http://cassandra.apache.org/.
[10] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra,A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst., 26(2), Feb. 2012.
[11] Y. Chawathe, S. Ramabhadran, S. Ratnasamy, A. LaMarca, J. Hellerstein, and S. Shenker. A Case Study in Building Layered DHT Applications. In Proc. ACM SIGCOMM 2005, pages 97–108, Aug. 2005.
[12] G. Chockler, R. Guerraoui, I. Keidar, and M. Vukolic. Reliable Distributed Storage.IEEE Computer, 42(4):60–67, 2009.
[13] B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!’s Hosted Data Serving Platform. In Proc. VLDB Endowment, volume 1, pages 1277-1288, Aug. 2008.
[14] B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking Cloud Serving Systems with YCSB. In Proc. 1st ACM Symp. Cloud Computing(SoCC’10), pages 143–154, June 2010.
[15] J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford. Spanner: Google’s Globally-Distributed Database. In Proc. Tenth Symp. Operating System Design and Implementation(OSDI’12), Oct. 2012.
[16] G. Coulouris, J. Dollimore, T. Kindberg, and G. Blair. Distributed Systems: Concepts and Design, chapter Replication. Addison Wesley, 5th edition, 2011.
[17] G. Coulouris, J. Dollimore, T. Kindberg, and G. Blair. Distributed Systems: Concepts and Design, chapter Transactions and Concurrency Control. Addison Wesley, 5th edition, 2011.
[18] Counchbase. http://www.couchbase.com/.
[19] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon’s Highly Available Key-Value Store. In Proc. 21st ACM Symp. Operating Systems Principles (SOSP’07), pages 205–220, Oct. 2007.
[20] Facebook. http://www.facebook.com/.
[21] S. Gilbert and N. A. Lynch. Perspectives on the CAP Theorem. IEEE Computer, 45(2): 30–36, 2012.
[22] HBase. http://hbase.apache.org/.
[23] Hive. http://hive.apache.org/.
[24] H.-C. Hsiao, H. Liao, S.-S. Chen, and K.-C. Huang. Load Balance with Imperfect Information in Structured Peer-to-Peer Systems. IEEE Trans. Parallel Distrib. Syst., 22(4):634–649, Apr. 2011.
[25] Hypertable. http://hypertable.org/.
[26] A. D. Kshemkalyani and M. Singhal. Distributed Computing: Principles, Algorithms, and Systems, chapter Message Ordering and Group Communication. Cambridge University Press, 1st edition, 2008.
[27] L. Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Trans. Computers, 28(9):690–691, 1979.
[28] L. Lamport. Fast Paxos. Distributed Computing, 19(2):79–102, 2006.
[29] Line. http://line.naver.jp.
[30] LinkedIn. http://www.linkedin.com/.
[31] Memcached. http://memcached.org/.
[32] R. Miller. Facebook Server Count: 60,000 or More. http://www.datacenterknowledge.com/ archives/ 2010/06/28/facebook-server-count-60000-or-more/.
[33] MongoDB. http://www.mongodb.org/.
[34] R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, D. Stafford, T. Tung, and V. Venkataramani. Scaling Memcache at Facebook. In Proc. Tenth Usenix Symp. Networked Systems Design and Implementation (NSDI’13), Apr. 2013.
[35] A. Rowstron and P. Druschel. Pastry: Scalable, Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems. Lecture Notes in Computer Science, 2218:161–172, Nov. 2001.
[36] I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F. Dabek, and H. Balakrishnan. Chord: a Scalable Peer-to-Peer Lookup Protocol for Internet Applications. IEEE/ACM Trans. Netw., 11(1):17–21, Feb. 2003.
[37] Y. Tang, S. Zhou, and J. Xu. LIGHT: A Query-Efficient Yet Low-Maintenance Indexing Scheme over DHTs. IEEE Trans. Knowl. Data Eng., 22(1):59–75, Jan. 2012.
[38] Twitter. http://twitter.com/.
[39] Wikipedia. http://www.wikipedia.org/.
校內:2018-07-31公開