| 研究生: |
李仲恩 Lee, Chung-En |
|---|---|
| 論文名稱: |
提升 Cassandra 讀取效能: 以半導體 EDA 測試使用情境為例 Improving Cassandra’s Read Performance: A Case Study in Electronic Design Automation Testing Scenario |
| 指導教授: |
蕭宏章
Hsiao, Hung-Chang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 人工智慧科技碩士學位學程 Graduate Program of Artificial Intelligence |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 28 |
| 中文關鍵詞: | Cassandra 、Cassandra-stress 、分散式 NoSQL 資料庫 、Electronic Design Automation |
| 外文關鍵詞: | Cassandra, Cassandra-stress, NoSQL, Electronic Design Automation |
| 相關次數: | 點閱:121 下載:6 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Cassandra 為一分散式 NoSQL 資料庫系統,設計用來儲存大型資料,適用於寫入比重大於讀取的情境,仍然有情境需要更好的讀取效能,如訓練模型時,則需要大量的讀取資料進行標記,因此 Cassandra 官方針對讀取行為提出表格設計的建議,如將需要被指定搜尋的項目設定為 Partition Key,而需要被範圍搜尋的項目設定為 Clustering Key。但是對於使用者來說這些建議無法完全套用到個別的情境。而Cassandra 是個相當大的系統,對於一般使用者有相當高的門檻。
本論文以半導體 EDA 測試使用情境為例,分析其使用情境,並且研究 Cassandra 原理,設計各種表格,分析各種表格下的優缺點比較,並且使用 Cassandra-stress,為Cassandra 的一個工具,用來量測客製化使用情境在 Cassandra 使用上的效能,根據設計出的表格設計,測量其在給定壓力下的效能表現,並且逐次提昇壓力達到瓶頸,驗證本論文對各種表格的設計想法,並且讓使用者更了解 Cassandra 的原理。
Cassandra is a distributed NoSQL database system designed to store large-scale data, suitable for scenarios where the emphasis is on writes rather than reads. However, there are still situations where improved read performance is required, such as when training models and requiring significant data reading for labeling. To address this, the Cassandra official documentation suggests table design recommendations, like setting items that require specific searches as the Partition Key, and items that need range searches as the Clustering Key. However, these recommendations may not be applicable to all individual scenarios from a user’s perspective. Additionally, Cassandra is a complex system, and its usage can be quite challenging for general users.
This thesis takes the semiconductor EDA testing scenario as an example to analyze its usage context and study Cassandra’s principles. It designs various tables, compares their advantages and disadvantages, and utilizes Cassandra-stress, a tool for measuring the performance of custom usage scenarios in Cassandra. Based on the designed table structures, it measures their performance under given pressures and incrementally increases pressure to reach bottlenecks. The thesis validates the design concepts for various tables and aims to provide users with a better understanding of Cassandra’s principles.
[1] Apache Cassandra. https://cassandra.apache.org/.
[2] Apache HBase. https://hbase.apache.org/.
[3] cassandra stress. https://cassandra.apache.org/doc/latest/cassandra/tools/cassandra_stress.html.
[4] Cql. https://cassandra.apache.org/doc/latest/cassandra/cql/.
[5] MongoDB. https://www.mongodb.com/.
[6] MSSQLServer. https://www.microsoft.com/en-us/sql-server.
[7] MySQL. https://www.mysql.com/.
[8] Oracle. https://docs.oracle.com/en/database/oracle/oracle-database/.
[9] python. https://www.python.org/.
[10] scylladb. https://www.scylladb.com/.
[11] ycsb github. https://ycsb.site.
[12] Veronika Abramova and Jorge Bernardino. Nosql databases: Mongodb vs cassandra. In Proceedings of the International C* Conference on Computer Science and Software Engineering, C3S2E ’13, page 14–22, New York, NY, USA, 2013. Association for Computing Machinery.
[13] Stefano Cereda, Stefano Valladares, Paolo Cremonesi, and Stefano Doni. Cgptuner: A contextual gaussian process bandit approach for the automatic tuning of it configurations under varying workload conditions. Proc. VLDB Endow., 14(8):1401–1413, apr 2021.
[14] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: A distributed storage system for structured data. In 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 205–218, 2006.
[15] Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC ’10, page 143–154, New York, NY, USA, 2010. Association for Computing Machinery.
[16] Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo: Amazon's highly available key-value store. In ACM Symposium on Operating System Principles, 2007. 27
[17] Theo Haerder and Andreas Reuter. Principles of transaction-oriented database recovery. ACM Comput. Surv., 15(4):287–317, dec 1983.
[18] Konstantinos Kanellis, Ramnatthan Alagappan, and Shivaram Venkataraman. Too many knobs to tune? towards faster database tuning by pre-selecting important knobs. In Workshop on Hot Topics in Storage and File Systems, 2020.
[19] Ashraf Mahgoub, Paul Wood, Sachandhan Ganesh, Subrata Mitra, Wolfgang Gerlach, Travis Harrison, Folker Meyer, Ananth Grama, Saurabh Bagchi, and Somali Chaterji. Rafiki: A middleware for parameter tuning of nosql datastores for dynamic metagenomics workloads. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, Middleware ’17, page 28–40, New York, NY, USA, 2017. Association for
Computing Machinery.
[20] Vivek Mishra. Cassandra Performance Tuning, pages 153–169. Apress, Berkeley, CA, 2014.
[21] Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil. The log-
structured merge-tree (lsm-tree). Acta Inf., 33(4):351–385, jun 1996.
[22] Matt Welsh, David Culler, and Eric Brewer. Seda: An architecture for well-conditioned, scalable internet services. In Proceedings of the Eighteenth ACM Symposium on Operating Systems Principles, SOSP ’01, page 230–243, New York, NY, USA, 2001. Association for Computing Machinery.