簡易檢索 / 詳目顯示

研究生: 李政憲
Li, Zheng-Xian
論文名稱: Apache Kafka 伺服端負載平衡框架
A Server Side Load Balancing Framework for Apache Kafka
指導教授: 蕭宏章
Hsiao, Hung-Chang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 54
中文關鍵詞: Apache Kafka最佳化負載平衡分散式系統
外文關鍵詞: Apache Kafka, Optimization, Load Balancing, Distributed System
相關次數: 點閱:95下載:9
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Apache Kafka 是廣受業界採用的訊息串流系統,伺服端負載不平衡是 Kafka 日常維運時普遍存在的議題,Kafka 伺服端的負載不平衡深深影響應用端所感受的訊息延遲與吞吐量。目前 Apache 開源版的 Kafka 僅能以人工的命令操作來平衡 Kafka 伺服端的負載。然而,當面向生產環境內龐大複雜的 Kafka 叢集系統, 整個平衡負載的過程將涉及成千上萬的物件之際,屆時人工命令的操作變得不可行。

    此論文主張提出一自動化的負載平衡框架與機制,除了使用我們所提供的負載平衡演算法來平衡系統的負載,若該預設之負載平衡方法不符需求,使用者亦得基於我們的框架自行客製新的負載平衡目標。

    目前市面上,LinkedIn 提出的 Cruise Control 工具亦能對 Kafka 伺服端做平衡負載,惟,Cruise Control 的負載平衡框架相對複雜,其優化目標是一整套搬移負載的方法,在擴充客製化目標上的難度高於此論文主張的方法,此論文提出的架構能以簡單靈活的手法改善網路負載平衡程度 94.3% 以上,其成果和 Cruise Control 的優化結果匹敵。另外,Cruise Control 的負載平衡框架需要使用者明確指定優化目標的優先順序,實驗指出這個順序在艱難的優化情境中造成了進步上的困難,我們的方法成功提出兼容三種不同的優化目標的負載平衡手法,相對於 Cruise Control 在網路負載平衡程度上作出 53.0% 的進步,我們的手法達到了 76.8% 的進步。

    This thesis aims to address the server-side load balance issue of Apache Kafka. The thesis formalized this problem as a typical optimization problem with a series of optimization goals. All thesis goals are constructed as a cost function. The thesis proposed a general algorithm that is based on a hill-climbing technique to find a better cluster load distribution that achieves better measurement in the user-specified cost function. The experiment in chapter 4 shows that our approach getting similar optimization results with existing works, while our design is relatively simpler. We also identified a design issue with relative work, in our experiment their approach can cause difficulty in improving low-priority optimization goals, while our approach doesn't reflect such an issue.

    摘要 i Extended Abstract ii 致謝 viii Table of Contents ix List of Tables xi List of Figures xii Chapter 1. Introduction 1 1.1. Apache Kafka System Model 2 1.2. Apache Kafka Static Load Balancing and its Limitation 5 1.3. Moving Loading in Apache Kafka 7 1.4. Apache Kafka Dynamic Load Balancing Problem Definition 8 1.5. Contributions 9 1.6. Outline 9 Chapter 2. Related Works 10 2.1. Cruise Control 10 2.2. On the Fly Load Balancing to Address Hot Topics in Topic-Based Pub/SubSystems 12 2.3. Apache Helix 12 2.4. Hill Climbing 13 2.5. Genetic Algorithm 13 2.6. Number Partitioning Problem 14 2.7. Satisfiability Modulo Theories Solver 15 2.8. ShardManager: A Generic Shard Management Framework for Geo-distributed Applications 15 Chapter 3. The Proposed Architecture 16 3.1. Cluster Metrics 17 3.2. Optimization 18 3.3. Optimization Goals 18 3.3.1. Network Ingress Goal 20 3.3.2. Network Egress Goal 20 3.3.3. Replica Number Goal 21 3.3.4. Comparing Solutions with Naive Method 22 3.4. Optimization Algorithm 22 3.5. Rebalance Plan Executor 25 Chapter 4. Performance Evaluation 27 4.1. Experiment Environment 27 4.2. Balancing a Overloaded Cluster 28 4.3. Balancing Network IO 30 4.4. Balancing Network IO and Replica Count 32 4.5. Balancing Network IO (Software Simulation) 34 Chapter 5. Conclusions 37 References 39 Appendix A. Experiment Hardwares 42 Appendix B. Balancing a Overloaded Cluster 44 Appendix C. Balancing Network IO 47 Appendix D. Balancing Network IO and Replica Count 51

    [1] Cruise control. https://github.com/linkedin/cruise-control, 2023.
    [2] Powered by apache kafka. https://kafka.apache.org/powered-by, 2023.
    [3] Emile Aarts, Emile HL Aarts, and Jan Karel Lenstra.Local Search in CombinatorialOptimization. Princeton University Press, 2003.
    [4] Tom Cooper. Cluster balancing with cruise control.https://strimzi.io/blog/2020/06/15/cruise-control/, June 2020.
    [5] Shirshanka Das, Chavdar Botev, Kapil Surlaker, Bhaskar Ghosh, Balaji Varadarajan,Sunil Nagaraj, David Zhang, Lei Gao, Jemiah Westerman, Phanindra Ganti, BorisShkolnik, Sajid Topiwala, Alexander Pachev, Naveen Somasundaram, and Subbu Subramaniam. All aboard the databus! linkedin's scalable consistent change data captureplatform. InProceedings of the Third ACM Symposium on Cloud Computing, SoCC'12, New York, NY, USA, 2012. Association for Computing Machinery.
    [6] Leonardo de Moura and Nikolaj Bjà ̧rner. Z3: an efficient smt solver. In2008 Toolsand Algorithms for Construction and Analysis of Systems, pages 337--340. Springer,Berlin, Heidelberg, March 2008.
    [7] Dimitris Dedousis, Nikos Zacheilas, and Vana Kalogeraki. On the fly load balancingto address hot topics in topic-based pub/sub systems. In2018 IEEE 38th InternationalConference on Distributed Computing Systems, 2018.
    [8] Tamas Barnabas Egyed. Scaling kafka brokers in cloudera data hub.https://blog.cloudera.com/scaling-kafka-brokers-in-cloudera-data-hub/,October2022.
    [9] T. L. Friesz G. Anandalingam. Hierarchical optimization: An introduction. InAnnalsof Operations Research, volume 34, pages 1--11, 12 1992.
    [10] Adem Efe Gencer. How linkedin navigates streams infrastructure using cruise control.InWest 2018 Workshops. Open Data Science Conference, 2018.
    [11] Kishore Gopalakrishna, Shi Lu, Zhen Zhang, Adam Silberstein, Kapil Surlaker,Ramesh Subramonian, and Bob Schulman. Untangling cluster management with helix.InProceedings of the Third ACM Symposium on Cloud Computing, SoCC '12, NewYork, NY, USA, 2012. Association for Computing Machinery.
    [12] Brendan Gregg. Systems Performance: Enterprise and the Cloud, page 5. Addison-Wesley, 2 edition, 2020.
    [13] John H. Holland. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. MIT Press, Cambridge, MA, USA, 1992.39
    [14] R.J. Honicky and E.L. Miller. Replication under scalable hashing: a family of algorithms for scalable decentralized data distribution. In18th International Parallel andDistributed Processing Symposium, 2004. Proceedings., pages 96--, 2004.
    [15] Jean-François Im, Kishore Gopalakrishna, Subbu Subramaniam, Mayank Shrivastava,Adwait Tumbde, Xiaotian Jiang, Jennifer Dai, Seunghyun Lee, Neha Pawar, JialiangLi, and Ravi Aringunram. Pinot: Realtime olap for 530 million users. InProceedingsofthe 2018 International Conference on Management of Data, SIGMOD '18, page 583–594, New York, NY, USA, 2018. Association for Computing Machinery.
    [16] Confluent Inc. Self-balancing clusters | confluent documentation.https://docs.confluent.io/platform/current/kafka/sbc/index.html, 2023.
    [17] Jay Kreps, Neha Narkhede, Jun Rao. Kafka : a distributed messaging system for logprocessing. InNetDB workshop '11, 2011.
    [18] Adem Efe Gencer Jiangjie Qin. Introduction to kafka cruise control.https://www.slideshare.net/JiangjieQin/introduction-to-kafka-cruise-control-68180931, 2016.
    [19] Martin Kleppmann. Designing Data-Intensive Applications. O'Reilly Media, Inc.,March 2017.
    [20] Indrajeet Kumar. Autonomous workload rebalancing in kafka. InSREcon. USENIXAssociation, June 2018.
    [21] Sangmin Lee, Zhenhua Guo, Omer Sunercan, Jun Ying, Thawan Kooburat, SuryadeepBiswal, Jun Chen, Kun Huang, Yatpang Cheung, Yiding Zhou, Kaushik Veeraraghavan,Biren Damani, Pol Mauri Ruiz, Vikas Mehta, and Chunqiang Tang. Shard manager: Ageneric shard management framework for geo-distributed applications. InProceedingsoftheACMSIGOPS28thSymposiumonOperatingSystemsPrinciples, SOSP '21, page553–569, New York, NY, USA, 2021. Association for Computing Machinery.
    [22] Jing Liang, Xuanxuan Ban, Kunjie Yu, Boyang Qu, Kangjia Qiao, Caitong Yue,Ke Chen, and Kay Chen Tan. A survey on evolutionary constrained multiobjectiveoptimization.IEEE Transactions on Evolutionary Computation, 27(2):201--221, 2023.
    [23] Sean Luke. Essentials of Metaheuristics. Lulu, second edition, 2013. Available for freeat http://cs.gmu.edu/∼sean/book/metaheuristics/.
    [24] Stephan Mertens. The easiest hard problem: Number partitioning. In Allon Percus,Gabriel Istrate, and Cristopher Moore, editors,Computational Complexity and Statistical Physics, pages 125--140. Oxford University Press, 2006.
    [25] Andrew Newell, Dimitrios Skarlatos, Jingyuan Fan, Pavan Kumar, Maxim Khutornenko, Mayank Pundir, Yirui Zhang, Mingjun Zhang, Yuanlai Liu, Linh Le, BrendonDaugherty, Apurva Samudra, Prashasti Baid, James Kneeland, Igor Kabiljo, DmitryShchukin, Andre Rodrigues, Scott Michelson, Ben Christensen, Kaushik Veeraraghavan, and Chunqiang Tang. Ras: Continuously optimized region-wide datacenter resource allocation. InProceedings of the ACM SIGOPS 28th Symposium on OperatingSystems Principles, SOSP '21, page 505–520, New York, NY, USA, 2021. Associationfor Computing Machinery.40
    [26] Lin Qiao, Kapil Surlaker, Shirshanka Das, Tom Quiggle, Bob Schulman, BhaskarGhosh, Antony Curtis, Oliver Seeliger, Zhen Zhang, Aditya Auradar, Chris Beaver,Gregory Brandt, Mihir Gandhi, Kishore Gopalakrishna, Wai Ip, Swaroop Jgadish, ShiLu, Alexander Pachev, Aditya Ramesh, Abraham Sebastian, Rupa Shanbhag, SubbuSubramaniam, Yun Sun, Sajid Topiwala, Cuong Tran, Jemiah Westerman, and DavidZhang. On brewing fresh espresso: Linkedin's distributed data serving platform. InProceedings of the 2013 ACM SIGMOD International Conference on Management ofData, SIGMOD '13, page 1135–1146, New York, NY, USA, 2013. Association forComputing Machinery.
    [27] Jun Rao. How to choose the number of topics/partitions in a kafka cluster? |confluent. https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster/, 2015.
    [28] Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and CarlosMaltzahn. Ceph: A scalable, high-performance distributed file system. InProceedings of the 7th Symposium on Operating Systems Design and Implementation, OSDI'06, page 307–320, USA, 2006. USENIX Association.
    [29] Sage A. Weil, Scott A. Brandt, Ethan L. Miller, and Carlos Maltzahn. Crush: Controlled, scalable, decentralized placement of replicated data. InSC '06: Proceedings ofthe 2006 ACM/IEEE Conference on Supercomputing, pages 31--31, 2006.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE