| 研究生: |
吳昱緯 Wu, Yu-Wei |
|---|---|
| 論文名稱: |
以LogP效能模型特徵化Apache Kafka Characterizing the Performance of Apache Kafka with LogP |
| 指導教授: |
蕭宏章
Hsiao, Hung-Chang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 中文 |
| 論文頁數: | 26 |
| 中文關鍵詞: | Apache Kafka 、Message Queue 、Data Streaming 、LogP |
| 外文關鍵詞: | Apache Kafka, Message Queue, Data Streaming, LogP |
| 相關次數: | 點閱:62 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在現今大數據的時代裡,數據的蒐集、儲存、處理的技術已經越來越容成熟了,這也使得許多的決策能夠依賴大量的數據做出。已經有不少應用成功的例子說明巨量資料所帶來的好處。隨著科技的進步對大量資料處理的應用也越來越廣泛,而資料流的即時應用就是其中一個例子,透過資料流可取得最新資料,進行即時監控與分析並立即採取相應措施以獲取最大營運效益並降低營運風險,相較於傳統資料處理,如何發揮最大價值處理連續且快速的資料是一門相當重要的課題。
在本論文研究中提出以LogP抽象效能系統模型來特徵化一個message queue (MQ)的應用,在組織一個MQ應用前先參考LogP四個參數的特徵進行設計,在實務開發前便能預知所設計演算法的效能特性,從微觀的角度,來推論應用端的效能特性。並且在這個MQ應用完成後依然可以參考LogP抽象效能系統模型反映出的效能瓶頸,將其效能差較差的地方評估可以改善的方法,並改善這個部分。
我們在本論文研究中以Apache Kafka展示LogP的使用方式,並也實作一個WordCount的應用,並透過LogP的L(latency)、o(overhead)和g(gap)將Kafka一個事件的傳輸特徵化,測量本論文實作出WordCount應用的效能,並根據LogP中Log三個參數特徵所反映出的效能瓶頸的部分,再透過P這個參數特徵將其效能較差的部分進行改善。並從本論文中的兩個實驗結果中分別驗證本研究中從微觀的角度,來推論應用端的效能特性和透過LogP抽象效能系統模型反映出的效能瓶頸部分去做改善的想法。
Message queues (MQ) are fundamental building blocks for real-time, message streaming applications such as IoTs, AI and big data. Developing and operating such applications is challenging as present state-of-the-art MQ substrates are complex in terms of functionalities, operations, and maintenance. In this study, we propose relying on LogP, presented by D. Culler et al. in 1996, to model the algorithm performance for MQ-based applications. We demonstrate how LogP can be used to design and analyze an application’s performance through a simple example of WordCount. To validate the design of WordCount based on LogP, the WordCount application is implemented over Apache Kafka, a worldwide popular streaming substrate. LogP facilitates to identify potential improvements for application algorithms. Specifically, LogP provides the microscopic perspective for modelling application performance, thus shortening the development lifecycle of a streaming application. We additionally demonstrate an enhanced WordCount application based on suggestions by LogP. The enhancement is also validated through our implementation for the enhanced WordCount in Kafka.
[1] Apache Hadoop. [Online]. Available: https://hadoop.apache.org/
[2] Apache Hbase. [Online]. Available: https://hbase.apache.org/
[3] Apache Spark. [Online]. Available: https://spark.apache.org/
[4] Apache Flink. [Online]. Available: https://flink.apache.org/
[5] Apache Kafka. [Online]. Available: https://kafka.apache.org/
[6] David Culler, Richard Karpy, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, and Thorsten von Eicken. “LogP: Towards a Realistic Model of Parallel Computation”, In: 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’93)
[7] What is message queuing? [Online]. Available: https://www.cloudamqp.com/blog/what-is-message-queuing.html
[8] Streaming Data: How it Works, Benefits, and Use Cases [Online]. Available: https://www.confluent.io/learn/data-streaming/#how-it-works
[9] MQTT. [Online]. Available: https://mqtt.org/
[10] RabbitMQ. [Online]. Available: https://www.rabbitmq.com/
[11] Apache ZooKeeper. [Online]. Available: https://zookeeper.apache.org/
[12] Apache. [Online]. Available: https://www.apache.org/
[13] Kafka Configs. [Online]. Available: https://kafka.apache.org/documentation/
[14] Zero-Copy. [Online]. Available: https://en.wikipedia.org/wiki/Zero-copy
[15] Python. [Online]. Available: https://www.python.org/
[16] Fortune, S., & Wyllie, J. (1978, May). Parallelism in random access machines. In Proceedings of the tenth annual ACM symposium on Theory of computing (pp. 114-118).
[17] Valiant, L. G. (1990). A bridging model for parallel computation. Communications of the ACM, 33(8), 103-111.
[18] Alexandrov, A., Ionescu, M. F., Schauser, K. E., & Scheiman, C. (1997). LogGP: Incorporating long messages into the LogP model for parallel computation. Journal of parallel and distributed computing, 44(1), 71-79.
[19] Confluent's Python Client for Apache Kafka. [Online]. Available: https://github.com/confluentinc/confluent-kafka-python
[20] Python String Split Method. [Online]. Available: https://www.geeksforgeeks.org/python-string-split/
[21] confluent-kafka-python. [Online]. Available: https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#producer
[22] Python Collections Module Counter Objects. [Online]. Available: https://docs.python.org/zh-tw/3.8/library/collections.html#collections.Counter
校內:2026-08-23公開