成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃奕崴 Huang, Yi-Wei
論文名稱：	大數據串流平台上降低感測資料傳輸的方法 Reduction Scheme for Sensor-Data Transmission on a Big Data Streaming Platform
指導教授：	鄭憲宗 Cheng, Sheng-Tzong
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2017
畢業學年度：	105
語文別：	英文
論文頁數：	52
中文關鍵詞：	大數據、動態傳輸、資料壓縮技術、In-memory Computing 、Spark Streaming
外文關鍵詞：	Big Data, In-memory Computing, Spark Streaming, Resilient Distributed Datasets, Data compression technique
相關次數：	點閱：196 下載：7
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著感測技術的進步，對於許多感測器應用於各種環境下而產生的巨量資料，如何善用這些巨量資料成為新的商業模式。如何在短時間內可以處理更多資料，甚至是達到即時性分析應用。從過去的分散式運算架構MapReduce，在一些情況下如:機器學習或多層次的迭代運算已經不符合Real-time的需求。主要是MapReduce 缺少一個重要的要素 “有效的資源共享”。為了解決這類的問題，記憶體內運算(In-memory Computing IMC)的概念被提出來。
IMC就如字面上的意思，將中間運算的結果都存在記憶體內，而不再是頻繁地存取硬碟，解決了磁碟I/O的效能瓶頸。近期經典的應用就是Apache Spark。Apache Spark 是開放原始碼的叢集運算框架，它在資料量越大時，能夠比MapReduce快上幾十倍。然而它仍然無法解決一個瓶頸 “頻寬”。感測資料從各個節點傳入，感測器會受限於資源如: 記憶體，能源、頻寬….等等。根據觀察，這些感測資料因為空間相依性或時間相依性而有一些相似的序列。因此，壓縮資料技術將會是個不錯的解決方案，利用較小的資料量來代表較大的資料量。藉此，來解決感測器資源上的限制，同時提高Spark 的資料使用率。
本研究提出了降低感測資料的傳輸方法來優化IMC平行化串流運算平台Spark Streaming。利用前處理來提高資料的相似度，讓壓縮技術能夠取代更長的樣式。另一方面，將壓縮與動態傳輸結合在一起，來達到即時性兼顧高壓縮率的效果。因此，降低感測器所要消耗的能量，延長感測器的壽命。同時，在同個頻寬下，可以傳輸更多資料進而提升了運算平台的處理能力。

Recent advances in sensor technology have led to the availability of a multitude of the sensor, e.g. sound, luminosity, and humidity. Huge raw data is a difficult problem to exploit and compute these data efficiently. Hadoop MapReduce has been used to solve this issue, but the operations which need iteration is not an efficient to handle these data. Hence, “In-memory Computing concept (IMC)” is come up to resolve the problem of Hadoop I/O bottleneck.
In in-memory computing, the data is computed parallel in random access memory (RAM) instead of slow disk drives. We can train patterns and analyze large data frequently by IMC technique. However, IMC platform does not provide an effective reduce transmission scheme in the real-time system. It may limit some applications like wireless sensor network. It may be impractical for transmitting entire data from each sensor node, due to weak resource such as CPU, Memory, Power, etc. Compress data before sending is an effective way to make good use of sensor nodes limited power supply and make better the life of sensors. According to our observation, most of the sensor data has a similar pattern due to time dependence and spatial dependence. Therefore, we can improve compression efficiency by these characteristics.
This study presents an effective reduce transmission scheme on a distributed real-time IMC platform “Spark Streaming” which is used to collect data in real-time. We describe the whole system design and implement that provides a high compression ratio in a small batch data from the source. It is expected to reduce data transmission with a little delay time in the soft real-time system.

摘要	I
Abstract	II
TABLE OF CONTENTS	III
LIST OF TABLES	VI
LIST OF FIGURES	VII
Chapter 1.    Introduction and Motivation	1
1. Introduction	1
2.    Motivation	2
3.    Thesis Overview	4
Chapter 2.    Backgrounds	6
1.    Spark	6
1.1.    Spark Core	6
1.2.    Spark Streaming	9
2.    Message Queue Telemetry Transport Protocol	11
3.    Lempel-Ziv-Welch Algorithm	13
3.1    Encode	13
3.2    Decode	14
Chapter 3.    System Design	15
1. Problem Description	15
2. System Design	16
2.1.	System Architecture	16
2.2.	Preprocess	17
2.3.	Mapper	18
2.4.	Encoder	20
2.5. Communication	22
2.6. Transformation	23
2.7. Decoder & Re-Constructor	24
Chapter 4. Implementation and Experiment	27
1.   Experiment Environment and Settings	27
2. Implementation	29
3. Experiment Result	31
3.1. Scheme Performance	32
3.2. Dictionary Code length	34
3.3. Dictionary Rebuild	36
3.4. Output Time Delay Formula	37
Chapter 5. Conclusion and future work	39
References	41


                                    

[1] Howard, Paul G. "Lossless and lossy compression of text images by soft pattern matching." Data Compression Conference, 1996. DCC'96. Proceedings. IEEE, 1996.
[2] Knuth, Donald E. "Dynamic huffman coding." Journal of algorithms 6.2 (1985): 163-180.
[3] Hauck, Edward L. "Data compression using run length encoding and statistical encoding." U.S. Patent No. 4,626,829. 2 Dec. 1986.
[4] Nelson, Mark R. "LZW data compression." Dr. Dobb's Journal 14.10 (1989): 29-36.
[5] Chen, Hsinchun, Roger HL Chiang, and Veda C. Storey. "Business intelligence and analytics: From big data to big impact." MIS quarterly 36.4 (2012).
[6] “Google MapReduce,” 2011, http://research.google.com/archive/mapreduce.html
[July. 05, 2017].
[7] “Hadoop,” 2014, http://hadoop.apache.org/ [July. 05, 2017].
[8] Liu, Xuhui, et al. "Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS." Cluster Computing and Workshops, 2009. CLUSTER'09. IEEE International Conference on. IEEE, 2009.
[9] Vavilapalli, Vinod Kumar, et al. "Apache hadoop yarn: Yet another resource negotiator." Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 2013.
[10] M. Zaharia, M. Chowdhury, M. Franklin, S. Shenkr, and I. Stoica. “Spark: cluster computing with working sets,” in HotCloud, 2010.
[11] Jiang, Tao, et al. "Understanding the behavior of in-memory computing workloads." Workload Characterization (IISWC), 2014 IEEE International Symposium on. IEEE, 2014.
[12] Sadler, Christopher M., and Margaret Martonosi. "Data compression algorithms for energy-constrained devices in delay tolerant networks." Proceedings of the 4th international conference on Embedded networked sensor systems. ACM, 2006.
[13] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cyirci, “A Survey on Sensor Networks,” IEEE Communications Magazine, vol. 40, no. 8, Aug. 2002, pp.102 -114.
[14] Hunkeler, Urs, Hong Linh Truong, and Andy Stanford-Clark. "MQTT-S—A publish/subscribe protocol for Wireless Sensor Networks." Communication systems software and middleware and workshops, 2008. comsware 2008. 3rd international conference on. IEEE, 2008.
[15] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica, “Resilient distributed datasets: A fault tolerant abstraction for in-memory cluster computing,” In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pages 2-2, USENIX Association, 2012.
[16] Zaharia, Matei, et al. "Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters." HotCloud 12 (2012): 10-10.
[17] “Pubnub Sensor Network,” 2010, https://www.pubnub.com/developers/realtime-data-streams/sensor-network/ [July. 05, 2017].
[18] “Benchmark IoT sensor data models,“2014, https://github.com/assaad/BenchmarkIoT/tree/master/DataSets [July. 05, 2017]. “
[19] The Scala programming language,” 2016, http://www.scala-lang.org [July. 05, 2017].
[20] J. Kreps, N. Narkhede, and J. Rao. “Kafka: A distributed messaging system for log processing.” In Proceedings of 6th International Workshop on Networking Meets Databases (NetDB), Athens, Greece, 2011.
[21] Raghuwanshi, B.S., Jain, S. Chawda, D. and Varma,B. 2009. “New dynamic approach for LZW data compression”. IJCNS Vol. 1, No. 1 (October),22-26.

2020-08-01公開

簡易檢索 / 詳目顯示

相關論文