| 研究生: |
林楙勛 Lin, Mao-Syun |
|---|---|
| 論文名稱: |
應用SDN分散Hadoop Shuffle流量 Distributing Hadoop shuffle traffic among weighted multipath by SDN |
| 指導教授: |
謝錫堃
Shieh, Ce-Kuen |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2017 |
| 畢業學年度: | 105 |
| 語文別: | 英文 |
| 論文頁數: | 35 |
| 中文關鍵詞: | 軟體定義網路 、雲端運算系統 、洗牌階段 |
| 外文關鍵詞: | Software Defined Network, Apache Hadoop, MapReduce, Shuffle |
| 相關次數: | 點閱:114 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在Hadoop雲端運算系統的運算架構中,MapReduce為主要的運算程序。然而在Map與Reduce運算過程中,需要透過洗牌階段(shuffle stage)讓節點之間互相交換大量資料,因而會產生網路的頻寬不足的問題。因此我們提出一個分散Hadoop shuffle流量的方法,將流量依照比例分配給所有可用的路徑。利用軟體定義網路(SDN)的中央控制式網路架構,我們蒐集網路狀況的參數,並以此作為分配流量的依據。
在我們的分散Hadoop Shuffle流量的方法中,我們利用Mininet模擬器建立網路拓樸,以實現我們的實驗環境。在SDN控制器選擇上,我們將Ryu 控制器應用在我們的實驗環境中,因為其具有方便編程的特性,讓我們在設計的過程中能夠更加快速,以及方便我們偵錯。在我們應用的交換器與控制器的配合之下,實驗的過程中能夠顯示出操作的訊息,讓我們可以輕易得知整體的SDN網路架構中,存在哪些問題或是程式碼是否錯誤。
在實驗結果中,我們的分散Hadoop流量演算法,在Hadoop完成時間上優於應用擴張樹演算法(Spanning Tree protocol)以及頻寬感知(Bandwidth aware )演算法。
In the Hadoop computing architecture, MapReduce is the main operation program. Between the process of Map and Reduce, servers are necessary to exchange large amounts of data with each other. The shuffle stage exchanges data between servers which will cause the problem of insufficient network bandwidth. We propose an algorithm to distribute Hadoop shuffle traffic and allocate them to all the possible paths. By using the central control network architecture, Software Defined Network (SDN), we collect the parameters of the network status as a basis for allocating traffic.
In our approach of distributing the traffic of Hadoop shuffle stage, we use the Mininet simulator to build the network topology which is implemented in our experimental environment. In the selection of SDN controller, we choose the Ryu controller used in our experimental environment. Because of the convenience of programming of Ryu controller, which make us convenient to design the Ryu controller. The other feature of Ryu controller is that it is facilitating to debug. Under the application of the switches with the SDN controller in the experiment process, SDN controller shows the detail of the process. So, we can easily know the overall SDN network architecture and if there is any problem or bug in the code.
In the experimental results, our redirection Hadoop traffic algorithm is superior to Spanning tree protocol and Bandwidth aware algorithm in Hadoop completion time.
[1] S. G. Manikandan and S. Ravi, “Big Data Analysis Using Apache Hadoop,” in 2014 IEEE International Conference on IT Convergence and Security (ICITCS), 2014.
[2] R. Machova, J. Komarkova and M. Lnenicka, “Processing of big educational data in the cloud using Apache Hadoop,” in 2016 International Conference on Information Society (i-Society), 2016.
[3] N. M. Ahmad, A. H. Yaacob and A. H. M. Amin, “Performance analysis of MapReduce on OpenStack-based hadoop virtual cluster” in 2014 IEEE International Symposium on Telecommunication Technologies (ISTT), 2014.
[4] S. Zhao and D. Medhi, “Application-Aware Network Design for Hadoop MapReduce Optimization Using Software-Defined Networking,” in IEEE Transactions on Network and Service Management, vol. PP, pp. 1-1, 2017.
[5] T. P. Shabeera and S. D. M. Kumar, “Bandwidth-aware data placement scheme for Hadoop,” in 2013 IEEE Recent Advances in Intelligent Computational Systems (RAICS), 2013.
[6] F. Hu, Q. Hao and K. Bao, “A Survey on Software-Defined Network and OpenFlow: From Concept to Implementation,” in IEEE Communications Surveys & Tutorials, vol. 16, pp. 2181-2206, 2014.
[7] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker and J. Turner, “OpenFlow: enabling innovation in campus networks ,” in ACM SIGCOMM Computer Communication, vol. 38, pp. 69-74, 2008.
[8] J. Dean S. Ghemawat, “MapReduce: Simplified data processing on large clusters”, in Communications of the ACM, vol. 51, pp. 107-113, 2008.
[9] S. Narayan, S. Bailey and A. Daga, “Hadoop Acceleration in an OpenFlow-based cluster,” in 2012 IEEE High Performance Computing, Networking, Storage and Analysis (SCC), 2012.
[10] S. Zhao, A. Sydney and D. Medhi, “Building Application-Aware Network Environments using SDN for Optimizing Hadoop Applications” in 2016 ACM SIGCOMM Computer Communication, 2016.
[11] C. Elliott, “GENI-global environment for network innovations,” in 2008 IEEE Conference on Local Computer Networks (LCN), 2008.
[12] L. W. Cheng and S. Y. Wang, “Application-Aware SDN Routing for Big Data Networking,” in 2015 IEEE Global Communications Conference (GLOBECOM), 2015.
[13] P. Qin, B. Dai, B. Huang and G. Xu, ” Bandwidth-Aware Scheduling With SDN in Hadoop: A New Trend for Big Data,” in IEEE Systems Journal, vol. PP, pp. 1-8, 2017.
[14] W. Shi, Y. Wang, J. P. Corriveau, B. Niu, W. L. Croft and M. Peng, “Smart Shuffling in MapReduce: A Solution to Balance Network Traffic and Workloads,” in 2015 IEEE International Conference on Utility and Cloud Computing (UCC), 2015.
[15] R. K. Arbettu, R. Khondoker and K. Bayarou, “Security analysis of OpenDaylight, ONOS, Rosemary and Ryu SDN controllers,” in 2016 International Telecommunications Network Strategy and Planning Symposium (Networks), 2016.
[16] M. A. Fares, A. Loukissas and A. Vahdat, “A scalable, commodity data center network architecture,” in 2008 ACM SIGCOMM Computer Communication, 2008.
校內:2019-01-01公開