簡易檢索 / 詳目顯示

研究生: 江家德
Jiang, Jia-De
論文名稱: LS2IC: 用於異質多代理人SDN路由的大規模同構影響力通訊
LS2IC: Large-scale Isomorphic Influence Communication for Heterogeneous Multi-agent SDN Routing
指導教授: 蘇銓清
Sue, Chuan-Ching
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 63
中文關鍵詞: MARL通信大規模場景異質代理人SDN 路由
外文關鍵詞: MARL, Communication, Large-Scale Scenarios, Heterogeneous agent, SDN routing
相關次數: 點閱:22下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在多代理人強化學習中,有效溝通對於促進合作和提高團隊績效至關重要。然而,現有的通訊方法通常依賴代理之間的成對交互,導致大規模異質系統中的計算複雜性和較差的可擴展性。此外,代理人之間固有的異質性可能導致溝通誤解並對團隊協調產生負面影響。
    為了克服這些挑戰,我們提出了大規模同構影響力通訊 (LS2IC) ,這是一個新穎的框架,使代理人能夠在學習過程中感知自身行為對環境的影響,並將這種影響力納入訊息當中,使訊息轉換為通用格式,從而使所有代理人能夠一致地解釋訊息。為了進一步簡化通訊並減少計算開銷,我們將平均場方法直接應用於這些通用格式的訊息,有效地近似傳統的基於廣播的通訊。最後我們引入額外的獎勵機制,激勵代理人產生有效信息,加速訓練。
    在基於 Mininet 的軟體定義網路路由環境中進行的實驗結果表明,LS2IC 明顯優於基線方法,在動態網路條件下表現出卓越的適應性和可擴展性。此外,我們通過消融實驗和可視化結果驗證 LS2IC 的有效性。

    Effective communication is essential for promoting cooperation and improving team performance in multi-agent reinforcement learning (MARL). However, existing communication methods typically rely on pairwise interactions between agents, leading to computational complexity and poor scalability in large-scale heterogeneous systems. Moreover, the inherent heterogeneity among agents can cause communication misunderstandings and negatively impact team coordination.
    To overcome these challenges, we propose the Large-Scale Isomorphic Influence Communication (LS2IC), a novel framework that transforms messages from heterogeneous agents into a common format, thereby enabling all agents to interpret messages consistently. To further simplify communication and reduce computational overhead, we apply a mean-field approach directly to these unified messages, efficiently approximating the traditional broadcast-based communication. Experimental results conducted in a Mininet-based Software-Defined Networking (SDN) routing environment demonstrate that LS2IC significantly outperforms the baseline methods, showing superior adaptability and scalability in dynamic network conditions.

    摘要 I Summary III 致謝 VIII List of Tables XI List of Figures XII 1 Introduction 1 2 Background and Related Work 3 2.1 Software-Defined Networking 3 2.2 Multi-agent Reinforcement Learning with Communication 3 2.3 Heterogeneous Multi-agent Reinforcement Learning 4 3 System Architecture 6 3.1 Overview 6 3.2 Components 8 3.2.1 Data Plane 8 3.2.2 Control Plane 8 3.2.3 Knowledge Plane 10 3.3 Control Delay 10 4 Our Approach 12 4.1 Encoder 13 4.2 Influence Model 13 4.3 Mean-field Message 14 4.4 Influence Reward 15 4.5 Model Design 16 4.6 Training 17 4.6 Algorithm 19 5 Experiment 21 5.1 Simulation Environment 21 5.2 Network Topologies 21 5.3 Traffic Matrix Dataset 23 5.4 Environment Setup 27 5.5 Parameter Setting 28 5.6 Baseline 30 5.7 Result 32 5.7.1 Result for GÉANT Topology 32 5.7.2 Result for 32-node Topology 34 5.7.3 Visual Analysis of Routing Strategies 39 5.7.4 Ablation Study 42 6 Conclusion 44 7 References 46

    [1] S. Sullivan, A. Brighente, S. A. P. Kumar, and M. Conti, “5G security challenges and solutions: A review by OSI layers,” IEEE Access, vol. 9, pp. 116294–116314, 2021.
    [2] M. Satyanarayanan, “The emergence of edge computing,” Computer, vol. 50, no. 1, pp. 30–39, Jan. 2017.
    [3] J. N. Foerster, Y. M. Assael, N. de Freitas, and S. Whiteson, “Learning to communicate with deep multi-agent reinforcement learning,” in Proc. 30th Int. Conf. Neural Information Processing Syst. (NIPS’16), pp. 2145–2153, 2016.
    [4] A. Singh, T. Jain, and S. Sukhbaatar, "Learning when to communicate at scale in multi-agent cooperative and competitive tasks," in Proc. Int. Conf. Learn. Represent. (ICLR), pp. 1–12, 2019.
    [5] Z. Ding, T. Huang, and Z. Lu, “Learning individually inferred communication for multi-agent cooperation,” in Proc. 34th Int. Conf. Neural Information Processing Syst. (NIPS’20), pp. 22069–22079, 2020.
    [6] J. Jiang and Z. Lu, “Learning attentional communication for multi-agent cooperation,” in Proc. 32nd Int. Conf. Neural Inf. Process. Syst. (NeurIPS’18), pp. 7265–7275, 2018.
    [7] S. Li, J. K. Gupta, P. Morales, R. Allen, and M. J. Kochenderfer, “Deep implicit coordination graphs for multi-agent reinforcement learning,” in Proc. 20th Int. Conf. Autonomous agents and Multiagent Syst. (AAMAS’21), pp. 764–772, 2021.
    [8] J. Jiang, C. Dun, T. Huang, and Z. Lu, "Graph convolutional reinforcement learning," in *Proc. Int. Conf. Learn. Represent. (ICLR)*, pp. 1–12, 2020.
    [9] T. Wang, J. Wang, C. Zheng, and C. Zhang, "Learning nearly decomposable value functions via communication minimization," in Proc. Int. Conf. Learn. Represent. (ICLR), pp. 1–12, 2020.
    [10] S. Q. Zhang, Q. Zhang, and J. Lin, "Succinct and robust multi-agent communication with temporal message control," in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 33, pp. 1–12, 2020.
    [11] A. Das, T. Gervet, J. Romoff, D. Batra, D. Parikh, M. Rabbat, and J. Pineau, "Tarmac: Targeted multi-agent communication," in Proc. 36th Int. Conf. Mach. Learn. (ICML), vol. 97, PMLR, pp. 1538–1546, 2019.
    [12] Y. Niu, R. Paleja, and M. Gombolay, “Multi-agent graph-attention communication and teaming,” in Proc. 20th Int. Conf. Autonomous agents and Multi agent Syst. (AAMAS’21), pp. 964–973, 2021.
    [13] X. Li and J. Zhang, “Context-aware communication for multi-agent reinforcement learning,” in Proc. 23rd Int. Conf. Autonomous agents and Multi agent Syst. (AAMAS’24), pp. 1156–1164, 2024.
    [14] E. Seraj, Z. Wang, R. Paleja, D. Martin, M. Sklar, A. Patel, and M. Gombolay, “Learning efficient diverse communication for cooperative heterogeneous teaming,” in Proc. 21st Int. Conf. Autonomous agents and Multi agent Syst. (AAMAS’22), pp. 1173–1182, 2022.
    [15] Y. Li, Y. Zhang, Y. Liu, and J. Hao, "Learning heterogeneous agent cooperation via multi-agent league training," in Procedia Computer Science, vol. 218, pp. 1–8, 2023.
    [16] F. Chen, M. Sewlia, and D. V. Dimarogonas, “Cooperative control of heterogeneous multi-agent systems under spatiotemporal constraints,” Annu. Rev. Control, vol. 57, p. 100946, 2024.
    [17] H. Guo, Z. Wang, J. Xing, P. Tao, and Y. Shi, “Cooperation and coordination in heterogeneous populations with interaction diversity,” in Proc. 23rd Int. Conf. Autonomous agents and Multi agent Syst. (AAMAS’24), pp. 752–760, 2024.
    [18] P. Sun et al., “Enabling scalable routing in software-defined networks with deep reinforcement learning on critical nodes,” IEEE/ACM Trans. Netw., vol. 30, no. 2, pp. 629–640, Apr. 2022.
    [19] G. Kim, Y. Kim, and H. Lim, “Deep reinforcement learning-based routing on software-defined networks,” IEEE Access, vol. 10, pp. 18121–18133, 2022.
    [20] P. Sun, Y. Hu, J. Lan, L. Tian, and M. Chen, “TIDE: Time-relevant deep reinforcement learning for routing optimization,” Future Gener. Comput. Syst., vol. 99, pp. 401–409, Oct. 2019.
    [21] L. Zhang, Y. Lu, D. Zhang, H. Cheng, P. Dong, and B. Han, “DSOQR: Deep reinforcement learning for online QoS routing in SDN-based networks,” Secur. Commun. Netw., 2022.
    [22] M. Ye, J. Zhang, Z. Guo, and H. J. Chao, “FlexDATE: Flexible and disturbance-aware traffic engineering with reinforcement learning in software-defined networks,” IEEE/ACM Trans. Netw., vol. 31, no. 4, pp. 1433–1448, Aug. 2023.
    [23] D. M. Casas-Velasco, O. M. C. Rendon, and N. L. S. da Fonseca, “DRSIR: A deep reinforcement learning approach for routing in software-defined networking,” IEEE Trans. Netw. Serv. Manag., vol. 19, no. 4, pp. 4807–4820, Dec. 2022.
    [24] R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, "Multiagent actor-critic for mixed cooperative-competitive environments," in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 30, pp. 6379–6390, 2017.
    [25] C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu, "The surprising effectiveness of PPO in cooperative, multi-agent games," in *Adv. Neural Inf. Process. Syst. (NeurIPS)*, vol. 35, pp. 1–12, 2022.
    [26] P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel, "Value-decomposition networks for cooperative multi-agent learning based on team reward," in *Proc. 17th Int. Conf. Auton. Agents Multiagent Syst. (AAMAS)*, pp. 2085–2087, 2018.
    [27] T. Rashid, M. Samvelyan, C. Schroeder de Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multi-agent reinforcement learning,” J. Mach. Learn. Res., vol. 21, no. 1, Art. no. 178, Jan. 2020.
    [28] K. Son, D. Kim, W. J. Kang, D. E. Hostallero, and Y. Yi, "QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning," in *Proc. 36th Int. Conf. Mach. Learn. (ICML)*, vol. 97, PMLR, pp. 2972–2981, 2019.
    [29] L. Liao and V. C. M. Leung, “LLDP based link latency monitoring in software defined networks,” in Proc. 2016 12th Int. Conf. Netw. Serv. Manag. (CNSM), Montreal, QC, Canada, pp. 330–335, 2016.
    [30] N. Jaques, A. Lazaridou, E. Hughes, C. Gulcehre, P. A. Ortega, D. J. Strouse, J. Z. Leibo, and N. de Freitas, "Social influence as intrinsic motivation for multi-agent deep reinforcement learning," in Proc. 36th Int. Conf. Mach. Learn. (ICML), vol. 97, PMLR, pp. 3040–3049, 2019.
    [31] X. Du, Y. Ye, P. Zhang, Y. Yang, M. Chen, and T. Wang, “Situation-dependent causal influence-based cooperative multi-agent reinforcement learning,” in Proc. 38th AAAI Conf. Artif. Intell. (AAAI’24), vol. 38, pp. 17362–17370, 2024.
    [32] W. U. Mondal, M. Agarwal, V. Aggarwal, and S. V. Ukkusuri, “On the approximation of cooperative heterogeneous multi-agent reinforcement learning (MARL) using mean field control (MFC),” J. Mach. Learn. Res., vol. 23, no. 1, Art. no. 129, pp. 1–46, 2022.
    [33] J. N. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, "Counterfactual multi‑agent policy gradients," in Proc. 32nd AAAI Conf. Artif. Intell. (AAAI’18), pp. 2974–2982, 2018.
    [34] R. L. S. de Oliveira, C. M. Schweitzer, A. A. Shinoda, and L. R. Prete, “Using Mininet for emulation and prototyping software-defined networks,” in Proc. 2014 IEEE Colombian Conf. Commun. Comput. (COLCOM), Bogota, Colombia, pp. 1–6, 2014.
    [35] P. T. Kirstein, “European international academic networking: A 20 year perspective,” TERENA Networking Conference, pp. 1–18, 2004.
    [36] S. Uhlig, B. Quoitin, J. Lepropre, and S. Balon, “Providing public intradomain traffic matrices to the research community,” SIGCOMM Comput. Commun. Rev., vol. 36, no. 1, pp. 83–86, Jan. 2006.
    [37] Xiao, Yung-Jie, “Multi-Agent Parameter Sharing-Based Intelligent Routing in Software Defined Networks Enhanced by Attention Mechanism,” Master Thesis, National Cheng Kung University, 2024. https://hdl.handle.net/11296/5rmqs8
    [38] Y. Yang, R. Luo, M. Li, M. Zhou, W. Zhang, and J. Wang, "Mean field multi-agent reinforcement learning," in Proc. 35th Int. Conf. Mach. Learn. (ICML), vol. 80, PMLR, pp. 5571–5580, 2018.
    [39] C. Zhao, M. Ye, X. Xue, J. Lv, Q. Jiang, and Y. Wang, “DRL-M4MR: An intelligent multicast routing approach based on DQN deep reinforcement learning in SDN,” Phys. Commun., vol. 55, 2022.
    [40] M. Ye, C. Zhao, P. Wen, Y. Wang, X. Wang, and H. Qiu, “DHRL-FNMR: An intelligent multicast routing approach based on deep hierarchical reinforcement learning in SDN,” IEEE Trans. Netw. Serv. Manag., vol. 21, no. 5, pp. 5733–5755, Oct. 2024.
    [41] X. Yue, L. Wang, and W. Duan, “Multi-agent reinforcement learning with contribution-based assignment for online routing in SDN,” in Proc. 2022 19th Int. Comput. Conf. Wavelet Active Media Technol. Inf. Process. (ICCWAMTIP), pp. 1–5, 2022.

    無法下載圖示 校內:2030-07-04公開
    校外:2030-07-04公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE