成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	江家德 Jiang, Jia-De
論文名稱：	LS2IC: 用於異質多代理人SDN路由的大規模同構影響力通訊 LS2IC: Large-scale Isomorphic Influence Communication for Heterogeneous Multi-agent SDN Routing
指導教授：	蘇銓清 Sue, Chuan-Ching
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2025
畢業學年度：	113
語文別：	中文
論文頁數：	63
中文關鍵詞：	MARL 、通信、大規模場景、異質代理人、SDN 路由
外文關鍵詞：	MARL, Communication, Large-Scale Scenarios, Heterogeneous agent, SDN routing
相關次數：	點閱：22 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在多代理人強化學習中，有效溝通對於促進合作和提高團隊績效至關重要。然而，現有的通訊方法通常依賴代理之間的成對交互，導致大規模異質系統中的計算複雜性和較差的可擴展性。此外，代理人之間固有的異質性可能導致溝通誤解並對團隊協調產生負面影響。
為了克服這些挑戰，我們提出了大規模同構影響力通訊 (LS2IC) ，這是一個新穎的框架，使代理人能夠在學習過程中感知自身行為對環境的影響，並將這種影響力納入訊息當中，使訊息轉換為通用格式，從而使所有代理人能夠一致地解釋訊息。為了進一步簡化通訊並減少計算開銷，我們將平均場方法直接應用於這些通用格式的訊息，有效地近似傳統的基於廣播的通訊。最後我們引入額外的獎勵機制，激勵代理人產生有效信息，加速訓練。
在基於 Mininet 的軟體定義網路路由環境中進行的實驗結果表明，LS2IC 明顯優於基線方法，在動態網路條件下表現出卓越的適應性和可擴展性。此外，我們通過消融實驗和可視化結果驗證 LS2IC 的有效性。

Effective communication is essential for promoting cooperation and improving team performance in multi-agent reinforcement learning (MARL). However, existing communication methods typically rely on pairwise interactions between agents, leading to computational complexity and poor scalability in large-scale heterogeneous systems. Moreover, the inherent heterogeneity among agents can cause communication misunderstandings and negatively impact team coordination.
To overcome these challenges, we propose the Large-Scale Isomorphic Influence Communication (LS2IC), a novel framework that transforms messages from heterogeneous agents into a common format, thereby enabling all agents to interpret messages consistently. To further simplify communication and reduce computational overhead, we apply a mean-field approach directly to these unified messages, efficiently approximating the traditional broadcast-based communication. Experimental results conducted in a Mininet-based Software-Defined Networking (SDN) routing environment demonstrate that LS2IC significantly outperforms the baseline methods, showing superior adaptability and scalability in dynamic network conditions.

摘要	I
Summary	III
致謝	VIII
List of Tables	XI
List of Figures	XII
Introduction	1
Background and Related Work	3
1	Software-Defined Networking	3
2	Multi-agent Reinforcement Learning with Communication	3
3	Heterogeneous Multi-agent Reinforcement Learning	4
System Architecture	6
1	Overview	6
2	Components	8
2.1	Data Plane	8
2.2	Control Plane	8
2.3	Knowledge Plane	10
3	Control Delay	10
Our Approach	12
1	Encoder	13
2	Influence Model	13
3	Mean-field Message	14
4	Influence Reward	15
5	Model Design	16
6	Training	17
6	Algorithm	19
Experiment	21
1	Simulation Environment	21
2	Network Topologies	21
3	Traffic Matrix Dataset	23
4	Environment Setup	27
5	Parameter Setting	28
6	Baseline	30
7	Result	32
7.1 Result for GÉANT Topology	32
7.2 Result for 32-node Topology	34
7.3 Visual Analysis of Routing Strategies	39
7.4	Ablation Study	42
Conclusion	44
References	46
                                    

[1] S. Sullivan, A. Brighente, S. A. P. Kumar, and M. Conti, “5G security challenges and solutions: A review by OSI layers,” IEEE Access, vol. 9, pp. 116294–116314, 2021.
[2] M. Satyanarayanan, “The emergence of edge computing,” Computer, vol. 50, no. 1, pp. 30–39, Jan. 2017.
[3] J. N. Foerster, Y. M. Assael, N. de Freitas, and S. Whiteson, “Learning to communicate with deep multi-agent reinforcement learning,” in Proc. 30th Int. Conf. Neural Information Processing Syst. (NIPS’16), pp. 2145–2153, 2016.
[4] A. Singh, T. Jain, and S. Sukhbaatar, "Learning when to communicate at scale in multi-agent cooperative and competitive tasks," in Proc. Int. Conf. Learn. Represent. (ICLR), pp. 1–12, 2019.
[5] Z. Ding, T. Huang, and Z. Lu, “Learning individually inferred communication for multi-agent cooperation,” in Proc. 34th Int. Conf. Neural Information Processing Syst. (NIPS’20), pp. 22069–22079, 2020.
[6] J. Jiang and Z. Lu, “Learning attentional communication for multi-agent cooperation,” in Proc. 32nd Int. Conf. Neural Inf. Process. Syst. (NeurIPS’18), pp. 7265–7275, 2018.
[7] S. Li, J. K. Gupta, P. Morales, R. Allen, and M. J. Kochenderfer, “Deep implicit coordination graphs for multi-agent reinforcement learning,” in Proc. 20th Int. Conf. Autonomous agents and Multiagent Syst. (AAMAS’21), pp. 764–772, 2021.
[8] J. Jiang, C. Dun, T. Huang, and Z. Lu, "Graph convolutional reinforcement learning," in *Proc. Int. Conf. Learn. Represent. (ICLR)*, pp. 1–12, 2020.
[9] T. Wang, J. Wang, C. Zheng, and C. Zhang, "Learning nearly decomposable value functions via communication minimization," in Proc. Int. Conf. Learn. Represent. (ICLR), pp. 1–12, 2020.
[10] S. Q. Zhang, Q. Zhang, and J. Lin, "Succinct and robust multi-agent communication with temporal message control," in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 33, pp. 1–12, 2020.
[11] A. Das, T. Gervet, J. Romoff, D. Batra, D. Parikh, M. Rabbat, and J. Pineau, "Tarmac: Targeted multi-agent communication," in Proc. 36th Int. Conf. Mach. Learn. (ICML), vol. 97, PMLR, pp. 1538–1546, 2019.
[12] Y. Niu, R. Paleja, and M. Gombolay, “Multi-agent graph-attention communication and teaming,” in Proc. 20th Int. Conf. Autonomous agents and Multi agent Syst. (AAMAS’21), pp. 964–973, 2021.
[13] X. Li and J. Zhang, “Context-aware communication for multi-agent reinforcement learning,” in Proc. 23rd Int. Conf. Autonomous agents and Multi agent Syst. (AAMAS’24), pp. 1156–1164, 2024.
[14] E. Seraj, Z. Wang, R. Paleja, D. Martin, M. Sklar, A. Patel, and M. Gombolay, “Learning efficient diverse communication for cooperative heterogeneous teaming,” in Proc. 21st Int. Conf. Autonomous agents and Multi agent Syst. (AAMAS’22), pp. 1173–1182, 2022.
[15] Y. Li, Y. Zhang, Y. Liu, and J. Hao, "Learning heterogeneous agent cooperation via multi-agent league training," in Procedia Computer Science, vol. 218, pp. 1–8, 2023.
[16] F. Chen, M. Sewlia, and D. V. Dimarogonas, “Cooperative control of heterogeneous multi-agent systems under spatiotemporal constraints,” Annu. Rev. Control, vol. 57, p. 100946, 2024.
[17] H. Guo, Z. Wang, J. Xing, P. Tao, and Y. Shi, “Cooperation and coordination in heterogeneous populations with interaction diversity,” in Proc. 23rd Int. Conf. Autonomous agents and Multi agent Syst. (AAMAS’24), pp. 752–760, 2024.
[18] P. Sun et al., “Enabling scalable routing in software-defined networks with deep reinforcement learning on critical nodes,” IEEE/ACM Trans. Netw., vol. 30, no. 2, pp. 629–640, Apr. 2022.
[19] G. Kim, Y. Kim, and H. Lim, “Deep reinforcement learning-based routing on software-defined networks,” IEEE Access, vol. 10, pp. 18121–18133, 2022.
[20] P. Sun, Y. Hu, J. Lan, L. Tian, and M. Chen, “TIDE: Time-relevant deep reinforcement learning for routing optimization,” Future Gener. Comput. Syst., vol. 99, pp. 401–409, Oct. 2019.
[21] L. Zhang, Y. Lu, D. Zhang, H. Cheng, P. Dong, and B. Han, “DSOQR: Deep reinforcement learning for online QoS routing in SDN-based networks,” Secur. Commun. Netw., 2022.
[22] M. Ye, J. Zhang, Z. Guo, and H. J. Chao, “FlexDATE: Flexible and disturbance-aware traffic engineering with reinforcement learning in software-defined networks,” IEEE/ACM Trans. Netw., vol. 31, no. 4, pp. 1433–1448, Aug. 2023.
[23] D. M. Casas-Velasco, O. M. C. Rendon, and N. L. S. da Fonseca, “DRSIR: A deep reinforcement learning approach for routing in software-defined networking,” IEEE Trans. Netw. Serv. Manag., vol. 19, no. 4, pp. 4807–4820, Dec. 2022.
[24] R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, "Multiagent actor-critic for mixed cooperative-competitive environments," in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 30, pp. 6379–6390, 2017.
[25] C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu, "The surprising effectiveness of PPO in cooperative, multi-agent games," in *Adv. Neural Inf. Process. Syst. (NeurIPS)*, vol. 35, pp. 1–12, 2022.
[26] P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel, "Value-decomposition networks for cooperative multi-agent learning based on team reward," in *Proc. 17th Int. Conf. Auton. Agents Multiagent Syst. (AAMAS)*, pp. 2085–2087, 2018.
[27] T. Rashid, M. Samvelyan, C. Schroeder de Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multi-agent reinforcement learning,” J. Mach. Learn. Res., vol. 21, no. 1, Art. no. 178, Jan. 2020.
[28] K. Son, D. Kim, W. J. Kang, D. E. Hostallero, and Y. Yi, "QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning," in *Proc. 36th Int. Conf. Mach. Learn. (ICML)*, vol. 97, PMLR, pp. 2972–2981, 2019.
[29] L. Liao and V. C. M. Leung, “LLDP based link latency monitoring in software defined networks,” in Proc. 2016 12th Int. Conf. Netw. Serv. Manag. (CNSM), Montreal, QC, Canada, pp. 330–335, 2016.
[30] N. Jaques, A. Lazaridou, E. Hughes, C. Gulcehre, P. A. Ortega, D. J. Strouse, J. Z. Leibo, and N. de Freitas, "Social influence as intrinsic motivation for multi-agent deep reinforcement learning," in Proc. 36th Int. Conf. Mach. Learn. (ICML), vol. 97, PMLR, pp. 3040–3049, 2019.
[31] X. Du, Y. Ye, P. Zhang, Y. Yang, M. Chen, and T. Wang, “Situation-dependent causal influence-based cooperative multi-agent reinforcement learning,” in Proc. 38th AAAI Conf. Artif. Intell. (AAAI’24), vol. 38, pp. 17362–17370, 2024.
[32] W. U. Mondal, M. Agarwal, V. Aggarwal, and S. V. Ukkusuri, “On the approximation of cooperative heterogeneous multi-agent reinforcement learning (MARL) using mean field control (MFC),” J. Mach. Learn. Res., vol. 23, no. 1, Art. no. 129, pp. 1–46, 2022.
[33] J. N. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, "Counterfactual multi‑agent policy gradients," in Proc. 32nd AAAI Conf. Artif. Intell. (AAAI’18), pp. 2974–2982, 2018.
[34] R. L. S. de Oliveira, C. M. Schweitzer, A. A. Shinoda, and L. R. Prete, “Using Mininet for emulation and prototyping software-defined networks,” in Proc. 2014 IEEE Colombian Conf. Commun. Comput. (COLCOM), Bogota, Colombia, pp. 1–6, 2014.
[35] P. T. Kirstein, “European international academic networking: A 20 year perspective,” TERENA Networking Conference, pp. 1–18, 2004.
[36] S. Uhlig, B. Quoitin, J. Lepropre, and S. Balon, “Providing public intradomain traffic matrices to the research community,” SIGCOMM Comput. Commun. Rev., vol. 36, no. 1, pp. 83–86, Jan. 2006.
[37] Xiao, Yung-Jie, “Multi-Agent Parameter Sharing-Based Intelligent Routing in Software Defined Networks Enhanced by Attention Mechanism,” Master Thesis, National Cheng Kung University, 2024. https://hdl.handle.net/11296/5rmqs8
[38] Y. Yang, R. Luo, M. Li, M. Zhou, W. Zhang, and J. Wang, "Mean field multi-agent reinforcement learning," in Proc. 35th Int. Conf. Mach. Learn. (ICML), vol. 80, PMLR, pp. 5571–5580, 2018.
[39] C. Zhao, M. Ye, X. Xue, J. Lv, Q. Jiang, and Y. Wang, “DRL-M4MR: An intelligent multicast routing approach based on DQN deep reinforcement learning in SDN,” Phys. Commun., vol. 55, 2022.
[40] M. Ye, C. Zhao, P. Wen, Y. Wang, X. Wang, and H. Qiu, “DHRL-FNMR: An intelligent multicast routing approach based on deep hierarchical reinforcement learning in SDN,” IEEE Trans. Netw. Serv. Manag., vol. 21, no. 5, pp. 5733–5755, Oct. 2024.
[41] X. Yue, L. Wang, and W. Duan, “Multi-agent reinforcement learning with contribution-based assignment for online routing in SDN,” in Proc. 2022 19th Int. Comput. Conf. Wavelet Active Media Technol. Inf. Process. (ICCWAMTIP), pp. 1–5, 2022.

校內：2030-07-04公開
校外：2030-07-04公開電子論文尚未授權公開，紙本請查館藏目錄

簡易檢索 / 詳目顯示

相關論文