簡易檢索 / 詳目顯示

研究生: 黃心渝
Huang, Sin-Yu
論文名稱: 以強化學習中繼站選擇演算法最小化獵能中繼站系統的平均資訊年紀
Average AoI Minimization in Energy Harvesting Relay Networks using Deep Reinforcement Learning-Based Relay Selection Algorithm
指導教授: 張志文
Chang, Chih-Wen
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 42
中文關鍵詞: 資訊年紀配置儲存空間的中繼站合作式通訊中繼站選擇無限獵能
外文關鍵詞: Age of information, buffer-aided relaying, cooperative communication, relay selection, wireless energy harvesting
相關次數: 點閱:44下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 這篇論文探討一個基於兩段式中繼轉傳的狀態更新問題,其中每一個中繼站都配置兩個有限的儲存空間,每當要將封包從起點傳送到終點時,中繼站會從能量空間中消耗一格的能量,當通道品質不佳時,則暫時將封包放置在資料的儲存空間裡,對狀態更新系統而言,資訊的及時性是相當重要的,因此我們研究一個最佳的中繼站選擇機制,在兩個有限儲存空間的限制下最小化系統的平均資訊年紀。由於這個最佳化問題存在許多變數,複雜度非常高,所以我們將其視為一個多狀態的馬可夫決策問題,並基於強化學習架構提出一個深度 Q 網路的演算法尋求近似的最佳解。
    為了讓系統表現更加穩定,我們在研究中加入深度雙 Q 網路以及優先經驗回放的技巧,模擬結果顯示,在傳輸距離較大的環境中,當通道品質較差時,使用深度雙Q網路和優先經驗回放的演算法比其他的演算法有更好的表現,再者,中繼站數量、封包到達率、儲存空間大小對平均資訊年紀的影響也在此論文中討論,對狀態更新系統的設計提供一些實用的觀點。

    A two-hop cooperative status update system with multiple relays each equipped with two finite buffers is studied in this work. When transmitting the data packet from the source to the destination, the selected relay takes an interval of energy in the energy buffer and temporarily stores the data packet in the data buffer if the channel condition is poor. For status update systems, the timeliness of information is extremely important. Therefore, we study a best relay selection algorithm to minimize the AoI of the system under the constraints of two finite buffers. The problem is formulated to a Markov decision process (MDP), but the solution complexity is fairly high and possibly infeasible. Thus, we model the optimal relay selection problem as a deep Q network (DQN)-based relay selection scheme to find the near-optimal solution based on the technique of deep reinforcement learning (DRL). To further stabilize the performance, we adopt the technique of double deep Q network with prioritized experience replay (DDQN-PER) in our work. Simulation results demonstrate that the DDQN-PER scheme outperforms other competitive schemes when the distance between the source and the relay is relatively long, especially in low signal-to-noise ratio (SNR). Additionally, the effects of the relay number, the arrival rate, and the size of both buffers on the average AoI are also investigated.

    Chinese Abstract i Abstract ii Acknowledgement iii List of Figures vi List of Tables vii List of Algorithms viii List of Symbols ix List of Acronyms xi 1 Introduction 1 2 Related Work 4 2.1 Age of Information 4 2.2 Energy Harvesting 5 2.3 Cooperative Communication 5 2.4 Deep Reinforcement Learning 6 3 System Model 7 3.1 Channel and Energy Models 7 3.2 Relay Selection Protocol 9 3.3 AoI model 11 3.4 Problem Formulation 13 3.5 Heuristic Lower bound of the problem 14 4 DDQN-PER Solution 16 4.1 DQN-Based Relay Selection Algorithm 17 4.1.1 State Space 18 4.1.2 Action Space 19 4.1.3 Reward Function 19 4.1.4 Framework of the Proposed DQN 20 4.2 Double DQN with Prioritized Experience Replay 22 4.2.1 Double Deep Q Network 23 4.2.2 Prioritized Experience Replay 23 5 Results and Simulation 26 5.1 Benchmark 26 5.2 Simulation Setting 27 5.3 Performance comparison 29 5.4 Impact of Arrival Rate 30 5.5 Impact of Relay Number 31 5.6 Impact of Data Size 32 5.7 Impact of Energy Size 33 6 Conclusions 36 6.1 Summary of Thesis 36 References 38

    [1] W.-Y. Chung, C.-L. Yau, K.-S. Shin, and R. Myllyla, “A cell phone based health monitoring system with self analysis processor using wireless sensor network technology,” in Proc. 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2007, pp. 3705–3708.
    [2] H. Liu, Z. Meng, and S. Cui, “A wireless sensor network prototype for environental monitoring in greenhouses,” in Proc. 2007 International Conference on Wireless Communications, Networking and Mobile Computing, 2007, pp. 2344–
    2347.
    [3] R. D. Yates, Y. Sun, D. R. Brown, S. K. Kaul, E. Modiano, and S. Ulukus, “Age of information: An introduction and survey,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 5, pp. 1183–1210, 2021.
    [4] M. Dohler and Y. Li, Cooperative Communications: Hardware, Channel and PHY. Chichester, U.K.: Wiley, 2010.
    [5] Q. Li, R. Q. Hu, Y. Qian, and G. Wu, “Cooperative communications for wireless networks: techniques and applications in LTE-advanced systems,” IEEE Wireless Communications, vol. 19, no. 2, pp. 22–29, 2012.
    [6] M. Wen, X. Cheng, H. V. Poor, and B. Jiao, “Use of SSK modulation in two-way amplify-and-forward relaying,” IEEE Transactions on Vehicular Technology, vol. 63, no. 3, pp. 1498–1504, 2014.
    [7] A. Bletsas, A. Khisti, D. Reed, and A. Lippman, “A simple cooperative diversity method based on network path selection,” IEEE Journal on Selected Areas in Communications, vol. 24, no. 3, pp. 659–672, 2006.
    [8] K. Ishibashi, H. Ochiai, and V. Tarokh, “Energy harvesting cooperative communications,” in Proc. 2012 IEEE 23rd International Symposium on Personal, Indoor and Mobile Radio Communications - (PIMRC), 2012, pp. 1819–1823.
    [9] Z. Mheich and V. Savin, “Cooperative communication protocols with energy harvesting relays,” in Proc. 2017 Wireless Days, 2017, pp. 60–65.
    [10] C.-H. Lin and K.-H. Liu, “Relay selection for energy-harvesting relays with finite data buffer and energy storage,” IEEE Internet of Things Journal, vol. 8, no. 14, pp. 11 249–11 259, 2021.
    [11] B. Li, H. Chen, Y. Zhou, and Y. Li, “Age-oriented opportunistic relaying in cooperative status update systems with stochastic arrivals,” in GLOBECOM 2020 - 2020 IEEE Global Communications Conference, 2020, pp. 1–6.
    [12] C. M. Wijerathna Basnayaka, D. N. K. Jayakody, T. D. Ponnimbaduge Perera, and M. Vidal Ribeiro, “Age of information in an URLLC-enabled decode-and-forward wireless communication system,” in Proc. 2021 IEEE 93rd Vehicular Technology Conference (VTC2021-Spring), 2021, pp. 1–6.
    [13] T. D. P. Perera, D. N. K. Jayakody, I. Pitas, and S. Garg, “Age of information in SWIPT-enabled wireless communication system for 5gb,” IEEE Wireless Communications, vol. 27, no. 5, pp. 162–167, 2020.
    [14] M. Xie, J. Gong, and X. Ma, “Age-energy tradeoff in dual-hop status update systems with the m-th best relay selection,” in Proc. 2021 IEEE 93rd Vehicular Technology Conference (VTC2021-Spring), 2021, pp. 1–5.
    [15] ——, “Age and energy tradeoff for short packet based two-hop decode-and-forward relaying networks,” in Proc. 2021 IEEE Wireless Communications and Networking Conference (WCNC), 2021, pp. 1–6.
    [16] S. Kaul, M. Gruteser, V. Rai, and J. Kenney, “Minimizing age of information in vehicular networks,” in Proc. 2011 8th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks, 2011,
    pp. 350–358.
    [17] S. Farazi, A. G. Klein, and D. R. Brown, “Age of information in energy harvesting status update systems: When to preempt in service?” in Proc. 2018 IEEE International Symposium on Information Theory (ISIT), 2018, pp. 2436–2440.
    [18] M. Moltafet, M. Leinonen, and M. Codreanu, “On the age of information in multi-source queueing models,” IEEE Transactions on Communications, vol. 68, no. 8, pp. 5003–5017, 2020.
    [19] M. Costa, M. Codreanu, and A. Ephremides, “On the age of information in status update systems with packet management,” IEEE Transactions on Information Theory, vol. 62, no. 4, pp. 1897–1910, 2016.
    [20] X. Wang, Y. Xia, Y. Du, H. Xia, G. Shi, Y. Ye, and Z. Chen, “Multi-input SECE based on buck structure for piezoelectric energy harvesting,” IEEE Transactions on Power Electronics, vol. 36, no. 4, pp. 3638–3642, 2021.
    [21] W. An, L. Hong, Y. Luo, K. Ma, J. Ma, and X. Huang, “A wideband dual-function solar cell dipole antenna for both energy harvesting and wireless communications, IEEE Transactions on Antennas and Propagation, vol. 69, no. 1, pp. 544–549, 2021.
    [22] A. Ikhlef, D. S. Michalopoulos, and R. Schober, “Max-max relay selection for relays with buffers,” IEEE Transactions on Wireless Communications, vol. 11, no. 3, pp. 1124–1135, 2012.
    [23] I. Krikidis, T. Charalambous, and J. S. Thompson, “Buffer-aided relay selection for cooperative diversity systems without delay constraints,” IEEE Transactions on Wireless Communications, vol. 11, no. 5, pp. 1957–1967, 2012.
    [24] F.-Y. Wang, J. J. Zhang, X. Zheng, X. Wang, Y. Yuan, X. Dai, J. Zhang, and L. Yang, “Where does alphago go: from church-turing thesis to alphago thesis and beyond,” IEEE/CAA Journal of Automatica Sinica, vol. 3, no. 2, pp. 113–120, 2016.
    [25] L. Liu, K. Xiong, J. Cao, Y. Lu, P. Fan, and K. B. Letaief, “Average aoi minimization in UAV-assisted data collection with RF wireless power transfer: A deep reinforcement learning scheme,” IEEE Internet of Things Journal, vol. 9, no. 7, pp. 5216–5228, 2022.
    [26] X. Tao and A. S. Hafid, “Deepsensing: A novel mobile crowdsensing framework with double deep Q-network and prioritized experience replay,” IEEE Internet of Things Journal, vol. 7, no. 12, pp. 11 547–11 558, 2020.
    [27] D. K. et al., “A survey on RF energy harvesting system with high efficiency RF-DC converters,” Journal of Semiconductor Engineering, vol. 1, no. 1, pp. 13–30, 2020.
    [28] Y. Gu, H. Chen, Y. Li, Y.-C. Liang, and B. Vucetic, “Distributed multi-relay selection in accumulate-then-forward energy harvesting relay networks,” IEEE Transactions on Green Communications and Networking, vol. 2, no. 1, pp. 74–86, Mar. 2018.
    [29] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
    [30] H. Hasselt, “Double Q-learning,” Advances in neural information processing systems, vol. 23, 2010.
    [31] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double Q-learning,” in Proc. AAAI Conference on Artificial Intelligence, vol. 30, no. 1, 2016.
    [32] S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” in Proc. 35th International Conference on Machine Learning, vol. 80.PMLR, 2018, pp. 1587–1596.
    [33] T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” arXiv preprint arXiv:1511.05952, 2015.
    [34] Y. Hou, L. Liu, Q. Wei, X. Xu, and C. Chen, “A novel DDPG method with prioritized experience replay,” in Proc. 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2017, pp. 316–321.
    [35] D. Fährmann, N. Jorek, N. Damer, F. Kirchbuchner, and A. Kuijper, “Double deep Q-learning with prioritized experience replay for anomaly detection in smart environments,” IEEE Access, vol. 10, pp. 60 836–60 848, 2022.
    [36] D. Mishra, S. De, S. Jana, S. Basagni, K. Chowdhury, and W. Heinzelman, “Smart RF energy harvesting communications: challenges and opportunities,” IEEE Communications Magazine, vol. 53, no. 4, pp. 70–78, 2015.

    下載圖示 校內:2024-09-01公開
    校外:2024-09-01公開
    QR CODE