簡易檢索 / 詳目顯示

研究生: 張宸翊
Chang, Chen-Yi
論文名稱: 在基於OM2M的異質門禁系統中應用深度強化學習的快取替換方法
A Deep Reinforcement Learning Approach for Cache Replacement in OM2M-based Heterogeneous Access Controller Management System
指導教授: 蘇銓清
Sue, Chuan-Ching
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 43
中文關鍵詞: 深度強化學習快取替換策略內容快取OM2M門禁管理系統
外文關鍵詞: Deep Reinforcement Learning, Cache Replacement policy, Content Caching, OM2M, Door Access Control System
相關次數: 點閱:138下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著物聯網迅速的發展,越來越多的資料在裝置與伺服器間傳輸,造成伺服器及網路的負載跟著增加。霧運算 (fog computing) 的出現是為了減少雲端伺服器的負載和網路頻寬的瓶頸。由於霧計算的階層式概念,衍生出內容快取的研究議題,包括DNS快取、Web搜尋快取等。然而,這些關於內容快取 (Content Caching) 的研究並沒有考慮在不同的內容大小時同時使用 Belady’s algorithm (理論上的最佳內容快取演算法)。本論文將研究在基於 OM2M 的異質門禁管理系統中探討考慮不同內容大小的快取替換策略。我們提出了一種用於快取替換的深度強化學習 (DRL) 模型,以降低快取的未命中率。通過我們的DRL模型,我們利用兩個回放記憶庫 (replay memory) 實現了延遲獎勵 (delay-reward) 機制並整合在線學習 (online learning) 和離線學習 (offline learning),克服了無法取得未來資料的困難。實驗結果顯示,與幾種傳統的啟發式快取替換方法相比,我們提出的方法在相同內容大小和不同內容大小的環境中的原始快取命中率分別提高5.7%到 23.9% 和 9.4% 到 22.3%。此外,我們還評估了其他同樣基於深度強化學習的快取替換研究中的不同獎勵函數,以證明我們提出的獎勵函數的優勢。最後,我們通過消融實驗 (ablation experiment) 討論了各種特徵對 DQN 模型的影響。

    With the rapid development of the Internet of Things, more and more data flows appear between devices and servers, and the load on the server and network increases accordingly. The emergence of fog computing has been proposed to reduce the cloud server load and network bandwidth bottleneck. Because of the hierarchical concept of fog computing, the research topic of content caching has been derived, including DNS caching, Web search caching, etc. However, several studies on content caching have not incorporated Belady’s algorithm (known for the theoretical best algorithm for content caching) while considering different content sizes. This article will study cache replacement in the OM2M-based heterogeneous access control management system that contains variable content sizes. We propose a deep reinforcement learning (DRL) model for cache replacement to reduce the miss rate. Through our DRL model, we conquer the difficulty of “looking into the future” by integrating online learning and offline learning and using two memory buffers to realize the delay-reward mechanism. The evaluation results show that our proposed method has improved from 5.7% to 23.9% and 9.4% to 22.3% compared to several traditional heuristic replacement policies regarding the raw hit rate in the environment with the same content size and different content sizes, respectively. Furthermore, we also evaluate different reward functions from other DRL studies to justify the advantage of our proposed reward function. Finally, we discuss the influence of various features on the DQN model by an ablation experiment.

    Contents 中文摘要 I Abstract II 致 謝 IV Contents V List of Tables VII List of Figures VIII I. Introduction 1 II. Related Work 5 A. IoT Middleware 5 B. OM2M Platform 6 C. Content Caching 6 D. Cache Replacement 7 E. Reinforcement Learning 9 F. Motivation 9 III. System Model 13 A. DRL Preliminary 19 B. Algorithm 22 IV. Evaluation 25 A. Environment Setting 25 B. DQN Parameters 26 C. Experimental setup 27 D. Results 28 E. Ablations 36 V. Conclusion and Future Work 39 VI. References 41 List of Tables Table 1. Products on the market 2 Table 2. Summary of Literature Study 8 Table 3. Notations 18 Table 4. User Profile 1 25 Table 5. User Profile 2 26 Table 6. Hyperparameters 27 Table 7. Raw hit rate and raw hit rate improvement (same size) 31 Table 8. Raw hit rate and raw hit rate improvement (different sizes) 31 Table 9. Raw hit rate and raw hit rate improvement ([8]’s dataset) 34 Table 10. Ablation settings for 10 feature combinations 37 List of Figures Fig. 1. The referenced OM2M-based architecture. 3 Fig. 2. Belady’s algorithm vs. method-1 algorithm. 12 Fig. 3. The relationship among user id, time zone group, and time zone. 14 Fig. 4. Cache and Cache content 14 Fig. 5. Cache operation flow chart 16 Fig. 6. Time epoch and Time step. 17 Fig. 7. Delay-reward mechanism. 20 Fig. 8. DRL flowchart 24 Fig. 9. Normalized hit rate for the same content size (average) 32 Fig. 10. Normalized hit rate for different content sizes (average) 32 Fig. 11. Improvement rate for the same content size (average) 33 Fig. 12. Improvement rate for different content sizes (average) 33 Fig. 13. Normalized hit rate for [8]’s dataset 34 Fig. 14. Improvement rate for [8]’s dataset 35 Fig. 15. The hit rate of different reward functions in the [8]’s dataset 35 Fig. 16. The hit rate and reward of different reward functions in the [8]’s dataset 36 Fig. 17. Ablation study for the same content size. 37 Fig. 18. Ablation study for different content sizes. 38

    [1] S.A.Weis, “RFID (Radio Frequency Identification): Principles and Applications.” MIT CSAIL, 2007. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.182.5224 &rep=rep1&type=pdf
    [2] SOYAL, “Communication Protocol AR-727HV3”, December 2018. https://www.soyal.com/download/protocol/721E Protocol_EN.pdf
    [3] MaCaPS, “MaCaPS Smart Access Event Server Specification”, November 2004. https://manualzz.com/doc/o/90109/smart-access-control-system-software-user-manual-administering-time-zone
    [4] ZKTeco, “User Manual ZKAccess3.5 Security System”, Jan. 2018. https://idency.com/wp-content/uploads/2018/01/ZKAccess3.5-Security-System-user-manual-V3.1.1.pdf
    [5] SOCA, “SF-1000 Network Fingerprint Proximity Access Control System”, Aug. 2016. http://www.socatech.com/en/product-543237/Fingerprint-Proximity-Access-Control-System-SF-1000.html
    [6] Continental Access, “CardAccess®3000 Architectural & Engineering Specification Version 2.7”, 2010. https://www.napcosecurity.com/media/pdfs/AE-ContinentalAccessV2.7-111309.pdf
    [7] SIELOX, “Access Control Software version 5.0”, November 2006. https://www.bassunited.com/downloads/manuals/access-control/SIELOX-PINNACLE-5.0-Manual.pdf
    [8] Chuan-Ching Sue and Hong-Wei Liu, “Cache-enabled Access Control System Based on OM2M Framework”, 2019 IEEE TRON Symposium (TRONSHOW), pp. 1-7, 2019.
    [9] L. A. Belady, “A study of replacement algorithms for a virtual-storage computer”, IBM Systems Journal, pp. 78-101, 1966.
    [10] Alaya M. B., Banouar Y., Monteil T., Chassot C., Drira K., “OM2M: Extensible ETSI-compliant M2M Service Platform with Self-configuration Capability”, Procedia Computer Science, vol. 32, pp. 1079-1086, 2014.
    [11] oneM2M, TS-0010-V2.4.1, “MQTT protocol Binding”, August 2016. https://www.etsi.org/deliver/etsi_ts/118100_118199/118110/02.04.01_60/ts_118110v020401p.pdf
    [12] Xiaofei Wang, Min Chen, Tarik Taleb, Adlen Ksentini, Victor C.M. Leung, “Cache in the air: exploiting content caching and delivery techniques for 5G systems”, IEEE Communications Magazine, vol. 52, no. 2, pp. 131-139, Feb. 2014.
    [13] Jeongho Kwak, Yeongjin Kim, Long Bao Le, Song Chong, “Hybrid Content Caching in 5G Wireless Networks: Cloud Versus Edge Caching”, IEEE Transactions on Wireless Communications, vol. 17, no. 5, pp. 3030-3045, Feb. 2018.
    [14] Serdar Vural, Pirabakaran Navaratnam, Ning Wang, Chonggang Wang, Lijun Dong, and Rahim Tafazolli, “In-network Caching of Internet-of-Things Data”, 2014 IEEE International Conference on Communications (ICC), pp. 3185-3190, June 2014.
    [15] Hao Zhu, Yang Cao, Xiao Wei, Wei Wang, Tao Jiang and Shi Jin, “Caching Transient Data for Internet of Things: A Deep Reinforcement Learning Approach”, IEEE Internet of Things Journal, vol. 6, no. 2, pp. 2047-2083, Nov. 2018.
    [16] Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh, and Sachin Katti, “Cliffhanger: Scaling Performance Cliffs in Web Memory Caches”, 13th USENIX Symposium on Networked Systems Design and Implementation, pp. 379-392, March 2016.
    [17] Stephen Williams, Marc Abrams, Ed Fox, Ghaleb Abdulla, and Charles R. Standridge, “Removal policies in network caches for World-Wide Web documents”, ACM SIGCOMM Computer Communication Review, pp. 293-305, August 1996.
    [18] Duane Wessels, “Intelligent caching for World-Wide Web objects”, M.S. thesis, University of Colorado at Boulder, 1995.
    [19] H. ElAarag and S. Romano, “Comparison of function based web proxy cache replacement strategies”, 2009 International Symposium on Performance Evaluation of Computer & Telecommunication Systems, pp. 252-259, 2009.
    [20] Shuai Hao and Haining Wang, “Exploring Domain Name Based Features on the Effectiveness of DNS Caching”, ACM SIGCOMM Computer Communication Review, vol. 47, pp. 36-42, 2017.
    [21] Chen Zhong, M Cenk Gursoy, and Senem Velipasalar, “A Deep Reinforcement Learning-Based Framework for Content Caching”, 52nd Annual Conference on Information Sciences and Systems (CISS), pp. 1-6, 2018.
    [22] PeihaoWang, YuehaoWang, and Rui Wang, “Deep Reinforcement Learning-Based Cache Replacement Policy”, https://github.com/peihaowang/DRLCache, pp. 1-12, 2020.
    [23] Akanksha Jain and Calvin Lin, “Back to the Future: Leveraging Belady's Algorithm for Improved Cache Replacement”, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 78-89, 2016.
    [24] Zhan Shi, Xiangru Huang, Akanksha Jain, and Calvin Lin, “Applying Deep Learning to the Cache Replacement Problem”, Proceedings of The 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 413-425, 2019.
    [25] Evan Zheran Liu, Milad Hashemi, Kevin Swersky, Parthasarathy Ranganathan, and Junwhan Ahn, “An Imitation Learning Approach for Cache Replacement”, International Conference on Machine Learning (ICML), pp. 6237-6247, July 2020.
    [26] Christopher J.C.H. Watkins and Peter Dayan, “Q-learning”, Machine learning, vol. 8, pp. 279-292, May 1992.
    [27] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller, “Playing Atari with deep reinforcement learning”, NIPS Deep Learning Workshop, pp. 1-9, Dec. 2013.
    [28] V. Mnih, et al, “Human-level control through deep reinforcement learning”, Nature, vol. 518, pp. 529-533, 2015.
    [29] Chuan-Ching Sue and Chen-Yi Chang, “Using Q-learning to Enhance Cache-enabled OM2M-based Access Control System”, TANET, pp. 305-309, Oct. 2020.
    [30] Arryon D. Tijsma, Madalina M. Drugan, Marco A. Wiering, “Comparing exploration strategies for Q-learning in random stochastic mazes”, IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1-8, Dec. 2016.
    [31] George Kingsley Zipf, “Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology”, Addison-Wesley Press, 1949.
    [32] X. Xiong, K. Zheng, L. Lei and L. Hou, “Resource Allocation Based on Deep Reinforcement Learning in IoT Edge Computing”, IEEE Journal on Selected Areas in Communications, vol. 38, no. 6, pp. 1133-1146, June 2020.

    無法下載圖示 校內:2026-08-17公開
    校外:2026-08-17公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE