成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	張宸翊 Chang, Chen-Yi
論文名稱：	在基於OM2M的異質門禁系統中應用深度強化學習的快取替換方法 A Deep Reinforcement Learning Approach for Cache Replacement in OM2M-based Heterogeneous Access Controller Management System
指導教授：	蘇銓清 Sue, Chuan-Ching
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	43
中文關鍵詞：	深度強化學習、快取替換策略、內容快取、OM2M 、門禁管理系統
外文關鍵詞：	Deep Reinforcement Learning, Cache Replacement policy, Content Caching, OM2M, Door Access Control System
相關次數：	點閱：138 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著物聯網迅速的發展，越來越多的資料在裝置與伺服器間傳輸，造成伺服器及網路的負載跟著增加。霧運算 (fog computing) 的出現是為了減少雲端伺服器的負載和網路頻寬的瓶頸。由於霧計算的階層式概念，衍生出內容快取的研究議題，包括DNS快取、Web搜尋快取等。然而，這些關於內容快取 (Content Caching) 的研究並沒有考慮在不同的內容大小時同時使用 Belady’s algorithm (理論上的最佳內容快取演算法)。本論文將研究在基於 OM2M 的異質門禁管理系統中探討考慮不同內容大小的快取替換策略。我們提出了一種用於快取替換的深度強化學習 (DRL) 模型，以降低快取的未命中率。通過我們的DRL模型，我們利用兩個回放記憶庫 (replay memory) 實現了延遲獎勵 (delay-reward) 機制並整合在線學習 (online learning) 和離線學習 (offline learning)，克服了無法取得未來資料的困難。實驗結果顯示，與幾種傳統的啟發式快取替換方法相比，我們提出的方法在相同內容大小和不同內容大小的環境中的原始快取命中率分別提高5.7%到 23.9% 和 9.4% 到 22.3%。此外，我們還評估了其他同樣基於深度強化學習的快取替換研究中的不同獎勵函數，以證明我們提出的獎勵函數的優勢。最後，我們通過消融實驗 (ablation experiment) 討論了各種特徵對 DQN 模型的影響。

With the rapid development of the Internet of Things, more and more data flows appear between devices and servers, and the load on the server and network increases accordingly. The emergence of fog computing has been proposed to reduce the cloud server load and network bandwidth bottleneck. Because of the hierarchical concept of fog computing, the research topic of content caching has been derived, including DNS caching, Web search caching, etc. However, several studies on content caching have not incorporated Belady’s algorithm (known for the theoretical best algorithm for content caching) while considering different content sizes. This article will study cache replacement in the OM2M-based heterogeneous access control management system that contains variable content sizes. We propose a deep reinforcement learning (DRL) model for cache replacement to reduce the miss rate. Through our DRL model, we conquer the difficulty of “looking into the future” by integrating online learning and offline learning and using two memory buffers to realize the delay-reward mechanism. The evaluation results show that our proposed method has improved from 5.7% to 23.9% and 9.4% to 22.3% compared to several traditional heuristic replacement policies regarding the raw hit rate in the environment with the same content size and different content sizes, respectively. Furthermore, we also evaluate different reward functions from other DRL studies to justify the advantage of our proposed reward function. Finally, we discuss the influence of various features on the DQN model by an ablation experiment.

Contents
中文摘要 					I 
Abstract 					II 
致 謝						IV 
Contents						V
List of Tables					VII
List of Figures 					VIII
I. Introduction					1
II. Related Work					5
A. IoT Middleware				5
B. OM2M Platform				6 
C. Content Caching				6
D. Cache Replacement			7
E. Reinforcement Learning		9
F. Motivation					9 
III. System Model				13 
A. DRL Preliminary				19
B. Algorithm						22 
IV. Evaluation					25 
A. Environment Setting			25 
B. DQN Parameters				26
C. Experimental setup			27
D. Results						28 
E. Ablations						36 
V. Conclusion and Future Work	39 
VI. References					41 

List of Tables 
Table 1. Products on the market										2 
Table 2. Summary of Literature Study									8 
Table 3. Notations													18 
Table 4. User Profile 1												25 
Table 5. User Profile 2												26
Table 6. Hyperparameters											27
Table 7. Raw hit rate and raw hit rate improvement (same size)		31
Table 8. Raw hit rate and raw hit rate improvement (different sizes)		31
Table 9. Raw hit rate and raw hit rate improvement ([8]’s dataset) 	34
Table 10. Ablation settings for 10 feature combinations				37 

List of Figures 
Fig. 1. The referenced OM2M-based architecture.					3 
Fig. 2. Belady’s algorithm vs. method-1 algorithm.				12 
Fig. 3. The relationship among user id, time zone group, and time zone.			14
Fig. 4. Cache and Cache content								14
Fig. 5. Cache operation flow chart 								16 
Fig. 6. Time epoch and Time step. 								17
Fig. 7. Delay-reward mechanism.									20
Fig. 8. DRL flowchart												24
Fig. 9. Normalized hit rate for the same content size (average)		32
Fig. 10. Normalized hit rate for different content sizes (average)	32
Fig. 11. Improvement rate for the same content size (average) 	33
Fig. 12. Improvement rate for different content sizes (average) 	33
Fig. 13. Normalized hit rate for [8]’s dataset 						34
Fig. 14. Improvement rate for [8]’s dataset						35
Fig. 15. The hit rate of different reward functions in the [8]’s dataset	35
Fig. 16. The hit rate and reward of different reward functions in the [8]’s dataset	36
Fig. 17. Ablation study for the same content size.					37
Fig. 18. Ablation study for different content sizes.					38
                                    

[1] S.A.Weis, “RFID (Radio Frequency Identification): Principles and Applications.” MIT CSAIL, 2007. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.182.5224 &rep=rep1&type=pdf
[2] SOYAL, “Communication Protocol AR-727HV3”, December 2018. https://www.soyal.com/download/protocol/721E Protocol_EN.pdf
[3] MaCaPS, “MaCaPS Smart Access Event Server Specification”, November 2004. https://manualzz.com/doc/o/90109/smart-access-control-system-software-user-manual-administering-time-zone
[4] ZKTeco, “User Manual ZKAccess3.5 Security System”, Jan. 2018. https://idency.com/wp-content/uploads/2018/01/ZKAccess3.5-Security-System-user-manual-V3.1.1.pdf
[5] SOCA, “SF-1000 Network Fingerprint Proximity Access Control System”, Aug. 2016. http://www.socatech.com/en/product-543237/Fingerprint-Proximity-Access-Control-System-SF-1000.html
[6] Continental Access, “CardAccess®3000 Architectural & Engineering Specification Version 2.7”, 2010. https://www.napcosecurity.com/media/pdfs/AE-ContinentalAccessV2.7-111309.pdf
[7] SIELOX, “Access Control Software version 5.0”, November 2006. https://www.bassunited.com/downloads/manuals/access-control/SIELOX-PINNACLE-5.0-Manual.pdf
[8] Chuan-Ching Sue and Hong-Wei Liu, “Cache-enabled Access Control System Based on OM2M Framework”, 2019 IEEE TRON Symposium (TRONSHOW), pp. 1-7, 2019.
[9] L. A. Belady, “A study of replacement algorithms for a virtual-storage computer”, IBM Systems Journal, pp. 78-101, 1966.
[10] Alaya M. B., Banouar Y., Monteil T., Chassot C., Drira K., “OM2M: Extensible ETSI-compliant M2M Service Platform with Self-configuration Capability”, Procedia Computer Science, vol. 32, pp. 1079-1086, 2014.
[11] oneM2M, TS-0010-V2.4.1, “MQTT protocol Binding”, August 2016. https://www.etsi.org/deliver/etsi_ts/118100_118199/118110/02.04.01_60/ts_118110v020401p.pdf
[12] Xiaofei Wang, Min Chen, Tarik Taleb, Adlen Ksentini, Victor C.M. Leung, “Cache in the air: exploiting content caching and delivery techniques for 5G systems”, IEEE Communications Magazine, vol. 52, no. 2, pp. 131-139, Feb. 2014.
[13] Jeongho Kwak, Yeongjin Kim, Long Bao Le, Song Chong, “Hybrid Content Caching in 5G Wireless Networks: Cloud Versus Edge Caching”, IEEE Transactions on Wireless Communications, vol. 17, no. 5, pp. 3030-3045, Feb. 2018.
[14] Serdar Vural, Pirabakaran Navaratnam, Ning Wang, Chonggang Wang, Lijun Dong, and Rahim Tafazolli, “In-network Caching of Internet-of-Things Data”, 2014 IEEE International Conference on Communications (ICC), pp. 3185-3190, June 2014.
[15] Hao Zhu, Yang Cao, Xiao Wei, Wei Wang, Tao Jiang and Shi Jin, “Caching Transient Data for Internet of Things: A Deep Reinforcement Learning Approach”, IEEE Internet of Things Journal, vol. 6, no. 2, pp. 2047-2083, Nov. 2018.
[16] Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh, and Sachin Katti, “Cliffhanger: Scaling Performance Cliffs in Web Memory Caches”, 13th USENIX Symposium on Networked Systems Design and Implementation, pp. 379-392, March 2016.
[17] Stephen Williams, Marc Abrams, Ed Fox, Ghaleb Abdulla, and Charles R. Standridge, “Removal policies in network caches for World-Wide Web documents”, ACM SIGCOMM Computer Communication Review, pp. 293-305, August 1996.
[18] Duane Wessels, “Intelligent caching for World-Wide Web objects”, M.S. thesis, University of Colorado at Boulder, 1995.
[19] H. ElAarag and S. Romano, “Comparison of function based web proxy cache replacement strategies”, 2009 International Symposium on Performance Evaluation of Computer & Telecommunication Systems, pp. 252-259, 2009.
[20] Shuai Hao and Haining Wang, “Exploring Domain Name Based Features on the Effectiveness of DNS Caching”, ACM SIGCOMM Computer Communication Review, vol. 47, pp. 36-42, 2017.
[21] Chen Zhong, M Cenk Gursoy, and Senem Velipasalar, “A Deep Reinforcement Learning-Based Framework for Content Caching”, 52nd Annual Conference on Information Sciences and Systems (CISS), pp. 1-6, 2018.
[22] PeihaoWang, YuehaoWang, and Rui Wang, “Deep Reinforcement Learning-Based Cache Replacement Policy”, https://github.com/peihaowang/DRLCache, pp. 1-12, 2020.
[23] Akanksha Jain and Calvin Lin, “Back to the Future: Leveraging Belady's Algorithm for Improved Cache Replacement”, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 78-89, 2016.
[24] Zhan Shi, Xiangru Huang, Akanksha Jain, and Calvin Lin, “Applying Deep Learning to the Cache Replacement Problem”, Proceedings of The 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 413-425, 2019.
[25] Evan Zheran Liu, Milad Hashemi, Kevin Swersky, Parthasarathy Ranganathan, and Junwhan Ahn, “An Imitation Learning Approach for Cache Replacement”, International Conference on Machine Learning (ICML), pp. 6237-6247, July 2020.
[26] Christopher J.C.H. Watkins and Peter Dayan, “Q-learning”, Machine learning, vol. 8, pp. 279-292, May 1992.
[27] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller, “Playing Atari with deep reinforcement learning”, NIPS Deep Learning Workshop, pp. 1-9, Dec. 2013.
[28] V. Mnih, et al, “Human-level control through deep reinforcement learning”, Nature, vol. 518, pp. 529-533, 2015.
[29] Chuan-Ching Sue and Chen-Yi Chang, “Using Q-learning to Enhance Cache-enabled OM2M-based Access Control System”, TANET, pp. 305-309, Oct. 2020.
[30] Arryon D. Tijsma, Madalina M. Drugan, Marco A. Wiering, “Comparing exploration strategies for Q-learning in random stochastic mazes”, IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1-8, Dec. 2016.
[31] George Kingsley Zipf, “Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology”, Addison-Wesley Press, 1949.
[32] X. Xiong, K. Zheng, L. Lei and L. Hou, “Resource Allocation Based on Deep Reinforcement Learning in IoT Edge Computing”, IEEE Journal on Selected Areas in Communications, vol. 38, no. 6, pp. 1133-1146, June 2020.

校內：2026-08-17公開
校外：2026-08-17公開電子論文尚未授權公開，紙本請查館藏目錄

簡易檢索 / 詳目顯示

相關論文