| 研究生: |
莊杰潾 Chuang, Chieh-Lin |
|---|---|
| 論文名稱: |
基於邊緣計算之強化學習與監督式學習協同訓練系統 Collaborative Training System of Reinforcement Learning and Supervised Learning Based on Edge Computing |
| 指導教授: |
賴槿峰
Lai, Chin-Feng |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 中文 |
| 論文頁數: | 51 |
| 中文關鍵詞: | 邊緣計算 、強化學習 、協同訓練 |
| 外文關鍵詞: | Edge computing, Reinforcement learning, Collaborative training |
| 相關次數: | 點閱:95 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在一般的監督式學習當中,模型在訓練的當下通常需要將大多數的變因固定使得模型能夠更快在資料集中找到規律以利於收斂,但在實際應用之時若環境有所改變,模型將會有失能的情形發生。因此本研究希望將強化學習模型應用在靠近使用者的環境節點,藉著其參數是動態更新的特點以解決在重新訓練時應用需停擺的問題。
但在強化學習的模型當中,有著需要接收反饋以更新參數的問題,本研究希望利用協同訓練的方式使監督式學習模型來提供強化學習所需之反饋,利用網路傳輸使兩邊的模型得以進行數據交換,以解決當強化學習模型遭遇到無反饋之任務所獲得的難題,藉由監督式學習模型來提供強化學習模型在更新參數時所需的獎懲計算依據。在應用落地的場域,邊緣節點通常不具備優良的運算能力,本研究提出的方法能夠使得強化學習模型在運作時間足夠常的情況下逐漸往監督式學習模型的參數收斂,以解決在邊緣計算上終端裝置效能不足的問題。
在實驗結果當中,本研究根據模型在Mnist、Mall Dataset兩個資料集上運行所獲得之結果,分析了不同效能的硬體之運行差異與網路環境和神經網路架構對實驗結果的影響能力,證明在協同訓練的架構下足以負荷使用者所需的即時性,並看出在不同複雜程度的應用上對於模型所造成的時間延遲壓力。
This research hopes to apply the reinforcement learning model to the environment nodes close to the user, by virtue of its parameters being dynamically updated, to solve the problem of application suspension during retraining.
However, in the reinforcement learning model, there is a problem that needs to receive feedback to update the parameters. This research hopes to use collaborative training to make the supervised learning model provide the feedback required for reinforcement learning, and use network transmission to enable the models on both sides to perform Data exchange is used to solve the problems obtained when the reinforcement learning model encounters a task without feedback, and the supervised learning model is used to provide the reward and punishment calculation basis required by the reinforcement learning model when updating parameters. In the application field, the edge node usually does not have excellent computing power. The method proposed in this research can make the reinforcement learning model gradually converge to the parameters of the supervised learning model when the operation time is long enough to solve the problem of computing at the edge. The problem of insufficient performance of the upper terminal device.
Among the experimental results, based on the results obtained by running the model on two data sets, Mnist and Mall Dataset, this research analyzes the operating differences of hardware with different performance and the influence of network environment and neural network architecture on the experimental results. To prove that the collaborative training architecture is sufficient to load the real-time required by users, and to see the time delay pressure on the model caused by applications of different complexity.
[1] L. Jiang, D.Y. Liu, and B. Yang, “Smart home research,” in Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No. 04EX826), vol. 2, pp. 659–663, IEEE, 2004.
[2] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016.
[3] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, “Edge computing: Vision and challenges,” IEEE internet of things journal, vol. 3, no. 5, pp. 637–646, 2016.
[4] S. Yi, C. Li, and Q. Li, “A survey of fog computing: concepts, applications and issues,” in Proceedings of the 2015 workshop on mobile big data, pp. 37–42, 2015.
[5] X. Zhu and A. B. Goldberg, “Introduction to semisupervised learning,” Synthesis lectures on artificial intelligence and machine learning, vol. 3, no. 1, pp. 1–130, 2009.
[6] R. S. Sutton, A. G. Barto, et al., Introduction to reinforcement learning, vol. 135. MIT press Cambridge, 1998.
[7] W. Robbins and S. Dustdar, Collaborative Computing, pp. 67–74. Boston, MA: Springer US, 2008.
[8] J. Lee, B. Bagheri, and H.A. Kao, “A cyberphysical systems architecture for industry 4.0based manufacturing systems,” Manufacturing letters, vol. 3, pp. 18–23, 2015.
[9] Z. Ghahramani, “Unsupervised learning,” in Summer School on Machine Learning, pp. 72–112, Springer, 2003.
[10] U. S. Shanthamallu, A. Spanias, C. Tepedelenlioglu, and M. Stanley, “A brief survey of machine learning methods and their sensor and iot applications,” in 2017 8th International Conference on Information, Intelligence, Systems & Applications (IISA), pp. 1–8, IEEE, 2017.
[11] D. J. Cook, M. Youngblood, E. O. Heierman, K. Gopalratnam, S. Rao, A. Litvin, and F. Khawaja, “Mavhome: An agentbased smart home,” in Proceedings of the First IEEE International Conference on Pervasive Computing and Communications, 2003.(PerCom 2003)., pp. 521–524, IEEE, 2003.
[12] Z. Yang, Z. Zhou, and Y. Liu, “From rssi to csi: Indoor localization via channel response,” ACM Computing Surveys (CSUR), vol. 46, no. 2, pp. 1–32, 2013.
[13] K. Wu, J. Xiao, Y. Yi, D. Chen, X. Luo, and L. M. Ni, “Csibased indoor localization,” IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 7, pp. 1300–1309, 2012.
[14] S. Z. Masood, G. Shu, A. Dehghan, and E. G. Ortiz, “License plate detection and recognition using deeply learned convolutional neural networks,” arXiv preprint arXiv:1703.07330, 2017.
[15] X. Ma, H. Yu, Y. Wang, and Y. Wang, “Largescale transportation network congestion evolution prediction using deep learning theory,” PloS one, vol. 10, no. 3, p. e0119044, 2015.
[16] W. Min and L. Wynter, “Realtime road traffic prediction with spatiotemporal correlations,” Transportation Research Part C: Emerging Technologies, vol. 19, no. 4, pp. 606–616, 2011.
[17] T.C. Tai, M. Tian, W.T. Cho, and C.F. Lai, “3dcnn based computeraided diagnosis (cadx) for lung nodule diagnosis,” in International Cognitive Cities Conference, pp. 35– 43, Springer, 2019.
[18] S. Ji, W. Xu, M. Yang, and K. Yu, “3d convolutional neural networks for human action recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221–231, 2012.
[19] J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson, “Understanding neural networks through deep visualization,” arXiv preprint arXiv:1506.06579, 2015.
[20] P. M. Kumar and U. D. Gandhi, “A novel threetier internet of things architecture with machine learning algorithm for early detection of heart diseases,” Computers & Electrical Engineering, vol. 65, pp. 222–235, 2018.
[21] E. A. Thomson, K. Nuss, A. Comstock, S. Reinwald, S. Blake, R. E. Pimentel, B. L. Tracy, and K. Li, “Heart rate measures from the apple watch, fitbit charge hr 2, and electrocardiogram across different exercise intensities,” Journal of sports sciences, vol. 37, no. 12, pp. 1411–1419, 2019.
[22] S. S. Patil and S. A. Thorat, “Early detection of grapes diseases using machine learning and iot,” in 2016 second international conference on Cognitive Computing and Information Processing (CCIP), pp. 1–5, IEEE, 2016.
[23] T. Baranwal, P. K. Pateriya, et al., “Development of iot based smart security and monitoring devices for agriculture,” in 2016 6th International ConferenceCloud System and Big Data Engineering (Confluence), pp. 597–602, IEEE, 2016.
[24] M. Roopaei, P. Rad, and K.K. R. Choo, “Cloud of things in smart agriculture: Intelligent irrigation monitoring by thermal imaging,” IEEE Cloud computing, vol. 4, no. 1, pp. 10–15, 2017.
[25] J. Lee, B. Bagheri, and H.A. Kao, “A cyberphysical systems architecture for industry 4.0based manufacturing systems,” Manufacturing letters, vol. 3, pp. 18–23, 2015.
[26] N. Correll, N. Arechiga, A. Bolger, M. Bollini, B. Charrow, A. Clayton, F. Dominguez,
K. Donahue, S. Dyar, L. Johnson, H. Liu, A. Patrikalakis, T. Robertson, J. Smith, D. Soltero, M. Tanner, L. White, and D. Rus, “Building a distributed robot garden,” in 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1509– 1516, 2009.
[27] S. Hou and Y. Li, “Shortterm fault prediction based on support vector machines with parameter optimization by evolution strategy,” Expert Systems with Applications, vol. 36, no. 10, pp. 12383–12391, 2009.
[28] S. F. Moore, Law as process: an anthropological approach. LIT Verlag Münster, 2000.
[29] M. Minsky and S. A. Papert, Perceptrons: An introduction to computational geometry. MIT press, 2017.
[30] S. K. Pal and S. Mitra, “Multilayer perceptron, fuzzy sets, classifiaction,” 1992.
[31] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by backpropagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986.
[32] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradientbased learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[33] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, pp. 1097–1105, 2012.
[34] K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[35] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015.
[36] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
[37] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,, 2016.
[38] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inceptionv4, inceptionresnet and the impact of residual connections on learning,” in Proceedings of the ThirtyFirst AAAI Conference on Artificial Intelligence, AAAI'17, p. 4278–4284, AAAI Press, 2017.
[39] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” 2016. cite arxiv:1610.02357.
[40] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” pp. 770–778, 06 2016.
[41] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” 2016. cite arxiv:1603.05027Comment: ECCV 2016 cameraready.
[42] S. Xie, R. B. Girshick, P.Dollár, Z.Tu, and K. He, “Aggregated residual transformations for deep neural networks,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5987–5995, 2017.
[43] B. Zoph, V. Vasudevan, J. Shlens, and Q. Le, “Learning transferable architectures for scalable image recognition,” pp. 8697–8710, 06 2018.
[44] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 00, pp. 580–587, June 2014.
[45] R. Girshick, “Fast rcnn,” in Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ICCV '15, (USA), p. 1440–1448, IEEE Computer Society, 2015.
[46] S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster rcnn: Towards realtime object detection with region proposal networks.,” in NIPS (C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, eds.), pp. 91–99, 2015.
[47] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask rcnn,” 2017. cite arxiv:1703.06870Comment: open source; appendix on more results.
[48] J.Redmon, S.K.Divvala, R.B.Girshick, andA.Farhadi, “Youonlylookonce: Unified, realtime object detection,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788, 2016.
[49] S. Hochreiter and J. Schmidhuber, “Long shortterm memory,” Neural Comput., vol. 9,
p. 1735–1780, Nov. 1997.
[50] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” 2014. cite arxiv:1412.3555Comment: Presented in NIPS 2014 Deep Learning and Representation Learning Workshop.
[51] W. Chien, H. Weng, C. Lai, Z. Fan, H. Chao, and Y. Hu, “A sfcbased access point switching mechanism for softwaredefined wireless network in iov,” Future Generation Computer Systems, vol. 98, pp. 577–585, Sept. 2019.
[52] K. Zhou, K. Zhang, Y. Wu, S. Liu, and J. Yu, “Unsupervised context rewriting for open domain conversation,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP), (Hong Kong, China), pp. 1834–1844, Association for Computational Linguistics, Nov. 2019.
[53] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neural image caption generator,” 2014. cite arxiv:1411.4555.
[54] K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” CoRR, vol. abs/1502.03044, 2015.
[55] T. Chai and R. Draxler, “Root mean square error (rmse) or mean absolute error (mae)?– arguments against avoiding rmse in the literature,” Geoscientific Model Development, vol. 7, pp. 1247–1250, 06 2014.
[56] R. Y. Rubinstein, “Optimization of computer simulation models with rare events,” European Journal of Operational Research, vol. 99, no. 1, pp. 89 – 112, 1997.
[57] R. Rubinstein, “Combinatorial optimization, crossentropy, ants and rare events,” 01 2001.
[58] P.T. Boer, D. Kroese, S. Mannor, and R. Rubinstein, “A tutorial on the crossentropy method,” Annals of Operations Research, vol. 134, pp. 19–67, 02 2005.
[59] L. Rosasco, E. De Vito, A. Caponnetto, M. Piana, and A. Verri, “Are loss functions all the same?,” Neural Comput., vol. 16, p. 1063–1076, May 2004.
[60] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu, “An efficient kmeans clustering algorithm: analysis and implementation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 881–892, 2002.
[61] S. C. Johnson, “Hierarchical clustering schemes,” Psychometrika, vol. 32, no. 3, pp. 241–254, 1967.
[62] N. Verkaik, M. Benard, H. Boelens, C. [de Vogel], J. Nouwen, H. Verbrugh, D. Melles, A. [van Belkum], and W. [van Wamel], “Immune evasion clusterpositive bacteriophages are highly prevalent among human staphylococcus aureus strains, but they are not essential in the first stages of nasal colonization,” Clinical Microbiology and Infection, vol. 17, no. 3, pp. 343 – 348, 2011.
[63] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems 27 (Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, eds.), pp. 2672–2680, Curran Associates, Inc., 2014.
[64] L. Bottou, “Largescale machine learning with stochastic gradient descent,” in Proceedings of COMPSTAT’2010 (Y. Lechevallier and G. Saporta, eds.), (Heidelberg), pp. 177– 186, PhysicaVerlag HD, 2010.
[65] A. Cauchy, “Méthode générale pour la résolution des systemes d'équations simultanées,” Comp. Rend. Sci. Paris, vol. 25, no. 1847, pp. 536–538, 1847.
[66] S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv preprint arXiv:1609.04747, 2016.
[67] H. Robbins and S. Monro, “A stochastic approximation method,” The annals of mathematical statistics, pp. 400–407, 1951.
[68] J. Kiefer, J. Wolfowitz, et al., “Stochastic estimation of the maximum of a regression function,” The Annals of Mathematical Statistics, vol. 23, no. 3, pp. 462–466, 1952.
[69] L. Bottou, F. E. Curtis, and J. Nocedal, “Optimization methods for largescale machine learning,” Siam Review, vol. 60, no. 2, pp. 223–311, 2018.
[70] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization.,” Journal of machine learning research, vol. 12, no. 7, 2011.
[71] T. Tieleman and G. Hinton, “Lecture 6.5rmsprop: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural networks for machine learning, vol. 4, no. 2, pp. 26–31, 2012.
[72] J. Śniatycki and A. Weinstein, “Reduction and quantization for singular momentum mappings,” Letters in mathematical physics, vol. 7, no. 2, pp. 155–161, 1983.
[73] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[74] D. P. Bertsekas, D. P. Bertsekas, D. P. Bertsekas, and D. P. Bertsekas, Dynamic programming and optimal control, vol. 1. Athena scientific Belmont, MA, 1995.
[75] L. Buşoniu, B. De Schutter, and R. Babuška, “Approximate dynamic programming and reinforcement learning,” in Interactive collaborative information systems, pp. 3–44, Springer, 2010.
[76] R. Bellman, “A markovian decision process,” Journal of mathematics and mechanics, pp. 679–684, 1957.
[77] M.L.Puterman, Markovdecisionprocesses: discretestochasticdynamicprogramming. John Wiley & Sons, 2014.
[78] R. Bellman, “Dynamic programming,” Science, vol. 153, no. 3731, pp. 34–37, 1966.
[79] C. J. C. H. Watkins, “Learning from delayed rewards,” 1989.
[80] C. J. Watkins and P. Dayan, “Qlearning,” Machine learning, vol. 8, no. 34, pp. 279– 292, 1992.
[81] V.Mnih, K.Kavukcuoglu, D.Silver, A.A.Rusu, J.Veness, M.G.Bellemare, A.Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., “Humanlevel control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015.
[82] B.G. Chun, S. Ihm, P. Maniatis, M. Naik, and A. Patti, “Clonecloud: elastic execution between mobile device and cloud,” in Proceedings of the sixth conference on Computer systems, pp. 301–314, 2011.
[83] A. Rudenko, P. Reiher, G. J. Popek, and G. H. Kuenning, “Saving portable computer battery power through remote process execution,” ACM SIGMOBILE Mobile Computing and Communications Review, vol. 2, no. 1, pp. 19–26, 1998.
[84] G. Hunt and M. Scott, “The coign automatic distributed partitioning system,” 1999.
[85] K. Kumar and Y.H. Lu, “Cloud computing for mobile users: Can offloading computation save energy?,” Computer, vol. 43, no. 4, pp. 51–56, 2010.
[86] S. Kosta, A. Aucinas, P. Hui, R. Mortier, and X. Zhang, “Thinkair: Dynamic resource allocation and parallel execution in the cloud for mobile code offloading,” in 2012 Proceedings IEEE Infocom, pp. 945–953, IEEE, 2012.
[87] C. V. Networking, “Cisco global cloud index: Forecast and methodology, 2014–2019,” white paper, 2013.
[88] L. Li, K. Ota, and M. Dong, “Deep learning for smart industry: Efficient manufacture inspection system with fog computing,” IEEE Transactions on Industrial Informatics, vol. 14, no. 10, pp. 4665–4673, 2018.
[89] C. Change Loy, S. Gong, and T. Xiang, “From semisupervised to transfer counting of crowds,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2256–2263, 2013.
[90] K. Chen, S. Gong, T. Xiang, and C. Change Loy, “Cumulative attribute space for age and crowd density estimation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2467–2474, 2013.
[91] C. C. Loy, K. Chen, S. Gong, and T. Xiang, “Crowd counting and profiling: Methodology and evaluation,” in Modeling, simulation and visual analysis of crowds, pp. 347– 382, Springer, 2013.
[92] K. Chen, C. C. Loy, S. Gong, and T. Xiang, “Feature mining for localised crowd counting.,” in BMVC, vol. 1, p. 3, 2012.
校內:2025-08-04公開