研究生: |
張淙垣 Chang, Tsung-Yuan |
---|---|
論文名稱: |
基於標籤基因組和深度強化學習在大量離散動作之動態電影推薦演算法 Dynamic Movie Recommendation Algorithm Based on Tag Genome and Deep Reinforcement Learning in Large Discrete Action Spaces |
指導教授: |
賴槿峰
Lai, Chin-Feng |
學位類別: |
碩士 Master |
系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 54 |
中文關鍵詞: | 推薦演算法 、強化學習 、大量離散動作 |
外文關鍵詞: | Recommendation algorithm, Reinforcement learning, Large number of discrete actions |
相關次數: | 點閱:114 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於現今社會全世界使用者產生的數據量日趨龐大,如何從大量資料中提取出有用的特徵,並將其產生價值就非常重要,推薦演算法也因此蓬勃發展,透過新的技術達到更加的效能及推薦結果。
在本論文中,我們提出了一種新型態的推薦演算法模型,以深度強化學習做為基礎,定義其架構及相關參數,根據不同使用者點擊不同電影並產生的評分,進行對使用者的個性化推薦,搭配線上模擬環境進行預訓練,且針對在大量的電影系統下的問題進行改良,期望提升效能並做出良好的推薦。
在實驗部分,本研究採用開源資料集MovieLens進行電影推薦,並以獎勵分數、點擊率、標準化折現累積收益、Top-k準確率及回合平均步數五個指標來評估模型好壞,實驗設計包括評估Deep Q Network與Dueling Deep Q Network架構的表現,調整不同實驗參數及動作選擇方法,並分別在大量離散動作空間下及冷啟動問題下進行實驗,與不同協同過濾的推薦方法比較結果,最後測試非訓練集中使用者的推薦。
實驗結果中,在大量動作下與傳統的DQN演算法比較我們的方法能在早期收斂且提升20倍以上的速度,與user based、item based的knn以及Bayesian Personalized Ranking比較顯示,在多項指標中表現較佳且能動態更新模型,此方法即使在系統完全沒見過的使用者下也能給出不錯的推薦,在驗證和測試資料集中都沒有過度擬合的問題,證明此模型的通用性。
In this paper, we proposed a recommendation algorithm which is based on deep reinforcement learning, to define its architecture and related parameters. According to different users click on different movies and their ratings, personalized recommendations for users. In addition to combining the characteristics of different recommendation algorithms, such as the advantages of collaborative filtering and content-based recommendation. It is improved for problems in large discrete action spaces of reinforcement learning, hoping to improve system performance and make good recommend.
Through the experimental results, we found that Comparison with DQN algorithm in a large number of actions, our method can converge at an early epoch and increase the speed by more than 20 times. Comparison with other commonly used recommendation algorithm shows that it performs better in multiple indicators and can dynamically update the model. The method can give good recommendations even for users who have never seen the system at all. There is no overfitting problem in the verification and test dataset, proving the universality of this model.
[1] D. Reinsel, J. Gantz, and J. Rydning, “The digitization of the world from edge to core,”IDC White Paper, 2018.
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems,pp. 1097–1105, 2012.
[3] S. Hochreiter and J. Schmidhuber, “Long shortterm memory,” Neural computation,vol. 9, no. 8, pp. 1735–1780, 1997.
[4] R. S. Sutton, A. G. Barto, et al., Introduction to reinforcement learning, vol. 135. MIT press Cambridge, 1998.
[5] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, p. 484, 2016.
[6] P. Resnick and H. R. Varian, “Recommender systems,” Communications of the ACM,vol. 40, no. 3, pp. 56–58, 1997.
[7] D. P. Bertsekas, D. P. Bertsekas, D. P. Bertsekas, and D. P. Bertsekas, Dynamic programming and optimal control, vol. 1. Athena scientific Belmont, MA, 1995.
[8] M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming.John Wiley & Sons, 2014.
[9] R. S. Sutton, “Generalization in reinforcement learning: Successful examples using sparse coarse coding,” in Advances in neural information processing systems, pp. 1038–1044, 1996.
[10] A. Harutyunyan, M. G. Bellemare, T. Stepleton, and R. Munos, “Q (λ) with offpolicy corrections,” in International Conference on Algorithmic Learning Theory, pp. 305–320, Springer, 2016.
[11] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” 2014.
[12] C. J. Watkins and P. Dayan, “Qlearning,” Machine learning, vol. 8, no. 34, pp. 279–292, 1992.
[13] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves,M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., “Humanlevel control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
[14] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
[15] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in International conference on machine learning, pp. 1889–1897, 2015.
[16] T.C. Wu, S.Y. Tseng, C.F. Lai, C.Y. Ho, and Y.H. Lai, “Navigating assistance system for quadcopter with deep reinforcement learning,” in 2018 1st International Cognitive Cities Conference (IC3), pp. 16–19, IEEE, 2018.
[17] D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly,M. Kalakrishnan, V. Vanhoucke, et al., “Qtopt: Scalable deep reinforcement learning for visionbased robotic manipulation,” arXiv preprint arXiv:1806.10293, 2018.
[18] A. Kendall, J. Hawke, D. Janz, P. Mazur, D. Reda, J.M. Allen, V.D. Lam, A. Bewley,and A. Shah, “Learning to drive in a day,” in 2019 International Conference on Robotics and Automation (ICRA), pp. 8248–8254, IEEE, 2019.
[19] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” arXiv preprint arXiv:1606.01540, 2016.
[20] W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,” The bulletin of mathematical biophysics, vol. 5, no. 4, pp. 115–133, 1943.
[21] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” arXiv preprint arXiv:1510.00149, 2015.
[22] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inceptionv4, inceptionresnet and the impact of residual connections on learning,” in Thirtyfirst AAAI conference on artificial intelligence, 2017.
[23] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587, 2014.
[24] R. Girshick, “Fast rcnn,” in Proceedings of the IEEE international conference on computer vision, pp. 1440–1448, 2015.
[25] S. Ren, K. He, R. Girshick, and J. Sun, “Faster rcnn: Towards realtime object detection with region proposal networks,” in Advances in neural information processing systems,pp. 91–99, 2015.
[26] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask rcnn,” in Proceedings of the IEEE international conference on computer vision, pp. 2961–2969, 2017.
[27] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified,realtime object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016.
[28] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555, 2014.
[29] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, pp. 3104–3112, 2014.
[30] A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger,S. Satheesh, S. Sengupta, A. Coates, et al., “Deep speech: Scaling up endtoend speech recognition,” arXiv preprint arXiv:1412.5567, 2014.
[31] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neural image caption generator,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156–3164, 2015.
[32] M. Hausknecht and P. Stone, “Deep recurrent qlearning for partially observable mdps,”in 2015 AAAI Fall Symposium Series, 2015.
[33] M. J. Pazzani and D. Billsus, “Contentbased recommendation systems,” in The adaptive web, pp. 325–341, Springer, 2007.
[34] G. Salton and C. Buckley, “Termweighting approaches in automatic text retrieval,”Information processing & management, vol. 24, no. 5, pp. 513–523, 1988.
[35] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[36] N. S. Altman, “An introduction to kernel and nearestneighbor nonparametric regression,” The American Statistician, vol. 46, no. 3, pp. 175–185, 1992.
[37] J. R. Quinlan, “Induction of decision trees,” Machine learning, vol. 1, no. 1, pp. 81–106,1986.
[38] I. Rish et al., “An empirical study of the naive bayes classifier,” in IJCAI 2001 workshop on empirical methods in artificial intelligence, vol. 3, pp. 41–46, 2001.
[39] J. Benesty, J. Chen, Y. Huang, and I. Cohen, “Pearson correlation coefficient,” in Noise reduction in speech processing, pp. 1–4, Springer, 2009.
[40] T. Hofmann, “Probabilistic latent semantic indexing,” in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 50–57, 1999.
[41] F. V. Jensen et al., An introduction to Bayesian networks, vol. 210. UCL press London,1996.
[42] G. H. Golub and C. Reinsch, “Singular value decomposition and least squares solutions,” in Linear Algebra, pp. 134–151, Springer, 1971.
[43] R. Agrawal, R. Srikant, et al., “Fast algorithms for mining association rules,” in Proc. 20th int. conf. very large data bases, VLDB, vol. 1215, pp. 487–499, 1994.
[44] J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation,”ACM sigmod record, vol. 29, no. 2, pp. 1–12, 2000.
[45] R. Burke, “Knowledgebased recommender systems,” Encyclopedia of library and information systems, vol. 69, no. Supplement 32, pp. 175–186, 2000.
[46] R. Burke, “Hybrid recommender systems: Survey and experiments,” User modeling and useradapted interaction, vol. 12, no. 4, pp. 331–370, 2002.
[47] S. Rendle, C. Freudenthaler, Z. Gantner, and L. SchmidtThieme, “Bpr: Bayesian personalized ranking from implicit feedback,” arXiv preprint arXiv:1205.2618, 2012.
[48] R. He and J. McAuley, “Vbpr: visual bayesian personalized ranking from implicit feedback,” in Thirtieth AAAI Conference on Artificial Intelligence, 2016.
[49] W. Pan and L. Chen, “Gbpr: group preference based bayesian personalized ranking for oneclass collaborative filtering,” in TwentyThird International Joint Conference on Artificial Intelligence, 2013.
[50] S. Stiebellehner, J. Wang, and S. Yuan, “Learning continuous user representations through hybrid filtering with doc2vec,” arXiv preprint arXiv:1801.00215, 2017.
[51] F. Strub, R. Gaudel, and J. Mary, “Hybrid recommender system based on autoencoders,” in Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, pp. 11–16, 2016.
[52] V. Bellini, V. W. Anelli, T. Di Noia, and E. Di Sciascio, “Autoencoding user ratings via knowledge graphs in recommendation scenarios,” in Proceedings of the 2nd Workshop on Deep Learning for Recommender Systems, pp. 60–66, 2017.
[53] W. Zhao, H. Chai, B. Wang, J. Ye, M. Yang, Z. Zhao, and X. Chen, “Leveraging long and shortterm information in contentaware movie recommendation,” arXiv preprint arXiv:1712.09059, 2017.
[54] X. Zhao, C. Gu, H. Zhang, X. Liu, X. Yang, and J. Tang, “Deep reinforcement learning for online advertising in recommender systems,” arXiv preprint arXiv:1909.03602,2019.
[55] G. Zheng, F. Zhang, Z. Zheng, Y. Xiang, N. J. Yuan, X. Xie, and Z. Li, “Drn: A deep reinforcement learning framework for news recommendation,” in Proceedings of the 2018 World Wide Web Conference, pp. 167–176, 2018.
[56] X. Zhao, L. Zhang, Z. Ding, L. Xia, J. Tang, and D. Yin, “Recommendations with negative feedback via pairwise deep reinforcement learning,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,pp. 1040–1048, 2018.
[57] G. DulacArnold, R. Evans, H. van Hasselt, P. Sunehag, T. Lillicrap, J. Hunt, T. Mann,T. Weber, T. Degris, and B. Coppin, “Deep reinforcement learning in large discrete action spaces,” arXiv preprint arXiv:1512.07679, 2015.
[58] X. Zhao, L. Zhang, L. Xia, Z. Ding, D. Yin, and J. Tang, “Deep reinforcement learning for listwise recommendations,” arXiv preprint arXiv:1801.00209, 2017.
[59] S. Choi, H. Ha, U. Hwang, C. Kim, J.W. Ha, and S. Yoon, “Reinforcement learning based recommender system using biclustering technique,” arXiv preprint arXiv:1801.05532, 2018.
[60] H. Chen, X. Dai, H. Cai, W. Zhang, X. Wang, R. Tang, Y. Zhang, and Y. Yu, “Largescale interactive recommendation with treestructured policy gradient,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3312–3320, 2019.
[61] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double qlearning,” in Thirtieth AAAI conference on artificial intelligence, 2016.
[62] Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and N. De Freitas, “Dueling network architectures for deep reinforcement learning,” arXiv preprint arXiv:1511.06581, 2015.
[63] R. Bellman, “Dynamic programming,” Science, vol. 153, no. 3731, pp. 34–37, 1966.
[64] J. Vig, S. Sen, and J. Riedl, “The tag genome: Encoding community knowledge to support novel interaction,” ACM Transactions on Interactive Intelligent Systems (TiiS),vol. 2, no. 3, pp. 1–44, 2012.
[65] C. J. C. H. Watkins, “Learning from delayed rewards,” 1989.
[66] J. MacQueen et al., “Some methods for classification and analysis of multivariate observations,” in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, pp. 281–297, Oakland, CA, USA, 1967.
[67] P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” Journal of computational and applied mathematics, vol. 20, pp. 53–65, 1987.
[68] F. M. Harper and J. A. Konstan, “The movielens datasets: History and context,” Acm transactions on interactive intelligent systems (tiis), vol. 5, no. 4, pp. 1–19, 2015.