| 研究生: | 方綵暄 Fang, Tsai-Hsuan | 
|---|---|
| 論文名稱: | 基於 P2P 網路與矩陣分解法構建去中心化之推薦系統 Building a Decentralized Recommender System Based on P2P Network and Matrix Factorization | 
| 指導教授: | 劉任修 Liu, Ren-Shiou | 
| 學位類別: | 碩士 Master | 
| 系所名稱: | 管理學院 - 資訊管理研究所 Institute of Information Management | 
| 論文出版年: | 2023 | 
| 畢業學年度: | 111 | 
| 語文別: | 中文 | 
| 論文頁數: | 72 | 
| 中文關鍵詞: | 推薦系統 、去中心化矩陣分解法 、Federated Learning 、Gossip Learning 、P2P 網路 | 
| 外文關鍵詞: | Recommender System, Decentralized Matrix Factorization, Federated Learning, Gossip Learning, P2P Network | 
| 相關次數: | 點閱:155 下載:16 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
現今的資訊科技發展快速,所以人們更依賴網路搜尋而來的資訊,為了有效率地讓使用者獲得所需的資訊,推薦系統 (Recommender System) 擔任了一個重要的角色。常使用的方法之一是協同過濾 (Collaborative Filtering),以使用者的興趣為基礎,計算使用者之間的相似程度或商品的相似程度來進行推薦,但當資料量過多的時候,可能造成資料稀疏的問題。資料的規模愈大,稀疏的問題就愈嚴重,而矩陣分解法 (Matrix Factorization) 能減輕這個問題。
因為傳統的矩陣分解法是集中式的架構,所以存在以下幾個問題:(1) 計算和儲存成本高 (2) 隱私問題 (3) 模型訓練效率低 (4) 彈性弱。為了緩解這些問題,有學者提出了去中心化的矩陣分解法。因此,本研究的目的為透過去中心化的矩陣分解法來進行推薦,結合 Federated Learning 和 Gossip Learning 兩種方法,提出了一個名為 FedGossip Learning 的模型,一方面提升收斂速度,另一方面保護使用者隱私。此外,為了降低訊息傳輸量,選擇基於對等式 (peer-to-peer, P2P) 網路,將使用者喜好的資料分群,在集群內以 Federated Learning 的方式來進行,而集群間以集群中心為代表,透過 Gossip Learning 的方法來執行。
實驗結果顯示,FedGossip Learning 模型與 Gossip Learning 模型採用一樣參數組合的情況下,FedGossip Learning 模型其均方根誤差 (Root Mean Square Error, RMSE) 較小於 Gossip Learning 模型。就收斂速度與訊息傳輸量而言,FedGossip Learning 模型均有顯著的改善。由此可見,本研究在不影響推薦品質的情況下,能同時提升收斂速度、保護使用者隱私與降低訊息傳輸量。
The development of information technology is rapidly advancing nowadays, and this leads people to rely more on information obtained through internet searches. To efficiently provide users with the information they need, recommender systems play a crucial role. One widely used method is collaborative filtering, which recommends items based on the similarity of users' interests or item attributes. However, dealing with large datasets can lead to the problem of data sparsity. As the data scale increases, this sparsity issue becomes more obvious. Matrix factorization offers a solution to alleviate this problem.
Traditional centralized matrix factorization approaches have several limitations, including high computation and storage costs, privacy concerns, low training efficiency, and inflexibility. To address these challenges, researchers have proposed decentralized matrix factorization methods. Hence, the objective of this research is to employ decentralized matrix factorization techniques for enhancing recommender systems. This is achieved by synergistically merging two methodologies: Federated Learning and Gossip Learning. Introducing a novel model named FedGossip Learning, our aim is to expedite convergence speed while ensuring user privacy protection. Furthermore, in order to minimize message transmission quantity, we adopt a peer-to-peer (P2P) network and organize user preference data into clusters. Within each cluster, Federated Learning is employed, while communication between clusters is facilitated through cluster centers, which serve as representatives and implement the Gossip Learning approach.
The experimental results demonstrate that the FedGossip Learning model outperforms the Gossip Learning model in terms of convergence speed and message transmission quantity while using the same set of parameters, and the Root Mean Square Error (RMSE) of the FedGossip Learning model is slightly lower than that of the Gossip Learning model. These results indicate that in this study, without compromising the quality of recommendations, it is possible to simultaneously enhance convergence speed, protect user privacy, and reduce message transmission quantity.
Canny, J. (2002). Collaborative filtering with privacy via factor analysis. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 238–245.
Chen, C., Liu, Z., Zhao, P., Zhou, J., and Li, X. (2018). Privacy preserving point-of-interest recommendation using decentralized matrix factorization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
Chu, C.-T., Kim, S., Lin, Y.-A., Yu, Y., Bradski, G., Olukotun, K., and Ng, A. (2006). Map-reduce for machine learning on multicore. Advances in Neural Information Processing Systems, 19.
Datta, S., Giannella, C., and Kargupta, H. (2008). Approximate distributed k-means clustering over a peer-to-peer network. IEEE Transactions on Knowledge and Data Engineering, 21(10):1372–1388.
De Carvalho, F. d. A. and Lechevallier, Y. (2009). Partitional clustering algorithms for symbolic interval data based on single adaptive distances. Pattern Recognition, 42(7):1223–1236.
Duriakova, E., Huáng, W., Tragos, E., Lawlor, A., Smyth, B., Geraci, J., and Hurley, N. (2020). An algorithmic framework for decentralised matrix factorisation. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 307–323.
Duriakova, E., Tragos, E. Z., Smyth, B., Hurley, N., Peña, F. J., Symeonidis, P., Geraci, J., and Lawlor, A. (2019). PDMFRec: A decentralised matrix factorisation with tunable user-centric privacy. In Proceedings of the 13th ACM Conference on Recommender Systems, pages 457–461.
Funk, S. (2006). Netflix Update: Try This at Home. https://sifter.org/~simon/journal/20061211.
He, X., Liao, L., Zhang, H., Nie, L., Hu, X., and Chua, T.-S. (2017). Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web, pages 173–182.
Hegedűs, I., Danner, G., and Jelasity, M. (2019). Decentralized recommendation based on matrix factorization: A comparison of gossip and federated learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 317–332. Springer.
Jin, R., Goswami, A., and Agrawal, G. (2006). Fast and exact out-of-core and distributed k-means clustering. Knowledge and Information Systems, 10(1):17–40.
Kim, D. and Yum, B.-J. (2005). Collaborative filtering based on iterative principal component analysis. Expert Systems with Applications, 28(4):823–830.
Konečnỳ, J., McMahan, H. B., Yu, F. X., Richtárik, P., Suresh, A. T., and Bacon, D. (2016). Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492.
Koren, Y. (2008). Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 426–434.
Koren, Y. (2010). Factor in the neighbors: Scalable and accurate collaborative filtering. ACM Transactions on Knowledge Discovery from Data (TKDD), 4(1):1–24.
Lai, C.-H. and Tseng, K.-C. (2022). Applying deep learning models to analyze users’ aspects, sentiment, and semantic features for product recommendation. Applied Sciences, 12(4):2118.
Ling, Q., Xu, Y., Yin, W., and Wen, Z. (2012). Decentralized low-rank matrix completion. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2925–2928. IEEE.
Long, J., Chen, T., Nguyen, Q. V. H., and Yin, H. (2023). Decentralized collaborative learning framework for next POI recommendation. ACM Transactions on Information Systems, 41(3):1–25.
Ma, H., Yang, H., Lyu, M. R., and King, I. (2008). SoRec: Social recommendation using probabilistic matrix factorization. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, pages 931–940.
McAuley, J. and Leskovec, J. (2013). Hidden factors and hidden topics: Understanding rating dimensions with review text. In Proceedings of the 7th ACM Conference on Recommender Systems, pages 165–172.
McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pages 1273–1282. PMLR.
McSherry, F. and Mironov, I. (2009). Differentially private recommender systems: Building privacy into the netflix prize contenders. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 627–636.
Mnih, A. and Salakhutdinov, R. R. (2007). Probabilistic matrix factorization. Advances in Neural Information Processing Systems, 20.
Nanda, S. J. and Panda, G. (2014). A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm and Evolutionary Computation, 16:1–18.
Narayanan, A. and Shmatikov, V. (2008). Robust de-anonymization of large sparse datasets. In 2008 IEEE Symposium on Security and Privacy (sp 2008), pages 111–125. IEEE.
Nikolaenko, V., Ioannidis, S., Weinsberg, U., Joye, M., Taft, N., and Boneh, D. (2013). Privacy-preserving matrix factorization. In Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, pages 801–812.
Ormándi, R., Hegedűs, I., and Jelasity, M. (2011). Efficient P2P ensemble learning with linear models on fully distributed data. CoRR abs/1109.1396.
Ormándi, R., Hegedűs, I., and Jelasity, M. (2013). Gossip learning with linear models on fully distributed data. Concurrency and Computation: Practice and Experience, 25(4):556–571.
Paterek, A. (2007). Improving regularized singular value decomposition for collaborative filtering. In Proceedings of KDD Cup and Workshop, volume 2007, pages 5–8.
Pena, J. M., Lozano, J. A., and Larranaga, P. (1999). An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognition Letters, 20(10):1027–1040.
Piatetsky, G. (2007). Interview with Simon Funk. ACM SIGKDD Explorations Newsletter, 9(1):38–40.
Qi, X. (2017). Intro Distributed Deep Learning. https://xiandong79.github.io/Intro-Distributed-Deep-Learning.
Rendle, S., Freudenthaler, C., Gantner, Z., and Schmidt-Thieme, L. (2009). BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pages 452–461.
Salakhutdinov, R., Mnih, A., and Hinton, G. (2007). Restricted Boltzmann machines for collaborative filtering. In Proceedings of the 24th International Conference on Machine Learning, pages 791–798.
Takács, G., Pilászy, I., Németh, B., and Tikk, D. (2007). Major components of the gravity recommendation system. ACM SIGKDD Explorations Newsletter, 9(2):80–83.
Takács, G., Pilászy, I., Németh, B., and Tikk, D. (2008). Matrix factorization and neighbor based algorithms for the netflix prize problem. In Proceedings of the 2008 ACM Conference on Recommender Systems, pages 267–274.
Tang, Z., Shi, S., Chu, X., Wang, W., and Li, B. (2020). Communication-efficient distributed deep learning: A comprehensive survey. arXiv preprint arXiv:2003.06307.
Tsapanos, N., Tefas, A., Nikolaidis, N., and Pitas, I. (2015). A distributed framework for trimmed kernel k-means clustering. Pattern Recognition, 48(8):2685–2698.
Tsitsiklis, J., Bertsekas, D., and Athans, M. (1986). Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Transactions on Automatic Control, 31(9):803–812.
Wang, F. and Li, P. (2010). Efficient nonnegative matrix factorization with random projections. In Proceedings of the 2010 SIAM International Conference on Data Mining, pages 281–292. SIAM.
Wang, Z., Liu, X., Chang, S., Zhou, J., Qi, G.-J., and Huang, T. S. (2015). Decentralized recommender systems. arXiv preprint arXiv:1503.01647.
Weinsberg, U., Bhagat, S., Ioannidis, S., and Taft, N. (2012). BlurMe: Inferring and obfuscating user gender based on ratings. In Proceedings of the Sixth ACM Conference on Recommender Systems, pages 195–202.
Wu, H., Zhang, Z., Yue, K., Zhang, B., He, J., and Sun, L. (2018). Dual-regularized matrix factorization with deep neural networks for recommender systems. Knowledge-Based Systems, 145:46–58.
Yang, X., Steck, H., and Liu, Y. (2012). Circle-based recommendation in online social networks. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1267–1275.
Ye, Z., Cao, H., Zhang, Y., and Jia, L. (2016). Outlier factor based partitional clustering analysis with constraints discovery and representative objects generation. Neurocomputing, 173:1538–1553.
Zhao, W., Ma, H., and He, Q. (2009). Parallel k-means clustering based on mapreduce. In IEEE International Conference on Cloud Computing, pages 674–679. Springer.
Zheng, W. (2016). Matrix factorization method for decentralized recommender systems. arXiv preprint arXiv:1604.08420.