| 研究生: |
張孟淮 Chang, Meng-Huai |
|---|---|
| 論文名稱: |
利用P2P網路實現分散式矩陣分解法-以Chord為例 Utilizing P2P network to implement distributed matrix factorization — Using Chord as an Example |
| 指導教授: |
劉任修
Liu, Ren-Shiou |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 45 |
| 中文關鍵詞: | 矩陣分解法 、P2P網路 、分散式矩陣分解法 、分散式儲存 |
| 外文關鍵詞: | Distributed matrix factorization, Distributed storage |
| 相關次數: | 點閱:64 下載:10 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
矩陣分解法正是構築推薦系統的主要方法之一。然而,推薦系統雖然能夠輔助決策,但其計算的過程包含了大量用戶的敏感資訊,因此如何保護用戶隱私成為一個重要挑戰。另一方面,在不斷增加的用戶和商品項目下,傳統的矩陣分解法所需要的儲存成本也愈來愈龐大,終端設備上有限的儲存容量無法負擔如此大量且重複的資料,如何分擔資料儲存也成為了重大課題之一。
通過網際網路,以P2P網路模式的資料分享機制得以讓大量的資料(用戶潛在特徵向量、項目潛在特徵向量)不見得要儲存在同一個設備中,在需要使用時透過網路傳輸即可獲得必需的資料。另外,與集中式儲存資料的方式不同,分散式儲存的架構讓特定人士難以竄改項目的潛在特徵,藉此保有資料完整性。本研究透過P2P網路與結合應用於推薦系統中的矩陣分解法,希望在保有相似精準度的情況下,保護用戶隱私的同時得以分散重複資料的儲存,減少終端設備記憶體的負擔,且藉此維護資料完整性。
最後本研究的實驗得知,在將矩陣分解法之資料分散儲存,其預測能力與其他未分散儲存之模型差異不大,約介於-4.7%至+3.9%之間。另外,在建立模型的過程中,所需的網路傳輸成本會遠小於其它分散式矩陣分解法至少四倍,因此此方法也不失為一種保護隱私與資料完整性的一種選擇。
Matrix factorization's computational process involved contains a large amount of sensitive user information, making the protection of user privacy a significant challenge. With the increasing number of users and items, the storage cost required by traditional matrix factorization methods has become increasingly massive.
This study combines P2P networking with matrix factorization methods applied in recommendation systems, aiming to protect user privacy while reducing the storage burden of repetitive data and maintaining data integrity. The experiments conducted in this research revealed that when storing data using a distributed approach for matrix factorization, The predictive ability is not much different from other models without decentralized storage, ranging from -4.7% to +3.9%. The required network transmission cost during model building is significantly reduced by at least four times compared to other distributed matrix factorization methods. This approach can be considered as a choice for protecting privacy and ensuring data integrity.
Ammad-Ud-Din, M., Ivannikova, E., Khan, S. A., Oyomno, W., Fu, Q., Tan, K. E., & Flanagan, A. (2019). Federated Collaborative Filtering for Privacy-preserving Personalized Recommendation System. ArXiv, abs/1901.09888.
Chai, D., Wang, L., Chen, K., & Yang, Q. (2021). Secure Federated Matrix Factorization. IEEE Intelligent Systems, 36(5), 11–20.
Daswani, N., Garcia-Molina, H., & Yang, B. (2003). Open Problems in Data-sharing Peer-to-peer Systems. In International Conference on Database Theory (pp. 1–15).
Duriakova, E., Tragos, E. Z., Smyth, B., Hurley, N., Pe ̃na, F. J., Symeonidis, P., . . . Lawlor, A. (2019). PDMFRec: A Decentralised Matrix Factorisation with Tunable User-centric Privacy. In Proceedings of the 13th ACM Conference on Recommender Systems (pp. 457–461)
Garces-Erice, L., Biersack, E. W., Ross, K. W., Felber, P. A., & Urvoy-Keller, G. (2003). Hierarchical Peer-to-peer Systems. Parallel Processing Letters, 13(04), 643–657.
Gemulla, R., Nijkamp, E., Haas, P. J., & Sismanis, Y. (2011). Large-scale Matrix Factorization with Distributed Stochastic Gradient Descent. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 69–77).
Koren, Y. (2008). Factorization Meets the Neighborhood: A Multifaceted Collaborative Filtering Model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 426–434).
Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix Factorization Techniques for Recommender Systems. In Computer (p. 30-37).
Li, F., Wu, B., Xu, L., Shi, C., & Shi, J. (2014). A Fast Distributed Stochastic Gradient Descent Algorithm for Matrix Factorization. In Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (pp. 77–87).
Liu, C., Yang, H.-c., Fan, J., He, L.-W., & Wang, Y.-M. (2010). Distributed Non-negative Matrix Factorization for Web-Scale Dyadic Data Analysis on Mapreduce. In Proceedings of the 19th International Conference on World Wide Web (pp. 681– 690).
Liu, P., Fan, Y., Huang, K., & Huang, G. (2020). Super Peer-based P2P VoD Architecture for Supporting Multiple Terminals. In International Conference of Pioneering Computer Scientists, Engineers and Educators (pp. 389–404).
McMahan, H. B., Moore, E., Ramage, D., & y Arcas, B. A. (2016). Federated Learning of Deep Networks Using Model Averaging. CoRR, abs/1602.05629.
Ratnasamy, S., Francis, P., Handley, M., Karp, R., & Shenker, S. (2001). A Scalable Content-addressable Network. In Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (pp.161–172).
Regulation, G. D. P. (2018). General Data Protection Regulation (GDPR). Intersoft Consulting, Accessed in October, 24(1).
Rowstron, A., & Druschel, P. (2001a). Pastry: Scalable, Decentralized Object Location, and Routing for Large-scale Peer-to-peer Systems. In IFIP/ACM International Conference on Distributed Systems Platforms and Open Distributed Processing (pp. 329–350).
Rowstron, A., & Druschel, P. (2001b). Storage Management and Caching in PAST, a Large-scale, Persistent Peer-to-peer Storage Utility. In Proceedings of the Eighteenth ACM Symposium on Operating Systems Principles (pp. 188–201).
Stoica, I., Morris, R., Liben-Nowell, D., Karger, D. R., Kaashoek, M. F., Dabek, F., & Balakrishnan, H. (2003). Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications. IEEE/ACM Transactions on Networking, 11(1), 17–32.
Vallet, D., Friedman, A., & Berkovsky, S. (2014). Matrix Factorization without User Data Retention. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 569–580).
Zhao, B. Y., Huang, L., Stribling, J., Rhea, S. C., Joseph, A. D., & Kubiatowicz, J. D. (2004). Tapestry: A Resilient Global-scale Overlay for Service Deployment. IEEE Journal on Selected Areas in Communications, 22(1), 41–53.