| 研究生: |
洪文麟 Hung, Wen-Lin |
|---|---|
| 論文名稱: |
深度學習應用於以影像辨識為基礎的個人化推薦系統-以服飾樣式為例 Individual Recommender System based on Image Recognition using Deep Learning- A case study on Clothing Style |
| 指導教授: |
黃悅民
Huang, Yueh-Min |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2016 |
| 畢業學年度: | 104 |
| 語文別: | 中文 |
| 論文頁數: | 71 |
| 中文關鍵詞: | 深度學習 、推薦系統 、基於內容的推薦系統 、影像辨識 、服飾分類 |
| 外文關鍵詞: | Deep learning, Recommender System, Content-based Recommender System, Image Recognition, Clothing Classification |
| 相關次數: | 點閱:216 下載:53 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於近年來網際網路科技的迅速發展,行動網路和智慧型手機的普及化,人們日常生活中所接收的資訊量與日俱增。在資訊種類繁多與資訊量爆增的情況下,能幫助使用者過濾有效資訊、進行決策的推薦系統日益受到重視,基於內容的推薦系統會找出使用者經常瀏覽或購買過的商品,以個人化的方式推薦出類似的商品提供給使用者做參考。
在物品內容方面,目前基於內容的推薦系統多以新聞、書本等文字類型資訊推薦為主,主要內容為圖像所組成的商品無法透過文字探勘或自然語言處理技術自動提取或學習商品的屬性;在使用者偏好方面,須克服自動獲取使用者偏好紀錄之問題,並且必須有適合的方式來決定使用者偏好與商品之間的關係,以確定推薦物品是否符合使用者的興趣。
緣此,本研究設計主要以內容為圖像所組成的服飾推薦為例,使用深度學習技術與框架,實作一套以影像辨識為基礎的個人化推薦系統。本系統使用公開的八萬張服飾圖片資料集及15種服飾分類樣式訓練深度學習網路,將訓練後的模型用在基於內容的推薦系統中的內容分析器、訊息學習器之實作,並使用餘弦相似度演算法完成基於內容的推薦系統中的過濾元件。本研究實作後樣式分類結果準確率為54.6%,相較於過去使用相同訓練資料進行15種服飾分類的Transfer Forest的準確度為41.36%,本研究方法在準確率方面提升約13.33%。在推薦系統實作上,使用服飾品牌Lativ與Uniqlo的商品資料庫,拍攝受測者身上的服飾圖片,共26張照片作為輸入資料,計算商品資料庫中前5名與受測者身穿服飾最相似之商品進行推薦。將推薦出的前5名結果請受測者確認後,多數受測者在前3名的推薦結果中已有滿意之服飾,茲證明本推薦系統的推薦結果應能做為大部份消費者選購服飾時的參考,期未來能為實體店面帶來銷售量增長。
Recently, it is more and more difficult to make a decision because we receive a large amount of information through the Internet every day. Therefore, the recommender system becomes more and more popular. It can help users making decisions effectively by providing suitable suggestions to users, and those suggestions were processed according to users’ browsing history and transaction. However, most of the content-based recommended systems are designed for text-based content, such as news and books. The reason is that image-based content cannot be extracted automatically and cannot learn the attributes by text mining or natural language processing.
In this research, we propose a method for personal recommender system based on image recognition by deep learning. To get optimal model and parameters, the system uses the public clothing database — has over 80,000 images and 15 categories — to train the deep learning network. We create a standard content-based recommender system by using training model and similar algorithm including content analyzer, profile learners, and filter component. Our result has 54.6% accuracy rate, higher than the result of random forest by 13.3%. In this experiment, we use the product database of Uniqlo and Lativ, taking 26 pictures from 5 testers and generating a purchasing recommendation for the testers. Most of the testers commented that the top three recommended items perfect matched their needs. It shows that this system is very efficient.
[1] T. Mahmood and F. Ricci, "Improving recommender systems with adaptive conversational strategies," in Proceedings of the 20th ACM conference on Hypertext and hypermedia, 2009, pp. 73-82.
[2] P. Resnick and H. R. Varian, "Recommender systems," Communications of the ACM, vol. 40, pp. 56-58, 1997.
[3] R. Burke, "Hybrid web recommender systems," in The adaptive web, ed: Springer, 2007, pp. 377-408.
[4] D. Jannach, "Finding preferred query relaxations in content-based recommenders," in Intelligent Techniques and Tools for Novel System Architectures, ed: Springer, 2008, pp. 81-97.
[5] H. Wang, N. Wang, and D.-Y. Yeung, "Collaborative deep learning for recommender systems," in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 1235-1244.
[6] X. Wang and Y. Wang, "Improving content-based and hybrid music recommendation using deep learning," in Proceedings of the ACM International Conference on Multimedia, 2014, pp. 627-636.
[7] A. Van den Oord, S. Dieleman, and B. Schrauwen, "Deep content-based music recommendation," in Advances in Neural Information Processing Systems, 2013, pp. 2643-2651.
[8] R. Burke, "Hybrid recommender systems: Survey and experiments," User modeling and user-adapted interaction, vol. 12, pp. 331-370, 2002.
[9] G. Linden, B. Smith, and J. York, "Amazon. com recommendations: Item-to-item collaborative filtering," Internet Computing, IEEE, vol. 7, pp. 76-80, 2003.
[10] M. Balabanović and Y. Shoham, "Fab: content-based, collaborative recommendation," Communications of the ACM, vol. 40, pp. 66-72, 1997.
[11] F. Ricci, L. Rokach, and B. Shapira, Introduction to recommender systems handbook: Springer, 2011.
[12] T. M. Mitchell, "Machine learning," Machine Learning, 1997.
[13] J. J. Rocchio, "Relevance feedback in information retrieval," 1971.
[14] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl, "Evaluating collaborative filtering recommender systems," ACM Transactions on Information Systems (TOIS), vol. 22, pp. 5-53, 2004.
[15] G. Salton and M. J. McGill, "Introduction to modern information retrieval," 1986.
[16] R. Baeza-Yates and B. Ribeiro-Neto, Modern information retrieval vol. 463: ACM press New York, 1999.
[17] M. J. Pazzani and D. Billsus, "Content-based recommendation systems," in The adaptive web, ed: Springer, 2007, pp. 325-341.
[18] M. Montaner, B. López, and J. L. De La Rosa, "A taxonomy of recommender agents on the internet," Artificial intelligence review, vol. 19, pp. 285-330, 2003.
[19] F. Sebastiani, "Machine learning in automated text categorization," ACM computing surveys (CSUR), vol. 34, pp. 1-47, 2002.
[20] D. Billsus and M. J. Pazzani, "User modeling for adaptive news access," User modeling and user-adapted interaction, vol. 10, pp. 147-180, 2000.
[21] S. E. Middleton, N. R. Shadbolt, and D. C. De Roure, "Ontological user profiling in recommender systems," ACM Transactions on Information Systems (TOIS), vol. 22, pp. 54-88, 2004.
[22] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, "Item-based collaborative filtering recommendation algorithms," in Proceedings of the 10th international conference on World Wide Web, 2001, pp. 285-295.
[23] F. Seide, G. Li, and D. Yu, "Conversational Speech Transcription Using Context-Dependent Deep Neural Networks," in Interspeech, 2011, pp. 437-440.
[24] M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in Computer vision–ECCV 2014, ed: Springer, 2014, pp. 818-833.
[25] NVIDIA, “NVIDA’s Next Generation CUDATM Compute Architecture: KeplerTM GK110,” NVIDIA Cooperation.
[26] Geeks3d.com. (2010). (GPU Computing) NVIDIA CUDA Compute Capability Comparative Table – Geeks3D. [online] Available at: http://www.geeks3d.com/20100606/gpu-computing-nvidia-cuda-compute-capability-comparative-table/ [Accessed Jun. 2016].
[27] L. Bossard, M. Dantone, C. Leistner, C. Wengert, T. Quack, and L. Van Gool, "Apparel classification with style," in Asian Conference on Computer Vision, 2012, pp. 321-335.
[28] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.
[29] Cs231n.github.io. (2016). CS231n Convolutional Neural Networks for Visual Recognition. [online] Available at: http://cs231n.github.io/neural-networks-2/ [Accessed Jul. 2016].
[30] I. Sutskever, J. Martens, G. E. Dahl, and G. E. Hinton, "On the importance of initialization and momentum in deep learning," ICML (3), vol. 28, pp. 1139-1147, 2013.
[31] Slideshare.net. (2016). 一天搞懂深度學習. [online] Available at: http://www.slideshare.net/tw_dsconf/ss-62245351 [Accessed Jun. 2016].
[32] X. Glorot, A. Bordes, and Y. Bengio, "Deep Sparse Rectifier Neural Networks," in Aistats, 2011, p. 275.
[33] A. L. Maas, A. Y. Hannun, and A. Y. Ng, "Rectifier nonlinearities improve neural network acoustic models," in Proc. ICML, 2013.
[34] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," arXiv preprint arXiv:1512.03385, 2015.
[35] K. Jarrett, K. Kavukcuoglu, and Y. Lecun, "What is the best multi-stage architecture for object recognition?," in 2009 IEEE 12th International Conference on Computer Vision, 2009, pp. 2146-2153.
[36] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, "Improving neural networks by preventing co-adaptation of feature detectors," arXiv preprint arXiv:1207.0580, 2012.
[37] Wikipedia. (2016). Backpropagation. [online] Available at: https://en.wikipedia.org/wiki/Backpropagation [Accessed Jun. 2016].
[38] J. Allan, J. G. Carbonell, G. Doddington, J. Yamron, and Y. Yang, "Topic detection and tracking pilot study final report," 1998.
[39] D. Billsus and M. J. Pazzani, "User modeling for adaptive news access," User modeling and user-adapted interaction, vol. 10, pp. 147-180, 2000.
[40] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, 2009, pp. 248-255.
[41] Lativ.com.tw. (2016). 首頁 - lativ 米格國際. [online] Available at: http://www.lativ.com.tw/ [Accessed May 2016].
[42] Uniqlo.com. (2016). HOME | UNIQLO. [online] Available at: http://www.uniqlo.com/tw/ [Accessed May 2016].
[43] Y. Bengio, "Learning deep architectures for AI," Foundations and trends® in Machine Learning, vol. 2, pp. 1-127, 2009.
[44] G. E. Hinton, S. Osindero, and Y.-W. Teh, "A fast learning algorithm for deep belief nets," Neural computation, vol. 18, pp. 1527-1554, 2006.
[45] Google.com.tw. (2016). Google 搜尋趨勢 - Google 網頁搜尋的熱門度 - 全球, 2004年至今. [online] Available at: https://www.google.com.tw/trends/explore#q=%2Fm%2F0h1fn8h [Accessed Jul. 2016].
[46] Wikipedia. (2016). AlphaGo. [online] Available at: https://en.wikipedia.org/wiki/AlphaGo [Accessed Jul. 2016].
[47] Wikipedia. (2016). Deep learning. [online] Available at: https://en.wikipedia.org/wiki/Deep_learning [Accessed Jul. 2016].
[48] J. M. Baker, L. Deng, J. Glass, S. Khudanpur, C.-H. Lee, N. Morgan, et al., "Developments and directions in speech recognition and understanding, Part 1 [DSP Education]," IEEE Signal Processing Magazine, vol. 26, pp. 75-80, 2009.
[49] J. V. Bouvrie, "Hierarchical learning: Theory with applications in speech and vision," Citeseer, 2009.
[50] X. Glorot and Y. Bengio, "Understanding the difficulty of training deep feedforward neural networks," in Aistats, 2010, pp. 249-256.
[51] L. B. Y. Le Cun and L. Bottou, "Large scale online learning," Advances in neural information processing systems, vol. 16, p. 217, 2004.
[52] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, pp. 2278-2324, 1998.
[53] D. C. Ciresan, U. Meier, L. M. Gambardella, and J. Schmidhuber, "Deep, big, simple neural nets for handwritten digit recognition," Neural computation, vol. 22, pp. 3207-3220, 2010.
[54] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, et al., "Caffe: Convolutional architecture for fast feature embedding," in Proceedings of the 22nd ACM international conference on Multimedia, 2014, pp. 675-678.
[55] S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, et al., "cudnn: Efficient primitives for deep learning," arXiv preprint arXiv:1410.0759, 2014.
[56] Oosten, J. (2012). Introduction to CUDA 5.0. [online] 3D Game Engine Programming. Available at: http://www.3dgep.com/introduction-to-cuda-5-0/ [Accessed May 2016].
[57] Y. Nesterov, "A method of solving a convex programming problem with convergence rate O (1/k2)," in Soviet Mathematics Doklady, 1983, pp. 372-376.