| 研究生: |
張芷菱 Chang, Chih-Lin |
|---|---|
| 論文名稱: |
藉由集成教師網路之知識轉移學習策略 Ensemble2Net: Learning from Ensemble Teacher Networks via Knowledge Transfer |
| 指導教授: |
黃仁暐
Huang, Jen-Wei |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2018 |
| 畢業學年度: | 107 |
| 語文別: | 英文 |
| 論文頁數: | 45 |
| 中文關鍵詞: | 轉移學習 、教師-學生網路 、深度學習 、卷積神經網路 |
| 外文關鍵詞: | Transfer Learning, Teacher-Student Network, Deep Learning, Convolutional Neural Network |
| 相關次數: | 點閱:54 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在機器學習中,深度神經網絡(DNN)逐漸成為研究主流,因為深度學習可以學習更高階的特徵,從而進行深度的特徵學習。 然而,DNN需要大量的記憶體和訓練時間。 在近年來,提高DNN訓練的效率和有效性已成為越來越重要的研究議題。 在本文中,我們提出了一種訓練方法Ensemble2Net用以加速深度卷積神經網絡(DCNN)的訓練,並且幫助學生網路從基於DCNN的教師網路中學習知識。 我們使用一種新的算法Ensemble2Net來加速VGGnet(13/16/19)和ResNet的學習轉移。 在實驗中,我們證明Ensemble2Net技術可以幫助VGGnet和ResNet以更低的成本達到相當的準確率,其中,利用Ensemble2Net方法轉移訓練20個時期的ResNet比訓練超過170個時期的原始ResNet獲得更好的準確性,性能提升1.503倍。
In machine learning, deep neural networks (DNNs) are becoming mainstream because they can learn higher-level features and thus form deep representations. However, DNNs require a lot of memory and training time. Improving the efficiency and effectiveness of DNN training has been an increasingly important focus of research in recent years. In this paper, we propose a training method, Ensemble2Net, that can accelerate the training of deep convolutional neural networks (DCNNs), and help student networks learn knowledge from DCNN-based teacher networks. We use a novel algorithm, Ensemble2Net, to accelerate the transfer of learning in VGGnet (13/16/19), and ResNet. The results show that the Ensemble2Net technique can help VGGnet and ResNet achieve the best accuracy at lower cost than current approaches. In particular, ResNet using Ensemble2Net with 20 epochs achieves better accuracy than the original ResNet trained with more than 170 epochs, with a 1.503x speedup in performance.
[1] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” inEuropean conference on computer vision. Springer, 2014, pp. 818–833.
[2] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale imagerecognition,”arXiv preprint arXiv:1409.1556, 2014.
[3] R. K. Srivastava, K. Greff, and J. Schmidhuber, “Training very deep networks,” inAd-vances in neural information processing systems, 2015, pp. 2377–2385.
[4] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp.770–778.
[5] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied todocument recognition,”Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[6] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits innatural images with unsupervised feature learning,” inNIPS workshop on deep learningand unsupervised feature learning, vol. 2011, no. 2, 2011, p. 5.
[7] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,”Citeseer, Tech. Rep., 2009.
[8] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpa-thy, A. Khosla, M. Bernsteinet al., “Imagenet large scale visual recognition challenge,”International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.
[9] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training byreducing internal covariate shift,”arXiv preprint arXiv:1502.03167, 2015.
[10] F. N. Iandola, M. W. Moskewicz, K. Ashraf, and K. Keutzer, “Firecaffe: near-linearacceleration of deep neural network training on compute clusters,” inProceedings of theIEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2592–2600.
[11] R. Polikar, L. Upda, S. S. Upda, and V. Honavar, “Learn++: An incremental learning algorithm for supervised neural networks,” IEEE transactions on systems, man, and
cybernetics, part C (applications and reviews), vol. 31, no. 4, pp. 497–508, 2001.
[12] W. Dai, Q. Yang, G.-R. Xue, and Y. Yu, “Boosting for transfer learning,” in Proceedings of the 24th International Conference on Machine Learning, ser.
ICML ’07. New York, NY, USA: ACM, 2007, pp. 193–200. [Online]. Available:
http://doi.acm.org/10.1145/1273496.1273521
[13] B. Tan, Y. Song, E. Zhong, and Q. Yang, “Transitive transfer learning,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, ser. KDD ’15. New York, NY, USA: ACM, 2015, pp. 1155–1164. [Online].
Available: http://doi.acm.org/10.1145/2783258.2783295
[14] B. Tan, Y. Zhang, S. J. Pan, and Q. Yang, “Distant domain transfer learning,” in AAAI, 2017, pp. 2604–2610.
[15] S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, “Domain adaptation via transfer component analysis,” IEEE Transactions on Neural Networks, vol. 22, no. 2, pp. 199–210, 2011.
[16] M. Long, J.Wang, G. Ding, J. Sun, and P. S. Yu, “Transfer joint matching for unsupervised domain adaptation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1410–1417.
[17] O. Sener, H. O. Song, A. Saxena, and S. Savarese, “Learning transferrable representations for unsupervised domain adaptation,” in Advances in Neural Information Processing Systems, 2016, pp. 2110–2118.
[18] Z. Zhao, Y. Chen, J. Liu, Z. Shen, and M. Liu, “Cross-people mobile-phone based activity recognition,” in Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Volume Three, ser. IJCAI’11. AAAI Press, 2011, pp. 2545–
2550. [Online]. Available: http://dx.doi.org/10.5591/978-1-57735-516-8/IJCAI11-423
[19] M. Long and J. Wang, “Learning transferable features with deep adaptation networks,” CoRR, vol. abs/1502.02791, 2015. [Online]. Available: http://arxiv.org/abs/1502.02791
[20] E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko, “Simultaneous deep transfer across domains and tasks,” CoRR, vol. abs/1510.02192, 2015. [Online]. Available:
http://arxiv.org/abs/1510.02192
[21] L. Mihalkova, T. Huynh, and R. J. Mooney, “Mapping and revising markov logic networks for transfer learning,” in Proceedings of the 22Nd National Conference on
Artificial Intelligence - Volume 1, ser. AAAI’07. AAAI Press, 2007, pp. 608–614. [Online]. Available: http://dl.acm.org/citation.cfm?id=1619645.1619743
[22] J. Davis and P. Domingos, “Deep transfer via second-order markov logic,” in Proceedings of the 26th Annual International Conference on Machine Learning, ser. ICML ’09. New York, NY, USA: ACM, 2009, pp. 217–224. [Online]. Available: http://doi.acm.org/10.1145/1553374.1553402
[23] C. Buciluˇa, R. Caruana, and A. Niculescu-Mizil, “Model compression,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006, pp. 535–541.
[24] J. Ba and R. Caruana, “Do deep nets really need to be deep?” in Advances in neural information processing systems, 2014, pp. 2654–2662.
[25] A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio, “Fitnets:
Hints for thin deep nets,” arXiv preprint arXiv:1412.6550, 2014.
[26] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
[27] S. You, C. Xu, C. Xu, and D. Tao, “Learning from multiple teacher networks,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining. ACM, 2017, pp. 1285–1294.
[28] J. Yim, D. Joo, J. Bae, and J. Kim, “A gift from knowledge distillation: Fast optimization, network minimization and transfer learning,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, 2017.
[29] T. Chen, I. Goodfellow, and J. Shlens, “Net2net: Accelerating learning via knowledge
transfer,” arXiv preprint arXiv:1511.05641, 2015.
校內:2024-01-07公開