簡易檢索 / 詳目顯示

研究生: 張芷菱
Chang, Chih-Lin
論文名稱: 藉由集成教師網路之知識轉移學習策略
Ensemble2Net: Learning from Ensemble Teacher Networks via Knowledge Transfer
指導教授: 黃仁暐
Huang, Jen-Wei
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2018
畢業學年度: 107
語文別: 英文
論文頁數: 45
中文關鍵詞: 轉移學習教師-學生網路深度學習卷積神經網路
外文關鍵詞: Transfer Learning, Teacher-Student Network, Deep Learning, Convolutional Neural Network
相關次數: 點閱:54下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在機器學習中,深度神經網絡(DNN)逐漸成為研究主流,因為深度學習可以學習更高階的特徵,從而進行深度的特徵學習。 然而,DNN需要大量的記憶體和訓練時間。 在近年來,提高DNN訓練的效率和有效性已成為越來越重要的研究議題。 在本文中,我們提出了一種訓練方法Ensemble2Net用以加速深度卷積神經網絡(DCNN)的訓練,並且幫助學生網路從基於DCNN的教師網路中學習知識。 我們使用一種新的算法Ensemble2Net來加速VGGnet(13/16/19)和ResNet的學習轉移。 在實驗中,我們證明Ensemble2Net技術可以幫助VGGnet和ResNet以更低的成本達到相當的準確率,其中,利用Ensemble2Net方法轉移訓練20個時期的ResNet比訓練超過170個時期的原始ResNet獲得更好的準確性,性能提升1.503倍。

    In machine learning, deep neural networks (DNNs) are becoming mainstream because they can learn higher-level features and thus form deep representations. However, DNNs require a lot of memory and training time. Improving the efficiency and effectiveness of DNN training has been an increasingly important focus of research in recent years. In this paper, we propose a training method, Ensemble2Net, that can accelerate the training of deep convolutional neural networks (DCNNs), and help student networks learn knowledge from DCNN-based teacher networks. We use a novel algorithm, Ensemble2Net, to accelerate the transfer of learning in VGGnet (13/16/19), and ResNet. The results show that the Ensemble2Net technique can help VGGnet and ResNet achieve the best accuracy at lower cost than current approaches. In particular, ResNet using Ensemble2Net with 20 epochs achieves better accuracy than the original ResNet trained with more than 170 epochs, with a 1.503x speedup in performance.

    中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation and Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 Instance based Transfer Learning . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 Feature based Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.3 Parameter/Model based Transfer Learning . . . . . . . . . . . . . . . . . 7 2.1.4 Relation Based Transfer Learning . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Teacher-Student Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Net2Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 Convolutional Neural Network(CNN) . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4.1 Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4.2 VGGnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4.3 ResNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.4 DenseNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 General CNN Learning Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.1 Training Teacher Networks. . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.2 Transfer of Knowledge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.3 Fine-Tuning the Student Network. . . . . . . . . . . . . . . . . . . . . . 23 4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.1 Environment and datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.1.1 CIFAR−10 & CIFAR−100. . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.1.2 Tiny ImageNet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2.1 Standard CNN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2.2 VGGnet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2.3 ResNet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2.4 DenseNet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.3 Pilot Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.3.1 Using uncategorized sub-dataset to train teacher networks. . . . . . . . . 31 4.3.2 Using similar classes to train teacher networks. . . . . . . . . . . . . . . . 31 4.3.3 Decrease teacher networks training epochs. . . . . . . . . . . . . . . . . . 32 4.3.4 Deciding the number of teacher networks. . . . . . . . . . . . . . . . . . 35 4.4 Performance Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.4.1 Training Teacher Networks. . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.4.2 Transferring Knowledge. . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    [1] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” inEuropean conference on computer vision. Springer, 2014, pp. 818–833.
    [2] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale imagerecognition,”arXiv preprint arXiv:1409.1556, 2014.
    [3] R. K. Srivastava, K. Greff, and J. Schmidhuber, “Training very deep networks,” inAd-vances in neural information processing systems, 2015, pp. 2377–2385.
    [4] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp.770–778.
    [5] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied todocument recognition,”Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
    [6] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits innatural images with unsupervised feature learning,” inNIPS workshop on deep learningand unsupervised feature learning, vol. 2011, no. 2, 2011, p. 5.
    [7] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,”Citeseer, Tech. Rep., 2009.
    [8] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpa-thy, A. Khosla, M. Bernsteinet al., “Imagenet large scale visual recognition challenge,”International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.
    [9] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training byreducing internal covariate shift,”arXiv preprint arXiv:1502.03167, 2015.
    [10] F. N. Iandola, M. W. Moskewicz, K. Ashraf, and K. Keutzer, “Firecaffe: near-linearacceleration of deep neural network training on compute clusters,” inProceedings of theIEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2592–2600.
    [11] R. Polikar, L. Upda, S. S. Upda, and V. Honavar, “Learn++: An incremental learning algorithm for supervised neural networks,” IEEE transactions on systems, man, and
    cybernetics, part C (applications and reviews), vol. 31, no. 4, pp. 497–508, 2001.
    [12] W. Dai, Q. Yang, G.-R. Xue, and Y. Yu, “Boosting for transfer learning,” in Proceedings of the 24th International Conference on Machine Learning, ser.
    ICML ’07. New York, NY, USA: ACM, 2007, pp. 193–200. [Online]. Available:
    http://doi.acm.org/10.1145/1273496.1273521
    [13] B. Tan, Y. Song, E. Zhong, and Q. Yang, “Transitive transfer learning,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data
    Mining, ser. KDD ’15. New York, NY, USA: ACM, 2015, pp. 1155–1164. [Online].
    Available: http://doi.acm.org/10.1145/2783258.2783295
    [14] B. Tan, Y. Zhang, S. J. Pan, and Q. Yang, “Distant domain transfer learning,” in AAAI, 2017, pp. 2604–2610.
    [15] S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, “Domain adaptation via transfer component analysis,” IEEE Transactions on Neural Networks, vol. 22, no. 2, pp. 199–210, 2011.
    [16] M. Long, J.Wang, G. Ding, J. Sun, and P. S. Yu, “Transfer joint matching for unsupervised domain adaptation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1410–1417.
    [17] O. Sener, H. O. Song, A. Saxena, and S. Savarese, “Learning transferrable representations for unsupervised domain adaptation,” in Advances in Neural Information Processing Systems, 2016, pp. 2110–2118.
    [18] Z. Zhao, Y. Chen, J. Liu, Z. Shen, and M. Liu, “Cross-people mobile-phone based activity recognition,” in Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Volume Three, ser. IJCAI’11. AAAI Press, 2011, pp. 2545–
    2550. [Online]. Available: http://dx.doi.org/10.5591/978-1-57735-516-8/IJCAI11-423
    [19] M. Long and J. Wang, “Learning transferable features with deep adaptation networks,” CoRR, vol. abs/1502.02791, 2015. [Online]. Available: http://arxiv.org/abs/1502.02791
    [20] E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko, “Simultaneous deep transfer across domains and tasks,” CoRR, vol. abs/1510.02192, 2015. [Online]. Available:
    http://arxiv.org/abs/1510.02192
    [21] L. Mihalkova, T. Huynh, and R. J. Mooney, “Mapping and revising markov logic networks for transfer learning,” in Proceedings of the 22Nd National Conference on
    Artificial Intelligence - Volume 1, ser. AAAI’07. AAAI Press, 2007, pp. 608–614. [Online]. Available: http://dl.acm.org/citation.cfm?id=1619645.1619743
    [22] J. Davis and P. Domingos, “Deep transfer via second-order markov logic,” in Proceedings of the 26th Annual International Conference on Machine Learning, ser. ICML ’09. New York, NY, USA: ACM, 2009, pp. 217–224. [Online]. Available: http://doi.acm.org/10.1145/1553374.1553402
    [23] C. Buciluˇa, R. Caruana, and A. Niculescu-Mizil, “Model compression,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006, pp. 535–541.
    [24] J. Ba and R. Caruana, “Do deep nets really need to be deep?” in Advances in neural information processing systems, 2014, pp. 2654–2662.
    [25] A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio, “Fitnets:
    Hints for thin deep nets,” arXiv preprint arXiv:1412.6550, 2014.
    [26] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
    [27] S. You, C. Xu, C. Xu, and D. Tao, “Learning from multiple teacher networks,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and
    Data Mining. ACM, 2017, pp. 1285–1294.
    [28] J. Yim, D. Joo, J. Bae, and J. Kim, “A gift from knowledge distillation: Fast optimization, network minimization and transfer learning,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, 2017.
    [29] T. Chen, I. Goodfellow, and J. Shlens, “Net2net: Accelerating learning via knowledge
    transfer,” arXiv preprint arXiv:1511.05641, 2015.

    無法下載圖示 校內:2024-01-07公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE