簡易檢索 / 詳目顯示

研究生: 蔡帛融
Tsai, Po-Jung
論文名稱: 應用生成對抗學習以增進卷積神經網路之注意力遷移
Using generative adversarial learning to enhance the attention transfer in convolutional neural network
指導教授: 吳馬丁
Torbjörn E. M. Nordling
學位類別: 碩士
Master
系所名稱: 工學院 - 機械工程學系
Department of Mechanical Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 48
中文關鍵詞: 模型壓縮師生策略知識蒸餾注意力遷移對抗生成網路
外文關鍵詞: Model compression, teacher-student strategy, knowledge distillation, attention transfer, generative adversarial network
相關次數: 點閱:85下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 研究背景: 由於深度卷積神經網路的發展,深度學習被用來解決許多機器視覺上的問題,像是物體識別。然而這些深度神經網路的高運算成本使人們產生對模型壓縮的需求,進而尋找能適用於行動和邊緣裝置並接近於原本性能的小型網路。而使用深度網路(老師)訓練小型網路(學生)的師生策略已經顯示出了具有潛力的成效。為了提高學生網路的性能,已有許多研究從輸出和隱藏層傳遞知識訊息的方法,例如知識蒸餾和注意力遷移。目前用於注意力遷移的方法依賴於教師和學生的注意力圖的像素比較,其對每個像素給予相同的權重並忽略空間差異。
    研究目標: 我們的目標是透過對抗性學習最佳化對注意力圖的比較來改善學生網路的表現。
    研究方法: 生成對抗網路(GAN)已經成功用於像是圖像樣式轉換等應用。我們則使用對抗性學習來獲得鑑別器網路以評估注意力圖的品質。同時,訓練學生網路以產生注意力圖,並且學習足以欺騙鑑別器網路來進行模仿教師網路的注意力,進而達到注意力遷移。
    研究結果: 我們透過將比較之前提出的注意力遷移和知識蒸餾的方法來評估我們在CIFAR10 資料集上的方法。我們發現MiGAN-AT 在應用於弱學生網路時表現優於其他方法,而在較強大的學生網路上則表現相似。
    研究結論: 我們的結果顯示,透過使用鑑別器網路來比較學生和教師網路的注意力圖,使用師生策略進行壓縮模型時仍然可以保持良好的性能。

    Background: Deep learning has enabled the super human solution of many computer vision problems, such as object recognition, thanks to the use of deep convolutional neural
    networks. The computational cost of these deep neural networks has spurred an interest in model compression to find small and good approximations that works on mobile and edge devices. In particular, the teacher-student strategy, where a deep network (teacher) is used to train a small network (student), has shown promise. To improve the performance of the student network methods for transfer of information from the output and hidden layers have
    been investigated, such as knowledge distillation and attention transfer. Existing methods for attention transfer rely on pixel-wise comparison of the attention map of the teacher and student, which puts the same weight on each pixel and neglects spatial differences.
    Aim: We aim to improve the student network’s performance by optimizing the comparison of the attention maps through adversarial learning.
    Method: Generative adversarial networks (GANs) have with success been used, e.g., for transfer of image style. We use adversarial learning to obtain a discriminator network to evaluate the quality of attention maps. Simultaneously, the student network is trained to generate
    attention maps, which are good enough to cheat the discriminator, and hence achieve attention transfer by mimicking the attention of the teacher. We call this method mimicking generative adversarial network for attention transfer (MiGAN-AT).
    Results: We evaluate our method on the CIFAR10 dataset by comparing it to previously published methods for attention transfer and knowledge distillation. We found that MiGAN-AT
    outperforms the other methods when applied to weak student networks and perform similarly on strong student networks.
    Conclusions: Our results show that good performance can be maintained while compressing models using the teacher-student strategy by using a discriminator network to compare the attention maps of the student and teacher.

    Chinese abstract i Abstract ii Acknowledgment iv Table of Contents v List of Tables vi List of Figures vii 1 Introduction 1 1.1 Motivation and objective . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Delimitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Introduction to convolutional neural networks . . . . . . . . . . . . . . . . 2 1.4 Introduction to model compression . . . . . . . . . . . . . . . . . . . . . . 4 1.5 Introduction to generative adversarial networks . . . . . . . . . . . . . . . 6 1.6 Organization of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Theory and Methods 9 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Generative Adversarial Network . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Teacher-student strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3 Attention transfer with adversarial learning 26 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 Attention transfer with Adversarial learning . . . . . . . . . . . . . . . . . 31 3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4 Conclusions and future work 41 4.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 References 43

    Arjovsky, M., Chintala, S., and Bottou, L. (2017). Wasserstein gan. arXiv preprint arXiv:1701.07875.
    Assael, Y. M., Shillingford, B., Whiteson, S., and De Freitas, N. (2016). Lipnet: End-to-end sentence-level lipreading. arXiv preprint arXiv:1611.01599.
    Ba, J. and Caruana, R. (2014). Do deep nets really need to be deep? In Advances in neural information processing systems, pages 2654–2662.
    Barbounis, T. G., Theocharis, J. B., Alexiadis, M. C., and Dokopoulos, P. S. (2006). Longterm wind speed and power forecasting using local recurrent neural network models. IEEE
    Transactions on Energy Conversion, 21(1):273–284.
    Bengio, Y., Simard, P., Frasconi, P., and Others (1994). Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2):157–166.
    Changpinyo, S., Sandler, M., and Zhmoginov, A. (2017). The power of sparsity in convolutional neural networks. arXiv preprint arXiv:1702.06257.
    Chen, G., Choi, W., Yu, X., Han, T., and Chandraker, M. (2017a). Learning efficient object detection models with knowledge distillation. In Advances in Neural Information Processing
    Systems, pages 742–751.
    Chen, P., Sun, Z., Bing, L., and Yang, W. (2017b). Recurrent attention network on memory for aspect sentiment analysis. In Proceedings of the 2017 conference on empirical methods
    in natural language processing, pages 452–461.
    Chen, T., Goodfellow, I., and Shlens, J. (2015a). Net2net: Accelerating learning via knowledge transfer. arXiv preprint arXiv:1511.05641.
    Chen, W., Wilson, J., Tyree, S., Weinberger, K., and Chen, Y. (2015b). Compressing neural networks with the hashing trick. In International Conference on Machine Learning, pages 2285–2294.
    Cheng, Y., Wang, D., Zhou, P., and Zhang, T. (2017). A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282.
    Cheng, Y., Yu, F. X., Feris, R. S., Kumar, S., Choudhary, A., and Chang, S.-F. (2015). An exploration of parameter redundancy in deep networks with circulant projections. In Proceedings of the IEEE International Conference on Computer Vision, pages 2857–2865.
    Cireşan, D., Meier, U., and Schmidhuber, J. (2012). Multi-column Deep Neural Networks for Image Classification. arXiv preprint arXiv:1202.2745.
    Dauphin, Y. N., Fan, A., Auli, M., and Grangier, D. (2017). Language modeling with gated convolutional networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 933–941. JMLR. org.
    Deng, J., Dong, W., Socher, R., Li, L.-J., Kai Li, and Li Fei-Fei (2009). ImageNet: A largescale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern
    Recognition, pages 248–255.
    Denil, M., Shakibi, B., Dinh, L., De Freitas, N., et al. (2013). Predicting parameters in deep learning. In Advances in neural information processing systems, pages 2148–2156.
    Denton, E. L., Chintala, S., Fergus, R., et al. (2015). Deep generative image models using a laplacian pyramid of adversarial networks. In Advances in neural information processing
    systems, pages 1486–1494.
    Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., and Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in neural
    information processing systems, pages 1269–1277.
    Dziugaite, G. K., Roy, D. M., and Ghahramani, Z. (2015). Training generative neural networks via maximum mean discrepancy optimization. arXiv preprint arXiv:1505.03906.
    Gauthier, J. (2014). Conditional generative adversarial nets for convolutional face generation. Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter semester, 2014(5):2.
    Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587.
    Glorot, X. and Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 249–256.
    Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative Adversarial Networks. pages 1–9.
    Griffin, G., Holub, A., and Perona, P. (2007). Caltech-256 object category dataset.
    Gulshan, V., Peng, L., Coram, M., Stumpe, M. C., Wu, D., Narayanaswamy, A., Venugopalan, S., Widner, K., Madams, T., Cuadros, J., et al. (2016). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama, 316(22):2402–2410.
    Han, S., Mao, H., and Dally, W. J. (2015a). Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv: 1510.00149.
    Han, S., Pool, J., Tran, J., and Dally, W. (2015b). Learning both weights and connections for efficient neural network. In Advances in neural information processing systems, pages 1135–1143.
    He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
    Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the Knowledge in a Neural Network. arXiv preprint arXiv:1503.02531.
    Hu, J., Shen, L., and Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141.
    Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708.
    Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.
    Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2016). Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004.
    Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134.
    Jaderberg, M., Vedaldi, A., and Zisserman, A. (2014). Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866.
    Kingma, D. P. and Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
    Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105.
    LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541–551.
    Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690.
    Lin, M., Chen, Q., and Yan, S. (2013). Network In Network. arXiv preprint arXiv:1312.4400.
    Lu, L., Guo, M., and Renals, S. (2017). Knowledge distillation for small-footprint highway networks. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4820–4824. IEEE.
    Luo, P., Zhu, Z., Liu, Z., Wang, X., and Tang, X. (2016). Face model compression by distilling
    knowledge from neurons. In Thirtieth AAAI Conference on Artificial Intelligence.
    Mathieu, M., Couprie, C., and LeCun, Y. (2015). Deep multi-scale video prediction beyond
    mean square error. arXiv preprint arXiv:1511.05440.
    Mirza, M. and Osindero, S. (2014). Conditional generative adversarial nets. arXiv preprint
    arXiv:1411.1784.
    Mogren, O. (2016). C-rnn-gan: Continuous recurrent neural networks with adversarial training.
    arXiv preprint arXiv:1611.09904.
    Parkhi, O. M., Vedaldi, A., and Zisserman, A. (2015). Deep Face Recognition. In Procedings
    of the British Machine Vision Conference 2015, volume 1, pages 41.1–41.12. British
    Machine Vision Association.
    Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., and Efros, A. (2016a). Context encoders:
    Feature learning by inpainting. In Computer Vision and Pattern Recognition (CVPR).
    Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., and Efros, A. A. (2016b). Context
    encoders: Feature learning by inpainting. In Proceedings of the IEEE conference on computer
    vision and pattern recognition, pages 2536–2544.
    Pottie, G. J. and Kaiser, W. J. (2000). Wireless integrated network sensors. Communications
    of the ACM, 43(5):51–58.
    Quattoni, A. and Torralba, A. (2009). Recognizing indoor scenes. In 2009 IEEE Conference
    on Computer Vision and Pattern Recognition, pages 413–420. IEEE.
    Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016). You only look once: Unified,
    real-time object detection. In Proceedings of the IEEE conference on computer vision and
    pattern recognition, pages 779–788.
    Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., and Lee, H. (2016). Generative
    adversarial text to image synthesis. arXiv preprint arXiv:1605.05396.
    Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object
    detection with region proposal networks. In Advances in neural information processing
    systems, pages 91–99.
    Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., and Bengio, Y. (2014). FitNets:
    Hints for Thin Deep Nets. arXiv preprint arXiv:1412.6550.
    Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net: Convolutional networks for
    biomedical image segmentation. In International Conference on Medical image computing
    and computer-assisted intervention, pages 234–241. Springer.
    Sabour, S., Frosst, N., and Hinton, G. E. (2017). Dynamic routing between capsules. In
    Advances in neural information processing systems, pages 3856–3866.
    Salakhutdinov, R. and Hinton, G. (2009). Deep boltzmann machines. In Artificial intelligence
    and statistics, pages 448–455.
    Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., and Chen, X. (2016).
    Improved techniques for training gans. In Advances in neural information processing
    systems, pages 2234–2242.
    Sau, B. B. and Balasubramanian, V. N. (2016). Deep model compression: Distilling knowledge
    from noisy teachers. arXiv preprint arXiv:1610.09650.
    Simonyan, K. and Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale
    Image Recognition. arXiv preprint arXiv:1409.1556, abs/1409.1.
    Srivastava, R. K., Greff, K., and Schmidhuber, J. (2015). Highway Networks. arXiv preprint
    arXiv:1505.00387.
    Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. A. (2017a). Inception-v4, Inception-
    ResNet and the Impact of Residual Connections on Learning. In Thirty-First AAAI Conference
    on Artificial Intelligence.
    Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. A. (2017b). Inception-v4, inceptionresnet
    and the impact of residual connections on learning. In Thirty-First AAAI Conference
    on Artificial Intelligence.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V.,
    and Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE
    conference on computer vision and pattern recognition, pages 1–9.
    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł.,
    and Polosukhin, I. (2017). Attention is all you need. In Advances in neural information
    processing systems, pages 5998–6008.
    Vondrick, C., Pirsiavash, H., and Torralba, A. (2016). Generating videos with scene dynamics.
    In Advances In Neural Information Processing Systems, pages 613–621.
    Wang, X. and Leng, C. (2015). High dimensional ordinary least squares projection for screening
    variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology),
    pages n/a–n/a.
    Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., and Bengio,
    Y. (2015). Show, attend and tell: Neural image caption generation with visual attention.
    arXiv preprint arXiv:1502.03044.
    Yang, Z., Dai, Z., Salakhutdinov, R., and Cohen, W. W. (2017). Breaking the softmax bottleneck:
    A high-rank rnn language model. arXiv preprint arXiv:1711.03953.
    Yin, W., Schütze, H., Xiang, B., and Zhou, B. (2016). Abcnn: Attention-based convolutional
    neural network for modeling sentence pairs. Transactions of the Association for
    Computational Linguistics, 4:259–272.
    Zagoruyko, S. and Komodakis, N. (2016a). Paying More Attention to Attention: Improving
    the Performance of Convolutional Neural Networks via Attention Transfer. arXiv preprint
    arXiv:1612.03928.
    Zagoruyko, S. and Komodakis, N. (2016b). Wide Residual Networks. arXiv preprint arXiv:
    1605.07146.
    Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., and Metaxas, D. N. (2017).
    Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks.
    In Proceedings of the IEEE International Conference on Computer Vision, pages
    5907–5915.
    Zhong, Z., Zheng, L., Kang, G., Li, S., and Yang, Y. (2017). Random erasing data augmentation.
    arXiv preprint arXiv:1708.04896.
    Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. (2017). Unpaired image-to-image translation
    using cycle-consistent adversarial networks. In Proceedings of the IEEE international
    conference on computer vision, pages 2223–2232.

    下載圖示 校內:2024-08-20公開
    校外:2024-08-20公開
    QR CODE