| 研究生: |
蔡帛融 Tsai, Po-Jung |
|---|---|
| 論文名稱: |
應用生成對抗學習以增進卷積神經網路之注意力遷移 Using generative adversarial learning to enhance the attention transfer in convolutional neural network |
| 指導教授: |
吳馬丁
Torbjörn E. M. Nordling |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 機械工程學系 Department of Mechanical Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 英文 |
| 論文頁數: | 48 |
| 中文關鍵詞: | 模型壓縮 、師生策略 、知識蒸餾 、注意力遷移 、對抗生成網路 |
| 外文關鍵詞: | Model compression, teacher-student strategy, knowledge distillation, attention transfer, generative adversarial network |
| 相關次數: | 點閱:85 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
研究背景: 由於深度卷積神經網路的發展,深度學習被用來解決許多機器視覺上的問題,像是物體識別。然而這些深度神經網路的高運算成本使人們產生對模型壓縮的需求,進而尋找能適用於行動和邊緣裝置並接近於原本性能的小型網路。而使用深度網路(老師)訓練小型網路(學生)的師生策略已經顯示出了具有潛力的成效。為了提高學生網路的性能,已有許多研究從輸出和隱藏層傳遞知識訊息的方法,例如知識蒸餾和注意力遷移。目前用於注意力遷移的方法依賴於教師和學生的注意力圖的像素比較,其對每個像素給予相同的權重並忽略空間差異。
研究目標: 我們的目標是透過對抗性學習最佳化對注意力圖的比較來改善學生網路的表現。
研究方法: 生成對抗網路(GAN)已經成功用於像是圖像樣式轉換等應用。我們則使用對抗性學習來獲得鑑別器網路以評估注意力圖的品質。同時,訓練學生網路以產生注意力圖,並且學習足以欺騙鑑別器網路來進行模仿教師網路的注意力,進而達到注意力遷移。
研究結果: 我們透過將比較之前提出的注意力遷移和知識蒸餾的方法來評估我們在CIFAR10 資料集上的方法。我們發現MiGAN-AT 在應用於弱學生網路時表現優於其他方法,而在較強大的學生網路上則表現相似。
研究結論: 我們的結果顯示,透過使用鑑別器網路來比較學生和教師網路的注意力圖,使用師生策略進行壓縮模型時仍然可以保持良好的性能。
Background: Deep learning has enabled the super human solution of many computer vision problems, such as object recognition, thanks to the use of deep convolutional neural
networks. The computational cost of these deep neural networks has spurred an interest in model compression to find small and good approximations that works on mobile and edge devices. In particular, the teacher-student strategy, where a deep network (teacher) is used to train a small network (student), has shown promise. To improve the performance of the student network methods for transfer of information from the output and hidden layers have
been investigated, such as knowledge distillation and attention transfer. Existing methods for attention transfer rely on pixel-wise comparison of the attention map of the teacher and student, which puts the same weight on each pixel and neglects spatial differences.
Aim: We aim to improve the student network’s performance by optimizing the comparison of the attention maps through adversarial learning.
Method: Generative adversarial networks (GANs) have with success been used, e.g., for transfer of image style. We use adversarial learning to obtain a discriminator network to evaluate the quality of attention maps. Simultaneously, the student network is trained to generate
attention maps, which are good enough to cheat the discriminator, and hence achieve attention transfer by mimicking the attention of the teacher. We call this method mimicking generative adversarial network for attention transfer (MiGAN-AT).
Results: We evaluate our method on the CIFAR10 dataset by comparing it to previously published methods for attention transfer and knowledge distillation. We found that MiGAN-AT
outperforms the other methods when applied to weak student networks and perform similarly on strong student networks.
Conclusions: Our results show that good performance can be maintained while compressing models using the teacher-student strategy by using a discriminator network to compare the attention maps of the student and teacher.
Arjovsky, M., Chintala, S., and Bottou, L. (2017). Wasserstein gan. arXiv preprint arXiv:1701.07875.
Assael, Y. M., Shillingford, B., Whiteson, S., and De Freitas, N. (2016). Lipnet: End-to-end sentence-level lipreading. arXiv preprint arXiv:1611.01599.
Ba, J. and Caruana, R. (2014). Do deep nets really need to be deep? In Advances in neural information processing systems, pages 2654–2662.
Barbounis, T. G., Theocharis, J. B., Alexiadis, M. C., and Dokopoulos, P. S. (2006). Longterm wind speed and power forecasting using local recurrent neural network models. IEEE
Transactions on Energy Conversion, 21(1):273–284.
Bengio, Y., Simard, P., Frasconi, P., and Others (1994). Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2):157–166.
Changpinyo, S., Sandler, M., and Zhmoginov, A. (2017). The power of sparsity in convolutional neural networks. arXiv preprint arXiv:1702.06257.
Chen, G., Choi, W., Yu, X., Han, T., and Chandraker, M. (2017a). Learning efficient object detection models with knowledge distillation. In Advances in Neural Information Processing
Systems, pages 742–751.
Chen, P., Sun, Z., Bing, L., and Yang, W. (2017b). Recurrent attention network on memory for aspect sentiment analysis. In Proceedings of the 2017 conference on empirical methods
in natural language processing, pages 452–461.
Chen, T., Goodfellow, I., and Shlens, J. (2015a). Net2net: Accelerating learning via knowledge transfer. arXiv preprint arXiv:1511.05641.
Chen, W., Wilson, J., Tyree, S., Weinberger, K., and Chen, Y. (2015b). Compressing neural networks with the hashing trick. In International Conference on Machine Learning, pages 2285–2294.
Cheng, Y., Wang, D., Zhou, P., and Zhang, T. (2017). A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282.
Cheng, Y., Yu, F. X., Feris, R. S., Kumar, S., Choudhary, A., and Chang, S.-F. (2015). An exploration of parameter redundancy in deep networks with circulant projections. In Proceedings of the IEEE International Conference on Computer Vision, pages 2857–2865.
Cireşan, D., Meier, U., and Schmidhuber, J. (2012). Multi-column Deep Neural Networks for Image Classification. arXiv preprint arXiv:1202.2745.
Dauphin, Y. N., Fan, A., Auli, M., and Grangier, D. (2017). Language modeling with gated convolutional networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 933–941. JMLR. org.
Deng, J., Dong, W., Socher, R., Li, L.-J., Kai Li, and Li Fei-Fei (2009). ImageNet: A largescale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern
Recognition, pages 248–255.
Denil, M., Shakibi, B., Dinh, L., De Freitas, N., et al. (2013). Predicting parameters in deep learning. In Advances in neural information processing systems, pages 2148–2156.
Denton, E. L., Chintala, S., Fergus, R., et al. (2015). Deep generative image models using a laplacian pyramid of adversarial networks. In Advances in neural information processing
systems, pages 1486–1494.
Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., and Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in neural
information processing systems, pages 1269–1277.
Dziugaite, G. K., Roy, D. M., and Ghahramani, Z. (2015). Training generative neural networks via maximum mean discrepancy optimization. arXiv preprint arXiv:1505.03906.
Gauthier, J. (2014). Conditional generative adversarial nets for convolutional face generation. Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter semester, 2014(5):2.
Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587.
Glorot, X. and Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 249–256.
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative Adversarial Networks. pages 1–9.
Griffin, G., Holub, A., and Perona, P. (2007). Caltech-256 object category dataset.
Gulshan, V., Peng, L., Coram, M., Stumpe, M. C., Wu, D., Narayanaswamy, A., Venugopalan, S., Widner, K., Madams, T., Cuadros, J., et al. (2016). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama, 316(22):2402–2410.
Han, S., Mao, H., and Dally, W. J. (2015a). Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv: 1510.00149.
Han, S., Pool, J., Tran, J., and Dally, W. (2015b). Learning both weights and connections for efficient neural network. In Advances in neural information processing systems, pages 1135–1143.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the Knowledge in a Neural Network. arXiv preprint arXiv:1503.02531.
Hu, J., Shen, L., and Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141.
Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708.
Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.
Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2016). Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004.
Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134.
Jaderberg, M., Vedaldi, A., and Zisserman, A. (2014). Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866.
Kingma, D. P. and Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541–551.
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690.
Lin, M., Chen, Q., and Yan, S. (2013). Network In Network. arXiv preprint arXiv:1312.4400.
Lu, L., Guo, M., and Renals, S. (2017). Knowledge distillation for small-footprint highway networks. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4820–4824. IEEE.
Luo, P., Zhu, Z., Liu, Z., Wang, X., and Tang, X. (2016). Face model compression by distilling
knowledge from neurons. In Thirtieth AAAI Conference on Artificial Intelligence.
Mathieu, M., Couprie, C., and LeCun, Y. (2015). Deep multi-scale video prediction beyond
mean square error. arXiv preprint arXiv:1511.05440.
Mirza, M. and Osindero, S. (2014). Conditional generative adversarial nets. arXiv preprint
arXiv:1411.1784.
Mogren, O. (2016). C-rnn-gan: Continuous recurrent neural networks with adversarial training.
arXiv preprint arXiv:1611.09904.
Parkhi, O. M., Vedaldi, A., and Zisserman, A. (2015). Deep Face Recognition. In Procedings
of the British Machine Vision Conference 2015, volume 1, pages 41.1–41.12. British
Machine Vision Association.
Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., and Efros, A. (2016a). Context encoders:
Feature learning by inpainting. In Computer Vision and Pattern Recognition (CVPR).
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., and Efros, A. A. (2016b). Context
encoders: Feature learning by inpainting. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 2536–2544.
Pottie, G. J. and Kaiser, W. J. (2000). Wireless integrated network sensors. Communications
of the ACM, 43(5):51–58.
Quattoni, A. and Torralba, A. (2009). Recognizing indoor scenes. In 2009 IEEE Conference
on Computer Vision and Pattern Recognition, pages 413–420. IEEE.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016). You only look once: Unified,
real-time object detection. In Proceedings of the IEEE conference on computer vision and
pattern recognition, pages 779–788.
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., and Lee, H. (2016). Generative
adversarial text to image synthesis. arXiv preprint arXiv:1605.05396.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object
detection with region proposal networks. In Advances in neural information processing
systems, pages 91–99.
Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., and Bengio, Y. (2014). FitNets:
Hints for Thin Deep Nets. arXiv preprint arXiv:1412.6550.
Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net: Convolutional networks for
biomedical image segmentation. In International Conference on Medical image computing
and computer-assisted intervention, pages 234–241. Springer.
Sabour, S., Frosst, N., and Hinton, G. E. (2017). Dynamic routing between capsules. In
Advances in neural information processing systems, pages 3856–3866.
Salakhutdinov, R. and Hinton, G. (2009). Deep boltzmann machines. In Artificial intelligence
and statistics, pages 448–455.
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., and Chen, X. (2016).
Improved techniques for training gans. In Advances in neural information processing
systems, pages 2234–2242.
Sau, B. B. and Balasubramanian, V. N. (2016). Deep model compression: Distilling knowledge
from noisy teachers. arXiv preprint arXiv:1610.09650.
Simonyan, K. and Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale
Image Recognition. arXiv preprint arXiv:1409.1556, abs/1409.1.
Srivastava, R. K., Greff, K., and Schmidhuber, J. (2015). Highway Networks. arXiv preprint
arXiv:1505.00387.
Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. A. (2017a). Inception-v4, Inception-
ResNet and the Impact of Residual Connections on Learning. In Thirty-First AAAI Conference
on Artificial Intelligence.
Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. A. (2017b). Inception-v4, inceptionresnet
and the impact of residual connections on learning. In Thirty-First AAAI Conference
on Artificial Intelligence.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V.,
and Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE
conference on computer vision and pattern recognition, pages 1–9.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł.,
and Polosukhin, I. (2017). Attention is all you need. In Advances in neural information
processing systems, pages 5998–6008.
Vondrick, C., Pirsiavash, H., and Torralba, A. (2016). Generating videos with scene dynamics.
In Advances In Neural Information Processing Systems, pages 613–621.
Wang, X. and Leng, C. (2015). High dimensional ordinary least squares projection for screening
variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology),
pages n/a–n/a.
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., and Bengio,
Y. (2015). Show, attend and tell: Neural image caption generation with visual attention.
arXiv preprint arXiv:1502.03044.
Yang, Z., Dai, Z., Salakhutdinov, R., and Cohen, W. W. (2017). Breaking the softmax bottleneck:
A high-rank rnn language model. arXiv preprint arXiv:1711.03953.
Yin, W., Schütze, H., Xiang, B., and Zhou, B. (2016). Abcnn: Attention-based convolutional
neural network for modeling sentence pairs. Transactions of the Association for
Computational Linguistics, 4:259–272.
Zagoruyko, S. and Komodakis, N. (2016a). Paying More Attention to Attention: Improving
the Performance of Convolutional Neural Networks via Attention Transfer. arXiv preprint
arXiv:1612.03928.
Zagoruyko, S. and Komodakis, N. (2016b). Wide Residual Networks. arXiv preprint arXiv:
1605.07146.
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., and Metaxas, D. N. (2017).
Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks.
In Proceedings of the IEEE International Conference on Computer Vision, pages
5907–5915.
Zhong, Z., Zheng, L., Kang, G., Li, S., and Yang, Y. (2017). Random erasing data augmentation.
arXiv preprint arXiv:1708.04896.
Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. (2017). Unpaired image-to-image translation
using cycle-consistent adversarial networks. In Proceedings of the IEEE international
conference on computer vision, pages 2223–2232.