研究生: |
劉邦彥 Liu, Bang-Yan |
---|---|
論文名稱: |
類自主機器學習與捲積加速器之研究與合成 Research of Autonomous-like Machine Learning and Bayesian Convolution Accelerator Synthesis |
指導教授: |
周哲民
Jou, Jer-Min |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 64 |
中文關鍵詞: | 對偶學習 、生成對抗網路 、類自主機器學習 、神經網路加速器 、捲積神經網路 、高階合成 、貝葉斯優化 |
外文關鍵詞: | Dual Learning, Generative Adversarial Network, Autonomous-like Machine Learning, Neural Network Accelerator, Convolutional Neural Network, High-Level Synthesis, Bayesian Optimization |
相關次數: | 點閱:122 下載:24 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於深度網路的興盛,人工智慧已經大量應用我們的生活中,本論文研究出一種類自主機器學習方法,其結合了對偶學習和生成對抗網路的學習方式,使神經網路的訓練不在需要有標籤的資料,而是以無監督的方式利用無標籤資料進行自主學習。
在神經網路加速器的設計方面,我們探討了捲積神經網路(CNN)各種平行化的可能性,並使用高階合成工具Vivado HLS,以高階語言實現高度可平行化的捲積神經網路硬體架構,最後透過貝葉斯優化在硬體執行時間與所使用的FPGA資源進行權衡,探索出多種不同的加速器設計方案。實驗結果顯示貝葉斯優化能夠有效的探索出多種資源替代的設計方案,並探索出一種設計其與我們人工設計相比只使用了30%的DSP達到與我們一樣的硬體執行速度。
Due to the proliferation of deep networks, artificial intelligence has been widely used in our lives. This paper has developed a kind of autonomous machine learning method that combines dual learning and generative adversarial network(GAN), so that the training of neural networks is not available. Labeled materials are needed, but unsupervised use of unlabeled materials for autonomous learning.
In the design of neural network accelerators, we explored the possibility of parallelization of convolutional neural networks (CNN), and used high-level synthesis tool Vivado HLS to implement highly parallelizable convolutional neural network hardware in high-level languages. Finally, Bayesian optimization is used to weigh the hardware execution time and the FPGA resources used to explore a variety of different accelerator design solutions. The experimental results show that Bayesian optimization can effectively explore a variety of resource replacement design options, and explore a design that uses only 30% of the DSP compared with our manual design to achieve the same hardware execution speed as ours.
[1] F. Rosenblatt, “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain,” in Psychological Review, pp. 386-408, 1958.
[2] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” in ICLR, 2015.
[3] S. Jean, K. Cho, R. Memisevic, and Y. Bengio, “On using very large target vocabulary for neural machine translation,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1–10, 2015.
[4] D. He, Y. Xia, T. Qin, L. Wang, N. Yu, T. Liu, W. Ma, ’’ Dual Learning for Machine Translation,” in NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016.
[5] R. S. Sutton, D. A. McAllester, S. P. Singh, Y. Mansour, ” Policy gradient methods for reinforcement learning with function approximation,” in NIPS, volume 99, pages 1057–1063, 1999.
[6] Ian Goodfellow, “Tutorial: Generative Adversarial Networks,” in NIPS, 2016.
[7] K. Chellapilla, S. Puri, P. Simard, ”High Performance Convolutional Neural Networks for Document Processing,” in Tenth International Workshop on Frontiers in Handwriting Recognition, 2006.
[8] M. Cho, D. Brand, “MEC: Memory-efficient Convolution for Deep Neural Network,” in ICML, 2017
[9] A. Anderson, A. Vasudevan, C. Keane, D. Gregg, “Low-memory GEMM-based convolution algorithms for deep neural networks,” in arXiv:1709.03395, 2017.
[10] S. Winograd, “Arithmetic Complexity of Computations”, SIAM, 1980.
[11] A. Lavin, S. Gray, “Fast Algorithms for Convolutional Neural Networks,” in arXiv:1509.09308, 2015.
[12] M. Mathieu, M. Henaff, Y. LeCun, “Fast Training of Convolutional Networks through FFTs,” in arXiv:1312.5851, 2014.
[13] J. Hwan Ko, B. Mudassar, T. Na and S. Mukhopadhyay, "Design of an energy-efficient accelerator for training of convolutional neural networks using frequency-domain computation," in 54th ACM/EDAC/IEEE Design Automation Conference, 2017.
[14] M. Denil, B. Shakibi, L. Dinh, M. Ranzato, N. Freitas, “Predicting Parameters in Deep Learning,” in arXiv:1306.0543, 2013.
[15] T. N. Sainath, B. Kingsbury, V. Sindhwani, E. Arisoy and B. Ramabhadran, "Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets," in IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6655-6659, 2013.
[16] S. Anwar, K. Hwang and W. Sung, "Fixed point optimization of deep convolutional neural networks for object recognition," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1131-1135, 2015.
[17] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, “Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks,” in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015.
[18] M. Courbariaux, Y. Bengio, J. David, “Training deep neural networks with low precision multiplications,” in arXiv:1412.7024, 2014.
[19] S. Han, H. Mao, W. J. Dally, “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,” in arXiv:1510.00149, 2015.
[20] Baoyuan Liu, Min Wang, H. Foroosh, M. Tappen and M. Penksy, "Sparse Convolutional Neural Networks," in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[21] Song Han, Jeff Pool, John Tran, William J. Dally, “Learning both Weights and Connections for Efficient Neural Networks,” in arXiv:1506.02626 [cs.NE], 2015.
[22] C. Farabet, C. Poulet, J. Y. Han and Y. LeCun, "CNP: An FPGA-based processor for Convolutional Networks," in International Conference on Field Programmable Logic and Applications, pp. 32-37, 2009.
[23] V. Gokhale, J. Jin, A. Dundar, B. Martini and E. Culurciello, "A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks," in IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 696-701, 2014.
[24] D. Koeplinger, R. Prabhakar, Y. Zhang, C. Delimitrou, C. Kozyrakis and K. Olukotun, "Automatic Generation of Efficient Accelerators for Reconfigurable Hardware," in ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 115-127, 2016.
[25] F. Winterstein, S. Bayliss and G. A. Constantinides, "High-level synthesis of dynamic data structures: A case study using Vivado HLS," in International Conference on Field-Programmable Technology (FPT) , pp. 362-365, 2013.
[26] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams and N. de Freitas, "Taking the Human Out of the Loop: A Review of Bayesian Optimization," in Proceedings of the IEEE, pp. 148-175, 2016.
[27] F. Winterstein, S. Bayliss, and G. Constantinides, “High-level synthesis of dynamic data structures: A case study using Vivado HLS,” In Proc. International Conference on Field-Programmable Technology (FPT’13), 2013.
[28] C. Lo and P. Chow, “Model-based optimization of high level synthesis directives,” In Proc. International Conference on Field Programmable Logic and Applications (FPL’16), 2016.
[29] B. Reagen, J. Hernández-Lobato, R. Adolf, M. Gelbart, P. Whatmough, G. Wei, and D. Brooks, “A case for efficient accelerator design space exploration via Bayesian optimization,” In Proc. International Symposium on Low Power Electronics and Design (ISLPED’17), 2017.
[30] A. Mehrabi, A. Manocha, B. C. Lee,D. J. Sorin, Bayesian Optimization for Efficient Accelerator Synthesis, In ACM Transactions on Architecture and Code Optimization,2020.
[31] J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang, “High-level synthesis for FPGAs: From prototyping to deployment,” In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2011.
[32] R. Nane, V. Sima, C. Pilato, J. Choi, B. Fort, A. Canis, Y. Chen, H. Hsiao, S. Brown, F. Ferrandi, J. Anderson, and K. Bertels, “ A survey and evaluation of FPGA high-level synthesis tools,” In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2015.
[33] B. Reagen, Y. S. Shao, G. Wei and D. Brooks, "Quantifying acceleration: Power/performance trade-offs of application kernels in hardware," International Symposium on Low Power Electronics and Design (ISLPED), 2013.
[34] Sergiu Duda, “How to Implement a Convolutional Neural Network Using High Level Synthesis,” In AMIQ Consulting, 2018.
[35] K. Guo, S. Zeng, J. Yu, Y. Wang, H. Yang, “A Survey of FPGA-Based Neural Network Accelerator,” arXiv:1712.08934 [cs.AR], 2017.
[36] José Miguel Hernández-Lobato, Michael A. Gelbart, Matthew W. Hoffman, Ryan P. Adams, Zoubin Ghahramani, ”Predictive Entropy Search for Bayesian Optimization with Unknown Constraints,” arXiv: 1502.05312 [stat.ML], 2015.
[37] D. Hernandez-Lobato, J. Miguel Hernandez-Lobato, A. Shah and R. Prescott Adams, ”Predictive Entropy Search for Multi-objective Bayesian Optimizaton,” NIPS workshop on Bayesian optimization, 2015.
[38] https://github.com/HIPS/Spearmint.