簡易檢索 / 詳目顯示

研究生: 劉邦彥
Liu, Bang-Yan
論文名稱: 類自主機器學習與捲積加速器之研究與合成
Research of Autonomous-like Machine Learning and Bayesian Convolution Accelerator Synthesis
指導教授: 周哲民
Jou, Jer-Min
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 64
中文關鍵詞: 對偶學習生成對抗網路類自主機器學習神經網路加速器捲積神經網路高階合成貝葉斯優化
外文關鍵詞: Dual Learning, Generative Adversarial Network, Autonomous-like Machine Learning, Neural Network Accelerator, Convolutional Neural Network, High-Level Synthesis, Bayesian Optimization
相關次數: 點閱:122下載:24
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於深度網路的興盛,人工智慧已經大量應用我們的生活中,本論文研究出一種類自主機器學習方法,其結合了對偶學習和生成對抗網路的學習方式,使神經網路的訓練不在需要有標籤的資料,而是以無監督的方式利用無標籤資料進行自主學習。
    在神經網路加速器的設計方面,我們探討了捲積神經網路(CNN)各種平行化的可能性,並使用高階合成工具Vivado HLS,以高階語言實現高度可平行化的捲積神經網路硬體架構,最後透過貝葉斯優化在硬體執行時間與所使用的FPGA資源進行權衡,探索出多種不同的加速器設計方案。實驗結果顯示貝葉斯優化能夠有效的探索出多種資源替代的設計方案,並探索出一種設計其與我們人工設計相比只使用了30%的DSP達到與我們一樣的硬體執行速度。

    Due to the proliferation of deep networks, artificial intelligence has been widely used in our lives. This paper has developed a kind of autonomous machine learning method that combines dual learning and generative adversarial network(GAN), so that the training of neural networks is not available. Labeled materials are needed, but unsupervised use of unlabeled materials for autonomous learning.
    In the design of neural network accelerators, we explored the possibility of parallelization of convolutional neural networks (CNN), and used high-level synthesis tool Vivado HLS to implement highly parallelizable convolutional neural network hardware in high-level languages. Finally, Bayesian optimization is used to weigh the hardware execution time and the FPGA resources used to explore a variety of different accelerator design solutions. The experimental results show that Bayesian optimization can effectively explore a variety of resource replacement design options, and explore a design that uses only 30% of the DSP compared with our manual design to achieve the same hardware execution speed as ours.

    摘要 I Research of Autonomous-like Machine Learning and Bayesian Convolution Accelerator Synthesis II SUMMARY II OUR PROPOSED DESIGN II EXPERIMENTS III CONCLUSION V 誌謝 VII 目錄 VIII 表目錄 X 圖目錄 XI 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機與目的 3 1.3 論文架構 4 第二章 類自主機器學習之背景與演算法設計 5 2.1 機器學習(Machine Learning) 5 2.1.1 監督式學習(Supervised Learning) 7 2.1.2 非監督式學習(Unsupervised Learning) 7 2.1.3 強化式學習(Reinforcement Learning, RL) 8 2.1.4 類神經網路(Neural Network, NN) 9 2.2 深度學習(Deep Learning) 11 2.2.1 捲積神經網路(Convolutional Neural Networks, CNN) 12 2.2.2 循環神經網路(Recurrent Neural Network, RNN) 16 2.3 對偶學習(Dual learning, DL) 20 2.4 生成對抗網路(Generative Adversarial Network, GAN) 24 2.5 類自主機器學習(Autonomous-like Machine Learning)演算法設計 25 2.6 類自主機器學習在實作上的挑戰 27 第三章 捲積神經網絡加速器文獻回顧 28 3.1 CNN加速器平台 28 3.1.1 中央處理器(Central Processing Unit, CPU) 28 3.1.2 圖形處理器(Graphic Processing Unit, GPU) 28 3.1.3 現場可程式化邏輯閘陣列(Field Programmable Gate Array, FPGA) 29 3.2 基於FPGA的CNN加速器文獻回顧 29 第四章 高階合成與貝葉斯優化之背景與文獻回顧 31 4.1 貝葉斯優化(Bayesian Optimization) 31 4.1.1 高斯過程(Gaussian Processes, GP) 32 4.1.2 採集函數(Acquisition Function) 35 4.2 高階合成(High-Level Synthesis, HLS)工具 36 4.2.1 HLS運作流程與合成行為 37 4.2.2 HLS設計流程 38 4.2.3 HLS 常用的優化指令(optimization directives) 39 4.2.4 HLS設計空間探索問題 41 4.3 高階合成設計方法文獻回顧 41 第五章 CNN高階合成加速器設計 42 5.1 CNN理論分析 42 5.1.1 捲積核內平行 42 5.1.2 輸入與輸出通道平行 43 5.1.3 捲積層間計算平行 43 5.1.4 輸入資料重用分析(Data reuse) 43 5.2 高階合成硬體架構設計 43 5.2.1 層間平行運算設計(Dataflow) 43 5.2.2 Line Buffer 設計 44 5.2.3 捲積層(Convolutional layer)迴圈設計 44 5.2.4 最大池化層(Max Pooling layer)迴圈設計 45 5.2.5 全連接層(Fully Connected layer)迴圈設計 46 5.2.6 固定點數(Fixed-point)運算 47 5.3 貝葉斯高階合成優化指令探索設計 48 5.3.1 貝葉斯優化探索架構 48 5.3.2 貝葉斯優化目標 49 5.3.3 高階合成指令優化位置 50 5.3.4 優化指令參數化 51 5.3.5 轉譯器(translator) 52 5.3.6 貝葉斯優化之高階合成 53 第六章 實驗環境與數據分析 55 6.1 實驗環境與實驗方式 55 6.2 實驗結果 55 第七章 結論與未來展望 59 參考文獻 60

    [1] F. Rosenblatt, “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain,” in Psychological Review, pp. 386-408, 1958.
    [2] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” in ICLR, 2015.
    [3] S. Jean, K. Cho, R. Memisevic, and Y. Bengio, “On using very large target vocabulary for neural machine translation,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1–10, 2015.
    [4] D. He, Y. Xia, T. Qin, L. Wang, N. Yu, T. Liu, W. Ma, ’’ Dual Learning for Machine Translation,” in NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016.
    [5] R. S. Sutton, D. A. McAllester, S. P. Singh, Y. Mansour, ” Policy gradient methods for reinforcement learning with function approximation,” in NIPS, volume 99, pages 1057–1063, 1999.
    [6] Ian Goodfellow, “Tutorial: Generative Adversarial Networks,” in NIPS, 2016.
    [7] K. Chellapilla, S. Puri, P. Simard, ”High Performance Convolutional Neural Networks for Document Processing,” in Tenth International Workshop on Frontiers in Handwriting Recognition, 2006.
    [8] M. Cho, D. Brand, “MEC: Memory-efficient Convolution for Deep Neural Network,” in ICML, 2017
    [9] A. Anderson, A. Vasudevan, C. Keane, D. Gregg, “Low-memory GEMM-based convolution algorithms for deep neural networks,” in arXiv:1709.03395, 2017.
    [10] S. Winograd, “Arithmetic Complexity of Computations”, SIAM, 1980.
    [11] A. Lavin, S. Gray, “Fast Algorithms for Convolutional Neural Networks,” in arXiv:1509.09308, 2015.
    [12] M. Mathieu, M. Henaff, Y. LeCun, “Fast Training of Convolutional Networks through FFTs,” in arXiv:1312.5851, 2014.
    [13] J. Hwan Ko, B. Mudassar, T. Na and S. Mukhopadhyay, "Design of an energy-efficient accelerator for training of convolutional neural networks using frequency-domain computation," in 54th ACM/EDAC/IEEE Design Automation Conference, 2017.
    [14] M. Denil, B. Shakibi, L. Dinh, M. Ranzato, N. Freitas, “Predicting Parameters in Deep Learning,” in arXiv:1306.0543, 2013.
    [15] T. N. Sainath, B. Kingsbury, V. Sindhwani, E. Arisoy and B. Ramabhadran, "Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets," in IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6655-6659, 2013.
    [16] S. Anwar, K. Hwang and W. Sung, "Fixed point optimization of deep convolutional neural networks for object recognition," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1131-1135, 2015.
    [17] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, “Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks,” in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015.
    [18] M. Courbariaux, Y. Bengio, J. David, “Training deep neural networks with low precision multiplications,” in arXiv:1412.7024, 2014.
    [19] S. Han, H. Mao, W. J. Dally, “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,” in arXiv:1510.00149, 2015.
    [20] Baoyuan Liu, Min Wang, H. Foroosh, M. Tappen and M. Penksy, "Sparse Convolutional Neural Networks," in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
    [21] Song Han, Jeff Pool, John Tran, William J. Dally, “Learning both Weights and Connections for Efficient Neural Networks,” in arXiv:1506.02626 [cs.NE], 2015.
    [22] C. Farabet, C. Poulet, J. Y. Han and Y. LeCun, "CNP: An FPGA-based processor for Convolutional Networks," in International Conference on Field Programmable Logic and Applications, pp. 32-37, 2009.
    [23] V. Gokhale, J. Jin, A. Dundar, B. Martini and E. Culurciello, "A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks," in IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 696-701, 2014.
    [24] D. Koeplinger, R. Prabhakar, Y. Zhang, C. Delimitrou, C. Kozyrakis and K. Olukotun, "Automatic Generation of Efficient Accelerators for Reconfigurable Hardware," in ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 115-127, 2016.
    [25] F. Winterstein, S. Bayliss and G. A. Constantinides, "High-level synthesis of dynamic data structures: A case study using Vivado HLS," in International Conference on Field-Programmable Technology (FPT) , pp. 362-365, 2013.
    [26] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams and N. de Freitas, "Taking the Human Out of the Loop: A Review of Bayesian Optimization," in Proceedings of the IEEE, pp. 148-175, 2016.
    [27] F. Winterstein, S. Bayliss, and G. Constantinides, “High-level synthesis of dynamic data structures: A case study using Vivado HLS,” In Proc. International Conference on Field-Programmable Technology (FPT’13), 2013.
    [28] C. Lo and P. Chow, “Model-based optimization of high level synthesis directives,” In Proc. International Conference on Field Programmable Logic and Applications (FPL’16), 2016.
    [29] B. Reagen, J. Hernández-Lobato, R. Adolf, M. Gelbart, P. Whatmough, G. Wei, and D. Brooks, “A case for efficient accelerator design space exploration via Bayesian optimization,” In Proc. International Symposium on Low Power Electronics and Design (ISLPED’17), 2017.
    [30] A. Mehrabi, A. Manocha, B. C. Lee,D. J. Sorin, Bayesian Optimization for Efficient Accelerator Synthesis, In ACM Transactions on Architecture and Code Optimization,2020.
    [31] J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang, “High-level synthesis for FPGAs: From prototyping to deployment,” In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2011.
    [32] R. Nane, V. Sima, C. Pilato, J. Choi, B. Fort, A. Canis, Y. Chen, H. Hsiao, S. Brown, F. Ferrandi, J. Anderson, and K. Bertels, “ A survey and evaluation of FPGA high-level synthesis tools,” In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2015.
    [33] B. Reagen, Y. S. Shao, G. Wei and D. Brooks, "Quantifying acceleration: Power/performance trade-offs of application kernels in hardware," International Symposium on Low Power Electronics and Design (ISLPED), 2013.
    [34] Sergiu Duda, “How to Implement a Convolutional Neural Network Using High Level Synthesis,” In AMIQ Consulting, 2018.
    [35] K. Guo, S. Zeng, J. Yu, Y. Wang, H. Yang, “A Survey of FPGA-Based Neural Network Accelerator,” arXiv:1712.08934 [cs.AR], 2017.
    [36] José Miguel Hernández-Lobato, Michael A. Gelbart, Matthew W. Hoffman, Ryan P. Adams, Zoubin Ghahramani, ”Predictive Entropy Search for Bayesian Optimization with Unknown Constraints,” arXiv: 1502.05312 [stat.ML], 2015.
    [37] D. Hernandez-Lobato, J. Miguel Hernandez-Lobato, A. Shah and R. Prescott Adams, ”Predictive Entropy Search for Multi-objective Bayesian Optimizaton,” NIPS workshop on Bayesian optimization, 2015.
    [38] https://github.com/HIPS/Spearmint.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE