| 研究生: |
王鈺堡 Wang, Yu-Bao |
|---|---|
| 論文名稱: |
矩陣相乘硬體架構設計空間探索 Matrix Multiplication Hardware Architecture Design Space Exploration |
| 指導教授: |
周哲民
Jou, Jer-Min |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 中文 |
| 論文頁數: | 55 |
| 中文關鍵詞: | 矩陣相乘演算法 、高階合成 、硬體設計 、資料流排程 |
| 外文關鍵詞: | Matrix Multiplication Algorithms, Advanced Synthesis, Hardware Design, Data Flow Scheduling |
| 相關次數: | 點閱:86 下載:14 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
機器學習中的神經網路已被廣泛應用在各種領域,需要事先將訓練資料貼上標籤的監督式學習,和監督式學習相反,不需要依靠標籤好的訓練集資料,從訓練集中找尋訓練資料淺在規律的無監督學習,但無論是監督或無監督網路,其訓練被後都是由大量的運算所組成。一種常用的神經網路運算,例如捲積運算,可以將其運算轉換為一般常見的矩陣相乘,只需在資料輸入網路前,先將資料有順序的排序好,便能將捲積神經網路的捲積運算轉換為普通的矩陣運算,但是這樣依然無法解決海量運算所帶來的時間成本問題。因此本文在矩陣相乘演算法硬體設計之基礎上,提出一種資料流優化設計方法,該方法能有效地在時間上最大化利用硬體資源,最終達到整體資料流運行之最優化設計。並結合高階合成軟體(Vivado High Level Synthesis, Vivado HLS)將我們之設計合成RTL,並分析其硬體結果。有別於傳統之硬體設計,需要透過人工方式實現RTL,透過高階合成軟體,可以大大的降低硬體實現上的時間,讓更多的時間能用在硬體設計之初的架構探索上。
Neural networks in machine learning have been widely used in various fields. Supervised learning requires labeling training materials in advance. Contrary to supervised learning, it does not need to rely on labeled training set data to find training data from the training set. It is shallow in regular unsupervised learning, but whether it is a supervised or unsupervised network, its training is composed of a large number of operations. A commonly used neural network operation, such as convolution operation, can be converted into a common matrix multiplication. It only needs to sort the data in order before entering the data into the network. The convolution operation of the network is converted into an ordinary matrix operation, but this still cannot solve the time cost problem caused by massive operations. Therefore, based on the hardware design of the matrix multiplication algorithm, this paper proposes a data flow optimization design method, which can effectively maximize the use of hardware resources in time, and finally achieve the optimal design of the overall data flow operation. Combined with high-level synthesis software (Vivado High Level Synthesis, Vivado HLS), our design is synthesized into RTL, and the hardware results are analyzed. Different from traditional hardware design, RTL needs to be implemented manually. Through high-level synthesis software, the time for hardware implementation can be greatly reduced, so that more time can be spent on architecture exploration at the beginning of hardware design.
[1] RUCK, Dennis W.; ROGERS, Steven K.; KABRISKY, Matthew. Feature selection using a multilayer perceptron. Journal of Neural Network Computing, 2.2: 40-48, 1990.
[2] MEDSKER, Larry R.; JAIN, L. C. Recurrent neural networks. Design and Applications, 5: 64-67, 2001.
[3] MIKOLOV, Tomáš, et al. Recurrent neural network based language model. In: Eleventh annual conference of the international speech communication association. 2010.
[4] WIERING, Marco A.; VAN OTTERLO, Martijn. Reinforcement learning. Adaptation, learning, and optimization, 2012.
[5] CHO, Kyunghyun, et al. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259, 2014.
[6] GOODFELLOW, Ian, et al. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
[7] LI, Mu, et al. Scaling distributed machine learning with the parameter server. In: 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14). p. 583-598. 2014.
[8] SIMONYAN, Karen; ZISSERMAN, Andrew. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[9] HUANG, Zhiheng; XU, Wei; YU, Kai. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991, 2015.
[10] HE, Kaiming, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. p. 770-778. 2016.
[11] HE, Di, et al. Dual learning for machine translation. Advances in neural information processing systems, 29: 820-828, 2016.
[12] WU, Yonghui, et al. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.
[13] ALBAWI, Saad; MOHAMMED, Tareq Abed; AL-ZAWI, Saad. Understanding of a convolutional neural network. In: 2017 International Conference on Engineering and Technology (ICET). Ieee, p. 1-6. 2017.
[14] GOYAL, Priya, et al. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
[15] WEI, Xuechao, et al. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In: Proceedings of the 54th Annual Design Automation Conference 2017. p. 1-6. 2017.
[16] XIA, Yingce, et al. Model-level dual learning. In: International Conference on Machine Learning. PMLR, p. 5383-5392. 2018.
[17] HASSAN, Hany, et al. Achieving human parity on automatic chinese to english news translation. arXiv preprint arXiv:1803.05567, 2018.
[18] HOLCOMB, Sean D., et al. Overview on deepmind and its alphago zero ai. In: Proceedings of the 2018 international conference on big data and education. p. 67-71. 2018.
[19] SAMAJDAR, Ananda, et al. Scale-sim: Systolic cnn accelerator simulator. arXiv preprint arXiv:1811.02883, 2018.
[20] HUANG, Yanping, et al. Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems, 32: 103-112, 2019.
[21] NARAYANAN, Deepak, et al. PipeDream: generalized pipeline parallelism for DNN training. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles. p. 1-15. 2019.
[22] Bahlmann, Claus, Bernard Haasdonk, and Hans Burkhardt. "Online handwriting recognition with support vector machines-a kernel approach." Proceedings eighth international workshop on frontiers in handwriting recognition. IEEE, 2002.
[23] Yadav, Anil Kumar, and Prerna Gaur. "AI-based adaptive control and design of autopilot system for nonlinear UAV." Sadhana 39.4 (2014): 765-783.
[24] F. Winterstein, S. Bayliss, and G. Constantinides, “High-level synthesis of dynamic data structures: A case study using Vivado HLS,” In Proc. International Conference on Field-Programmable Technology (FPT’13), 2013.
[25] J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang, “High-level synthesis for FPGAs: From prototyping to deployment,” In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2011.
[26] R. Nane, V. Sima, C. Pilato, J. Choi, B. Fort, A. Canis, Y. Chen, H. Hsiao, S. Brown, F. Ferrandi, J. Anderson, and K. Bertels, “ A survey and evaluation of FPGA highlevel synthesis tools,” In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2015.
[27] D. Hernandez-Lobato, J. Miguel Hernandez-Lobato, A. Shah and R. Prescott Adams, ”Predictive Entropy Search for Multi-objective Bayesian Optimizaton,” NIPS workshop on Bayesian optimization, 2015.