成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	王鈺堡 Wang, Yu-Bao
論文名稱：	矩陣相乘硬體架構設計空間探索 Matrix Multiplication Hardware Architecture Design Space Exploration
指導教授：	周哲民 Jou, Jer-Min
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	中文
論文頁數：	55
中文關鍵詞：	矩陣相乘演算法、高階合成、硬體設計、資料流排程
外文關鍵詞：	Matrix Multiplication Algorithms, Advanced Synthesis, Hardware Design, Data Flow Scheduling
相關次數：	點閱：156 下載：14
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

機器學習中的神經網路已被廣泛應用在各種領域，需要事先將訓練資料貼上標籤的監督式學習，和監督式學習相反，不需要依靠標籤好的訓練集資料，從訓練集中找尋訓練資料淺在規律的無監督學習，但無論是監督或無監督網路，其訓練被後都是由大量的運算所組成。一種常用的神經網路運算，例如捲積運算，可以將其運算轉換為一般常見的矩陣相乘，只需在資料輸入網路前，先將資料有順序的排序好，便能將捲積神經網路的捲積運算轉換為普通的矩陣運算，但是這樣依然無法解決海量運算所帶來的時間成本問題。因此本文在矩陣相乘演算法硬體設計之基礎上，提出一種資料流優化設計方法，該方法能有效地在時間上最大化利用硬體資源，最終達到整體資料流運行之最優化設計。並結合高階合成軟體(Vivado High Level Synthesis, Vivado HLS)將我們之設計合成RTL，並分析其硬體結果。有別於傳統之硬體設計，需要透過人工方式實現RTL，透過高階合成軟體，可以大大的降低硬體實現上的時間，讓更多的時間能用在硬體設計之初的架構探索上。

Neural networks in machine learning have been widely used in various fields. Supervised learning requires labeling training materials in advance. Contrary to supervised learning, it does not need to rely on labeled training set data to find training data from the training set. It is shallow in regular unsupervised learning, but whether it is a supervised or unsupervised network, its training is composed of a large number of operations. A commonly used neural network operation, such as convolution operation, can be converted into a common matrix multiplication. It only needs to sort the data in order before entering the data into the network. The convolution operation of the network is converted into an ordinary matrix operation, but this still cannot solve the time cost problem caused by massive operations. Therefore, based on the hardware design of the matrix multiplication algorithm, this paper proposes a data flow optimization design method, which can effectively maximize the use of hardware resources in time, and finally achieve the optimal design of the overall data flow operation. Combined with high-level synthesis software (Vivado High Level Synthesis, Vivado HLS), our design is synthesized into RTL, and the hardware results are analyzed. Different from traditional hardware design, RTL needs to be implemented manually. Through high-level synthesis software, the time for hardware implementation can be greatly reduced, so that more time can be spent on architecture exploration at the beginning of hardware design.

摘要 II
SUMMARY III
OUR PROPOSED DESIGN	III
EXPERIMENTS V
CONCLUSION VII
誌謝 VIII
目錄 IX
表目錄 XI
圖目錄 XII
第一章 緒論 1
1研究背景 1
2研究動機與目的 1
3論文架構 2
第二章 背景知識與相關研究 3
1機器學習(Machine Learning) 3
1.1監督式學習(Supervised learning) 5
1.2非監督式學習(Unsupervised Learning) 6
1.3強化式學習	(Reinforcement Learning)	6
2神經網路(Neural Network, NN) 7
2.1多層感知器(Multilayer Perceptron, MLP) 7
2.2捲積神經網路(Convolutional Neural Network, CNN) 9
2.2.1 捲積層(Convolutional Layer) 10
2.2.2 線性整流單元(Rectified Linear Unit, ReLU) 11
2.2.3 池化層(Pooling Layer) 12
2.2.4 全連接層(Fully Connected Layer) 13
3 Vivado高階合成(High-Level Synthesis, HLS) 13
3.1 Vivado高階合成(High-Level Synthesis, HLS)工具 14
3.1.1 HLS運作流程與合成行為 14
3.1.2 HLS設計流程 16
3.2 HLS常用的優化指令(optimization directives) 17
3.2.1 迴圈管線化(Loop Pipeline) 17
3.2.2 迴圈展開(Loop Unrolling) 18
3.2.3 陣列分塊(Array Partition) 19
3.2.4 資料流(Dataflow) 19
3.3 HLS 設計空間探索問題 19
3.4 高階合成設計方法文獻回顧 20
第三章 矩陣相乘硬體設計空間探索 21
1 演算法分析 21
2 運算資料切塊 23
3 C程式碼設計 23
3.1 矩陣資料切塊設計之運算行為 25
3.2 軟體執行之行為分析 26
3.3 C程式碼之資料流排程分析 28
4 巨集資料流排程優化設計法分析 28
4.1 巨集資料流管線化設計分析 29
4.2 雙緩衝器(Double Buffer)設計 30
4.3 巨集資料流管線化設計之優化 30
第四章 矩陣相乘硬體架構設計 34
1 矩陣相乘硬體架構圖 34
2 Input Local Buffer切塊及資料流設計 35
3 Data_In Hardward-Based Dataflow	38
3.1 I1_Loop Hardward-Based Dataflow 38
3.2 I2_Loop Hardward-Based Dataflow 41
4 Data_Computation Hardward-Based Dataflow 43
5 Data_Out Hardward-Based Dataflow 45
6 控制單元	47
第五章 實驗環境與數據分析 49
第六章 結論與未來展望 53
參考文獻 54
                                    

[1] RUCK, Dennis W.; ROGERS, Steven K.; KABRISKY, Matthew. Feature selection using a multilayer perceptron. Journal of Neural Network Computing, 2.2: 40-48, 1990.
[2] MEDSKER, Larry R.; JAIN, L. C. Recurrent neural networks. Design and Applications, 5: 64-67, 2001.
[3] MIKOLOV, Tomáš, et al. Recurrent neural network based language model. In: Eleventh annual conference of the international speech communication association. 2010.
[4] WIERING, Marco A.; VAN OTTERLO, Martijn. Reinforcement learning. Adaptation, learning, and optimization, 2012.
[5] CHO, Kyunghyun, et al. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259, 2014.
[6] GOODFELLOW, Ian, et al. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
[7] LI, Mu, et al. Scaling distributed machine learning with the parameter server. In: 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14). p. 583-598. 2014.
[8] SIMONYAN, Karen; ZISSERMAN, Andrew. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[9] HUANG, Zhiheng; XU, Wei; YU, Kai. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991, 2015.
[10] HE, Kaiming, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. p. 770-778. 2016.
[11] HE, Di, et al. Dual learning for machine translation. Advances in neural information processing systems, 29: 820-828, 2016.
[12] WU, Yonghui, et al. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.
[13] ALBAWI, Saad; MOHAMMED, Tareq Abed; AL-ZAWI, Saad. Understanding of a convolutional neural network. In: 2017 International Conference on Engineering and Technology (ICET). Ieee, p. 1-6. 2017.
[14] GOYAL, Priya, et al. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
[15] WEI, Xuechao, et al. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In: Proceedings of the 54th Annual Design Automation Conference 2017. p. 1-6. 2017.

[16] XIA, Yingce, et al. Model-level dual learning. In: International Conference on Machine Learning. PMLR, p. 5383-5392. 2018.
[17] HASSAN, Hany, et al. Achieving human parity on automatic chinese to english news translation. arXiv preprint arXiv:1803.05567, 2018.
[18] HOLCOMB, Sean D., et al. Overview on deepmind and its alphago zero ai. In: Proceedings of the 2018 international conference on big data and education. p. 67-71. 2018.
[19] SAMAJDAR, Ananda, et al. Scale-sim: Systolic cnn accelerator simulator. arXiv preprint arXiv:1811.02883, 2018.
[20] HUANG, Yanping, et al. Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems, 32: 103-112, 2019.
[21] NARAYANAN, Deepak, et al. PipeDream: generalized pipeline parallelism for DNN training. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles. p. 1-15. 2019.
[22] Bahlmann, Claus, Bernard Haasdonk, and Hans Burkhardt. "Online handwriting recognition with support vector machines-a kernel approach." Proceedings eighth international workshop on frontiers in handwriting recognition. IEEE, 2002.
[23] Yadav, Anil Kumar, and Prerna Gaur. "AI-based adaptive control and design of autopilot system for nonlinear UAV." Sadhana 39.4 (2014): 765-783.
[24] F. Winterstein, S. Bayliss, and G. Constantinides, “High-level synthesis of dynamic data structures: A case study using Vivado HLS,” In Proc. International Conference on Field-Programmable Technology (FPT’13), 2013.
[25] J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang, “High-level synthesis for FPGAs: From prototyping to deployment,” In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2011.
[26] R. Nane, V. Sima, C. Pilato, J. Choi, B. Fort, A. Canis, Y. Chen, H. Hsiao, S. Brown, F. Ferrandi, J. Anderson, and K. Bertels, “ A survey and evaluation of FPGA highlevel synthesis tools,” In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2015.
[27] D. Hernandez-Lobato, J. Miguel Hernandez-Lobato, A. Shah and R. Prescott Adams, ”Predictive Entropy Search for Multi-objective Bayesian Optimizaton,” NIPS workshop on Bayesian optimization, 2015.

校外：立即公開

簡易檢索 / 詳目顯示

相關論文