簡易檢索 / 詳目顯示

研究生: 賴翰樟
Lai, Han-Zhang
論文名稱: 可動態組合之可重構DNN加速器
Reconfigurable DNN Accelerator With Dynamic Fission
指導教授: 周哲民
Jou, Jer-Min
共同指導教授: 郭致宏
Kuo, Chih-Hung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 75
中文關鍵詞: 深度神經網絡資料靜態階層式控制單元分散式控制單元可重構指令集微指令
外文關鍵詞: Deep Neural Network, Data Stationary, Hierarchical Control Unit, Distributed COntrol Unit, Reconfigurable Accelerator, ISA, Micro Program
相關次數: 點閱:25下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,深度學習的發展和應用已經變得越來越廣泛,在語音識別、圖像識別和自然語言處理等多個領域取得了顯著的進展。然而,隨著神經網路模型的不斷擴大和複雜化,需要大量的計算資源來支援模型的訓練和推論,研究界一直在探索靈活且高效的通用加速器基底。考慮到硬體開發週期長且成本高,並且硬體的發展可能無法跟上演演算法的快速發展,因此硬體通用性對加速器而言是非常重要,為了滿足後續演算法的發展需求,在本文中,我們提出了一種基於階層化管線設計的可重構深度神經網路加速器。為了獲得良好的性能並避免多級管線不平衡的問題,我們的加速器架構設計採用了三種控制特性:資料靜態(Data-Stationary)、分散式(Distributed) 和分層控制(Hierarchy)。我們採用了新的嵌套循環管線控制協議,形成了高效的並行動態管線控制器。同時,我們還開發了一種新的指令集架構,實現了通用的神經網路計算,使得加速器能夠應付不同神經網路運算並解決了管線不平衡的問題。此外,我們還開發了一種新的微指令架構,實現了加速器可動態組合的特性,使加速器的計算資源組合能夠按照需求組合成不同的形狀,並解決神經網路規格與運算資源不匹配的情況,減少運算資源閒置。我們將上述加速器的特性發展成模擬器,以加速設計空間探索。實驗結果表明,我們提出的加速器能夠有效地結合可重構的靈活性和多級管線的並行性,加速深度學習工作負載。作為邊緣推論設備,我們的加速器能夠在LeNet-5、AlexNet、VGG-16、SqueezeNet和神經矩陣乘法等計算密集層上實現持續高效的加速。

    In recent years, the development and application of deep learning have become increasingly widespread. The research community has been exploring flexible and efficient general-purpose accelerator architectures. To achieve high performance and avoid multi-level pipeline imbalance issues, our accelerator architecture incorporates three control characteristics: data stationary, distributed, and hierarchical control. We adopt a novel nested loop pipeline control protocol, forming an efficient parallel dynamic pipeline controller.We have developed a new instruction set architecture that enables general neural network computations, allowing the accelerator to handle different neural network operations. Furthermore, we have designed a new microinstruction architecture that enables dynamic fission of the accelerator, allowing the computation resources to be configured according to specific requirements and addressing mismatches between neural network specifications and computational resources, thereby reducing resource idle time. We have developed a simulator based on the aforementioned accelerator features to expedite hardware architecture exploration. Experiment results demonstrate that our proposed accelerator effectively combines reconfigurable flexibility and multi-level pipeline parallelism, accelerating deep learning workloads.

    摘要 I SUMMARY II METHOD AND ARCHITECTURE III RESULT AND DISCUSSION VI CONCLUSION IX 誌謝 X 目錄 XI 表目錄 XIII 圖目錄 XIV 第一章 緒論 1 1.1前言 1 1.2研究動機與目的 2 1.3研究貢獻 2 1.4論文架構 3 第二章 背景知識與相關研究 4 2.1深度神經網路(Deep Neural Network, DNN) 4 2.2卷積神經網路(Convolutional Neural Network, CNN) 6 2.3脈動陣列(Systolic Array, SA) 12 2.4硬體模擬器相關文獻回顧 14 第三章 可動態組合之可重構DNN運算加速器設計 18 3.1加速器設計理念 19 3.2加速器架構概述 25 3.3運算單元之可重構性和可組合性 27 3.4控制單元之設計 30 3.5加速器執行流程 40 第四章 週期準確之硬體模擬器設計 41 4.1 C硬體模擬器之輸入與輸出 41 4.2 C硬體模擬器之內部架構 43 4.3系統驗證流程 46 第五章實驗結果與討論 47 5.1開發平台與硬體規格 47 5.2測試基準(Benchmark) 48 5.3硬體模擬器的可靠度以及效率 54 5.4可重構性與可組合性對加速器的影響 56 5.5硬體特性對資料流的影響 63 第六章結論與未來展望 72 參考文獻 73

    [1] M. Capra et al., “An updated survey of efficient hardware architectures for accelerating deep convolutional neural networks,” Future Internet, 2020.
    [2] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
    [3] G. Chen, C. Parada, and G. Heigold, “Small-footprint keyword spotting using deep neural networks,” IEEE conference on speech and signal processing, 2014.
    [4] L. Tai and M. Liu, “Deep-learning in mobile robotics-from perception to control systems: A survey on why and why not,” arXiv preprint arXiv:1612.07139, 2016.
    [5] Y.-H. Chen et al., “Using dataflow to optimize energy efficiency of deep neural network accelerators,” IEEE MICRO, 2017.
    [6] Y.-H. Chen, T. Krishna, J. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” in International Solid-State Circuits Conference, 2016.
    [7] A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. S. Emer, S. W. Keckler, and W. J. Dally, “SCNN: an accelerator for compressed-sparse convolutional neural networks,” CoRR, vol. abs/1708.04485, 2017.
    [8] Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam, “Dadiannao: A machine-learning supercomputer,” in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-47. Washington, DC, USA: IEEE Computer Society, 2014.
    [9] Y.-H. Lai et al., “SuSy: A programming model for productive construction of high-performance systolic arrays on FPGAs,” in Proc. IEEE/ACM 39th Int. Conf. Comput.-Aided Des., pp. 1–9, 2020.
    [10] Rosenblatt, F. , “The perceptron: A probabilistic model for information storage and organization in the brain. ” Psychological Review, 386–408, 1958.
    [11] Z. -Q. Zhao, P. Zheng, S. -T. Xu and X. Wu, “Object Detection With Deep Learning: A Review, ” in IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 11, pp. 3212-3232, Nov. 2019.
    [12] Taud H, Mas JF, “Multilayer perceptron (MLP). In: Geomatic approaches for modeling land change scenarios,” Springer, Cham, pp 451–455, 2018
    [13] Dzmitry Bahdanau, Kyunghyu n Cho, and YoshuaBengio. “Neural Machine Translation by Jointly Learning to Alignand Translate,” in 3rd International Conference for Learning Representations (ICLR),San Diego,California,USA, 2015.
    [14] L. Deng, G. Hinton and B. Kingsbury, “New types of deep neural network learning for speech recognition and related applications: an overview,” IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, pp. 8599-8603, 2013.
    [15] E. Cambria and B. White, “Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article],” in IEEE Computational Intelligence Magazine, vol. 9, no. 2, pp. 48-57, May 2014.
    [16] RUCK, Dennis W.; ROGERS, Steven K.; KABRISKY, Matthew. “Feature selection using a multilayer perceptron,” Journal of Neural Network Computing, 2.2: 40-48, 1990.
    [17] Goodfellow, I. et al. “Generative adversarial nets,” Adv. Neural Inf. Process. Syst. 27, 2672–2680, 2014.
    [18] J. Cong, B. Xiao, “Minimizing Computation in Convolutional Neural Networks, in Artificial Neural Networks and Machine Learning,” ICANN, 2014.
    [19] Y.J. Wai, Z. Yussof, et al., “Fixed Point Implementation of Tiny-Yolo-v2 using OpenCL on FPGA,” International Journal of Advanced Computer Science & Applications, 2018.
    [20] J. Turian, J. Bergstra, and Y. Bengio, “Quadratic features and deep architectures for chunking,” in Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2009.
    [21] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” Haifa, pp. 807–814, 2010.
    [22] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
    [23] LeCun Y, Jackel LD, Bottou L, Cortes C, Denker JS, Drucker H, Guyon I, Muller UA, Sackinger E, Simard P, et al. “Learning algorithms for classification: a comparison on handwritten digit recognition,” Neural Netw Stat Mech Perspect. 1995.
    [24] Krizhevsky A, Sutskever I, Hinton GE. “Imagenet classification with deep convolutional neural networks,” Commun ACM. 2017.
    [25] Poma, Xavier Soria, Edgar Riba, and Angel Sappa. “Dense extreme inception network: Towards a robust cnn model for edge detection.” Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2020.
    [26] K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for Image Recognition,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770-778, 2016.
    [27] Keren, Gil, and Björn Schuller. “Convolutional RNN: an enhanced model for extracting features from sequential data.” 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, 2016.
    [28] Chollet, François. “Xception: Deep learning with depthwise separable convolutions.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
    [29] Samajdar, A., Zhu, Y., Whatmough, P., Mattina, M., Krishna, T. “SCALE-Sim: Systolic CNN Accelerator Simulator.” arXiv:1811.02883v2 ,2019.
    [30] Y. S. Shao, B. Reagen, G.-Y. Wei, and D. Brooks, “Aladdin: A prertl, power-performance accelerator simulator enabling large design space exploration of customized architectures,” in Proc. Int. Symp. Comput. Archit., pp. 97–108, 2014.
    [31] LeCun, Yann. “LeNet-5, convolutional neural networks.” ,2015.
    [32] Krizhevsky, A., Sutskever, I., and Hinton, G. E. “ImageNet classification with deep convolutional neural networks.” In NIPS, pp. 1106–1114, 2012.
    [33] Simonyan, Karen, and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556, 2014.
    [34] Iandola, Forrest N., et al. “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size.” arXiv preprint arXiv:1602.07360, 2016.

    下載圖示 校內:2024-08-01公開
    校外:2024-08-01公開
    QR CODE