簡易檢索 / 詳目顯示

研究生: 張育倫
Chang, Yu-Lun
論文名稱: 協同設計神經網路模型和硬體架構的加速器設計
Accelerator Design for Co-design of Neural Network Model and HW Architecture
指導教授: 周哲民
Jou, Jer-Min
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 64
中文關鍵詞: 協同設計神經網路模型設計硬體架構設計硬體映射設計加速器
外文關鍵詞: Co-design, Neural Network Model Design, Hardware Architecture Design, Hardware Mapping Design, Accelerator
相關次數: 點閱:103下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現今深度神經網路(DNN)應用日廣而需求高增。然,其網路演算法設計日趨複雜,而與之搭配之加速硬體更難有效獲得。如何將二者整合而協同最佳化且快速設計,是ㄧ極富挑戰之問題。本文提出一個硬體架構來實現自動化推導出最佳之DNN演算法及其運作架構,來解決上述挑戰。本硬體加速器有效協同DNN演算法搜尋、其運行之架構設計空間推論和運算與資源之最適映射等動作,以推導出最佳DNN演算法及其運作架構與映射策略,且本加速器也可應用於一般之神經網路之推論與訓練。

    Deep Neural Network (DNN) applications are widely used and the demand is increasing. Of course, the design of its network algorithm is becoming more and more complicated, and it is more difficult to obtain the acceleration hardware with it. How to integrate the two for synergistic optimization and rapid design is a very challenging problem. This paper proposes a hardware architecture to automatically derive the optimal DNN algorithm and its operational architecture to solve the above challenges. This hardware accelerator effectively cooperates with the DNN algorithm search, its operational architecture design space inference, and the optimal mapping of operations and resources to derive the optimal DNN algorithm, its operational architecture and mapping, and this accelerator can also be applied Inference and training of general neural networks.

    摘要 I SUMMARY II OUR PROPOSED DESIGN II EXPERIMENTS VII CONCLUSION VIII 誌謝 IX 目錄 X 表目錄 XI 圖目錄 XI 第一章 緒論 1 1.1研究背景 1 1.2研究動機與目的 2 1.3論文架構 2 第二章 背景知識與相關研究 3 2.1 DNN模型架構設計搜索 3 2.2 DNN硬體加速架構設計搜索 5 2.3 DNN映射策略設計搜索 6 2.4 DNN模型與硬體和映射設計之搜索演算法 7 2.5 網路評測器 10 2.6 優化問題的描述 11 第三章 搜索深度神經網路設計空間研究 13 3.1 深度神經網路模型之設計空間 13 3.2 DNN硬體設計空間 15 3.3 映射策略設計空間 16 第四章 深度神經網路模型、硬體與映射協同設計 22 4.1 協同設計神經網路模型和硬體架構之目標式 22 4.2 DNN模型設計搜索 24 4.3 DNN映射策略設計搜索 27 4.4 DNN硬體設計搜索 32 4.5 協同DNN模型、映射策略與硬體設計之搜索 37 4.6 協同設計神經網絡模型和硬體架構的加速器實現 40 第五章 實驗結果與討論 43 5.1實驗環境與實驗方式 43 5.2神經網路模型搜索實驗結果 46 5.3硬體搜索實驗結果 48 5.4映射策略搜索實驗結果 50 5.5整合搜索實驗結果 51 5.6協同設計神經網絡模型和硬體與映射策略的加速器實驗結果 56 第六章 結論與未來展望 57 參考文獻 58

    [1] J. Lin, W.-M. Chen, Y. Lin, J. Cohn, C. Gan, and S. Han, “Mcunet: Tiny deep learning on iot devices,” in Advances in Neural Information Processing Systems (NeurIPS’20), 2020.
    [2] Y. Fu et al., “Auto-NBA: Efficient and effective search over the joint space of networks, bitwidths, and accelerators,” in The 38th International Conference on Machine Learning (ICML 2021), 2021.
    [3] Hyoukjun Kwon et al. Herald: Optimizing heterogeneous dnn accelerators for edge devices. arXiv preprint arXiv:1909.07437, 2019.
    [4] Hyoukjun Kwon et al. Understanding reuse, performance, and hardware cost of dnn dataflow: A data-centric approach. In Proc. of MICRO, pages 754–768, 2019.
    [5] Q. Lu, W. Jiang, X. Xu, Y. Shi, and J. Hu, “On neural architecture search for resource-constrained hardware platforms,” CoRR, vol. abs/1911.00105, 2019. [Online]. Available: http://arxiv.org/abs/1911.00105
    [6] W. Jiang, L. Yang, E. H. Sha, Q. Zhuge, S. Gu, Y. Shi, and J. Hu,“Hardware/software co-exploration of neural architectures,” CoRR, vol. abs/1907.04650, 2019. [Online]. Available: http://arxiv.org/abs/1907.04650
    [7] M. S. Abdelfattah, u. Dudziak, T. Chau, R. Lee, H. Kim, and N. D. Lane, “Best of both worlds: Automl codesign of a cnn and its hardware accelerator,” in Proceedings of the 57th ACM/EDAC/IEEE Design Automation Conference, 2020.
    [8] W. Chen, Y. Wang, S. Yang, C. Liu, and L. Zhang, “You only search once: A fast automation framework for single-stage dnn/accelerator co-design,” in Design, Automation & Test in Europe Conference & Exhibition, DATE, 2020, pp. 1283–1286. [Online]. Available: https://doi.org/10.23919/DATE48585.2020.9116474
    [9] [9] C. Hao, X. Zhang, Y. Li, S. Huang, J. Xiong, K. Rupnow, W.-m. Hwu, and D. Chen, “FPGA/DNN co-design: An efficient design methodology for iot intelligence on the edge,” in Proceedings of the 56th Annual Design Automation Conference, 2019.
    [10] L. Yang, Z. Yan, M. Li, H. Kwon, W. Jiang, L. Lai, Y. Shi, T. Krishna, and V. Chandra, Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks, 2020.
    [11] W. Jiang, L. Yang, S. Dasgupta, J. Hu, and Y. Shi, “Standing on the shoulders of giants: Hardware and neural architecture co-search with hot start,” 2020.
    [12] B. Lu, J. Yang, L. Y. Chen, and S. Ren, “Automating deep neural network model selection for edge inference,” in IEEE First International Conference on Cognitive Machine Intelligence (CogMI), 2019, pp. 184–193.
    [13] “Why tinyml is a giant opportunity,” https://venturebeat.com/2020/01/11/why-tinyml-is-a-giant-opportunity/.
    [14] Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 367–379.
    [15] Hans Eberle, Nils Gura, Daniel Finchelstein, Sheueling Chang-Shantz, and Vipul Gupta. 2009. Hardware accelerator for elliptic curve cryptography. US Patent 7,508,936.
    [16] Kartik Hegde, Rohit Agrawal, Yulun Yao, and Christopher W Fletcher. 2018. Morph: Flexible Acceleration for 3D CNN-based Video Understanding. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 933–946.
    [17] Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2018. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects. SIGPLAN Not. 53, 2 (March 2018), 461–475. https://doi.org/10. 1145/3296957.3173176
    [18] Wenyan Lu, Guihai Yan, Jiajun Li, Shijun Gong, Yinhe Han, and Xiaowei Li. 2017. Flexflow: A flexible dataflow accelerator architecture for convolutional neural networks. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 553–564.
    [19] Fengbin Tu, Shouyi Yin, Peng Ouyang, Shibin Tang, Leibo Liu, and Shaojun Wei. 2017. Deep convolutional neural network architecture with reconfigurable computation patterns. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 8 (2017), 2220–2233.
    [20] Angshuman Parashar, Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Stephen W Keckler, and Joel Emer. 2019. Timeloop: A Systematic Approach to DNN Accelerator Evaluation. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 304–315.
    [21] Byung Hoon Ahn, Prannoy Pilligundla, and Hadi Esmaeilzadeh. 2019. Reinforcement Learning and Adaptive Sampling for Optimized DNN Compilation. arXiv preprint arXiv:1905.12799 (2019).
    [22] Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, LeyuanWang, Yuwei Hu, Luis Ceze, et al. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 578–594.
    [23] Size Zheng, Yun Liang, Shuo Wang, Renze Chen, and Kaiwen Sheng. 2020. Flex- Tensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 859–873.
    [24] Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D Sculley. 2017. Google vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1487–1495.
    [25] Shail Dave, Youngbin Kim, Sasikanth Avancha, Kyoungwoo Lee, and Aviral Shrivastava. 2019. DMazerunner: Executing perfectly nested loops on dataflow accelerators. ACM Transactions on Embedded Computing Systems (TECS) 18, 5s (2019), 1–27.
    [26] Sheng-Chun Kao and Tushar Krishna. 2020. GAMMA: automating the HWmapping of DNN models on accelerators via genetic algorithm. In 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE.
    [27] Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, 785–794.
    [28] Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, et al. 2019. Learning to optimize halide with tree search and random programs. ACM Transactions on Graphics (TOG) 38, 4 (2019), 1–12.
    [29] Prasanth Chatarasi, Hyoukjun Kwon, Natesh Raina, Saurabh Malik, Vaisakh Haridas, Tushar Krishna, and Vivek Sarkar. 2020. MARVEL: A Decoupled Model-driven Approach for Efficiently Mapping Convolutions on Spatial DNN Accelerators. arXiv preprint arXiv:2002.07752 (2020).
    [30] Scott Kirkpatrick, C Daniel Gelatt, and Mario P Vecchi. 1983. Optimization by simulated annealing. science 220, 4598 (1983), 671–680.
    [31] Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O’Reilly, and Saman Amarasinghe. 2014. OpenTuner: An Extensible Framework for Program Autotuning. In International Conference on Parallel Architectures and Compilation Techniques. Edmonton, Canada.
    [32] Size Zheng, Yun Liang, Shuo Wang, Renze Chen, and Kaiwen Sheng. 2020. Flex-Tensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 859–873.
    [33] Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman Amarasinghe. 2019. Tiramisu: A polyhedral compiler for expressing fast and portable code. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization. IEEE Press, 193–205.
    [34] Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S Moses, Sven Verdoolaege, AndrewAdams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv:1802.04730 (2018).
    [35] John Henry Holland et al. 1992. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press.
    [36] David E Goldberg. 2006. Genetic algorithms. Pearson Education India.
    [37] Anjum A Mohammed and Gihan Nagib. 2012. Optimal routing in ad-hoc network using genetic algorithm. Int. J. Advanced Networking and Applications 3, 05 (2012), 1323–1328.
    [38] Praveen Ranjan Srivastava and Tai-hoon Kim. 2009. Application of genetic algorithm in software testing. International Journal of software Engineering and its Applications 3, 4 (2009), 87–96.
    [39] B. Jang, M. Kim, G. Harerimana, and J. W. Kim, “Q-learning algorithms: A comprehensive classification and applications,” IEEE Access, vol. 7, pp. 133 653–133 667, 2019.
    [40] R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Machine Learning, vol. 8, no. 3–4, p. 229–256, 1992.
    [41] Baker B, Gupta O, Naik N, et a1., Designing Neural Network Architectures Using Reinforcement Learning. International Conference on Learning Representations, 2017.
    [42] Zoph B, Le Q V, Neural Architecture Search with Reinforcement Learning. International Conference on Learning Representations, 2017.
    [43] Zoph B, Vasudevan V, Shlens J, et a1., Learning Transferable Architectures for Scalable Image Recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 8697-8710.
    [44] Pham H, Guan M, Zoph B, et a1., Efficient Neural Architecture Search Via Parameters Sharing. International Conference on Machine Learning, PMLR, 2018: 4095-4104.
    [45] Cai H, Chen T, Zhang W, et a1., Efficient Architecture Search by Network Transformation. Proceedings of the AAAI Conference on Artificial Intelligence: Volume 32. 2018.
    [46] Hsu C H, Chang S H, Liang J H, et a1., Monas: Multi-objective Neural Architecture Search Using Reinforcement Learning. arXiv Preprint, arXiv, 1806.10332, 2018.
    [47] Liu C, Zoph B, Neumann M, et a1., Progressive Neural Architecture Search. Proceedings of the European Conference on Computer Vision(eccv). 2018: 19-34.
    [48] Caih, Yang L, Zhang W, et a1., Path-level Network Transformation for Efficient Architecture Search. International Conference on Machine Learning, PMLR, 2018, 678-687.
    [49] Zhong Z, Yan J, Wuw, et a1., Practical Block-wise Neural Network Architecture Generation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 2423-2432.
    [50] M. Tan, B. Chen, R. Pang, V. Vasudevan, and Q. V. Le, “Mnasnet: Platform-aware neural architecture search for mobile,” CoRR, vol. abs/1807.11626, 2018. [Online]. Available: http://arxiv.org/abs/1807.11626
    [51] X. Zhang, W. Jiang, Y. Shi, and J. Hu, “When neural architecture search meets hardware implementation: from hardware awareness to co-design,” in IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2019, pp. 25–30.
    [52] G. Bender, H. Liu, B. Chen, G. Chu, S. Cheng, P. J. Kindermans, and Q. V. Le, “Can weight sharing outperform random architecture search? an investigation with tunas,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 14 311–14 320.
    [53] Y. Yang, C. Wang, L. Gong, and X. Zhou, “Fpnet: Customized convolutional neural network for FPGA platforms,” in International Conference on Field-Programmable Technology (ICFPT), 2019, pp. 399–402.
    [54] Fahlman S E, Lebiere C., The Cascade-correlation Learning Architecture. Proceedings of the 2nd International Conference on Neural Information Processing Systems, 1989: 524-532.
    [55] Kitano H., Designing Neural Networks Using Genetic Algorithms with Graph Generation System. Complex Systems, 1990, 4: 461-476.
    [56] Millerg G F, Todd P M, Hegde S U. Designing Neural Network Using Genetic Algorithms. Proceedings of the Third International Conference on Genetic Algorithms. 1989: 379-384.
    [57] Angeline P J, Saunders G M, Pollack J B, An Evolutionary Algorithm That Constructs Recurrent Neural Networks. IEEE Transactions on Neural Networks, 1994, 5(1): 54-65.
    [58] Floreano D, Durr P, Mattiussi C, Neuroevolution: From Architectures to Learning . Evolutionary Intelligence, 2008, 1(1): 47-62.
    [59] Real E, Moore S, Selle A, et a1., Large-scale Evolution of Image Classifiers. International Conference on Machine Learning, PMLR, 2017: 2902-2911.
    [60] Xie L, Yuille A. Genetic Cnn. Proceedings of the IEEE International Conference on Computer Vision. 2017: 1379-1388.
    [61] Liang J, Meyerson E, Miikkulainen R. Evolutionary Architecture Search for Deep Multitask Networks. Proceedings of the Genetic and Evolutionary Computation Conference, 2018: 466-473.
    [62] Zhu H, An Z, Yang C, et a1., Eena: Efficient Evolution of Neural Architecture. Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019: 0-0.
    [63] Yang Z, Wang Y, Chen X, et a1., Cars: Continuous Evolution for Efficient Neural Architecture Search. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, 1829-1838.
    [64] Wei C, Niu C, Tang Y, et a1., Npenas: Neural Predictor Guided Evolution for Neural Architecture Search[j], arXiv Preprint arXiv , 2003. 12857, 2020.
    [65] [9] Real E, Aggarwal A, Huang Y, et a1., Regularized Evolution for Image Classifier Architecture Search. Proceedings of the AAAI Conference on Artificial Intelligence: Volume 33, 2019: 4780-4789.
    [66] Xie S, Zheng H, Liu C, et a1., Snas: Stochastic Neural Architecture Search. arXiv Preprint arXiv: 1812. 09926, 2018.
    [67] Cai H, Zhu L, Han S, Proxylessnas: Direct Neural Architecture Search on Target Task and Hardware. International Conference on Learning Representations, 2018.
    [68] Chen X, Xie L, Wu J, et a1., Progressive Differentiable Architecture Search: Bridging the Depth Gap Between Search and Evaluation. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, 1294-1303.
    [69] Xu Y, Xie L, Zhang X, et a1., Pc-darts: Partial Channel Connections for Memory-efficient Architecture Search. International Conference on Learning Representations, 2019.
    [70] Chux, Zhou T, Zhang B, et a1., Fair Darts: Eliminating Unfair Advantages in Differentiable Architecture Search. European Conference on Computer Vision. Springer, 2020: 465-480.
    [71] H. Liu, K. Simonyan, and Y. Yang, “DARTS: differentiable architecture search,” in 7th International Conference on Learning Representations, ICLR, 2019.
    [72] B. Wu, X. Dai, P. Zhang, Y. Wang, F. Sun, Y. Wu, Y. Tian, P. Vajda, Y. Jia, and K. Keutzer, “Fbnet: Hardware-aware efficient convent design via differentiable neural architecture search,” in Conference on Computer Vision and Pattern Recognition, CVPR. Computer Vision Foundation / IEEE, 2019, pp. 10 734–10 742.
    [73] N. Nayman, A. Noy, T. Ridnik, I. Friedman, R. Jin, and L. Zelnik- Manor, “XNAS: neural architecture search with expert advice,” in Advances in Neural Information Processing System 32 (NeurIPS), 2019, pp. 1975–1985. [Online]. Available: http://papers.nips.cc/paper/8472-xnas-neural-architecture-search-with-expert-advice
    [74] N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games. Cambridge University Press, 2006.
    [75] J. Kivinen and M. K. Warmuth, “Exponentiated gradient versus gradient descent for linear predictors,” information and computation, vol. 132, no. 1, pp. 1–63, 1997.
    [76] Kandasamy, K., Neiswanger, W., Schneider, J., Poczos, B., & Xing, E. P. (2018). Neural Architecture Search with Bayesian Optimisation and Optimal Transport. In Advances in Neural Information Processing Systems.
    [77] L. Li and A. Talwalkar, “Random search and reproducibility for neural architecture search,” in Uncertainty in Artificial Intelligence. PMLR, 2020, pp. 367–377.
    [78] L. Zhang, Y. Yang, Y. Jiang, W. Zhu, and Y. Liu, “Fast hardware-aware neural architecture search,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020, pp. 2959–2967
    [79] H. Wang, Z. Wu, Z. Liu, H. Cai, L. Zhu, C. Gan, and S. Han, “HAT: Hardware-aware transformers for efficient natural language processing,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 7675–7688.
    [80] H. Bouzidi, H. Ouarnoughi, S. Niar, and A. A. E. Cadi, “Performance prediction for convolutional neural networks in edge devices,” 2020.
    [81] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248–255.
    [82] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012), 1097–1105.

    無法下載圖示 校內:2027-08-09公開
    校外:2027-08-09公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE