| 研究生: |
李冠賢 Lee, Kuan-Hsien |
|---|---|
| 論文名稱: |
可重新配置與虛擬化之類神經網路處理器 A Reconfigurable and Virtualizable Neural Net Processor |
| 指導教授: |
陳中和
Chen, Chung-Ho |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 英文 |
| 論文頁數: | 58 |
| 中文關鍵詞: | 類神經網路 、硬體加速神經網路訓練 、虛擬化 、多層感知器神經網路 、霍普菲爾網路 、倒傳遞演算法 、虛擬機器 |
| 外文關鍵詞: | neural network, on-chip training, virtualization, multi-layer perceptron, Hopfield neural network, error-backpropagation, virtual machine |
| 相關次數: | 點閱:128 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
我們提出了一個可重新配置與虛擬化之類神經網路加速處理器,並且介紹了其硬體架構、軟硬體的溝通與硬體的虛擬化機制。我們提出的硬體架構能讓各種不同拓樸的類神經網路應用可以透過此硬體加速,其中包含了多層感知器類神經網路與霍普菲爾網路。倒傳遞演算法為最常見且實用的多層感知器類神經網路學習演算法。但倒傳遞演算法的網路訓練過程可能會耗費許多的時間,而若用硬體來平行處理作加速又會遇到記憶體存取的瓶頸造成效能不佳。為此我們提出一種存放類神經網路權重值的記憶體排列方式來消除記憶體存取的瓶頸,以此增進在硬體上處理倒傳遞演算法加速的效能。
此外,我們提出了一個建立在CASL hypervisor上綜合虛擬硬體與直接存取的虛擬化架構,在此虛擬化架構下可讓此類神經網路加速處理器達到效能的平衡且不影響整體效能。最後,我們在硬體上增加了輔助虛擬化所需的單元,使hypervisor 能夠公平且有效率的控管硬體資源。
In this thesis, we present a reconfigurable and virtualizable neural net processor, and introduce the hardware architecture, software-hardware co-design, and the device virtualization mechanism. The proposed hardware architecture is reconfigurable to accommodate several different neural network topologies and applications of MLP and Hopfield network in a single chip. To eliminate the memory bottleneck of error-backpropagation neural network training algorithm, we propose a weight memory management method to interleave the memory access order, and it improves the performance of on-chip training.
In addition, we propose a hybrid device-emulation and direct-access virtualization mechanism based on the CASL hypervisor to achieve low performance overhead and workload balance among multiple virtual machines. With the dual-bank weight memory design and other hardware supports, the hypervisor can manage the hardware resource effectively and fairly.
[1] Z. Lin, Y. Dong, Y Li, T. Watanabe, “A hybrid architecture for efficient FPGA-based implementation of multilayer neural network,” IEEE Asia Pacific Conference on Circuits and Systems. APCCAS’10, Dec. 2010, pp.616-619.
[2] N. Nedjah, R. M. da Silva, L. de Macedo Mourelle, “Compact yet efficient hardware implementation of artificial neural networks with customized topology,” Expert Systems with Applications, vol. 39, Issue 10, Aug. 2012.
[3] N. Nedjah et al., “Dynamic MAC-Based Architecture of Artificial Neural Networks Suitable for Hardware Implementation on FPGAs,” Neurocomputing, vol. 72, nos. 10-12, pp. 2171-2179, 2009.
[4] I. D. dos Santos Miranda and A. I. A. Cunha, “ASIC design of a novel high performance neuroprocessor architecture for multi layered perceptron networks,” Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design. New York, NY, USA, 2009, pp. 1-6.
[5] S. Himavathi, D. Anitha, and A. Muthuramalingam, “Feedforward neural network implementation in FPGA using layer multiplexing for effective resource utilization,” IEEE Transactions on Neural Networks, vol.18, no.3, pp. 880-888, 2007.
[6] A. Youssef, K. Mohammed, and A. Nasar, “A Reconfigurable, Generic and Programmable Feed Forward Neural Network Implementation in FPGA,” in International Conference on Modelling and Simulation. UKSim-AMSS’12, Cambridge, UK, Mar. 2012.
[7] P. O. Domingos, F. M. Silva, and H. C. Neto, “An efficient and scalable architecture for neural networks with backpropagation learning,” in Proc. Field Programmable Logic and Application, Aug. 2005, pp. 89-94.
[8] A. Farmahini-Farahani, S. M. Fakhraie, and S. Safari, “Scalable architecture for on-chip neural network training using swarm intelligence,” in Proc. Design, Automation and Test in Europe Conf. DATE’08, Mar. 2008.
[9] R. J. Aliaga, et al., “Multiprocessor SoC implementation of neural network training on FPGA,” in International Conference on Advances in Electronics and Micro-electronics. ENICS’08, 2008, pp. 149-154
[10] R. J. Aliaga, et al., “System-on-Chip Implementation of Neural Network Training on FPGA,” International Journal On Advances in Systems and Measurements, vol. 2, no. 1, pp. 44-55, 2009.
[11] E. Ordoñez-Cardenas, and R. de J. Romero-Troncoso, “MLP neural network and on-line backpropagation learning implementation in a low-cost fpga,” in Proc. ACM Great Lakes symposium on VLSI. GLSVLSI’08, New York, NY, USA, 2008, pp. 333-338.
[12] Y. Sun, and A. C. Cheng, “Machine learning on-a-chip: A high-performance low-power reusable neuron architecture for artificial neural networks in ECG classifications,” Computers in Biology and Medicine, pp. 751-757, 2012.
[13] C. T. Liu, “CASL Hypervisor and its Virtualization Platform,” 2012 master thesis of National Cheng Kung University, Tainan, Jul., 2012.
[14] O. L. Mangasarian and W. H. Wolberg, “Cancer diagnosis via linear programming,” SIAM News, vol. 23, no. 5, pp. 1-18, Sep. 1990.
[15] W. H. Wolberg and O. L. Mangasarian, “Multisurface method of pattern separation for medical diagnosis applied to breast cytology,” in Proc. National Academy of Sciences, vol. 87, U.S.A., Dec. 1990, pp. 9193-9196.
[16] O. L. Mangasarian, R. Setiono, and W. H. Wolberg, “Pattern recognition via linear programming: Theory and application to medical diagnosis,” in Large-scale numerical optimization, Philadelphia, U.S.A., 1990, pp. 22-30.
[17] K. P. Bennett and O. L. Mangasarian, “Robust linear programming discrimination of two linearly inseparable sets,” Optimization Methods and Software, vol. 1, pp. 23-34, 1992
[18] A. S. Georghiades and P. N. Belhumeur, “From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 643-660, 2001.
[19] R. A. Fisher, “The use of multiple measurements in taxonomic problems,” Annals of eugenics, vol. 7, no. 2, pp. 179-188, Sep. 1936.
[20] http://en.wikipedia.org/wiki/Artificial_neuron
[21] http://en.wikipedia.org/wiki/Multilayer_perceptron
[22] http://en.wikipedia.org/wiki/Backpropagation
[23] https://en.wikipedia.org/wiki/Hopfield_network
[24] J. Misra and I. Saha, “Artificial neural networks in hardware: A survey of two decades of progress,” Neurocomputing, vol. 74, no. 1, pp. 239-255, Dec 2010.
[25] N. Izeboudjen, C. Larbes, and A. Farah, “A new classification approach for neural networks hardware: from standards chips to embedded systems on chip,” Artificial Intelligence Review, pp. 1-44, 2012.
[26] F. M. Dias, A. Antunes, and A. M. Mota, “Artificial neural networks: a review of commercial hardware,” Engineering Applications of Artificial Intelligence, vol. 17, no. 8, pp. 945-952, 2004.
[27] Y. Liao, “Neural networks in hardware: A survey,” Department of Computer Science, University of California, 2001.
[28] H. Raj and K. Schwan, “High performance and scalable I/O virtualization via self-virtualized devices,” in Proc. international symposium on High performance distributed computing, HPDC’07, New York, U.S.A., 2007, pp. 179-188.
[29] Y. Dong, et al., “High performance network virtualization with SR-IOV,” Journal of Parallel and Distributed Computing, vol. 72, no. 11, pp. 1471-1480, Nov. 2012.
[30] P. Barham, et al., “Xen and the art of virtualization,” in ACM SIGOPS Operating Systems Review, vol. 37, no. 5, Dec. 2003, pp. 164-177.
[31] M. H. Hassoun, Fundamentals of artificial neural networks, 1st ed. MIT Press Cambridge, MA, USA, 1995.
[32] Y. Ago, Y. Ito, and K. Nakano, “An FPGA implementation for neural networks with the FDFM processor core approach,” International Journal of Parallel, Emergent and Distributed Systems, vol. 28, no. 4, pp. 308-320, 2013.
[33] K. Bache and M. Lichman, UCI Repository for Machine Learning Databases, Dept. of Information and Computer Sciences, Univ. of California, Irvine, http://www.ics.uci.edu/~mlearn/MLRepository.html, 2013.
[34] D. Bhattacharjee, et al., “A Parallel Framework for Multilayer Perceptron for Human Face Recognition,” International Journal of Computer Science and Security, IJCSS’10, vol. 3, no. 6, pp. 491-507, 2010.
[35] Wide I/O Single Data Rate, Standard J. E. D. E. C. JESD229, Dec. 2011.
[36] M. Jung, et al., “TLM modelling of 3D stacked wide I/O DRAM subsystems: a virtual platform for memory controller design space exploration,” in Proc. 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, New York, NY, USA, 2013.
[37] J. Seo, et al., “A 45 nm CMOS neuromorphic chip with a scalable architecture for learning in networks of spiking neurons,” in Proc. Custom Integrated Circuits Conference, CICC’11, San Jose, CA, USA, 2011, pp. 1–4.
[38] P. Merolla, et al., “A digital neurosynaptic core using embedded crossbar memory with 45pj per spike in 45nm,” in Proc. Custom Integrated Circuits Conference,CICC’11, San Jose, CA, USA, 2011.