成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳識宇 Chen, Shih-Yu
論文名稱：	基於解析架構之可重組態卷積神經網路 Reconfigurable Convolutional Neural Network via Analytics Architecture
指導教授：	李國君 Lee, Gwo Giun
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2020
畢業學年度：	108
語文別：	英文
論文頁數：	140
中文關鍵詞：	卷積神經網路、可重組態、平台獨立、主成分分析、高斯─約當消去法
外文關鍵詞：	Convolutional neural network (CNN), Reconfigurable, Platform independent, Principal component analysis (PCA), Gauss-Jordan elimination (GJE)
相關次數：	點閱：116 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近來，由於需要高精度的人工智慧（artificial intelligence）算法變得極度複雜，並且邊緣裝置（edge devices）及物聯網（internet of things）生成的數據越來越多，因此本文提出了一種靈活且可重組態的（reconfigurable）卷積神經網路（convolutional neural network）。本文將卷積神經網路的卷積算法：（1）透過主成分分析（principal component analysis）進行特徵轉換為矩陣運算，使得其具有較高對稱性，或（2）透過高斯─約當消去法（Gauss-Jordan elimination）將運算轉換為較稀疏矩陣的基底（bases）；這兩者皆有助於在合成或重新組態轉換後的濾波器和基底時，減少運算數量、降低數據傳輸速率及存儲空間並縮短執行時間。此兩種方法都具備可重新組態與平台獨立（platform independent）的特性，因此它們適用具有不同內核數或內核大小的卷積層，並且可在任何平台上實行。此外，量化方法也被引入用於減少存儲空間及降低數據傳輸速率。最後，我們在四種不同的卷積神經網路模型進行評估，其中包括兩個著名的模型：ResNet-20和ResNet-50；結果顯示，提出的方法實際上是可重組態且能夠提高卷積神經網路的效率。

Recently, since artificial intelligence (AI) algorithms that require high accuracy are becoming exceedingly complex and Edge/IoT generated data increasingly prominent, a flexible reconfigurable convolutional neural network (CNN) is proposed. In this thesis, algorithmic convolutions for CNN are (1) eigen-transformed into matrix operations via principal component analysis (PCA) with higher symmetry, or (2) moved to higher sparsity bases through Gauss-Jordan elimination (GJE), which both facilitates a significant reduction in the number of operations, a lower data transfer rate/storage and a lower execution time when synthesizing or reconfiguring the transformed filters and bases. Both presented methods are reconfigurable and platform independent, so they are applicable for the convolution layer with different number of kernels or size of kernels on any platform with proper implementation or realization. Besides, quantization is also introduced to reduce data storage and lower data transfer rate. Finally, evaluation of proposed methods is conducted on four different CNN models, including two famous CNN models called ResNet-20 and ResNet-50; the results show that proposed methods actually are reconfigurable and improve the efficiency.

摘 要	iv
Abstract	v
誌 謝	vii
Table of Contents	ix
List of Tables	xiii
List of Figures	xviii
Chapter 1	Introduction	23
1	Objective	23
2	Motivation	23
3	Organization of this Thesis	24
4	Background Information	24
4.1	Algorithm/Architecture Co-design (AAC)	26
4.2	Deep Learning	28
4.3	Related Works	33
5	Contributions of this Thesis	38
Chapter 2	Proposed Methods	40
1	Eigen-Transformation Approach	42
1.1	Low Rank Approximation with Eigen-Transformation	47
2	Sparse Basis Approach	49
2.1	Sparse Matrix Representation	54
2.2	Low Rank Approximation with Sparse Basis Approach	55
2.3	More Characteristics and Properties	56
3	Reconfigurability via Efficient Methods	62
4	Analytical Characterization of the Computational Complexity	62
Chapter 3	Efficiency Improvement of the Convolutional Neural Network	67
1	CNN for TGH Microscopy Images	67
2	CNN for HAM10000 Dataset	70
3	Fixed Gabor Coefficients in Convolution Layer	72
3.1	For CNN Model of THG Microscopy Images	72
3.2	For CNN Model of HAM10000 Dataset	72
4	ResNet-20 on CIFAR-10 Dataset	72
5	ResNet-50 on ImageNet Dataset	75
6	Applied Sparse Basis with Low Rank Approximation and Quantization to ResNet-20 and ResNet-50	78
6.1	Configuration Exploration	78
Chapter 4	Experimental Results and Discussion	82
1	Result Overview	83
2	Implementation	84
3	Performance Metrics	85
4	Experiment on THG Microscopy Images	86
4.1	Experiment Setup	86
4.2	Overall Experimental Results	86
4.3	Efficient Convolution in Part I (Training Phase)	89
4.4	Efficient Convolution in Part II (Testing Phase)	89
5	Experiment on HAM10000 Dataset	93
5.1	Experiment Setup	93
5.2	Overall Experimental Results	94
5.3	Efficient Convolution in Part I (Training Phase)	97
5.4	Efficient Convolution in Part II (Testing Phase)	98
6	Experiment for ResNet-20 on CIFAR-10 Dataset	101
6.1	Experiment Setup	101
6.2	Overall Experimental Results	102
6.3	Apply Sparse Basis Approach with Low Rank Approximation and Quantization	105
7	Experiment for ResNet-50 on ImageNet Dataset	107
7.1	Experiment Setup	107
7.2	Overall Experimental Results	108
7.3	Apply Sparse Basis Approach with Low Rank Approximation and Quantization	110
8	Discussion	112
8.1	Fixed Coefficients in Convolution Layers	112
8.2	Applied Spare Basis Approach to ResNet-20 and ResNet-50	114
8.3	Possible Usages of Spare Basis Approach	114
8.4	Effect of Quantization	115
8.5	Data Accessibility	116
8.6	Comparison of Proposed Methods	116
8.7	Comparison of Related Works	117
Chapter 5	Conclusions and Future Works	120
1	Conclusions	120
2	Future Works	120
Acknowledgments	122
References	123
Appendix	132

                                    

[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Proc. Advances in neural information processing systems, 2012, pp. 1097-1105.
[2] L. Deng et al., "Recent advances in deep learning for speech research at Microsoft," in Proc. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 8604-8608.
[3] T. Karras, S. Laine, and T. Aila, "A style-based generator architecture for generative adversarial networks," in Proc. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 4401-4410.
[4] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, "Language models are unsupervised multitask learners," OpenAI Blog, vol. 1, iss. 8, 2019.
[5] D. Silver et al., "Mastering the game of Go with deep neural networks and tree search," Nature, vol. 529, iss. 7587, p. 484, 2016.
[6] G. G. Lee, C.-H. Huang, C.-F. Chen, and T.-P. Wang, "Complexity-Aware Gabor Filter Bank Architecture Using Principal Component Analysis," Journal of Signal Processing Systems, journal article vol. 89, iss. 3, pp. 431-444, December 01 2017, doi: 10.1007/s11265-017-1246-6.
[7] G. G. Lee, C.-F. Chen, and T.-P. Wang, "System-on-Chip Architectures for Data Analytics," in Handbook of Signal Processing Systems, S. S. Bhattacharyya, E. F. Deprettere, R. Leupers, and J. Takala Eds. Cham: Springer International Publishing, 2019, pp. 543-575.
[8] J. Song et al., "7.1 An 11.5TOPS/W 1024-MAC Butterfly Structure Dual-Core Sparsity-Aware Neural Processing Unit in 8nm Flagship Mobile SoC," in Proc. 2019 IEEE International Solid- State Circuits Conference - (ISSCC), 17-21 Feb. 2019 2019, pp. 130-132, doi: 10.1109/ISSCC.2019.8662476.
[9] "Azure IoT Edge." Microsoft Azure. https://azure.microsoft.com/en-us/services/iot-edge/ (accessed Oct. 28, 2019).
[10] "Edge TPU." Google Cloud. https://cloud.google.com/edge-tpu/ (accessed Oct. 28, 2019).
[11] "NVIDIA EGX Edge Computing Platform." NVIDIA. https://www.nvidia.com/en-us/data-center/products/egx-edge-computing/ (accessed Oct. 28, 2019).
[12] "Announcing Qt for MCUs - A comprehensive toolkit." Qt Blog. https://www.qt.io/blog/2019/08/21/announcing-qt-mcus (accessed Oct. 28, 2019).
[13] "TensorFlow Lite." TensorFlow. https://www.tensorflow.org/lite (accessed Oct. 28, 2019).
[14] "PyTorch Mobile." PyTorch. https://pytorch.org/mobile/home/ (accessed Oct. 28, 2019).
[15] "TVM Stack." TVM. https://tvm.ai/ (accessed Oct. 28, 2019).
[16] "ONNC." ONNC. https://onnc.ai/ (accessed Oct. 28, 2019).
[17] L. Gwo Giun, W. Ming-Jiun, L. He-Yuan, and L. Ron-Lai, "On the efficient algorithm/architecture co-exploration for complex video processing," in Proc. 2008 IEEE International Conference on Multimedia and Expo, 23 June-26 April 2008 2008, pp. 1057-1060, doi: 10.1109/ICME.2008.4607620.
[18] G. G. Lee, Y. Chen, M. Mattavelli, and E. S. Jang, "Algorithm/Architecture Co-Exploration of Visual Computing on Emergent Platforms: Overview and Future Prospects," IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, iss. 11, pp. 1576-1587, 2009, doi: 10.1109/TCSVT.2009.2031376.
[19] G. G. Lee and S. C. Kim, "Guest Editorial for the Special Section on “Algorithm Vs. Architectures: Opportunities and Challenges in Multicore/GPU DSP," Journal of Signal Processing Systems, journal article vol. 89, iss. 3, pp. 415-416, December 01 2017, doi: 10.1007/s11265-017-1294-y.
[20] G. G. Lee, M.-J. Wang, H.-Y. Lin, D. W.-C. Su, and B.-Y. Lin, "Algorithm/architecture co-design of 3-D spatio–temporal motion estimation for video coding," IEEE Transactions on Multimedia, vol. 9, iss. 3, pp. 455-465, 2007.
[21] A. S. Cassidy and A. G. Andreou, "Beyond Amdahl's law: An objective function that links multiprocessor performance gains to delay and energy," IEEE Transactions on Computers, vol. 61, iss. 8, pp. 1110-1126, 2011.
[22] M. Pelcat et al., "Models of Architecture: Reproducible Efficiency Evaluation for Signal Processing Systems," in Proc. 2016 IEEE International Workshop on Signal Processing Systems (SiPS), 26-28 Oct. 2016 2016, pp. 121-126, doi: 10.1109/SiPS.2016.29.
[23] G. G. Lee, C.-F. Chen, C.-J. Hsiao, and J.-C. Wu, "Bi-directional trajectory tracking with variable block-size motion estimation for frame rate up-convertor," IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 4, iss. 1, pp. 29-42, 2014.
[24] M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in Proc. European conference on computer vision, 2014: Springer, pp. 818-833.
[25] C. Szegedy et al., "Going deeper with convolutions," in Proc. Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
[26] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[27] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proc. Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[28] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proc. Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700-4708.
[29] Architectural framework for machine learning in future networks including IMT2020, Recommendation Y.3172, ITU, 2019.
[30] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, iss. 11, pp. 2278-2324, 1998, doi: 10.1109/5.726791.
[31] J. Schmidhuber, D. Wierstra, and F. Gomez, "Evolino: hybrid neuroevolution / optimal linear search for sequence learning," presented at the Proceedings of the 19th international joint conference on Artificial intelligence, Edinburgh, Scotland, 2005.
[32] M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt, "Sequential deep learning for human action recognition," in Proc. International workshop on human behavior understanding, 2011: Springer, pp. 29-39.
[33] F. A. Gers and E. Schmidhuber, "LSTM recurrent networks learn simple context-free and context-sensitive languages," IEEE Transactions on Neural Networks, vol. 12, iss. 6, pp. 1333-1340, 2001, doi: 10.1109/72.963769.
[34] S. Huang and W. Houfeng, "Bi-LSTM neural networks for Chinese grammatical error diagnosis," in Proc. Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016), 2016, pp. 148-154.
[35] I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," presented at the Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, Montreal, Canada, 2014.
[36] A. Bérard, O. Pietquin, C. Servan, and L. Besacier, "Listen and translate: A proof of concept for end-to-end speech-to-text translation," arXiv preprint arXiv:1612.01744, 2016.
[37] Y. Miao, M. Gowayyed, and F. Metze, "EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding," in Proc. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2015: IEEE, pp. 167-174.
[38] A. Graves, N. Jaitly, and A.-r. Mohamed, "Hybrid speech recognition with deep bidirectional LSTM," in Proc. 2013 IEEE workshop on automatic speech recognition and understanding, 2013: IEEE, pp. 273-278.
[39] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, iss. 8, pp. 1735-1780, 1997.
[40] K. Cho et al., "Learning phrase representations using RNN encoder-decoder for statistical machine translation," arXiv preprint arXiv:1406.1078, 2014.
[41] F. Chollet, "Xception: Deep learning with depthwise separable convolutions," in Proc. IEEE conference on computer vision and pattern recognition (CVPR), 2017, pp. 1251-1258.
[42] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception architecture for computer vision," in Proc. Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818-2826.
[43] A. G. Howard et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.
[44] "NVIDIA Deep Learning Accelerator." NVIDIA. http://nvdla.org/ (accessed Nov. 3, 2019).
[45] Y. Li and A. Pedram, "Caterpillar: Coarse grain reconfigurable architecture for accelerating the training of deep neural networks," in Proc. 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2017, pp. 1-10.
[46] A. Aimar et al., "NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps," IEEE Transactions on Neural Networks and Learning Systems, vol. 30, pp. 644-656, 2017.
[47] S. Han et al., "EIE: efficient inference engine on compressed deep neural network," ACM SIGARCH Computer Architecture News, vol. 44, iss. 3, pp. 243-254, 2016, doi: 10.1145/3007787.3001163.
[48] J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos, "Cnvlutin: Ineffectual-neuron-free deep neural network computing," ACM SIGARCH Computer Architecture News, vol. 44, iss. 3, pp. 1-13, 2016.
[49] J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe, "Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines," in Proc. Acm Sigplan Notices, 2013, vol. 48, no. 6: ACM, pp. 519-530.
[50] J. Ragan-Kelley et al., "Halide: decoupling algorithms from schedules for high-performance image processing," Communications of the ACM, vol. 61, iss. 1, pp. 106-115, 2017.
[51] N. Rotem et al., "Glow: Graph lowering compiler techniques for neural networks," arXiv preprint arXiv:1805.00907, 2018.
[52] "NVDLA Deep Learning Inference Compiler is Now Open Source." NVIDIA Developer Blog. https://devblogs.nvidia.com/nvdla/ (accessed Dec. 22, 2019).
[53] "NVDLA Open Source Software." https://github.com/nvdla/sw (accessed Dec. 22, 2019).
[54] Y.-H. Chen, T. Krishna, J. Emer, and V. Sze, "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks," IEEE Journal of Solid-State Circuits, vol. 52, iss. 1, pp. 127-138, 2017, doi: 10.1109/JSSC.2016.2616357.
[55] L. Du et al., "A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, iss. 1, pp. 198-208, 2018, doi: 10.1109/TCSI.2017.2735490.
[56] C. Dubout and F. Fleuret, "Exact acceleration of linear object detectors," in Proc. European Conference on Computer Vision, 2012, pp. 301-311.
[57] J. Cong and B. Xiao, "Minimizing computation in convolutional neural networks," in Proc. International conference on artificial neural networks, 2014, pp. 281-290.
[58] A. Lavin and S. Gray, "Fast algorithms for convolutional neural networks," in Proc. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4013-4021.
[59] C. Carranza, D. Llamocca, and M. S. Pattichis, "Fast 2D Convolutions and Cross-Correlations Using Scalable Architectures," IEEE Transactions on Image Processing, vol. 26, pp. 2230-2245, 2017.
[60] J. Yan, S. Yin, F. Tu, L. Liu, and S. Wei, "GNA: Reconfigurable and Efficient Architecture for Generative Network Acceleration," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, pp. 2519-2529, 2018.
[61] C.-C. J. Kuo and Y. Chen, "On data-driven saak transform," Journal of Visual Communication and Image Representation, vol. 50, pp. 237-246, 2018.
[62] Y. LeCun and C. Cortes, "MNIST handwritten digit database," 2010. [Online]. Available: http://yann.lecun.com/exdb/mnist/.
[63] K. Chellapilla, S. Puri, and P. Simard, "High performance convolutional neural networks for document processing," in, 2006.
[64] W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, "Learning structured sparsity in deep neural networks," presented at the Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 2016.
[65] S. Srinivas, A. Subramanya, and R. V. Babu, "Training Sparse Neural Networks," presented at the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 21-26 July 2017, 2017.
[66] T.-J. Yang, Y.-H. Chen, and V. Sze, "Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning," presented at the 2017 Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [Online]. Available: https://doi.org/10.1109/CVPR.2017.643.
[67] J. Lin, Y. Rao, J. Lu, and J. Zhou, "Runtime neural pruning," in Proc. Advances in Neural Information Processing Systems, 2017, pp. 2181-2191.
[68] J. Frankle and M. Carbin, "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks," 2019.
[69] E. Nurvitadhi, A. Mishra, Y. Wang, G. Venkatesh, and D. Marr, "Hardware accelerator for analytics of sparse data," presented at the Proceedings of the 2016 Conference on Design, Automation & Test in Europe, Dresden, Germany, 2016.
[70] X. Xie et al., "Exploiting Sparsity to Accelerate Fully Connected Layers of CNN-Based Applications on Mobile SoCs," ACM Trans. Embed. Comput. Syst., vol. 17, iss. 2, pp. 1-25, 2017, doi: 10.1145/3122788.
[71] A. Elafrou, V. Karakasis, T. Gkountouvas, K. Kourtis, G. Goumas, and N. Koziris, "SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms," ACM Trans. Math. Softw., vol. 44, iss. 3, pp. 1-32, 2018, doi: 10.1145/3134442.
[72] "cuSPARSE." NVIDIA. https://developer.nvidia.com/cusparse (accessed Dec. 1, 2019).
[73] "Trilinos Project." Trilinos. https://trilinos.github.io/ (accessed Dec. 1, 2019).
[74] X. He, Z. Zhou, and L. Thiele, "Multi-task zipping via layer-wise neuron sharing," presented at the Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montreal, Canada, 2018.
[75] J. Yu, A. Lukefahr, D. Palframan, G. Dasika, R. Das, and S. Mahlke, "Scalpel: Customizing dnn pruning to the underlying hardware parallelism," ACM SIGARCH Computer Architecture News, vol. 45, iss. 2, pp. 548-560, 2017.
[76] B. Liu, M. Wang, H. Foroosh, M. Tappen, and M. Penksy, "Sparse Convolutional Neural Networks," presented at the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 7-12 June 2015, 2015.
[77] H. Mao et al., "Exploring the Regularity of Sparse Structure in Convolutional Neural Networks," presented at the IEEE Conference on Computer Vision and Pattern Recognition Workshop, 05/24, 2017.
[78] T. Lei, Y. Zhang, S. I. Wang, H. Dai, and Y. Artzi, "Simple recurrent units for highly parallelizable recurrence," in Proc. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 4470-4481.
[79] V. Campos Camunez, B. Jou, X. Giró Nieto, J. Torres Viñals, and S.-F. Chang, "Skip RNN: learning to skip state updates in recurrent neural networks," in Proc. Sixth International Conference on Learning Representations: Monday April 30-Thursday May 03, 2018, Vancouver Convention Center, Vancouver:[proceedings], 2018, pp. 1-17.
[80] S. Li, C. Wu, H. Li, B. Li, Y. Wang, and Q. Qiu, "FPGA Acceleration of Recurrent Neural Network Based Language Model," in Proc. 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, 2-6 May 2015 2015, pp. 111-118, doi: 10.1109/FCCM.2015.50.
[81] J. C. Ferreira and J. Fonseca, "An FPGA implementation of a long short-term memory neural network," in Proc. 2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig), 30 Nov.-2 Dec. 2016 2016, pp. 1-8, doi: 10.1109/ReConFig.2016.7857151.
[82] Y. Guan, Z. Yuan, G. Sun, and J. Cong, "FPGA-based accelerator for long short-term memory recurrent neural networks," in Proc. 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), 16-19 Jan. 2017 2017, pp. 629-634, doi: 10.1109/ASPDAC.2017.7858394.
[83] Y. Zhang et al., "A Power-Efficient Accelerator Based on FPGAs for LSTM Network," in Proc. 2017 IEEE International Conference on Cluster Computing (CLUSTER), 5-8 Sept. 2017 2017, pp. 629-630, doi: 10.1109/CLUSTER.2017.45.
[84] C. Chen et al., "OCEAN: An on-chip incremental-learning enhanced processor with gated recurrent neural network accelerators," in Proc. ESSCIRC 2017 - 43rd IEEE European Solid State Circuits Conference, 11-14 Sept. 2017 2017, pp. 259-262, doi: 10.1109/ESSCIRC.2017.8094575.
[85] M. Price, J. Glass, and A. P. Chandrakasan, "14.4 A scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating," in Proc. 2017 IEEE International Solid-State Circuits Conference (ISSCC), 5-9 Feb. 2017 2017, pp. 244-245, doi: 10.1109/ISSCC.2017.7870352.
[86] S. Bang et al., "14.7 a 288µw programmable deep-learning processor with 270kb on-chip weight storage using non-uniform memory hierarchy for mobile intelligence," in Proc. 2017 IEEE International Solid-State Circuits Conference (ISSCC), 2017: IEEE, pp. 250-251.
[87] S. Han et al., "Ese: Efficient speech recognition engine with sparse lstm on fpga," in Proc. Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017: ACM, pp. 75-84.
[88] S. Wang et al., "C-lstm: Enabling efficient lstm using structured compression techniques on fpgas," in Proc. Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018: ACM, pp. 11-20.
[89] Z. Li et al., "E-RNN: Design optimization for efficient recurrent neural networks in FPGAs," in Proc. 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2019: IEEE, pp. 69-80.
[90] J. S. P. Giraldo and M. Verhelst, "Laika: A 5uW Programmable LSTM Accelerator for Always-on Keyword Spotting in 65nm CMOS," in Proc. ESSCIRC 2018 - IEEE 44th European Solid State Circuits Conference (ESSCIRC), 3-6 Sept. 2018 2018, pp. 166-169, doi: 10.1109/ESSCIRC.2018.8494342.
[91] C. Gao, D. Neil, E. Ceolini, S.-C. Liu, and T. Delbruck, "DeltaRNN: A Power-efficient Recurrent Neural Network Accelerator," presented at the Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CALIFORNIA, USA, 2018.
[92] R. Haeb-Umbach et al., "Speech Processing for Digital Home Assistants: Combining Signal Processing With Deep-Learning Techniques," IEEE Signal Processing Magazine, vol. 36, iss. 6, pp. 111-124, 2019, doi: 10.1109/MSP.2019.2918706.
[93] S. Liu, B. Rueckauer, E. Ceolini, A. Huber, and T. Delbruck, "Event-Driven Sensing for Efficient Perception: Vision and Audition Algorithms," IEEE Signal Processing Magazine, vol. 36, iss. 6, pp. 29-37, 2019, doi: 10.1109/MSP.2019.2928127.
[94] C. Wang, F. Jiang, and H. Yang, "A Hybrid Framework for Text Modeling with Convolutional RNN," presented at the Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 2017.
[95] M. Babaee, Z. Li, and G. Rigoll, "A dual CNN–RNN for multiple people tracking," Neurocomputing, vol. 368, pp. 69-83, 2019.
[96] S. T. Hsu, C. Moon, P. Jones, and N. Samatova, "A hybrid CNN-RNN alignment model for phrase-aware sentence classification," in Proc. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 2017, pp. 443-449.
[97] L. Chen, R. Jin, S. Zhang, S. Lee, Z. Chen, and D. Crandall, "A hybrid HMM-RNN model for optical music recognition," in Proc. Extended abstracts for the Late-Breaking Demo Session of the 17th International Society for Music Information Retrieval Conference, 2016.
[98] H. Liu, B. Lang, M. Liu, and H. Yan, "CNN and RNN based payload classification methods for attack detection," Knowledge-Based Systems, vol. 163, pp. 332-341, 2019.
[99] L. Guo, D. Zhang, L. Wang, H. Wang, and B. Cui, "CRAN: A Hybrid CNN-RNN Attention-Based Model for Text Classification," in Proc. International Conference on Conceptual Modeling, 2018: Springer, pp. 571-585.
[100] Y. Guan et al., "FP-DNN: An automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates," in Proc. 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2017: IEEE, pp. 152-159.
[101] J. Lee, C. Kim, S. Kang, D. Shin, S. Kim, and H.-J. Yoo, "Unpu: An energy-efficient deep neural network accelerator with fully variable weight bit precision," IEEE Journal of Solid-State Circuits, vol. 54, iss. 1, pp. 173-185, 2018.
[102] S. Zeng et al., "An Efficient Reconfigurable Framework for General Purpose CNN-RNN Models on FPGAs," in Proc. 2018 IEEE 23rd International Conference on Digital Signal Processing (DSP), 2018: IEEE, pp. 1-5.
[103] S. Li, D. Niu, K. T. Malladi, H. Zheng, B. Brennan, and Y. Xie, "Drisa: A dram-based reconfigurable in-situ accelerator," in Proc. 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2017: IEEE, pp. 288-301.
[104] S. Yin et al., "A high energy efficient reconfigurable hybrid neural network processor for deep learning applications," IEEE Journal of Solid-State Circuits, vol. 53, iss. 4, pp. 968-982, 2017.
[105] S.-Y. Chen, G. G. C. Lee, T.-P. Wang, C.-W. Huang, J.-H. Chen, and C.-L. Tsai, "Reconfigurable Edge via Analytics Architecture," presented at the 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), Hsinchu, Taiwan, March, 2019.
[106] D. Gabor, "Theory of communication. Part 1: The analysis of information," Journal of the Institution of Electrical Engineers-Part III: Radio and Communication Engineering, vol. 93, iss. 26, pp. 429-441, 1946.
[107] W. Jordan, Handbuch der Vermessungskunde. J.B. Metzler, 1888.
[108] B. Jacob et al., "Quantization and training of neural networks for efficient integer-arithmetic-only inference," in Proc. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2704-2713.
[109] C.-K. Sun, "Higher harmonic generation microscopy," in Microscopy Techniques: Springer, 2005, pp. 17-56.
[110] S. Chen, S. Chen, H. Wu, W. Lee, Y. Liao, and C. Sun, "In Vivo Virtual Biopsy of Human Skin by Using Noninvasive Higher Harmonic Generation Microscopy," IEEE Journal of Selected Topics in Quantum Electronics, vol. 16, iss. 3, pp. 478-492, 2010, doi: 10.1109/JSTQE.2009.2031987.
[111] P. Tschandl, C. Rosendahl, and H. Kittler, "The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions," Scientific Data, Data Descriptor vol. 5, p. 180161, 08/14/online 2018, doi: 10.1038/sdata.2018.161.
[112] A. Krizhevsky and G. Hinton, "Learning multiple layers of features from tiny images," Citeseer, 2009.
[113] A. Krizhevsky. "The CIFAR-10 dataset." https://www.cs.toronto.edu/~kriz/cifar.html (accessed Nov. 2, 2019).
[114] O. Russakovsky et al., "Imagenet large scale visual recognition challenge," International journal of computer vision, vol. 115, iss. 3, pp. 211-252, 2015.
[115] Y.-Y. Chou, "Convolutional Neural Network Analytics of Melasma in Harmonically Generated Microscopy Images," Master, Department of Electrical Engineering, National Cheng Kung University, 2018.
[116] A. L. Maas, A. Y. Hannun, and A. Y. Ng, "Rectifier nonlinearities improve neural network acoustic models," in Proc. International Conference on Machine Learning (ICML), 2013, vol. 30, no. 1, p. 3.
[117] X. Glorot and Y. Bengio, "Understanding the difficulty of training deep feedforward neural networks," presented at the Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, 2010. [Online]. Available: http://proceedings.mlr.press.
[118] C.-W. Huang, "Basal Cell Carcinoma Detection in Dermoscopy Images via Convolutional Neural Network," Master, Department of Electrical Engineering, National Cheng Kung University, 2019.
[119] M. Lin, Q. Chen, and S. Yan, "Network in network," arXiv preprint arXiv:1312.4400, 2013.
[120] Y. Bengio, P. Simard, and P. Frasconi, "Learning long-term dependencies with gradient descent is difficult," IEEE transactions on neural networks, vol. 5, iss. 2, pp. 157-166, 1994.
[121] X. Glorot and Y. Bengio, "Understanding the difficulty of training deep feedforward neural networks," in Proc. Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010, pp. 249-256.
[122] Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, "Efficient backprop," in Neural networks: Tricks of the trade: Springer, 2012, pp. 9-48.
[123] A. M. Saxe, J. L. McClelland, and S. Ganguli, "Exact solutions to the nonlinear dynamics of learning in deep linear neural networks," arXiv preprint arXiv:1312.6120, 2013.
[124] K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," in Proc. Proceedings of the IEEE international conference on computer vision, 2015, pp. 1026-1034.
[125] S. Ioffe and C. Szegedy, "Batch normalization: accelerating deep network training by reducing internal covariate shift," presented at the Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, Lille, France, 2015.
[126] Y. Jia et al., "Caffe: Convolutional Architecture for Fast Feature Embedding," presented at the Proceedings of the 22nd ACM international conference on Multimedia, Orlando, Florida, USA, 2014.
[127] S. Lin, Y. Liu, K. Lee, L. Li, W. Plishker, and S. S. Bhattacharyya, "The DSPCAD Framework for Modeling and Synthesis of Signal Processing Systems," in Handbook of Hardware/Software Codesign, S. Ha and J. Teich Eds. Dordrecht: Springer Netherlands, 2017, pp. 1185-1219.
[128] Y. He, X. Zhang, and J. Sun, "Channel pruning for accelerating very deep neural networks," in Proc. Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1389-1397.
[129] "ResNet-20/32/44/56/110 on CIFAR-10 with Caffe " https://github.com/yihui-he/resnet-cifar10-caffe (accessed Nov. 2, 2019).
[130] "Deep Residual Learning for Image Recognition." https://github.com/KaimingHe/deep-residual-networks (accessed Nov. 2, 2019).

2025-12-31公開

簡易檢索 / 詳目顯示

相關論文