簡易檢索 / 詳目顯示

研究生: 李佳穎
Li, Jia-Ying
論文名稱: 於FPGA實作量化神經網路特徵擷取並應用於多攝影機間的影像追蹤
Across-Camera Object Tracking with Quantized Neural Networks Feature Extraction Implemented on Embedded FPGA
指導教授: 鄭憲宗
Cheng, Sheng-Tzong
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 43
中文關鍵詞: 跨攝影機影像追蹤量化卷積神經網路FPGA
外文關鍵詞: Cross-camera image tracking, quantized convolutional neural networks, FPGA
相關次數: 點閱:112下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著CPU與GPU等硬體速度的提升,人工神經網路已蓬勃發展了一陣子,人們在圖像分類以及語音辨識等領域的準確率上獲得了極大的提升。其中常應用於圖像分類的卷積神經網路(CNN)模型需要大量的運算資源及記憶體容量,由於大容量的GPU除了在數據的吞吐量上有著優勢,其兼容的深度學習框架也較為成熟,因此大部分模型的訓練以及推論集中在伺服器端。儘管如此,人們在許多應用上不適合等待網路傳輸,例如在需要進行即時回應的環境,仍有將深度神經網路模型移植至邊緣裝置的必要。
    像是近年來因為安全因素,各個場所都開始架設監視系統確認現場狀況,不過大多是意外發生後再去調閱備份在伺服器上的影像,或者利用這些影像的片段進行深度神經網路的訓練及推論,但若是要讓伺服器全天候的去進行所有監視器影像的處理達到即時工廠人員追蹤以確保工廠安全並不切實際。因此能夠協助CNN可平行化流式運算的嵌入式硬體架構,如FPGA就非常適合進行端點的深度神經網路處理。
    在本研究中,我們希望讓特徵擷取這個部分的運算在邊緣裝置完成,因此提出一個應用於多攝影機間影像追蹤的量化卷積神經網路並且實作在FPGA上,透過在各個攝影機裝置端點的影像進行特徵擷取後比對其相似程度以進行人員追蹤,確立量化壓縮過後的神經網路模型對於特徵擷取後的資料比對上並不會過度影響準確率並且實驗其效能的提升。

    With the increase of hardware speeds such as CPU and GPU, the artificial neural network has been flourishing for a while. People have greatly improved the accuracy in the fields of image classification and speech recognition. The convolutional neural network (CNN) model, which is often used for image classification, requires a large amount of computing resources and memory capacity. Since the large-capacity GPU has advantages in data throughput and its compatible deep learning framework is mature, the training and inference of most models are concentrated on the server side. Despite this, it doesn’t need to wait for network transmission in many applications.
    In recent years, because of safety, various places are beginning to set up a monitoring system for identification site conditions. However, it is difficult to detect accidents in time. Most of the time, we access the backup on the server, or using the fragments of these videos for deep neural network training and inference. But, it is energy inefficient if the server is to perform all the monitor image processing all day long. Therefore, an embedded hardware architecture that can assist CNN to parallelize streaming operations, such as FPGAs, is well suited for deep neural network processing of edge device.
    In this research, we hope that the operation of character extraction of this part is done at the edge device, so a quantized convolutional neural network applied to image tracking between multiple cameras is proposed and implemented on the FPGA, through each camera. The image of the device endpoint is compared with the similarity of the device for tracking. The establishment of the quantized compressed neural network model does not affect the accuracy and the performance of the data after the feature is captured. Improvement.

    摘要 I Abstract II ACKNOWLEDGEMENT III TABLE OF CONTENTS IV LIST OF TABLES V LIST OF FIGURES VI Chapter 1. Introduction & Motivation 1 1.1 Introduction 1 1.2 Motivation 3 1.3 Thesis Overview 4 Chapter 2. Background & Related Work 5 2.1 Convolutional Neural Network 5 2.1.1 CNN overview 5 2.1.2 Inference and Training 5 2.1.3 Inference of CNNs 5 2.2 Approximate Computing of CNN Models 7 2.2.1 fixed-point arithmetic 7 2.2.2 Dynamic Fixed Point for CNN 8 2.2.3 Extreme quantification with Binary and pseudo-Binary Nets: 9 2.2.4 Stochastic Computing 10 2.3 Hardware acceleration for Deep Learning 11 2.4 Siamese Convolutional Neural Network 14 2.5 DNNDK and DPU 15 2.6 CRF on camera tracking 18 Chapter 3. System Design and Approach 20 3.1 Problem Description 20 3.2 System Design 20 3.3 Cross Camera Tracking Algorithm 21 3.4 CNN Architecture 25 Chapter 4. Implementation and Experiments 28 4.1 System Implementation 28 4.2 Experiment Environment and Settings 36 4.3 Experiment Result 37 Chapter 5. Conclusion & Future work 40 Reference 41

    [1] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
    [2] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing fpga-based accelerator design for deep convolutional neural networks," in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015: ACM, pp. 161-170.
    [3] M. Alwani, H. Chen, M. Ferdman, and P. Milder, "Fused-layer CNN accelerators," in The 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016: IEEE Press, p. 22.
    [4] A. Rahman, J. Lee, and K. Choi, "Efficient FPGA acceleration of convolutional neural networks using logical-3D compute array," in 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2016: IEEE, pp. 1393-1398.
    [5] S. Anwar, K. Hwang, and W. Sung, "Fixed point optimization of deep convolutional neural networks for object recognition," in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015: IEEE, pp. 1131-1135.
    [6] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, "Deep learning with limited numerical precision," in International Conference on Machine Learning, 2015, pp. 1737-1746.
    [7] D. Lin, S. Talathi, and S. Annapureddy, "Fixed point quantization of deep convolutional networks," in International Conference on Machine Learning, 2016, pp. 2849-2858.
    [8] M. Courbariaux, Y. Bengio, and J.-P. David, "Training deep neural networks with low precision multiplications," arXiv preprint arXiv:1412.7024, 2014.
    [9] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," The Journal of Machine Learning Research, vol. 18, no. 1, pp. 6869-6898, 2017.
    [10] S.-C. Zhou, Y.-Z. Wang, H. Wen, Q.-Y. He, and Y.-H. Zou, "Balanced quantization: An effective and efficient approach to quantized neural networks," Journal of Computer Science and Technology, vol. 32, no. 4, pp. 667-682, 2017.
    [11] J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, "Quantized convolutional neural networks for mobile devices," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4820-4828.
    [12] M. Sankaradas et al., "A massively parallel coprocessor for convolutional neural networks," in 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors, 2009: IEEE, pp. 53-60.
    [13] C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun, "Cnp: An fpga-based processor for convolutional networks," in 2009 International Conference on Field Programmable Logic and Applications, 2009: IEEE, pp. 32-37.
    [14] S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi, "A dynamically configurable coprocessor for convolutional neural networks," in ACM SIGARCH Computer Architecture News, 2010, vol. 38, no. 3: ACM, pp. 247-257.
    [15] C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun, "NeuFlow: A runtime reconfigurable dataflow processor for vision," in CVPR Workshops, 2011, pp. 109-116.
    [16] V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, "A 240 g-ops/s mobile coprocessor for deep neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014, pp. 682-687.
    [17] P. Gysel, "Ristretto: Hardware-oriented approximation of convolutional neural networks," arXiv preprint arXiv:1605.06402, 2016.
    [18] D. Williamson, "Dynamically scaled fixed point arithmetic," in [1991] IEEE Pacific Rim Conference on Communications, Computers and Signal Processing Conference Proceedings, 1991: IEEE, pp. 315-318.
    [19] S. Guo, L. Wang, B. Chen, Q. Dou, Y. Tang, and Z. Li, "FixCaffe: Training CNN with Low Precision Arithmetic Operations by Fixed Point Caffe," in International Workshop on Advanced Parallel Processing Technologies, 2017: Springer, pp. 38-50.
    [20] N. Suda et al., "Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks," in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016: ACM, pp. 16-25.
    [21] Y. Ma, N. Suda, Y. Cao, J.-s. Seo, and S. Vrudhula, "Scalable and modularized RTL compilation of convolutional neural networks onto FPGA," in 2016 26th International Conference on Field Programmable Logic and Applications (FPL), 2016: IEEE, pp. 1-8.
    [22] J. Qiu et al., "Going deeper with embedded fpga platform for convolutional neural network," in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016: ACM, pp. 26-35.
    [23] M. Motamedi, P. Gysel, and S. Ghiasi, "PLACID: a platform for FPGA-based accelerator creation for DCNNs," ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 13, no. 4, p. 62, 2017.
    [24] M. Courbariaux, Y. Bengio, and J.-P. David, "Binaryconnect: Training deep neural networks with binary weights during propagations," in Advances in neural information processing systems, 2015, pp. 3123-3131.
    [25] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio, "Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1," arXiv preprint arXiv:1602.02830, 2016.
    [26] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, "Xnor-net: Imagenet classification using binary convolutional neural networks," in European Conference on Computer Vision, 2016: Springer, pp. 525-542.
    [27] H. Nakahara, T. Fujii, and S. Sato, "A fully connected layer elimination for a binarizec convolutional neural network on an FPGA," in 2017 27th International Conference on Field Programmable Logic and Applications (FPL), 2017: IEEE, pp. 1-4.
    [28] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Binarized neural networks," in Advances in neural information processing systems, 2016, pp. 4107-4115.
    [29] C. Zhu, S. Han, H. Mao, and W. J. Dally, "Trained ternary quantization," arXiv preprint arXiv:1612.01064, 2016.
    [30] E. Nurvitadhi et al., "Can FPGAs beat GPUs in accelerating next-generation deep neural networks?," in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017: ACM, pp. 5-14.
    [31] R. Zhao et al., "Accelerating binarized convolutional neural networks with software-programmable fpgas," in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017: ACM, pp. 15-24.
    [32] Y. Umuroglu et al., "Finn: A framework for fast, scalable binarized neural network inference," in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017: ACM, pp. 65-74.
    [33] N. J. Fraser et al., "Scaling binarized neural networks on reconfigurable logic," in Proceedings of the 8th Workshop and 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms, 2017: ACM, pp. 25-30.
    [34] S. Liang, S. Yin, L. Liu, W. Luk, and S. Wei, "FP-BNN: Binarized neural network on FPGA," Neurocomputing, vol. 275, pp. 1072-1086, 2018.
    [35] A. Prost-Boucle, A. Bourge, F. Pétrot, H. Alemdar, N. Caldwell, and V. Leroy, "Scalable high-performance architecture for convolutional ternary neural networks on FPGA," in 2017 27th International Conference on Field Programmable Logic and Applications (FPL), 2017: IEEE, pp. 1-7.
    [36] A. Alaghi and J. P. Hayes, "Fast and accurate computation using stochastic circuits," in Proceedings of the conference on Design, Automation & Test in Europe, 2014: European Design and Automation Association, p. 76.
    [37] A. Ardakani, F. Leduc-Primeau, N. Onizawa, T. Hanyu, and W. J. Gross, "VLSI implementation of deep neural network using integral stochastic computing," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 10, pp. 2688-2699, 2017.
    [38] A. Ren et al., "Sc-dcnn: Highly-scalable deep convolutional neural network using stochastic computing," ACM SIGOPS Operating Systems Review, vol. 51, no. 2, pp. 405-418, 2017.
    [39] K. Kim, J. Kim, J. Yu, J. Seo, J. Lee, and K. Choi, "Dynamic energy-accuracy trade-off using stochastic computing in deep neural networks," in 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC), 2016: IEEE, pp. 1-6.
    [40] R. Hadsell, S. Chopra, and Y. LeCun, "Dimensionality reduction by learning an invariant mapping," in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), 2006, vol. 2: IEEE, pp. 1735-1742.
    [41] X. Chen and B. Bhanu, "Integrating social grouping for multitarget tracking across cameras in a crf model," IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 11, pp. 2382-2394, 2016.
    [42] A. Heili, C. Chen, and J.-M. Odobez, "Detection-based multi-human tracking using a CRF model," in 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), 2011: IEEE, pp. 1673-1680.
    [43] B. Yang and R. Nevatia, "An online learned CRF model for multi-target tracking," in 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012: IEEE, pp. 2034-2041.
    [44] X. Chen, K. Huang, and T. Tan, "Object tracking across non-overlapping views by learning inter-camera transfer models," Pattern Recognition, vol. 47, no. 3, pp. 1126-1137, 2014.

    無法下載圖示 校內:2024-09-01公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE