簡易檢索 / 詳目顯示

研究生: 巴沙曼
SYED BASHA MAITHEEN FARMANNULLA
論文名稱: 在三個通用型繪圖型處理器上用Yolo v4做效能分析
Performance Analysis of Three Modern General-Purpose GPUs using YOLO v4
指導教授: 陳中和
Chen, Chung-Ho
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 106
外文關鍵詞: GPGPU, OpenCL, Object detection , Deep Learning
相關次數: 點閱:164下載:12
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • As Artificial Intelligence (AI) is significantly employed in numerous fields, it is crucial to use either GPGPU (General-Purpose Graphics Processing Unit) or ASIC (Application Specific Integrated) to speed up computation respectively. We implement a virtual platform, CASLab GPU, which is a GPGPU with SIMT (Single Instruction Multiple Thread) architectures. Despite the fact that GPGPU can support plentiful distinct applications by software stack, the implementation of software libraries has a prominent effect on the performance of GPGPU. Contrastly, ASIC has considerable performance on the particular application, yet it is lacking in versatility.
    In this thesis, we implement a new process in GPGPU called YOLO v4 which is one of the Deep learning-based approaches mainly used for object detection. It is considered to be the best, free and open-source which is very fast that provides accurate results of detection and it is possible to train a super-fast and accurate object detector. CASLab GPU is designed in a SIMT structure. Based on the performance target of GPU, this work has been set in the non-High-end AI application field, involving numerous edge computing, distributed deep neural network training, and inference applications. Therefore, the CASLab GPU complies with the OpenCL (Open Computing Language) or TensorFlow application. As stated in preliminary experimental results, it will bring down the average power consumption and hardware utilization. Based on this result the GPU will optimize the hardware and software design and also it supplies 100% better accuracy on object detection accordingly. Furthermore, we run YOLO v4 in NVIDIA Jetson Nano as well as NVIDIA GEFORCE GTX 1080 Titanium and finally compared each result with one another to attain the best accuracy and timing performances. Also, we carefully analyzed the performance, obtained results, and working speed is depicted in this work.

    Abstract I Acknowledgement III Table of contents V List of Tables IX List of Figures X Chapter 1 Introduction 1 Chapter 2 Background 3 2.1 YOLO v4 3 2.1.1 Architecture 5 2.1.2 Backbone 6 2.1.3 CSPDarknet53 8 2.1.4 Bag of Freebies 9 2.1.5 Bag of Specials 13 2.1.6 Neck 15 2.1.7 Head (Dense prediction) 18 2.2 Working of YOLO v4 21 Chapter 3 Related work 24 3.1 Object detection and recognition using Deep learning devices 24 3.2 Implementation of Raspberry Pi 3 CPU and 3+ Movidius NCS 26 3.3 Experiment and Evaluation using PYNQ Z2 board and Intel Movidius NCS 27 3.4 Performance of Intel Movidius NCS and Xilinx PYNQ Z2 boards 30 Chapter 4 Experiments and Evaluation 32 4.1 NVIDIA Jetson Nano 32 4.1.1 Features 33 4.1.2 Storage and Connectivity 33 4.1.3 Multi-Stream Video Analytics 33 4.2 Reason for Choosing Jetson Nano 34 4.2.1 The Intel Movidius Neural Compute Stick (NCS) with Raspberry Pi 34 4.2.2 Features 35 4.3 AAEON BOXER-8221AI 37 4.3.1 Implementing YOLO v4 in AAEON BOXER-8221AI 39 4.4 NVIDIA GPU GEFORCE GTX 1080 Titanium 44 4.5 YOLO v4 Prerequisites 46 4.5.1 OpenCV 46 4.5.2 CUDA 47 4.5.3 CUDNN 48 4.5.4 CMAKE 49 4.5.5 Make 51 4.6 CASLAB GPGPU 56 4.6.1 OpenCL 59 4.6.2 OpenCL Platform Model 59 4.6.3 OpenCL Execution model 60 4.6.4 Simulation Platform 61 4.6.5 Instruction Set Architecture 63 4.7 Runtime System 64 4.7.1 OpenCL Runtime 64 4.7.2 HSA Runtime 67 4.7.3 The connection between OpenCL and HSA Runtime 71 4.7.4 Software Toolchain 72 4.7.5 Hardware Design of GPU 74 4.7.6 Implementation of GPU Simulation Platform 79 4.8 YOLO v4 Result in our CASLAB GPGPU 81 4.9 Performance Analysis 84 4.9.1 Calculation of GFLOPs from YOLO v4 92 4.9.2 Performance Analysis of GFLOPs in NVIDIA Jetson Nano, NVIDIA GEFORCE GTX 1080 Ti and CASLab GPU 94 4.9.3 Architecture of NVIDIA GEFORCE GTX 1080 Ti, NVIDIA Jetson NANO, and CASLab GPU Software Stack 96 4.9.4 Calculation of Frames per Second (FPS) in CASLab GPGPU 98 Chapter 5 Conclusion and Future work 100 5.1 Conclusion 100 5.2 Future work 102 References 104

    Alexey Bochkovskiy, Chien-Yao Wang, Hong Yuan Mark Liao,
    “YOLOv4: Optimal Speed and Accuracy of Objecty Detection (2020)”, arXiv:2004.10934v.
    “CUDA Introduction” [Online] Available: https://developer.nvidia.com/cuda-zone.
    “CUDNN Performance” [Online] Available: https://www.teckknow.com/new-titan-x-200-faster-previous-geforce-gtx-titan-x/.
    Dun-jie Chen, Chung-Ho Chen. “LLVM-based OpenCL Compiler for CASLab-GPU’ the thesis for Master of Science. National Cheng Kung University, Tainan, Taiwan. 2019.
    Feng-Ming Hsu, Chung-Ho Chen, “Tensor Process Unit (TPU) design and TPU APIs implementation for CASLAB-GPU” the thesis for Master of Science. National Cheng Kung University,Tainan, Taiwan. 2021.
    “Gentle Introduction to Yolov4” [Online] Available: https://roboacademy.com/2020/05/01/a-gentle-introduction-to-yolo-v4-for-object-detection-in-ubuntu-20-04/.
    “GPU Wattch Energy Model Manual” [Online] Available: https://www.gpgpu-sim.org/gpuwattch/.
    HSA Foundation. “HSA Programmer’s Reference Manual: HSAIL Virtual
    ISA and Programming Model, Compiler Writer, andObject Format (BRIG) Version 1.0 Final,” 2015.
    N. AOTHMAN and I. AYDIN, “A New Deep Learning Application Based on Movidius NCS for Embedded Object Detection and Recognition. “2018 and International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), 2018, pp. 1-5, doi: 10.1109/ISMIT.2018.8567306.
    “NVIDIA Corporation” [Online] Available: https://developer.nvidia.com/opencl.
    NVIDIA Corporation, “NVIDIA’s Next Generation CUDA Compute Architecture: Fermi”.2009.
    “Nvidia GPU Specifications” [Online] Available: https://www.nvidia.com/en-sg/geforce/products/10series/geforce-gtx-1080-ti/#performance.
    “NVIDIA GTX 1080 Ti Configuration” [Online] Available: https://djy-git.github.io/2019/08/28/1080ti_query.html#gsc.tab=0, https://www.techpowerup.com/gpu-specs/geforce-gtx-1080-ti.c2877.
    “NVIDIA Jetson Nano Configuration” [Online] Available: https://forums.developer.nvidia.com/t/about-jetson-nano-device-query/77653.
    S.P. Kaarmukilan, A.Hazarika, A. Thomas K., S. Poddar, and H. Rahaman, “An Accelerated Prototype with Movidius Neural Compute Stick for Real-Time Object Detection.” 2020 International Symposium on Devices, Circuits, and Systems (ISDCS), 2020, pp. 1-5,doi: 10.1109/ISDCS48393.2020.9262996.
    Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon, “Convolutional Block Attention Module (2018)”, arXiv:1807.06521v2.
    Tsung-Yi Lin, Piotr Dollar. Rose Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie “Feature Pyramid Networks for Object Detection (2017)”, arXiv:1612.03144v2.
    Tsuang-Huan Tsou, Chung-Ho Chen, “Optimization of Stride Prefetching Mechanism and Dependent Warp Scheduling on GPGU” the thesis for Master of Science. National Cheng Kung University, Tainan, Taiwan.2019.
    Wei-Chung Tseng, Chung-Ho Chen, “Layer-wise Fixed-Point Quantization for Deep Convolutional Neural Networks and Implementation of YOLOv3 Inference Engine” the thesis for Master of Science. National Cheng Kung University, Tainan, Taiwano. 2019.
    Wan-Shan Hsieh, Chung-Ho Chen, “Micro-Architecture Optimization of HAS-Compatible GPU” the thesis for Master of Science. National Cheng Kung University. Tainan, Taiwan. 2021.
    “Yolov4 darknet GitHub” [Online] Available: https://github.com/AlexyAB/darknet.
    “Yolov4 darknet GitHub” [Online] Available: https://github.com/sowson/darknet.
    “Yolov4 in Jetson Nano” [Online] Available: https://jkjung-avt.github .io/yolov4/.“YOLO v4 One Stage Detector Explanation” [Online] Available: https://becominghuman.ai/explaning -yolov4-a-one-stage-detector-cd.

    下載圖示 校內:2023-08-01公開
    校外:2023-08-01公開
    QR CODE