| 研究生: |
巴沙曼 SYED BASHA MAITHEEN FARMANNULLA |
|---|---|
| 論文名稱: |
在三個通用型繪圖型處理器上用Yolo v4做效能分析 Performance Analysis of Three Modern General-Purpose GPUs using YOLO v4 |
| 指導教授: |
陳中和
Chen, Chung-Ho |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 英文 |
| 論文頁數: | 106 |
| 外文關鍵詞: | GPGPU, OpenCL, Object detection , Deep Learning |
| 相關次數: | 點閱:164 下載:12 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
As Artificial Intelligence (AI) is significantly employed in numerous fields, it is crucial to use either GPGPU (General-Purpose Graphics Processing Unit) or ASIC (Application Specific Integrated) to speed up computation respectively. We implement a virtual platform, CASLab GPU, which is a GPGPU with SIMT (Single Instruction Multiple Thread) architectures. Despite the fact that GPGPU can support plentiful distinct applications by software stack, the implementation of software libraries has a prominent effect on the performance of GPGPU. Contrastly, ASIC has considerable performance on the particular application, yet it is lacking in versatility.
In this thesis, we implement a new process in GPGPU called YOLO v4 which is one of the Deep learning-based approaches mainly used for object detection. It is considered to be the best, free and open-source which is very fast that provides accurate results of detection and it is possible to train a super-fast and accurate object detector. CASLab GPU is designed in a SIMT structure. Based on the performance target of GPU, this work has been set in the non-High-end AI application field, involving numerous edge computing, distributed deep neural network training, and inference applications. Therefore, the CASLab GPU complies with the OpenCL (Open Computing Language) or TensorFlow application. As stated in preliminary experimental results, it will bring down the average power consumption and hardware utilization. Based on this result the GPU will optimize the hardware and software design and also it supplies 100% better accuracy on object detection accordingly. Furthermore, we run YOLO v4 in NVIDIA Jetson Nano as well as NVIDIA GEFORCE GTX 1080 Titanium and finally compared each result with one another to attain the best accuracy and timing performances. Also, we carefully analyzed the performance, obtained results, and working speed is depicted in this work.
Alexey Bochkovskiy, Chien-Yao Wang, Hong Yuan Mark Liao,
“YOLOv4: Optimal Speed and Accuracy of Objecty Detection (2020)”, arXiv:2004.10934v.
“CUDA Introduction” [Online] Available: https://developer.nvidia.com/cuda-zone.
“CUDNN Performance” [Online] Available: https://www.teckknow.com/new-titan-x-200-faster-previous-geforce-gtx-titan-x/.
Dun-jie Chen, Chung-Ho Chen. “LLVM-based OpenCL Compiler for CASLab-GPU’ the thesis for Master of Science. National Cheng Kung University, Tainan, Taiwan. 2019.
Feng-Ming Hsu, Chung-Ho Chen, “Tensor Process Unit (TPU) design and TPU APIs implementation for CASLAB-GPU” the thesis for Master of Science. National Cheng Kung University,Tainan, Taiwan. 2021.
“Gentle Introduction to Yolov4” [Online] Available: https://roboacademy.com/2020/05/01/a-gentle-introduction-to-yolo-v4-for-object-detection-in-ubuntu-20-04/.
“GPU Wattch Energy Model Manual” [Online] Available: https://www.gpgpu-sim.org/gpuwattch/.
HSA Foundation. “HSA Programmer’s Reference Manual: HSAIL Virtual
ISA and Programming Model, Compiler Writer, andObject Format (BRIG) Version 1.0 Final,” 2015.
N. AOTHMAN and I. AYDIN, “A New Deep Learning Application Based on Movidius NCS for Embedded Object Detection and Recognition. “2018 and International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), 2018, pp. 1-5, doi: 10.1109/ISMIT.2018.8567306.
“NVIDIA Corporation” [Online] Available: https://developer.nvidia.com/opencl.
NVIDIA Corporation, “NVIDIA’s Next Generation CUDA Compute Architecture: Fermi”.2009.
“Nvidia GPU Specifications” [Online] Available: https://www.nvidia.com/en-sg/geforce/products/10series/geforce-gtx-1080-ti/#performance.
“NVIDIA GTX 1080 Ti Configuration” [Online] Available: https://djy-git.github.io/2019/08/28/1080ti_query.html#gsc.tab=0, https://www.techpowerup.com/gpu-specs/geforce-gtx-1080-ti.c2877.
“NVIDIA Jetson Nano Configuration” [Online] Available: https://forums.developer.nvidia.com/t/about-jetson-nano-device-query/77653.
S.P. Kaarmukilan, A.Hazarika, A. Thomas K., S. Poddar, and H. Rahaman, “An Accelerated Prototype with Movidius Neural Compute Stick for Real-Time Object Detection.” 2020 International Symposium on Devices, Circuits, and Systems (ISDCS), 2020, pp. 1-5,doi: 10.1109/ISDCS48393.2020.9262996.
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon, “Convolutional Block Attention Module (2018)”, arXiv:1807.06521v2.
Tsung-Yi Lin, Piotr Dollar. Rose Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie “Feature Pyramid Networks for Object Detection (2017)”, arXiv:1612.03144v2.
Tsuang-Huan Tsou, Chung-Ho Chen, “Optimization of Stride Prefetching Mechanism and Dependent Warp Scheduling on GPGU” the thesis for Master of Science. National Cheng Kung University, Tainan, Taiwan.2019.
Wei-Chung Tseng, Chung-Ho Chen, “Layer-wise Fixed-Point Quantization for Deep Convolutional Neural Networks and Implementation of YOLOv3 Inference Engine” the thesis for Master of Science. National Cheng Kung University, Tainan, Taiwano. 2019.
Wan-Shan Hsieh, Chung-Ho Chen, “Micro-Architecture Optimization of HAS-Compatible GPU” the thesis for Master of Science. National Cheng Kung University. Tainan, Taiwan. 2021.
“Yolov4 darknet GitHub” [Online] Available: https://github.com/AlexyAB/darknet.
“Yolov4 darknet GitHub” [Online] Available: https://github.com/sowson/darknet.
“Yolov4 in Jetson Nano” [Online] Available: https://jkjung-avt.github .io/yolov4/.“YOLO v4 One Stage Detector Explanation” [Online] Available: https://becominghuman.ai/explaning -yolov4-a-one-stage-detector-cd.