| 研究生: |
潘特羅 Carlo D. Pastoral |
|---|---|
| 論文名稱: |
輕量級高解析度分割卷積類神經網路之發展 Development of a Lightweight High-Resolution Segmentation Convolutional Neural Network |
| 指導教授: |
莊智清
Juang, Jyh-Ching |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 英文 |
| 論文頁數: | 49 |
| 中文關鍵詞: | 神經網絡 、分割神經網絡 、抽象 、內存消耗 、高分辨率 、衛星 |
| 外文關鍵詞: | Neural Network, Segmentation Neural Network, Abstraction, Memory Consumption, High-Resolution, Satellite |
| 相關次數: | 點閱:112 下載:11 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
最近用於圖像分割的最先進的端到端神經網絡在計算機視覺領域取得了巨大進步。然而,大多數模型沒有考慮設備約束,特別是對於邊緣計算和許多嵌入式系統設備。使用高端機器的實驗室開發模型與實際部署機器之間存在差距,在功耗、計算資源、存儲和處理速度方面可能受到高度限制。在本文中,我們回顧了最近最先進的模型,並確定了導致內存流量和計算瓶頸的痛點。我們提出了一種減少內存流量和浮點運算的網絡稀疏化和一種高分辨率卷積上採樣方法的組合,該方法可以在沒有或最小化棋盤偽影的情況下產生清晰的分割邊緣。我們將結果與在基準數據集中訓練的最先進的圖像分割模型進行比較。為了展示我們模型在資源受限設備中的功效,我們在 AMD GX412HC SoC、AMD Ryzen Embedded R1505G 中部署了模型。
The recent state of the art end-to-end neural networks for image segmentation have achieved a great advancement in the field of computer vision. However, most of the models does not take into consideration the device constraints especially for edge computing and many embedded systems devices. There is a gap between the laboratory developed models using high end machines and in the real-world deployment machines which can be highly constrained in terms of power consumption, computing resource, storage, and processing speed. In this paper, we review the recent state-of-the-art models and identify the pain points that are causing memory traffic and computing bottlenecks. We propose a combination of network abstraction that reduces the memory traffic and floating-point operations and a highresolution convolutional upsampling method that produce sharp segmentation edges without or minimized checkerboard artifacts. We compare the results together with the state-of-theart image segmentation models that are trained in benchmark datasets. To show the efficacy of our model in a resource constraint device, we deploy the model in AMD GX412HC SoC, AMD Ryzen Embedded R1505G.
[1] H. Noh, S. Hong, and B. Han, “Learning Deconvolution Network for Semantic Segmentation,” May 2015, [Online]. Available: http://arxiv.org/abs/1505.04366
[2] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” Sep. 2014, [Online]. Available: http://arxiv.org/abs/1409.1556
[3] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” May 2015, [Online]. Available: http://arxiv.org/abs/1505.04597
[4] V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,” Nov. 2015, [Online]. Available: http://arxiv.org/abs/1511.00561
[5] A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation,” Jun. 2016, [Online]. Available: http://arxiv.org/abs/1606.02147
[6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual Learning for Image Recognition,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-December, pp. 770–778, 2016, doi: 10.1109/CVPR.2016.90.
[7] E. Shelhamer, J. Long, and T. Darrell, “Fully Convolutional Networks for Semantic Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 4, pp. 640–651, 2017, doi: 10.1109/TPAMI.2016.2572683.
[8] W. Shi et al., “Is the Deconvolution Layer the Same as a Convolutional Layer? A Note on Real¬Time Single Image and Video Super¬Resolution Using an Efficient Sub-Pixel Convolutional Neural Network.” [Online]. Available: http://caffe.berkeleyvision.org/doxygen/classcaffe_1_1DeconvolutionLayer.html
[9] W. Shi et al., “Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-December, pp. 1874–1883, 2016, doi: 10.1109/CVPR.2016.207.
[10] A. Odena, V. Dumoulin, and C. Olah, “Deconvolution and Checkerboard Artifacts,” Distill, 2016, doi: 10.23915/distill.00003.
[11] V. Dumoulin et al., “Adversarially learned inference,” 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings, pp. 1–18, 2017.
[12] A. Odena, C. Olah, and J. Shlens, “Conditional Image Synthesis with Auxiliary Classifier Gans,” 34th International Conference on Machine Learning, ICML 2017, vol. 6, pp. 4043–4055, 2017.
[13] W. Shi et al., “Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-Decem, pp. 1874–1883, 2016, doi: 10.1109/CVPR.2016.207.
[14] A. Aitken, C. Ledig, L. Theis, J. Caballero, Z. Wang, and W. Shi, “Checkerboard Artifact Free Sub-pixel Convolution: A Note on Sub-pixel Convolution, Resize Convolution and Convolution Resize,” arXiv, 2017.
[15] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” Jan. 2018, [Online]. Available: http://arxiv.org/abs/1801.04381
[16] P. Chao, C.-Y. Kao, Y.-S. Ruan, C.-H. Huang, and Y.-L. Lin, “HarDNet: A Low Memory Traffic Network,” Sep. 2019, [Online]. Available: http://arxiv.org/abs/1909.00948
[17] N. Ma, X. Zhang, H. T. Zheng, and J. Sun, “Shufflenet V2: Practical Guidelines for Efficient CNN Architecture Design,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11218 LNCS, pp. 122–138, 2018, doi: 10.1007/978-3-030-01264-9_8.
[18] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. C. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4510–4520, 2018, doi: 10.1109/CVPR.2018.00474.
[19] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 2261–2269, 2017, doi: 10.1109/CVPR.2017.243.
[20] G. Larsson, M. Maire, and G. Shakhnarovich, “FractalNet: Ultra-Deep Neural Networks Without Residuals,” 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings, pp. 1–11, 2017.
[21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-Decem, pp. 770–778, 2016, doi: 10.1109/CVPR.2016.90.
[22] K. He, X. Zhang, S. Ren, and J. Sun, “Identity Mappings in Deep Residual Networks,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9908 LNCS, pp. 630–645, 2016, doi: 10.1007/978-3-319-46493-0_38.
[23] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” 32nd International Conference on Machine Learning, ICML 2015, vol. 1, pp. 448–456, 2015.
[24] X. Glorot, A. Bordes, and Y. Bengio, “Deep Sparse Rectifier Neural Networks,” Journal of Machine Learning Research, vol. 15, no. January, pp. 315–323, 2011.
[25] C. Zhang, S. Lee, and J. Bazin, “ResNet or DenseNet ? Introducing Dense Shortcuts to ResNet,” pp. 3550–3559.
[26] P. Cvitanović, Universality in Chaos, Second Edition, vol. 261, no. l, pp. 3–34, 2017, doi: 10.1201/9780203734636.
[27] “CIFAR-10 and CIFAR-100 datasets.” https://www.cs.toronto.edu/~kriz/cifar.html (accessed Jul. 05, 2021).
[28] O. Russakovsky et al., “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015, doi: 10.1007/s11263-015-0816-y.
[29] G. J. Brostow, J. Fauqueur, and R. Cipolla, “Semantic Object Classes in Video: A High-Definition Ground Truth Database,” Pattern Recognition Letters, vol. 30, no. 2, pp. 88–97, 2009, doi: 10.1016/j.patrec.2008.04.005.
[30] G. J. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla, “Segmentation and Recognition using Structure from Motion Point Clouds.”
[31] Vladislav Sovrasov, “Flops Counter for Convolutional Networks in Pytorch Framework.” 2019. Accessed: Jul. 06, 2021. [Online]. Available: https://github.com/sovrasov/flops-counter.pytorch