| 研究生: |
陳奕瑋 Chen, I-Wei |
|---|---|
| 論文名稱: |
應用於有限資源邊緣裝置之低複雜度單次多框架構人臉偵測器 Low Complexity SSD-based Face Detector for Limited-resource Edge Devices |
| 指導教授: |
謝明得
Shieh, Ming-Der |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 英文 |
| 論文頁數: | 72 |
| 中文關鍵詞: | 人臉偵測 、低複雜度 、單次多框架構 、深度學習 、電腦視覺 |
| 外文關鍵詞: | face detection, low complexity, single shot, deep learning, computer vision |
| 相關次數: | 點閱:64 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在人臉偵測(Face detection)的領域中,基於卷積神經網路(Convolutional Neural Network, CNN)的方法雖然已經有了很大幅度的進展,然而,特別針對低複雜度而設計的偵測器,仍然有明顯的進步空間。我們提出了一個超低複雜度的人臉偵測器,在效能上卻仍然很有競爭力,為了解決在記憶體上的限制,此偵測器是特別針對有限資源邊緣裝置(limited-resource edge devices)而設計的。在這個充滿深度學習方法的時代中,有許多高效率的網路已被提出,然而大部分的這些高效率網路,仍有超過一百萬的參數量,使他們無法在有限資源的邊緣裝置上運行。我們最主要建立這個低複雜度偵測器的概念是將一般偵測器當中需要高運算量的網路替換為低運算量的網路。在我們目前實現的版本當中,偵測器的總參數量僅僅只有十五萬,遠低於現今高效率網路一個級距。此外,在人臉偵測當中的其中一個最大的問題,是大幅度的尺度變化。在處理尺度變化上,我們為了不要有多餘的重複運算,我們選擇不使用圖像金字塔(image pyramid)。為了能夠使我們的偵測器運算量最小化,我們選擇建立一個僅有七層卷積層和四個預測分支的客製化單次多框架構偵測器(SSD-based detector)。同時,我們利用人臉偵測問題的特性,去最佳化卷積核(filter)的數量以及最佳化錨框(anchor box)的設定,同時藉由大量的實驗去驗證上述的方法。在最具挑戰性的人臉偵測評測數據集上,和其他的低複雜度人臉偵測器相比,我們所提出的偵測器效能是相當具有競爭力的。
Although remarkable progress has been made in the field of CNN-based face detection, detectors focus on extremely low complexity is still an open challenge. To address the problem of memory limitations on edge devices, we presented a low complexity yet competitive face detector. There are plenty of networks focus on high efficiency today. However, most of these efficient networks still have a number of parameters more than one million, which is infeasible for limited-resource devices. The main concept to build our low complexity detector is to replace the computation-intensive backbone network in generic object detectors with a lightweight backbone. In our current implementation, the proposed detector has only 153k parameters in total, which is one order of magnitude smaller than existing efficient network. Besides, one of the main challenges in face detection is to deal with the large variation of scale. To handle the multi-scale problem without redundant computations, we avoided using image pyramid. We have built a customized SSD-based face detector with only seven convolution layers and four predictors to minimize the computations. Base on the characteristics of face detection, we optimized the number of filters and the configuration of anchor boxes. We also performed extensive experiments to evaluate these methods. Compare to other low complexity face detectors, our detector shows competitive performance on all challenging face detection benchmarks.
[1] V. Jain and E. Learned-Miller, "Fddb: A benchmark for face detection in unconstrained settings," UMass Amherst technical report, 2010.
[2] S. Yang, P. Luo, C.-C. Loy, and X. Tang, "Wider face: A face detection benchmark," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5525-5533.
[3] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," in International Conference on Learning Representations, 2015.
[4] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[5] P. Viola and M. J. Jones, "Robust real-time face detection," International journal of computer vision, vol. 57, no. 2, pp. 137-154, 2004.
[6] B. Yang, J. Yan, Z. Lei, and S. Z. Li, "Aggregate channel features for multi-view face detection," in IEEE international joint conference on biometrics, 2014: IEEE, pp. 1-8.
[7] R. Ranjan, V. M. Patel, and R. Chellappa, "A deep pyramid deformable part model for face detection," in 2015 IEEE 7th international conference on biometrics theory, applications and systems (BTAS), 2015: IEEE, pp. 1-8.
[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.
[9] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255.
[10] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, "Overfeat: Integrated recognition, localization and detection using convolutional networks," in International Conference on Learning Representations, 2014.
[11] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580-587.
[12] R. Girshick, "Fast r-cnn," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440-1448.
[13] S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," in Advances in neural information processing systems, 2015, pp. 91-99.
[14] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788.
[15] W. Liu et al., "Ssd: Single shot multibox detector," in European conference on computer vision, 2016: Springer, pp. 21-37.
[16] H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, "A convolutional neural network cascade for face detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5325-5334.
[17] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, "Joint face detection and alignment using multitask cascaded convolutional networks," IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499-1503, 2016.
[18] C. Zhu, Y. Zheng, K. Luu, and M. Savvides, "Cms-rcnn: contextual multi-scale region-based cnn for unconstrained face detection," in Deep learning for biometrics: Springer, 2017, pp. 57-79.
[19] M. Najibi, P. Samangouei, R. Chellappa, and L. S. Davis, "Ssh: Single stage headless face detector," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 4875-4884.
[20] S. Zhang, X. Zhu, Z. Lei, H. Shi, X. Wang, and S. Z. Li, "S3fd: Single shot scale-invariant face detector," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 192-201.
[21] X. Tang, D. K. Du, Z. He, and J. Liu, "Pyramidbox: A context-assisted single shot face detector," in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 797-813.
[22] J. Li et al., "DSFD: dual shot face detector," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 5060-5069.
[23] B. Zhang et al., "ASFD: Automatic and Scalable Face Detector," arXiv preprint arXiv:2003.11228, 2020.
[24] S. Zhang, X. Zhu, Z. Lei, H. Shi, X. Wang, and S. Z. Li, "Faceboxes: A CPU real-time face detector with high accuracy," in 2017 IEEE International Joint Conference on Biometrics (IJCB), 2017: IEEE, pp. 1-9.
[25] D. Triantafyllidou, P. Nousi, and A. Tefas, "Fast deep convolutional face detection in the wild exploiting hard sample mining," Big data research, vol. 11, pp. 65-76, 2018.
[26] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, "The PASCAL visual object classes challenge 2007 (VOC2007) results," 2007.
[27] J. R. Uijlings, K. E. Van De Sande, T. Gevers, and A. W. Smeulders, "Selective search for object recognition," International journal of computer vision, vol. 104, no. 2, pp. 154-171, 2013.
[28] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117-2125.
[29] J. Huang et al., "Speed/accuracy trade-offs for modern convolutional object detectors," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7310-7311.
[30] T.-Y. Lin et al., "Microsoft coco: Common objects in context," in European conference on computer vision, 2014: Springer, pp. 740-755.
[31] P. Hu and D. Ramanan, "Finding tiny faces," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 951-959.
[32] K. Siu, D. M. Stuart, M. Mahmoud, and A. Moshovos, "Memory requirements for convolutional neural network hardware accelerators," in 2018 IEEE International Symposium on Workload Characterization (IISWC), 2018: IEEE, pp. 111-121.
[33] A. G. Howard et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.
[34] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "Mobilenetv2: Inverted residuals and linear bottlenecks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510-4520.
[35] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size," arXiv preprint arXiv:1602.07360, 2016.
[36] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700-4708.
[37] X. Zhang, X. Zhou, M. Lin, and J. Sun, "Shufflenet: An extremely efficient convolutional neural network for mobile devices," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6848-6856.
[38] M. Tan and Q. Le, "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks," presented at the Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, 2019.
[39] S. Ioffe and C. Szegedy, "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift," presented at the Proceedings of the 32nd International Conference on Machine Learning, Proceedings of Machine Learning Research, 2015.
[40] V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," in ICML, 2010.
[41] X. Glorot and Y. Bengio, "Understanding the difficulty of training deep feedforward neural networks," in Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010, pp. 249-256.