研究生: |
王羽合 Wang, Yu-He |
---|---|
論文名稱: |
非局部注意力細化機制網路用於臉部特徵點偵測 NRNet: Non-local Refinement Network for Facial Landmark Detection |
指導教授: |
陳奇業
Chen, Chi-Yeh |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 英文 |
論文頁數: | 45 |
中文關鍵詞: | 人臉關鍵點偵測 、注意力機制 、高效偵測器 |
外文關鍵詞: | Facial landmark detection, Attention mechansim, Efficient architecture |
相關次數: | 點閱:105 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,隨著深度學習的發展,臉部相關影像技術也發展得越來越快。其中有許多技術需要使用到臉部特徵點作為輸入資料,如人臉辨識、臉部表情偵測、疲勞駕駛偵測等等。最近許多臉部特徵點偵測方法傾向將臉部特徵點座標轉化成機率響應圖作為額外的訓練資訊或是透過生成機率響應圖輔助預測座標。
此法雖然能達到高精準度,卻因為需要處理龐大的額外資訊導致速度緩慢。
為此,本論文提出了不使用機率響應圖,直接預測座標的高效臉部特徵點偵測器。
本論文提出了一個基於注意力機制的細化模組,將此模組部署在原有架構的輸出端上,透過非局部注意力機制學習點與點之間的關係,並以此資訊將初始輸出進行細部調整,得到更精細的輸出。
此外,本論文整合了許多能讓網路更加精簡的設計,如透過可微分神經網路搜尋用於特徵學習的骨架網路、利用深度監視提早迴歸骨架網路等等。為了實現高速的推理速度,本論文在可微分神經網路搜尋中引入了FBNet所提出的運算時間限制項目,此計算項可用於限制推理時間,提高推理速度。深度監視則用於輔助特徵學習,幫助神經網路收斂。透過整合這些技術,本論文的方法達到相對高速且具有競爭力的準確率。
This thesis proposed a lightweight attention mechanism used in facial landmark detection, called the non-local refinement module, which could refine the original output, and make the prediction more accurate. The non-local refinement module utilizes non-local attention to learn the relation information between all points, via this information to refine the original prediction and obtain a more accurate refined prediction.
Integrate previous studies to improve the efficiency and accuracy of the model to obtain a great result. This thesis also utilizes differential neural architecture search (DNAS) to search for the backbone network. Moreover, to obtain an efficient backbone network, this thesis adopts the latency term used in FBNet, which will measure the latency of the operator, through limit the latency loss to reduce the inference time of the model. To get different scale information of feature maps, this thesis implements the feature fusion method, which could improve feature learning quality by fusing multi-scale features.
In order to solve the issue of slow convergence of the backbone network, this thesis adopts deep supervision, which would take the output from the middle hidden layer of the backbone, and then regress this early output to speed up the convergence.
Finally, this thesis performs many experiments to show the effectiveness and obtain excellent results with high inference speed on the different benchmarks.
D. Yi, Z. Lei, S. Liao, and S. Z. Li, “Learning face representation from scratch,” arXiv preprint arXiv:1411.7923, 2014.
A. Mollahosseini, B. Hasani, and M. H. Mahoor, “Affectnet: A database for facial expression, valence, and arousal computing in the wild,” IEEE Transactions on Affective Computing, vol. 10, no.1, p.18–31, 2017.
Y. Zhang, S. Zhang, Y. He, C. Li, C. C. Loy, and Z. Liu, “One-shot face reenactment,” arXiv preprint arXiv:1908.03251, 2019.
M. Kowalski, J. Naruniec, and T. Trzcinski, “Deep alignment network: A convolutional neural network for robust face alignment,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 88–97.
W. Wu, C. Qian, S. Yang, Q. Wang, Y. Cai, and Q. Zhou, “Look at boundary: A boundary-aware face alignment algorithm,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2129–2138.
J. Su, Z. Wang, C. Liao, and H. Ling, “Efficient and accurate face alignment by global regression and cascaded local refinement,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0–0.
J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141.
S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3–19.
X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7794–7803.
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520.
N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient cnn architecture design,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 116–131.
M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International Conference on Machine Learning. PMLR, 2019, pp. 6105–6114.
L. Wang, C.-Y. Lee, Z. Tu, and S. Lazebnik, “Training deeper convolutional networks with deep supervision,” arXiv preprint arXiv:1505.02496, 2015.
Y. Sun, X. Wang, and X. Tang, “Deep convolutional network cascade for facial point detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, p. 3476–3483.
Z. Zhang, P. Luo, C. C. Loy, and X. Tang, “Facial landmark detection by deep multi-task learning,” in European conference on computer vision. Springer, 2014, pp. 94–108.
Z.-H. Feng, J. Kittler, M. Awais, P. Huber, and X.-J. Wu, “Wing loss for robust facial landmark localisation with convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp.2235–2245.
G. Trigeorgis, P. Snape, M. A. Nicolaou, E. Antonakos, and S. Zafeiriou, “Mnemonic descent method: A recurrent process applied for end-to-end face alignment,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp.4177–4187.
X. Wang, L. Bo, and L. Fuxin, “Adaptive wing loss for robust face alignment via heatmap regression,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6971–6981.
Y. Xiong, Z. Zhou, Y. Dou, and Z. Su, “Gaussian vector: An efficient solution for facial landmark detection,” in Proceedings of the Asian Conference on Computer Vision, 2020.
Y. Huang, H. Yang, C. Li, J. Kim, and F. Wei, “Adnet: Leveraging error-bias towards normal direction in face alignment,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3080–3090.
X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6848–6856.
B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning,” arXiv preprint arXiv:1611.01578, 2016.
C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L.-J. Li, L. FeiFei, A. Yuille, J. Huang, and K. Murphy, “Progressive neural architecture search,” in Proceedings of the European conference on computer vision (ECCV), 2018, p. 19–34.
H. Liu, K. Simonyan, and Y. Yang, “Darts: Differentiable architecture search,” arXiv preprint arXiv:1806.09055, 2018.
B. Wu, X. Dai, P. Zhang, Y. Wang, F. Sun, Y. Wu, Y. Tian, P. Vajda, Y. Jia, and K. Keutzer, “Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 10 734–10 742.
Q. Hou, D. Zhou, and J. Feng, “Coordinate attention for efficient mobile network design,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13 713–13 722.
C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, “300 faces in-the-wild challenge: The first facial landmark localization challenge,” in Proceedings of the IEEE international conference on computer vision workshops, 2013, pp. 397–403.
X. Zhu and D. Ramanan, “Face detection, pose estimation, and landmark localization in the wild,” in 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012, pp. 2879–2886.
P. N. Belhumeur, D. W. Jacobs, D. J. Kriegman, and N. Kumar, “Localizing parts of faces using a consensus of exemplars,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 12, pp. 2930–2940, 2013.
V. Le, J. Brandt, Z. Lin, L. Bourdev, and T. S. Huang, “Interactive facial feature localization,” in European conference on computer vision. Springer, 2012, pp. 679–692.
C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, “300 faces in-the-wild challenge: The first facial landmark localization challenge,” in Proceedings of the IEEE international conference on computer vision workshops, 2013, pp. 397–403.
S. Yang, P. Luo, C.-C. Loy, and X. Tang, “Wider face: A face detection benchmark,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5525–5533.
R. Valle, J. M. Buenaposada, A. Valdes, and L. Baumela, “A deeply-initialized coarse-to-fine ensemble of regression trees for face alignment,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 585–601.
X. Cao, Y. Wei, F. Wen, and J. Sun, “Face alignment by explicit shape regression,” International journal of computer vision, vol. 107, no. 2, pp. 177–190, 2014.
X. Xiong and F. De la Torre, “Supervised descent method and its applications to face alignment,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 532–539.
S. Zhu, C. Li, C. Change Loy, and X. Tang, “Face alignment by coarse-to-fine shape searching,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 4998–5006.
W. Wu and S. Yang, “Leveraging intra and inter-dataset variations for robust face alignment,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 150–159.