簡易檢索 / 詳目顯示

研究生: 王羽合
Wang, Yu-He
論文名稱: 非局部注意力細化機制網路用於臉部特徵點偵測
NRNet: Non-local Refinement Network for Facial Landmark Detection
指導教授: 陳奇業
Chen, Chi-Yeh
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 45
中文關鍵詞: 人臉關鍵點偵測注意力機制高效偵測器
外文關鍵詞: Facial landmark detection, Attention mechansim, Efficient architecture
相關次數: 點閱:105下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,隨著深度學習的發展,臉部相關影像技術也發展得越來越快。其中有許多技術需要使用到臉部特徵點作為輸入資料,如人臉辨識、臉部表情偵測、疲勞駕駛偵測等等。最近許多臉部特徵點偵測方法傾向將臉部特徵點座標轉化成機率響應圖作為額外的訓練資訊或是透過生成機率響應圖輔助預測座標。
    此法雖然能達到高精準度,卻因為需要處理龐大的額外資訊導致速度緩慢。
    為此,本論文提出了不使用機率響應圖,直接預測座標的高效臉部特徵點偵測器。
    本論文提出了一個基於注意力機制的細化模組,將此模組部署在原有架構的輸出端上,透過非局部注意力機制學習點與點之間的關係,並以此資訊將初始輸出進行細部調整,得到更精細的輸出。
    此外,本論文整合了許多能讓網路更加精簡的設計,如透過可微分神經網路搜尋用於特徵學習的骨架網路、利用深度監視提早迴歸骨架網路等等。為了實現高速的推理速度,本論文在可微分神經網路搜尋中引入了FBNet所提出的運算時間限制項目,此計算項可用於限制推理時間,提高推理速度。深度監視則用於輔助特徵學習,幫助神經網路收斂。透過整合這些技術,本論文的方法達到相對高速且具有競爭力的準確率。

    This thesis proposed a lightweight attention mechanism used in facial landmark detection, called the non-local refinement module, which could refine the original output, and make the prediction more accurate. The non-local refinement module utilizes non-local attention to learn the relation information between all points, via this information to refine the original prediction and obtain a more accurate refined prediction.
    Integrate previous studies to improve the efficiency and accuracy of the model to obtain a great result. This thesis also utilizes differential neural architecture search (DNAS) to search for the backbone network. Moreover, to obtain an efficient backbone network, this thesis adopts the latency term used in FBNet, which will measure the latency of the operator, through limit the latency loss to reduce the inference time of the model. To get different scale information of feature maps, this thesis implements the feature fusion method, which could improve feature learning quality by fusing multi-scale features.
    In order to solve the issue of slow convergence of the backbone network, this thesis adopts deep supervision, which would take the output from the middle hidden layer of the backbone, and then regress this early output to speed up the convergence.
    Finally, this thesis performs many experiments to show the effectiveness and obtain excellent results with high inference speed on the different benchmarks.

    摘要 i Abstract ii 誌謝 iii Table of Contents iv List of Tables vi List of Figures vii Chapter 1. Introduction 1 1.1. Background 1 1.2. Motivation 3 1.3. Contribution 3 Chapter 2. Related Work 4 2.1. Coordinate-Based Method and Heatmap-Based Method 4 2.1.1. Coordinate-Based Method 4 2.1.2. Heatmap-Based Method 5 2.2. Efficient Architecture 6 2.3. Neural Architecture Search 8 2.3.1. Neural Architecture Search with Reinforcement Learning 8 2.3.2. Differentiable NAS 9 2.4. Attention Mechanism 9 Chapter 3. Method 12 3.1. Backbone 13 3.2. Deep Supervision 17 3.3. Non-local Refinement Module 18 3.3.1. Detail of non-local refinement 18 3.3.2. Activation function 26 3.4. Loss Function 28 3.5. Algorithm 29 Chapter 4. Experiments 30 4.1. Dataset 30 4.1.1. 300W 30 4.1.2. WFLW 32 4.2. Evaluation Metric 33 4.3. Implementation details 34 4.4. Evaluation on Benchmark 34 4.5. Visualize Result 36 4.6. Ablation Study 38 Chapter 5. Conclusion 42 References 43

    D. Yi, Z. Lei, S. Liao, and S. Z. Li, “Learning face representation from scratch,” arXiv preprint arXiv:1411.7923, 2014.
    A. Mollahosseini, B. Hasani, and M. H. Mahoor, “Affectnet: A database for facial expression, valence, and arousal computing in the wild,” IEEE Transactions on Affective Computing, vol. 10, no.1, p.18–31, 2017.
    Y. Zhang, S. Zhang, Y. He, C. Li, C. C. Loy, and Z. Liu, “One-shot face reenactment,” arXiv preprint arXiv:1908.03251, 2019.
    M. Kowalski, J. Naruniec, and T. Trzcinski, “Deep alignment network: A convolutional neural network for robust face alignment,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 88–97.
    W. Wu, C. Qian, S. Yang, Q. Wang, Y. Cai, and Q. Zhou, “Look at boundary: A boundary-aware face alignment algorithm,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2129–2138.
    J. Su, Z. Wang, C. Liao, and H. Ling, “Efficient and accurate face alignment by global regression and cascaded local refinement,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0–0.
    J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141.
    S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3–19.
    X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7794–7803.
    M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520.
    N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient cnn architecture design,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 116–131.
    M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International Conference on Machine Learning. PMLR, 2019, pp. 6105–6114.
    L. Wang, C.-Y. Lee, Z. Tu, and S. Lazebnik, “Training deeper convolutional networks with deep supervision,” arXiv preprint arXiv:1505.02496, 2015.
    Y. Sun, X. Wang, and X. Tang, “Deep convolutional network cascade for facial point detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, p. 3476–3483.
    Z. Zhang, P. Luo, C. C. Loy, and X. Tang, “Facial landmark detection by deep multi-task learning,” in European conference on computer vision. Springer, 2014, pp. 94–108.
    Z.-H. Feng, J. Kittler, M. Awais, P. Huber, and X.-J. Wu, “Wing loss for robust facial landmark localisation with convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp.2235–2245.
    G. Trigeorgis, P. Snape, M. A. Nicolaou, E. Antonakos, and S. Zafeiriou, “Mnemonic descent method: A recurrent process applied for end-to-end face alignment,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp.4177–4187.
    X. Wang, L. Bo, and L. Fuxin, “Adaptive wing loss for robust face alignment via heatmap regression,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6971–6981.
    Y. Xiong, Z. Zhou, Y. Dou, and Z. Su, “Gaussian vector: An efficient solution for facial landmark detection,” in Proceedings of the Asian Conference on Computer Vision, 2020.
    Y. Huang, H. Yang, C. Li, J. Kim, and F. Wei, “Adnet: Leveraging error-bias towards normal direction in face alignment,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3080–3090.
    X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6848–6856.
    B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning,” arXiv preprint arXiv:1611.01578, 2016.
    C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L.-J. Li, L. FeiFei, A. Yuille, J. Huang, and K. Murphy, “Progressive neural architecture search,” in Proceedings of the European conference on computer vision (ECCV), 2018, p. 19–34.
    H. Liu, K. Simonyan, and Y. Yang, “Darts: Differentiable architecture search,” arXiv preprint arXiv:1806.09055, 2018.
    B. Wu, X. Dai, P. Zhang, Y. Wang, F. Sun, Y. Wu, Y. Tian, P. Vajda, Y. Jia, and K. Keutzer, “Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 10 734–10 742.
    Q. Hou, D. Zhou, and J. Feng, “Coordinate attention for efficient mobile network design,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13 713–13 722.
    C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, “300 faces in-the-wild challenge: The first facial landmark localization challenge,” in Proceedings of the IEEE international conference on computer vision workshops, 2013, pp. 397–403.
    X. Zhu and D. Ramanan, “Face detection, pose estimation, and landmark localization in the wild,” in 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012, pp. 2879–2886.
    P. N. Belhumeur, D. W. Jacobs, D. J. Kriegman, and N. Kumar, “Localizing parts of faces using a consensus of exemplars,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 12, pp. 2930–2940, 2013.
    V. Le, J. Brandt, Z. Lin, L. Bourdev, and T. S. Huang, “Interactive facial feature localization,” in European conference on computer vision. Springer, 2012, pp. 679–692.
    C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, “300 faces in-the-wild challenge: The first facial landmark localization challenge,” in Proceedings of the IEEE international conference on computer vision workshops, 2013, pp. 397–403.
    S. Yang, P. Luo, C.-C. Loy, and X. Tang, “Wider face: A face detection benchmark,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5525–5533.
    R. Valle, J. M. Buenaposada, A. Valdes, and L. Baumela, “A deeply-initialized coarse-to-fine ensemble of regression trees for face alignment,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 585–601.
    X. Cao, Y. Wei, F. Wen, and J. Sun, “Face alignment by explicit shape regression,” International journal of computer vision, vol. 107, no. 2, pp. 177–190, 2014.
    X. Xiong and F. De la Torre, “Supervised descent method and its applications to face alignment,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 532–539.
    S. Zhu, C. Li, C. Change Loy, and X. Tang, “Face alignment by coarse-to-fine shape searching,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 4998–5006.
    W. Wu and S. Yang, “Leveraging intra and inter-dataset variations for robust face alignment,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 150–159.

    無法下載圖示 校內:2027-08-17公開
    校外:2027-08-17公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE