簡易檢索 / 詳目顯示

研究生: 高士鈞
Kao, Shih-Chun
論文名稱: 應用於輕量化人臉偵測之多任務元學習演算法
Multi-Task Meta-Learning for Lightweight Face Detection
指導教授: 謝明得
Shieh, Ming-Der
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 68
中文關鍵詞: 人臉偵測元學習小樣本學習機器學習電腦視覺
外文關鍵詞: face detection, meta-learning, few-shot learning, machine learning, computer vision
相關次數: 點閱:93下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著機器學習在電腦視覺領域的成熟,近年來也開始出現了許多針對低成本、低功耗邊緣裝置開發的機器學習演算法。本論文聚焦在影像相關的應用上,參考歷年來被熱烈討論的物件偵測網路,在所設定的精確度條件下開發出一套低複雜度的人臉偵測演算法系統。
    除此之外,我們更進一步強調系統在各種不同環境下皆能夠保有其表現,對此我們引入了元學習領域中模型無關學習(Model-Agnostic Meta-Learning, MAML)的概念強化模型的適應能力,模型的訓練將專注於找到更好的初始參數以利於快速移植到新任務上,有了這個機制,我們確保模型能夠針對不同環境條件自我調整,對各種問題客制化出專責模型。而後我們更進一步基於MAML的基本概念,引入了多任務學習理論,提出了多任務元學習(Multi-Task Meta-Learning, MTML),透過於元學習過程中加入額外的次要任務,讓模型能夠在學習主要任務中的特徵以外,同時掌握多樣化的特徵作為未來適應到新任務的重要材料,以此加強模型的適應能力,以此訓練出來的模型在我們的系統中負責把關一般性人臉偵測器無法處理的非理想樣本,透過這兩類模型的合作,我們得以建構出全面性的人臉偵測系統。
    最後,本論文分為兩部分評估理論的正確性,首先我們使用CIFAR10數據集實驗MTML在一般影像分類問題上的可行性,相對於參考演算法MAML,MTML確實能強化模型對問題的掌握度;而後,我們在先前所提出的輕量化人臉偵測系統上實裝MTML,證明了其對於不同非理想場景皆能有更好的表現,且相比於現有的其他演算法,我們的系統有更低的參數量、更高的召回率,更適合於現實生活中的前端恆在線(always-on)應用情境。

    As machine learning matures in computer vision, many machine learning algorithms developed for low-cost, low-power edge devices have begun to emerge in recent years. In this thesis, we investigated object detection networks that have been actively discussed over the years to develop a low-complexity face detection system under the constraints of the predefined accuracy and recall rate.
    In addition, we also emphasized developing systems that can maintain their performance in various application environments, and introduced the concept of model-agnostic meta-learning (MAML) in the field of meta-learning to enhance the adaptability of the model. That is, the training of the model is focused on finding better initial parameters to facilitate rapid transfer to new tasks. Through this mechanism, we ensure that the model can adjust itself for different environmental conditions, thus providing customized models for various problems. Based on the basic concepts of MAML, we adopted the theory of multi-task learning and proposed multi-task meta-learning (MTML) to deal with special operating conditions. By introducing additional regularization tasks in the meta-learning process, the model can learn the features in the main task while mastering diverse features as important materials for adapting to new tasks in the future, thereby enhancing the adaptability of the model. After that, the model is trained with few shots to handle the non-ideal samples that fail the original face detector in our target system. Through the cooperation of these two types of models, we can build a comprehensive face detection system.
    Finally, we have evaluated the correctness of the theory in two parts. First, we adopted the CIFAR10 dataset to test the feasibility of MTML in general image classification problems. Compared with the baseline algorithm MAML, MTML can indeed strengthen the model's grasp of the problem. Then, we implemented MTML on the previously proposed lightweight face detection system, and showed that the model trained by proposed MTML can perform better in different non-ideal scenarios. Compared with related algorithms, the developed system exhibits better performance in terms of lower parameter number and higher recall rate which is suitable for frontend always-on application scenarios.

    摘要 i Abstract iii 致謝 v Content vi List of Tables viii List of Figures ix Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Thesis Overview 4 1.3 Thesis Organization 4 Chapter 2 Background 6 2.1 Iconic Face Detectors 6 2.1.1 Traditional Cascade Face Detectors 6 2.1.2 CNN-based Cascade Face Detectors 8 2.2 Meta-Learning 11 2.2.1 Metric-based Meta-Learning Methods 11 2.2.2 Optimized-based Meta-Learning Methods 13 Chapter 3 Proposed Multi-Task Meta-Learning for Lightweight Face Detection 18 3.1 Proposed Lightweight Face Detector 18 3.1.1 System Structure 18 3.1.2 Network Architecture 19 3.1.3 Meta-Learning the Face Detector 21 3.2 Proposed Multi-Task Meta-Learning (MTML) 24 3.2.1 Concept and Training Framework of MTML 24 3.2.2 Loss Function of MTML 28 3.2.3 Other implementation details 31 Chapter 4 Experimental Evaluation and Results Comparison 35 4.1 Experimental Evaluation of MTML Fundamental 35 4.1.1 Experiment Setup 35 4.1.2 Generality Test – Illumination Differences 36 4.1.3 Generality Test – Other Differences 40 4.1.4 Comparison with Other MAML Research 42 4.1.5 Ablation Studies 47 4.2 Experimental Evaluation of Face Detection with MTML 49 4.2.1 System Setup 49 4.2.2 Evaluation Metrics 51 4.2.3 Comparison of Different Task Balance Factor 52 4.2.4 Comparison of Different Primary Task 56 4.3 System Implementation and Comparison 58 4.3.1 Performance of Complete Framework 58 4.3.2 System Implementation on PC-based System 62 4.3.3 System Implementation on Ultra-low Power Embedded System 64 Chapter 5 Conclusion and Future Work 65 5.1 Conclusion 65 5.2 Future work 66 References 67

    [1] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems 25 (2012).
    [2] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).\n[3] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
    [4] Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009.
    [5] Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." International conference on machine learning. PMLR, 2017.
    [6] Viola, Paul, and Michael Jones. "Rapid object detection using a boosted cascade of simple features." Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001. Vol. 1. Ieee, 2001.
    [7] Krizhevsky, Alex, and Geoffrey Hinton. "Learning multiple layers of features from tiny images." (2009): 7.
    [8] Zhang, Kaipeng, et al. "Joint face detection and alignment using multitask cascaded convolutional networks." IEEE signal processing letters 23.10 (2016): 1499-1503.
    [9] Freund, Yoav, and Robert E. Schapire. "Experiments with a new boosting algorithm." icml. Vol. 96. 1996.
    [10] Li, Haoxiang, et al. "A convolutional neural network cascade for face detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
    [11] Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. "Siamese neural networks for one-shot image recognition." ICML deep learning workshop. Vol. 2. 2015.
    [12] Snell, Jake, Kevin Swersky, and Richard Zemel. "Prototypical networks for few-shot learning." Advances in neural information processing systems 30 (2017).
    [13] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." International conference on machine learning. PMLR, 2015.
    [14] Triantafyllidou, Danai, Paraskevi Nousi, and Anastasios Tefas. "Fast deep convolutional face detection in the wild exploiting hard sample mining." Big data research 11 (2018): 65-76.
    [15] Yang, Shuo, et al. "Wider face: A face detection benchmark." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
    [16] Ruder, Sebastian. "An overview of multi-task learning in deep neural networks." arXiv preprint arXiv:1706.05098 (2017).
    [17] Glorot, Xavier, and Yoshua Bengio. "Understanding the difficulty of training deep feedforward neural networks." Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2010.
    [18] Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016).
    [19] Van der Maaten, Laurens, and Geoffrey Hinton. "Visualizing data using t-SNE." Journal of machine learning research 9.11 (2008).

    下載圖示 校內:2025-09-12公開
    校外:2025-09-12公開
    QR CODE