| 研究生: |
陳珏安 Chen, Jyue-An |
|---|---|
| 論文名稱: |
基於師生模型之間的訊息差異進行域外偵測 Based on Information Discrepancy between Teacher-student Model for Out-of-domain Detection |
| 指導教授: |
蔣榮先
Chiang, Jung-Hsien |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 英文 |
| 論文頁數: | 46 |
| 中文關鍵詞: | 師生模型 、域外偵測 、可信賴人工智慧系統 |
| 外文關鍵詞: | Teacher Student model, Out-of-domain detection, Reliable AI system |
| 相關次數: | 點閱:136 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
域外檢測是建置一個可信賴人工智慧系統的重要議題。現行的人工智慧分類模型是從一批訓練資料中學習各個預定義類別的特徵,在測試階段,模型預測輸入資料屬於哪一種預定義類別。由於模型不具備感知未定義類別資料的能力,當輸入資料不屬於任一種預定義類別,模型有可能極具信心地輸出一種預定義類別的預測,這是我們不樂見的現象,尤其當我們要將模型應用在醫療、金融等高敏感領域。基於上述問題,人們因為無法預期模型對域外資料的預測結果而產生對其安全性的擔憂。
本研究提出域內特徵注意力師生框架偵測輸入的資料是否屬於域外資料,老師模型是一個見多識廣的預訓練模型,其知道如何抓取域內外特徵。學生模型向老師模型學習如何抓取域內特徵。藉由比較兩個模型對輸入資料抓取的特徵差異判斷資料是否屬於域外資料。為了確保老師模型教導的特徵是域內特徵,本研究將結合注意力模組至老師模型,並透過代理任務訓練注意力模組,以此提高老師模型對域內特徵的注意力。
本研究進行了四個實驗用以評估本研究所提出框架的表現、強健性及實用性。在域外檢測的基準資料集實驗中,本研究提出的框架上表現優於其他比較方法。由於基準資料集過於簡單,本研究另外設計了一個域內與域外資料相似性高的複雜資料集測試本研究提出的框架表現。另外,強健性的實驗證明本研究提出的框架不易受各種數據噪聲的影響,是一個具強健性的框架。最後,為了驗證本研究提出的框架能夠實際應用在真實場域中,本研究進行檢測中文手寫文字是否是域外資料的實驗,這個實驗發想於金融業需要自動化紀錄大量各式單據的需求。在本研究的四個實驗中,本研究提出的模型表現皆大幅度優於前人研究。
This research conducts the out-of-domain (OOD) detection problem in images. Out-of-domain detection is an essential issue for reliable AI systems. The current deep learning model lacks the ability to know the unknown. When we feed the out-of-domain data that doesn’t belong to any of the training categories, the model will predict one of the training categories. We can’t expect the model prediction of the out-of-domain data. Unpredictable deep learning model behavior of the out-of-domain data raises concerns about model reliability. This shortcoming makes people hesitate to apply the deep learning-based solutions to the real world, especially in sensitive fields such as medical diagnosis or financial applications.
This research proposes an in-domain feature-aware teacher-student framework. The teacher model characterizes a complete feature space, and the student model is a pure in-domain feature extractor. This research designs our OODness score as the difference of internal activations extracted by the two models. In order to ensure that the features transferred by the teacher model are in-domain features, this research combines the attention module to the teacher model, and trains the attention module through the pretext task, so as to improve the teacher model's attention to the in-domain features.
This research designs four experiments to evaluate (1) performance, (2) robustness, and (3) practicality. Overall, this research proposed framework outperforms baseline methods and shows positive results of both model robustness experiment and framework practicality experiment. Notably, this research proposed framework achieves a substantial performance improvement of about 10% in the tough experiment which evaluates framework performance under a complex dataset that characterizes the high similarity of the in-domain and out-of-domain data. Besides, the experiment also demonstrates that this research proposed framework is a robust framework in various types of data corruption.
Abati, D., Porrello, A., Calderara, S., & Cucchiara, R. (2019). Latent space autoregression for novelty detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 481–490.
Balduzzi, D., Frean, M., Leary, L., Lewis, J. P., Ma, K. W.-D., & McWilliams, B. (2017). The shattered gradients problem: If resnets are the answer, then what is the question? International Conference on Machine Learning, 342–350.
Bergmann, P., Fauser, M., Sattlegger, D., & Steger, C. (2020). Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4183–4192.
Bergmann, P., Löwe, S., Fauser, M., Sattlegger, D., & Steger, C. (2018). Improving unsupervised defect segmentation by applying structural similarity to autoencoders. ArXiv Preprint ArXiv:1807.02011.
Buciluǎ, C., Caruana, R., & Niculescu-Mizil, A. (2006). Model compression. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 535–541.
Charpentier, B., Zügner, D., & Günnemann, S. (2020). Posterior network: Uncertainty estimation without ood samples via density-based pseudo-counts. Advances in Neural Information Processing Systems, 33, 1356–1367.
Chen, J., Li, B., & Xue, X. (2021). Zero-Shot Chinese Character Recognition with Stroke-Level Decomposition. ArXiv Preprint ArXiv:2106.11613.
Chen, T., Kornblith, S., Swersky, K., Norouzi, M., & Hinton, G. E. (2020). Big self-supervised models are strong semi-supervised learners. Advances in Neural Information Processing Systems, 33, 22243–22255.
Chen, Y., Tian, Y., Pang, G., & Carneiro, G. (2021). Deep one-class classification via interpolated gaussian descriptor. ArXiv Preprint ArXiv:2101.10043.
Chrabaszcz, P., Loshchilov, I., & Hutter, F. (2017). A downsampled variant of imagenet as an alternative to the cifar datasets. ArXiv Preprint ArXiv:1707.08819.
Gangal, V., Arora, A., Einolghozati, A., & Gupta, S. (2020). Likelihood ratios and generative classifiers for unsupervised out-of-domain detection in task oriented dialog. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 7764–7771.
Golan, I., & El-Yaniv, R. (2018). Deep anomaly detection using geometric transformations. Advances in Neural Information Processing Systems, 31.
Gong, D., Liu, L., Le, V., Saha, B., Mansour, M. R., Venkatesh, S., & Hengel, A. van den. (2019). Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. Proceedings of the IEEE/CVF International Conference on Computer Vision, 1705–1714.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778.
Hendrycks, D., & Dietterich, T. G. (2018). Benchmarking neural network robustness to common corruptions and surface variations. ArXiv Preprint ArXiv:1807.01697.
Hendrycks, D., & Gimpel, K. (2016). A baseline for detecting misclassified and out-of-distribution examples in neural networks. ArXiv Preprint ArXiv:1610.02136.
Hendrycks, D., Mu, N., Cubuk, E. D., Zoph, B., Gilmer, J., & Lakshminarayanan, B. (2019). Augmix: A simple data processing method to improve robustness and uncertainty. ArXiv Preprint ArXiv:1912.02781.
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. ArXiv Preprint ArXiv:1503.02531, 2(7).
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7132–7141.
Li, C.-L., Sohn, K., Yoon, J., & Pfister, T. (2021). Cutpaste: Self-supervised learning for anomaly detection and localization. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9664–9674.
Papadopoulos, A.-A., Rajati, M. R., Shaikh, N., & Wang, J. (2021). Outlier exposure with confidence control for out-of-distribution detection. Neurocomputing, 441, 138–150.
Ruff, L., Vandermeulen, R., Goernitz, N., Deecke, L., Siddiqui, S. A., Binder, A., Müller, E., & Kloft, M. (2018). Deep one-class classification. International Conference on Machine Learning, 4393–4402.
Sabokrou, M., Khalooei, M., Fathy, M., & Adeli, E. (2018). Adversarially learned one-class classifier for novelty detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3379–3388.
Salehi, M., Sadjadi, N., Baselizadeh, S., Rohban, M. H., & Rabiee, H. R. (2021). Multiresolution knowledge distillation for anomaly detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14902–14912.
Tack, J., Mo, S., Jeong, J., & Shin, J. (2020). Csi: Novelty detection via contrastive learning on distributionally shifted instances. Advances in Neural Information Processing Systems, 33, 11839–11852.
Tarvainen, A., & Valpola, H. (2017). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in Neural Information Processing Systems, 30.
Wang, D.-B., Feng, L., & Zhang, M.-L. (2021). Rethinking Calibration of Deep Neural Networks: Do Not Be Afraid of Overconfidence. Advances in Neural Information Processing Systems, 34.
Yi, J., & Yoon, S. (2020). Patch svdd: Patch-level svdd for anomaly detection and segmentation. Proceedings of the Asian Conference on Computer Vision.
Yim, J., Joo, D., Bae, J., & Kim, J. (2017). A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4133–4141.
Yin, F., Wang, Q.-F., Zhang, X.-Y., & Liu, C.-L. (2013). ICDAR 2013 Chinese handwriting recognition competition. 2013 12th International Conference on Document Analysis and Recognition, 1464–1470.
Zagoruyko, S., & Komodakis, N. (2016). Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. ArXiv Preprint ArXiv:1612.03928.
Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 586–595.
Zhao, Y., Wang, S., & Xiao, F. (2013). Pattern recognition-based chillers fault detection method using support vector data description (SVDD). Applied Energy, 112, 1041–1048.