| 研究生: |
蔡仕宸 Tsai, Shih-Chen |
|---|---|
| 論文名稱: |
吞嚥困難的早期檢測:基於噪聲標籤的聲音分析 Early detection of dysphagia through sound analysis with noisy labels |
| 指導教授: |
藍崑展
Lan, Kun-Chan |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2024 |
| 畢業學年度: | 112 |
| 語文別: | 英文 |
| 論文頁數: | 196 |
| 中文關鍵詞: | 吞嚥困難 、語音分析 、深度學習 、噪聲標籤 |
| 外文關鍵詞: | Dysphagia, Audio Analysis, Deep Learning, Noise Labels |
| 相關次數: | 點閱:89 下載:8 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
目前有研究已表明吞嚥困難是許多老年疾病的併發症,例如:中風、神經退化性疾病、肌肉萎縮等等,隨著當前社會人口老化的趨勢,吞嚥困難在老年族群中也越來越普遍,並且有研究對多個國家進行統合分析指出目前吞嚥困難已影響著約 30% 的社區老年人口和近 50% 的老年患者,因此,吞嚥困難的診斷和治療是老年保健的關鍵問題。
現今臨床上的吞嚥困難診斷方法有著顯著的缺點,包括侵入性、耗時以及需要專業醫療專業人員的評估,這使得目前吞嚥困難的診斷過程相對複雜,為了應對這個問題,以往的研究採用了一種基於深度學習的語音診斷方法。此診斷方法的優點在於其非侵入性、操作簡便以及早期發現的潛力。
然而,先前深度學習的語音診斷研究中存在著一個問題,也就是準確率不高,為了解決這個問題,我們開始觀察現有的吞嚥聲資料集中是否存在著異常,隨後我們發現資料集中病人有時候的吞嚥聲樣本有點像吞嚥困難又有點不像,而正常人的吞嚥聲樣本同樣也有這種問題,但是在標籤的過程中只會根據受試者是否為吞嚥困難患者來做為給予吞嚥聲樣本標籤的依據,這樣會導致這些介於模糊地帶的樣本有機會被分配到錯誤的標籤,基於這些觀察,我們提出了一個假設是之前的研究準確率不高的原因是因為資料集中存在著噪聲標籤。
總的來說,我們的論文主要貢獻在於證實並解決噪聲標籤的假設,我們的研究探討了噪聲標籤在吞嚥困難語音診斷領域的應用,並且為了進一步確定哪種噪聲標籤演算法可以有效提高模型的準確率,我們從現有演算法中選擇了五種代表性演算法,並仔細比較了它們的結果,最終,我們發現噪聲標籤演算法的有效性可以歸類於兩個因素,第一個為是否篩選出正確的噪聲樣本,以及第二個是否正確的刪除或重新標籤噪聲樣本。
Research has shown that dysphagia is a complication of many geriatric diseases, such as stroke, neurodegenerative diseases, and muscular atrophy. With an aging population, dysphagia is becoming increasingly prevalent among the elderly, affecting about 30% of community-dwelling elderly and nearly 50% of elderly patients. Therefore, diagnosis and treatment of dysphagia are critical in elderly healthcare.
Current clinical methods for diagnosing dysphagia are invasive, time-consuming, and require professional medical personnel, making the diagnostic process complex. To address this, previous research adopted a deep learning-based audio diagnostic method, which is non-invasive, easy to operate, and potentially allows for early detection.
However, previous deep learning speech diagnostic studies have shown low accuracy. We observed anomalies in existing datasets of swallowing sounds, where samples from patients and healthy individuals sometimes appeared ambiguous. During labeling, samples were only categorized based on whether the subject had dysphagia, leading to potential mislabeling. We hypothesized that these noisy labels caused the low accuracy in previous studies.
Our research focuses on verifying and addressing the issue of noisy labels. We explored the application of noisy label algorithms in speech diagnosis for dysphagia, comparing five representative algorithms. We found that the effectiveness of these algorithms depends on correctly identifying noisy samples and either deleting or relabeling them accurately.
[1] Rajati, F., Ahmadi, N., Naghibzadeh, Z. A., & Kazeminia, M. (2022). The global prevalence of oropharyngeal dysphagia in different populations: a systematic review and meta-analysis. J Transl Med, 20(1), 175. https://doi.org/10.1186/s12967-022-03380-0
[2] Kaye, G. M., Zorowitz, R. D., & Baredes, S. (1997). Role of flexible laryngoscopy in evaluating aspiration. Ann Otol Rhinol Laryngol, 106(8), 705-709. https://doi.org/10.1177/000348949710600817
[3] Gao, H., Li, X., & Wang, C. (2020). Pharyngeal perforation following laryngoscopy in a patient with dysphagia secondary to diffuse idiopathic skeletal hyperostosis: A case report. Medicine (Baltimore), 99(31), e21526. https://doi.org/10.1097/md.0000000000021526
[4] El Fassi, N., Pavy le Traon, A., Mouchon, E., Rascol, O., Meissner, W. G., Foubert-Saumier, A., Gallois, Y., Tessier, S., Ory-Magne, F., Woisard, V. A., & Fabbri, M. (2023). UMSARS Versus Laryngoscopy-Based Assessment of Dysphagia. Mov Disord Clin Pract, 10(6), 974-979. https://doi.org/10.1002/mdc3.13734
[5] Parise Junior, O., Miguel, R. E., Gomes, D. L., Menon, A. D., & Hashiba, K. (2004). Laryngeal sensitivity evaluation and dysphagia: Hospital Sírio-Libanês experience. Sao Paulo Med J, 122(5), 200-203. https://doi.org/10.1590/s1516-31802004000500004
[6] Krishnamurthy, C., Hilden, K., Peterson, K. A., Mattek, N., Adler, D. G., & Fang, J. C. (2012). Endoscopic findings in patients presenting with dysphagia: analysis of a national endoscopy database. Dysphagia, 27(1), 101-105. https://doi.org/10.1007/s00455-011-9346-0
[7] Makino, H., Yoshida, H., & Uchida, E. (2015). Endoscopy for diseases with esophageal dysphagia. Seminars in Dysphagia,
[8] Martin-Harris, B., & Jones, B. (2008). The videofluorographic swallowing study. Physical medicine and rehabilitation clinics of North America, 19(4), 769-785.
[9] Miguel, P. R., da Rosa, A. L. M., Reusch, M., & Aguzzoli, M. (1999). Esophageal manometry and 24-hour pH monitoring to evaluate laparoscopic Lind fundoplication in gastroesophageal reflux disease. JSLS: Journal of the Society of Laparoendoscopic Surgeons, 3(3), 197.
[10] Abbas, A. M., Medani, S., Abdallah, T. M., & Gasim, G. I. (2016). Clinical Utility of Esophageal manometry in the patients with dysphagia–Experience from Sudan. International Journal of Health Sciences, 10(4), 522.
[11] Yamashita, M., Yokoyama, K., Takei, Y., Furuya, N., Nakamichi, Y., Ihara, Y., Takahashi, K., & Groher, M. E. (2014). Acoustic characteristics of voluntary expiratory sounds after swallow for detecting dysphagia. Journal of oral rehabilitation, 41(9), 667-674.
[12] Santamato, A., Panza, F., Solfrizzi, V., Russo, A., Frisardi, V., Megna, M., Ranieri, M., & Fiore, P. (2009). Acoustic analysis of swallowing sounds: a new technique for assessing dysphagia. Journal of rehabilitation medicine, 41(8), 639-645.
[13] Kim, J.-M., Kim, M.-S., Choi, S.-Y., & Ryu, J. S. (2024). Prediction of dysphagia aspiration through machine learning-based analysis of patients’ postprandial voices. Journal of NeuroEngineering and Rehabilitation, 21(1), 43.
[14] Basiri, B., Vali, M., & Agah, S. (2017). Classification of normal and dysphagia in patients with GERD using swallowing sound analysis. 2017 Artificial Intelligence and Signal Processing Conference (AISP),
[15] Nakamori, M., Ishikawa, R., Watanabe, T., Toko, M., Naito, H., Takahashi, T., Simizu, Y., Yamazaki, Y., & Maruyama, H. (2023). Swallowing sound evaluation using an electronic stethoscope and artificial intelligence analysis for patients with amyotrophic lateral sclerosis. Frontiers in neurology, 14, 1212024.
[16] Pahar, M., Klopper, M., Warren, R., & Niesler, T. (2021). COVID-19 cough classification using machine learning and global smartphone recordings. Computers in Biology and Medicine, 135, 104572.
[17] Srivastava, A., Jain, S., Miranda, R., Patil, S., Pandya, S., & Kotecha, K. (2021). Deep learning based respiratory sound analysis for detection of chronic obstructive pulmonary disease. PeerJ Computer Science, 7, e369.
[18] Lella, K. K., & Pja, A. (2021). Automatic COVID-19 disease diagnosis using 1D convolutional neural network and augmentation with human respiratory sound based on parameters: cough, breath, and voice. AIMS public health, 8(2), 240.
[19] He, L., & Cao, C. (2018). Automated depression analysis using convolutional neural networks from speech. Journal of biomedical informatics, 83, 103-111.
[20] Ozkanca, Y., Göksu Öztürk, M., Ekmekci, M. N., Atkins, D. C., Demiroglu, C., & Hosseini Ghomi, R. (2019). Depression screening from voice samples of patients affected by parkinson’s disease. Digital biomarkers, 3(2), 72-82.
[21] Cohen, R., Ruinskiy, D., Zickfeld, J., IJzerman, H., & Lavner, Y. (2020). Baby cry detection: deep learning and classical approaches. Development and analysis of deep learning architectures, 171-196.
[22] Brown, C., Chauhan, J., Grammenos, A., Han, J., Hasthanasombat, A., Spathis, D., Xia, T., Cicuta, P., & Mascolo, C. (2020). Exploring automatic diagnosis of COVID-19 from crowdsourced respiratory sound data. Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining,
[23] Karabayir, I., Goldman, S. M., Pappu, S., & Akbilgic, O. (2020). Gradient boosting for Parkinson’s disease diagnosis from voice recordings. BMC Medical Informatics and Decision Making, 20, 1-7.
[24] Mohammed, E. A., Keyhani, M., Sanati-Nezhad, A., Hejazi, S. H., & Far, B. H. (2021). An ensemble learning approach to digital corona virus preliminary screening from cough sounds. Scientific Reports, 11(1), 15404.
[25] Laguarta, J., Hueto, F., & Subirana, B. (2020). COVID-19 artificial intelligence diagnosis using only cough recordings. IEEE Open Journal of Engineering in Medicine and Biology, 1, 275-281.
[26] Mohammed, E. A., Keyhani, M., Sanati-Nezhad, A., Hejazi, S. H., & Far, B. H. (2021). An ensemble learning approach to digital corona virus preliminary screening from cough sounds. Scientific Reports, 11(1), 15404.
[27] Xu, X., Nemati, E., Vatanparvar, K., Nathan, V., Ahmed, T., Rahman, M. M., McCaffrey, D., Kuang, J., & Gao, J. A. (2021). Listen2cough: Leveraging end-to-end deep learning cough detection model to enhance lung health assessment using passively sensed audio. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 5(1), 1-22.
[28] John, C. (2020). Practical cough detection in presence of background noise and preliminary differential diagnosis from cough sound using artificial intelligence.
[29] Rashid, H.-A., Mazumder, A. N., Niyogi, U. P. K., & Mohsenin, T. (2021). CoughNet: A flexible low power CNN-LSTM processor for cough sound detection. 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS),
[30] Imran, A., Posokhova, I., Qureshi, H., Masood, U., Riaz, M., Ali, K., John, C., Hussain, M., & Nabeel, M. (2020). AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app. Inform Med Unlocked. 2020; 20: 100378. In.
[31] Jaehwan, L., Donggeun, Y., & Hyo-Eun, K. (2019). Photometric transformer networks and label adjustment for breast density prediction. Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops,
[32] Yuan, B., Chen, J., Zhang, W., Tai, H.-S., & McMains, S. (2018). Iterative cross learning on noisy labels. 2018 IEEE Winter Conference on applications of computer vision (WACV),
[33] Veit, A., Alldrin, N., Chechik, G., Krasin, I., Gupta, A., & Belongie, S. (2017). Learning from noisy large-scale datasets with minimal supervision. Proceedings of the IEEE conference on computer vision and pattern recognition,
[34] Dehghani, M., Mehrjou, A., Gouws, S., Kamps, J., & Schölkopf, B. (2017). Fidelity-weighted learning. arXiv preprint arXiv:1711.02799.
[35] Garcia, L. P., Sáez, J. A., Luengo, J., Lorena, A. C., de Carvalho, A. C., & Herrera, F. (2015). Using the one-vs-one decomposition to improve the performance of class noise filters via an aggregation strategy in multi-class classification problems. Knowledge-Based Systems, 90, 153-164.
[36] Luengo, J., Shim, S.-O., Alshomrani, S., Altalhi, A., & Herrera, F. (2018). CNC-NOS: Class noise cleaning by ensemble filtering and noise scoring. Knowledge-Based Systems, 140, 27-49.
[37] Wu, X., He, R., Sun, Z., & Tan, T. (2018). A light CNN for deep face representation with noisy labels. IEEE transactions on information forensics and security, 13(11), 2884-2896.
[38] Huang, J., Qu, L., Jia, R., & Zhao, B. (2019). O2u-net: A simple noisy label detection approach for deep neural networks. In 2019 IEEE. CVF international conference on computer vision (ICCV),
[39] Sharma, K., Donmez, P., Luo, E., Liu, Y., & Yalniz, I. Z. (2020). Noiserank: Unsupervised label noise reduction with dependence models. European conference on computer vision,
[40] Nguyen, D. T., Ngo, T.-P.-N., Lou, Z., Klar, M., Beggel, L., & Brox, T. (2019). Robust learning under label noise with iterative noise-filtering. arXiv preprint arXiv:1906.00216.
[41] Nguyen, D. T., Mummadi, C. K., Ngo, T. P. N., Nguyen, T. H. P., Beggel, L., & Brox, T. (2019). Self: Learning to filter noisy labels with self-ensembling. arXiv preprint arXiv:1910.01842.
[42] Li, J., Socher, R., & Hoi, S. C. (2020). Dividemix: Learning with noisy labels as semi-supervised learning. arXiv preprint arXiv:2002.07394.
[43] Han, B., Tsang, I. W., Chen, L., Celina, P. Y., & Fung, S.-F. (2018). Progressive stochastic learning for noisy labels. IEEE transactions on neural networks and learning systems, 29(10), 5136-5148.
[44] Jiang, L., Zhou, Z., Leung, T., Li, L.-J., & Fei-Fei, L. (2018). Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. International conference on machine learning,
[45] Chang, H.-S., Learned-Miller, E., & McCallum, A. (2017). Active bias: Training more accurate neural networks by emphasizing high variance samples. Advances in Neural Information Processing Systems, 30.
[46] Lyu, Y., & Tsang, I. W. (2019). Curriculum loss: Robust learning and generalization against label corruption. arXiv preprint arXiv:1905.10045.
[47] Guo, S., Huang, W., Zhang, H., Zhuang, C., Dong, D., Scott, M. R., & Huang, D. (2018). Curriculumnet: Weakly supervised learning from large-scale web images. Proceedings of the European conference on computer vision (ECCV),
[48] Reed, S., Lee, H., Anguelov, D., Szegedy, C., Erhan, D., & Rabinovich, A. (2014). Training deep neural networks on noisy labels with bootstrapping. arXiv preprint arXiv:1412.6596.
[49] Malach, E., & Shalev-Shwartz, S. (2017). Decoupling" when to update" from" how to update". Advances in Neural Information Processing Systems, 30.
[50] Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I., & Sugiyama, M. (2018). Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in Neural Information Processing Systems, 31.
[51] Yu, X., Han, B., Yao, J., Niu, G., Tsang, I., & Sugiyama, M. (2019). How does disagreement help generalization against label corruption? International conference on machine learning,
[52] Wang, X., Wang, S., Wang, J., Shi, H., & Mei, T. (2019). Co-mining: Deep face recognition with noisy labels. Proceedings of the IEEE/CVF international conference on computer vision,
[53] Chen, P., Liao, B. B., Chen, G., & Zhang, S. (2019). Understanding and utilizing deep neural networks trained with noisy labels. International conference on machine learning,
[54] Ren, M., Zeng, W., Yang, B., & Urtasun, R. (2018). Learning to reweight examples for robust deep learning. International conference on machine learning,
[55] Jenni, S., & Favaro, P. (2018). Deep bilevel learning. Proceedings of the European conference on computer vision (ECCV),
[56] Shu, J., Xie, Q., Yi, L., Zhao, Q., Zhou, S., Xu, Z., & Meng, D. (2019). Meta-weight-net: Learning an explicit mapping for sample weighting. Advances in Neural Information Processing Systems, 32.
[57] Kriegel, H.-P., Kröger, P., Schubert, E., & Zimek, A. (2009). LoOP: local outlier probabilities. Proceedings of the 18th ACM conference on Information and knowledge management,
[58] Thulasidasan, S., Bhattacharya, T., Bilmes, J., Chennupati, G., & Mohd-Yusof, J. (2019). Combating label noise in deep learning using abstention. arXiv preprint arXiv:1905.10964.
[59] Lee, K.-H., He, X., Zhang, L., & Yang, L. (2018). Cleannet: Transfer learning for scalable image classifier training with label noise. Proceedings of the IEEE conference on computer vision and pattern recognition,
[60] Litany, O., & Freedman, D. (2018). Soseleto: A unified approach to transfer learning and training with noisy labels. arXiv preprint arXiv:1805.09622.
[61] Zhao, Y., & Hryniewicki, M. K. (2018). Xgbod: improving supervised outlier detection with unsupervised representation learning. 2018 International Joint Conference on Neural Networks (IJCNN),
[62] Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D. (2017). mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412.
[63] Guo, G., Wang, H., Bell, D., Bi, Y., & Greer, K. (2003). KNN model-based approach in classification. On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE: OTM Confederated International Conferences, CoopIS, DOA, and ODBASE 2003, Catania, Sicily, Italy, November 3-7, 2003. Proceedings,
[64] Har-Peled, S., & Mazumdar, S. (2004). On coresets for k-means and k-median clustering. Proceedings of the thirty-sixth annual ACM symposium on Theory of computing,
[65] Breunig, M. M., Kriegel, H.-P., Ng, R. T., & Sander, J. (2000). LOF: identifying density-based local outliers. Proceedings of the 2000 ACM SIGMOD international conference on Management of data,
[66] Dudik, J. M., Kurosu, A., Coyle, J. L., & Sejdić, E. (2018). Dysphagia and its effects on swallowing sounds and vibrations in adults. Biomedical engineering online, 17, 1-18.
[67] Manevitz, L. M., & Yousef, M. (2001). One-class SVMs for document classification. Journal of machine Learning research, 2(Dec), 139-154.
[68] Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008). Isolation forest. 2008 eighth ieee international conference on data mining,
[69] Hershey, S., Chaudhuri, S., Ellis, D. P., Gemmeke, J. F., Jansen, A., Moore, R. C., Plakal, M., Platt, D., Saurous, R. A., & Seybold, B. (2017). CNN architectures for large-scale audio classification. 2017 ieee international conference on acoustics, speech and signal processing (icassp),
[70] Wei, S., Zou, S., & Liao, F. (2020). A comparison on data augmentation methods based on deep learning for audio classification. Journal of physics: Conference series,
[71] Cortes, C., Mohri, M., & Rostamizadeh, A. (2012). L2 regularization for learning kernels. arXiv preprint arXiv:1205.2653.
[72] Müller, R., Kornblith, S., & Hinton, G. E. (2019). When does label smoothing help? Advances in Neural Information Processing Systems, 32.
[73] Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. Wiley interdisciplinary reviews: data mining and knowledge discovery, 8(4), e1249.
[74] Lazareck, L. J., & Moussavi, Z. M. (2004). Classification of normal and dysphagic swallows by acoustical means. IEEE Transactions on Biomedical Engineering, 51(12), 2103-2112.
[75] Kemp, T., Schmidt, M., Westphal, M., & Waibel, A. (2000). Strategies for automatic segmentation of audio data. 2000 ieee international conference on acoustics, speech, and signal processing. proceedings (cat. no. 00ch37100),
[76] Kalantarian, H., Alshurafa, N., Pourhomayoun, M., Sarin, S., Le, T., & Sarrafzadeh, M. (2014). Spectrogram-based audio classification of nutrition intake. 2014 IEEE Healthcare Innovation Conference (HIC),
[77] Buckland, M., & Gey, F. (1994). The relationship between recall and precision. Journal of the American society for information science, 45(1), 12-19.
[78] Kwon, S., Cha, S., Kim, J., Han, K., Paik, N. J., & Kim, W. S. (2023). Trends in the incidence and prevalence of dysphagia requiring medical attention among adults in South Korea, 2006-2016: A nationwide population study. PLoS One, 18(6), e0287512.