| 研究生: |
陳綺嬅 Chen, Chi-Hua |
|---|---|
| 論文名稱: |
結合機器聽覺與深度學習的貓咪疾病系統應用 Cat Disease Identification Using Machine Audition and Deep Learning |
| 指導教授: |
陳牧言
Chen, Mu-Yen |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 87 |
| 中文關鍵詞: | 梅爾頻譜 、深度學習 、資料增生 、系統應用 、視覺轉換器 |
| 外文關鍵詞: | Mel spectrogram, deep learning, data augmentation, system application, Vision Transformer |
| 相關次數: | 點閱:5 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究旨在探討並應用機器聽覺與深度學習技術於貓咪叫聲辨識系統的開發。根據近幾年國內外報導發現台灣及全球養貓戶數持續增加,貓咪在身體不適或情緒異常時常隱藏其狀況,導致飼主難以透過其叫聲準確判斷,而延誤就醫。為解決這個問題,本研究透過將聲音轉換為梅爾頻譜(Mel spectrogram)的方式將聲音轉換為圖片,再進行深度學習模型的訓練,採用了卷積神經網路(Convolutional Neural Network, CNN)、視覺轉換器(Vision Transformer, ViT),選用多種深度學習模型訓練已換為梅爾頻譜圖並分類過後的聲音檔案。研究共蒐集了1325筆聲音資料,將貓咪的叫聲分成五種類型進行辨識。
實驗結果顯示,透過資料增生方法,模型準確率從 84.91% 顯著提升到 92.66%,且未出現過擬合(Overfitting)現象。模型選型方面,最後選定 EfficientNet-B0 模型,其模型檔案大小(約含參數量)約為 5.3MB 。模型可在中央處理器(Central Processing Unit, CPU) 環境下進行推論,無需專用圖形處理器(GPU), 兼顧模型體積與計算效率,並展現高效運算能力。本研究成果可提供飼主和獸醫師作為診斷的參考依據,並探討了此系統在貓咪健康監測與行為訓練中的可行性與應用價值。
This study employs the Mel spectrogram method to transform audio into images, which are then used to train deep learning models. Specifically, we utilize a convolutional neural network (CNN) and Vision Transformer (ViT). Multiple deep learning models were selected for training on these Mel spectrograms of classified sound files. A total of 1,325 audio samples were collected, with cat vocalizations categorized into five distinct types for identification. The resulting identification data can serve as a valuable diagnostic reference for pet owners and veterinarians. Furthermore, this research explores the feasibility of using this system for health monitoring and behavioral training of cats.
Model accuracy was evaluated using F1-score, Macro-average Recall, Accuracy, and confusion matrices. The Validation Loss curve was also used to monitor for overfitting. Ultimately, EfficientNet-B0 was selected for its superior performance after data augmentation. In a CPU environment, this model achieved an accuracy of over 84% with a compact size of approximately 5.3 MB, demonstrating an optimal balance between model size and computational efficiency. Experimental results indicate that data augmentation effectively enhances model accuracy without causing overfitting. Specifically, model accuracy significantly improved from 84.91% to 92.66%, demonstrating the efficacy of this method for improving accuracy with smaller datasets. The experiments also confirmed that model training can be successfully performed on a central processing unit (CPU), opening new avenues for future deployment on embedded hardware devices. Further model adjustments can be made to accommodate the limitations of embedded hardware, thereby improving model efficiency and expanding inference capabilities.
農業部,2024。全國家犬貓飼養數量最新推估結果,農業新聞。取自︰https://www.moa.gov.tw/theme_data.php?theme=news&sub_theme=agri&id=9418
中央通訊社,2024。台灣少子化2023年貓狗登記數高出新生兒近10萬。取自︰https://www.cna.com.tw/news/ahel/202410160068.aspx
農業部,2024。增列貓需寵物登記,明年1/1起開始辦理,農業新聞。取自︰https://www.moa.gov.tw/theme_data.php?theme=news&sub_theme=agri&id=9637
政府資料開放平台,2025。寵物登記站簽約動物醫院名單。取自︰https://data.gov.tw/dataset/124583
農業部,2025。動物認領養。取自︰https://data.moa.gov.tw/open_detail.aspx?id=QcbUEzN6E6DL
AmericanPetProductsAssociation.(2024).Industrytrendsandstats.APPAResearch&Insights.RetrievedJune5,2025,fromhttps://americanpetproducts.org/industry-trends-and-stats
Alajlan,N.N.,&Ibrahim,D.M.(2022).TinyML:Enablingofinferencedeeplearningmodelsonultra-low-powerIoTedgedevicesforAIapplications.Micromachines,13(6),851.https://doi.org/10.3390/mi13060851
Andrew,M.(2019).CatMeowClassification[Dataset].Kaggle.https://www.kaggle.com/datasets/andrewmvd/cat-meow-classification
Baowaly,M.K.,Sarkar,B.C.,Walid,M.A.A.,Ahamad,M.M.,Singh,B.C.,Alvarado,E.S.,Ashraf,I.,&Samad,M.A.(2024).Deeptransferlearning-basedbirdspeciesclassificationusingmelspectrogramimages.PLOSONE,19(8),e0305708.https://doi.org/10.1371/journal.pone.0305708
Battini,M.,&Mattiello,S.(2021).CatMeows:Apublicly-availabledatasetofcatvocalizations[Dataset].Zenodo.https://doi.org/10.5281/zenodo.4008297
Chittepu,S.,Martha,S.,&Banik,D.(2025).EmpoweringvoiceassistantswithTinyMLforuser-centricinnovationsandreal-worldapplications.ScientificReports.https://doi.org/10.1038/s41598-025-96588-1
Chu,H.-C.,Zhang,Y.-L.,&Chiang,H.-C.(2023).ACNNsoundclassificationmechanismusingdataaugmentation.Sensors,23(15),6972.https://doi.org/10.3390/s23156972
Chen,Y.,Zhu,Y.,Yan,Z.,Shen,J.,Ren,Z.,&Huang,Y.(2023).Dataaugmentationforenvironmentalsoundclassificationusingdiffusionprobabilisticmodelwithtop-kselectiondiscriminator.InY.‐M.Cheung&D.S.Huang(Eds.),AdvancedIntelligentComputingTechnologyandApplications:ProceedingsofICIC2023(LectureNotesinComputerScience,Vol.14087,pp.283–295).Springer.https://doi.org/10.1007/978-981-99-4742-3_23Chu,H.-C.,Zhang,Y.-L.,&Chiang,H.-C.(2023).ACNNsoundclassificationmechanismusingdataaugmentation.Sensors,23(15),6972.https://doi.org/10.3390/s23156972
Dewmini,H.,Meedeniya,D.,&Perera,C.(2025).Elephantsoundclassificationusingdeeplearningoptimization.Sensors,25(2),352.https://doi.org/10.3390/s25020352
Fischbach,L.,Kleen,C.,Flek,L.,&Lameli,A.(2025,March).Doespreprocessingmatter?Ananalysisofacousticfeatureimportanceindeeplearningfordialectclassification.InProceedingsoftheJoint25thNordicConferenceonComputationalLinguisticsand11thBalticConferenceonHumanLanguageTechnologies(NoDaLiDa/Baltic-HLT2025)(pp.159–169).UniversityofTartuLibrary.https://aclanthology.org/2025.nodalida-1.14
Furbo360°CatCamera.(n.d.).See,talk,andplaywithyourcatevenwhenyou’reaway!Retrievedfromhttps://furbo.com/us/products/furbo-360-cat-camera
Graham,B.,El-Nouby,A.,Touvron,H.,Wightman,P.,Bojanowski,T.,Simonyan,A.,&Jégou,H.(2021).LeViT:AvisiontransformerinConvNet’sclothingforfasterinference.InProceedingsoftheIEEE/CVFInternationalConferenceonComputerVision(pp.12259–12269).IEEE.https://doi.org/10.1109/ICCV48922.2021.01204
Gupta,S.,Chavan,A.S.,Deepak,A.,Kumar,A.,Pundir,S.,Bajaj,R.,&Shrivastava,A.(2024).Speechemotionrecognitionofanimalvocalsusingdeeplearning.InternationalJournalofIntelligentSystemsandApplicationsinEngineering,12(13s),129–136.https://ijisae.org/index.php/IJISAE/article/view/4578
HuggingFace.(2024).LeViTmodeldocumentation.RetrievedJuly13,2025,fromhttps://huggingface.co/docs/transformers/model_doc/levit
Howardetal.(2019)“SearchingforMobileNetV3,”ProceedingsofICCV2019,pp.1314–1324。Howard,A.,Sandler,M.,Chu,G.,Chen,L.,Chen,B.,Tan,M.,etal.(2019).SearchingforMobileNetV3.InProceedingsofICCV(pp.1314–1324).
Howard,A.G.,Zhu,M.,Chen,B.,Kalenichenko,D.,Wang,W.,Weyand,T.,Andreetto,M.,&Adam,H.(2017).MobileNets:Efficientconvolutionalneuralnetworksformobilevisionapplications.arXiv.https://arxiv.org/abs/1704.04861
He,K.,Zhang,X.,Ren,S.,&Sun,J.(2016).Deepresiduallearningforimagerecognition.InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition(CVPR)(pp.770–778).https://doi.org/10.1109/CVPR.2016.90
Jiang,X.,&Zhang,Y.(2024).Soundingthealarm:FunctionallyreferentialsignalinginAzure-wingedMagpie.AvianResearch,15,100164.https://doi.org/10.1016/j.avrs.2024.100164(ResearchGate)
Jurafsky,D.,&Martin,J.H.(2023).Speechandlanguageprocessing(3rded.).Pearson.Retrievedfromhttps://web.stanford.edu/~jurafsky/slp3/
Kim,E.,Moon,J.,Shim,J.,&Hwang,E.(2023).DualDiscWaveGAN-baseddataaugmentationschemeforanimalsoundclassification.Sensors,23(4),2024.https://doi.org/10.3390/s23042024
Kahl,S.,Wood,C.M.,Eibl,M.,&Klinck,H.(2021).BirdNET:Adeeplearningsolutionforbirdsounddetectionandclassification.EcologicalInformatics,61,101236.https://doi.org/10.1016/j.ecoinf.2021.101236
librosadevelopmentteam.(n.d.).librosa:Pythonlibraryforaudioandmusicanalysis.RetrievedJune7,2025,fromhttps://github.com/librosa/librosa
Lefèvre,R.A.,Sypherd,C.C.R.,&Briefer,É.F.(2025).Machinelearningalgorithmscanpredictemotionalvalenceacrossungulatevocalizations.iScience,28(2),111834.https://doi.org/10.1016/j.isci.2025.111834(PubMed)
Licciardi,A.,&Carbone,D.(2024).WhaleNet:AnoveldeeplearningarchitectureformarinemammalsvocalizationsonWatkinsMarineMammalSoundDatabase.IEEEAccess,12,154182–154194.https://doi.org/10.1109/ACCESS.2024.3482117
Lee,S.,Kim,H.,&Park,J.(2023).Detectionofabnormalsymptomsusingacoustic-spectrogram-baseddeeplearning.AppliedSciences,15(9),4679.https://doi.org/10.3390/app15094679
Li,Y.,Sun,Z.,&Zhao,H.(2023).ACNNsoundclassificationmechanismusingdataaugmentation.Sensors,23(15),6972.https://doi.org/10.3390/s23156972
Ludovico,L.A.,Ntalampiras,S.,Presti,G.,Cannas,S.,Battini,M.,&Mattiello,S.(2020).CatMeows:Apublicly-availabledatasetofcatvocalizations[Dataset].Zenodo.https://doi.org/10.5281/zenodo.4008297
MacPhail,A.G.,Stowell,D.,Sait,S.M.,&Banks-Leite,C.(2023).Audiodatacompressionaffectsacousticindicesandreducesdetectionsofbirdsbyhumanlisteningandautomatedrecognisers.Bioacoustics,32(2),123–145.https://doi.org/10.1080/09524622.2023.2290718
Mehta,S.,&Rastegari,M.(2022).MobileViT:Light-weight,general-purpose,andmobile-friendlyvisiontransformer.InProceedingsoftheInternationalConferenceonLearningRepresentations(ICLR2022).RetrievedJuly14,2025,fromhttps://openreview.net/forum?id=vh-0sUt8HlG
MeowTalk.(n.d.).MeowTalk:Getfluentin"Meow".Retrievedfromhttps://www.meowtalk.app/
Mutanu,L.,Gohil,J.,Gupta,K.,Wagio,P.,&Kotonya,G.(2022).Areviewofautomatedbioacousticsandgeneralacousticsclassificationresearch.Sensors,22(21),8361.https://doi.org/10.3390/s22218361
NationalGeographic.(2024,November).Whatisyourcattellingyou?Newtechnologydeciphersmeows.RetrievedJuly13,2025,fromhttps://www.nationalgeographic.com/animals/article/cats-pets-communication-app-technology
Nielsen,J.(2020).10usabilityheuristicsforuserinterfacedesign.NielsenNormanGroup.Retrievedfromhttps://www.nngroup.com/articles/ten-usability-heuristics/
O’Shaughnessy,D.(2023).Reviewofmethodsforcodingofspeechsignals.EURASIPJournalonAudio,Speech,andMusicProcessing,2023,Article8.https://doi.org/10.1186/s13636-023-00274-x
Pan,W.,Jiao,J.,Zhou,X.,Xu,Z.,Gu,L.,&Zhu,C.(2024).Underdeterminedblindsourceseparationofaudiosignalsforgroup-rearedpigsbasedonsparsecomponentanalysis.Sensors,24(16),5173.https://doi.org/10.3390/s24165173
Park,D.S.,Chan,W.,Zhang,Y.,Chiu,C.-C.,Zoph,B.,Cubuk,E.D.,&Le,Q.V.(2019).SpecAugment:Asimpledataaugmentationmethodforautomaticspeechrecognition.Interspeech2019(pp.2613–2617).
Qin,D.,Leichner,C.,Delakis,M.,Fornoni,M.,Luo,S.,Yang,F.,…Howard,A.(2024).MobileNetV4:Universalmodelsforthemobileecosystem.InZ.Liuetal.(Eds.),ComputerVision–ECCV2024(LectureNotesinComputerScience,Vol.15098,pp.78–96).Springer.https://doi.org/10.1007/978-3-031-73661-2_5
Ranmal,D.,Ranasinghe,P.,Paranayapa,T.,Meedeniya,D.B.,&Perera,C.(2024).ESC-NAS:Environmentsoundclassificationusinghardware-awareneuralarchitecturesearchfortheedge.Sensors,24(12),3749.https://doi.org/10.3390/s24123749
Rosero-Pena,D.(2024,July22).Foodtofuelglobalpeteconomy’sclimbto$500billion.BloombergIntelligence.https://www.bloomberg.com/professional/insights/markets/food-to-fuel-global-pet-economys-climb-to-500-billion/Bloomberg.com
Rushibalajiputthewad,R.(2022).SoundClassificationofAnimalVoice[Dataset].Kaggle.https://www.kaggle.com/datasets/rushibalajiputthewad/sound-classification-of-animal-voice
Silva,L.M.,Gomes,J.R.,&Ribeiro,P.S.(2025).Elephantsoundclassificationusingdeeplearningoptimization.Sensors,25(2),352.
Schötz,S.,vandeWeijer,J.,&Eklund,R.(2024).Contexteffectsonduration,fundamentalfrequency,andintonationinhuman-directeddomesticcatmeows.AppliedAnimalBehaviourScience,270,106146.https://doi.org/10.1016/j.applanim.2023.106146(ResearchGate)
Sandler,M.,Howard,A.,Zhu,M.,Zhmoginov,A.,&Chen,L.(2018).MobileNetV2:Invertedresidualsandlinearbottlenecks.InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition(CVPR)(pp.4510–4520).https://doi.org/10.1109/CVPR.2018.00474
Tang,L.,Hu,S.,Yang,C.,Deng,R.,Chen,A.,&Zhou,G.(2024).JL-TFMSFNet:Adomesticcatsoundemotionrecognitionmethodbasedonjointlylearningthetime–frequencydomainandmulti-scalefeatures.ExpertSystemswithApplications,255,Article124620.https://doi.org/10.1016/j.eswa.2024.124620
Tan,M.,&Le,Q.V.(2019).EfficientNet:Rethinkingmodelscalingforconvolutionalneuralnetworks.InK.C.Chaudhuri&R.Salakhutdinov(Eds.),Proceedingsofthe36thInternationalConferenceonMachineLearning(Vol.97,pp.6105–6114).ProceedingsofMachineLearningResearch.PMLR.Retrievedfromhttps://proceedings.mlr.press/v97/tan19a.html
Wang,L.(2023).Anewapproachandguidelineforloudnessingameaudio:Developingspecificloudnessstandardsforeachsectionofgameaudio(Master’sthesis,UniversityofSkövde).DiVAPortal.https://his.diva-portal.org/smash/get/diva2:1768699/FULLTEXT01.pdf
Weisslinger,M.(2022).Apurringcatisahappycat,TRUEorFALSE?Acatcanpurrforpleasure,forattentionorfood,orevenforstressandpain!VetAgroSup–ChaireBEA.https://doi.org/10.5281/zenodo.13709857
Xu,M.Y.(2024).Analysisofcat'scommunicationstyleandcognitiveability.InternationalJournalofMolecularZoology,14(1),1–8.https://doi.org/10.5376/ijmz.2024.14.0001
校內:2030-08-08公開