簡易檢索 / 詳目顯示

研究生: 陳綺嬅
Chen, Chi-Hua
論文名稱: 結合機器聽覺與深度學習的貓咪疾病系統應用
Cat Disease Identification Using Machine Audition and Deep Learning
指導教授: 陳牧言
Chen, Mu-Yen
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 87
中文關鍵詞: 梅爾頻譜深度學習資料增生系統應用視覺轉換器
外文關鍵詞: Mel spectrogram, deep learning, data augmentation, system application, Vision Transformer
相關次數: 點閱:5下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究旨在探討並應用機器聽覺與深度學習技術於貓咪叫聲辨識系統的開發。根據近幾年國內外報導發現台灣及全球養貓戶數持續增加,貓咪在身體不適或情緒異常時常隱藏其狀況,導致飼主難以透過其叫聲準確判斷,而延誤就醫。為解決這個問題,本研究透過將聲音轉換為梅爾頻譜(Mel spectrogram)的方式將聲音轉換為圖片,再進行深度學習模型的訓練,採用了卷積神經網路(Convolutional Neural Network, CNN)、視覺轉換器(Vision Transformer, ViT),選用多種深度學習模型訓練已換為梅爾頻譜圖並分類過後的聲音檔案。研究共蒐集了1325筆聲音資料,將貓咪的叫聲分成五種類型進行辨識。
    實驗結果顯示,透過資料增生方法,模型準確率從 84.91% 顯著提升到 92.66%,且未出現過擬合(Overfitting)現象。模型選型方面,最後選定 EfficientNet-B0 模型,其模型檔案大小(約含參數量)約為 5.3MB 。模型可在中央處理器(Central Processing Unit, CPU) 環境下進行推論,無需專用圖形處理器(GPU), 兼顧模型體積與計算效率,並展現高效運算能力。本研究成果可提供飼主和獸醫師作為診斷的參考依據,並探討了此系統在貓咪健康監測與行為訓練中的可行性與應用價值。

    This study employs the Mel spectrogram method to transform audio into images, which are then used to train deep learning models. Specifically, we utilize a convolutional neural network (CNN) and Vision Transformer (ViT). Multiple deep learning models were selected for training on these Mel spectrograms of classified sound files. A total of 1,325 audio samples were collected, with cat vocalizations categorized into five distinct types for identification. The resulting identification data can serve as a valuable diagnostic reference for pet owners and veterinarians. Furthermore, this research explores the feasibility of using this system for health monitoring and behavioral training of cats.
    Model accuracy was evaluated using F1-score, Macro-average Recall, Accuracy, and confusion matrices. The Validation Loss curve was also used to monitor for overfitting. Ultimately, EfficientNet-B0 was selected for its superior performance after data augmentation. In a CPU environment, this model achieved an accuracy of over 84% with a compact size of approximately 5.3 MB, demonstrating an optimal balance between model size and computational efficiency. Experimental results indicate that data augmentation effectively enhances model accuracy without causing overfitting. Specifically, model accuracy significantly improved from 84.91% to 92.66%, demonstrating the efficacy of this method for improving accuracy with smaller datasets. The experiments also confirmed that model training can be successfully performed on a central processing unit (CPU), opening new avenues for future deployment on embedded hardware devices. Further model adjustments can be made to accommodate the limitations of embedded hardware, thereby improving model efficiency and expanding inference capabilities.

    摘要I 致謝VIII 目錄IX 表目錄XI 圖目錄XII 第一章緒論1 1.1研究背景與動機1 1.2研究目的3 1.3研究架構5 第二章文獻探討7 2.1MobileNet從V1到V4的演進7 2.2其他輕量化模型8 2.2.1輕量CNN架構9 2.2.2CNN與Transformer混合架構9 2.2.3時間序列模型(LSTM、GRU、BiLSTM)的應用13 2.3頻譜圖分析技術14 2.3.1頻譜(spectrum)14 2.3.2頻譜圖(spectrogram)14 2.3.3梅爾頻譜圖(melspectrogram)14 2.3.4頻譜圖應用於動物健康聲音分析的實證研究15 2.4機器學習模型改進16 第三章研究方法18 3.1系統架構與設計18 3.1.1研究目的與設計18 3.1.2系統需求分析18 3.1.3整體研究流程圖19 3.2資料蒐集20 3.3資料前處理20 3.3.1聲音資料前處理21 3.3.2聲音資料清理與降噪23 3.3.3資料增強方法介紹24 3.3.4特徵萃取流程與參數設定25 3.4模型訓練25 3.5模型部屬26 3.5.1資料收集流程圖26 3.5.2使用者流程圖26 3.6系統結果推論26 3.6.1各模型訓練與驗證結果比較27 3.6.2實驗結果分析與討論27 第四章系統設計與開發28 4.1開發環境配置28 4.2資料前處理實作28 4.2.1爬蟲下載聲音檔案29 4.2.2聲音檔案前處理29 4.2.3檔案轉換為梅爾頻譜圖32 4.3CNN模型訓練34 4.3.1EfficientNet-B0訓練結果34 4.3.2MobileNetV3-Small訓練結果37 4.3.3LeViT-128S訓練結果39 4.3.4MobileViT-XS訓練結果41 4.3.5ResNet-18訓練結果44 4.4資料增強實作47 4.4.1第一次資料增強47 4.4.2第二次資料增強50 4.4.3第三次資料增強52 4.4.4第四次資料增強54 第五章系統建置與應用57 5.1使用者介面設計與開發57 5.2網頁部屬實作58 5.3系統實作58 5.4系統分析與比較60 5.5研究結果綜合討論61 5.5.1各模型原始性能比較與選擇EfficientNet-B0的理由61 5.5.2資料增強的顯著影響與效果分析63 5.5.3學術貢獻63 第六章結論與未來展望65 6.1結論65 6.2研究限制67 6.3未來展望68 參考文獻69

    農業部,2024。全國家犬貓飼養數量最新推估結果,農業新聞。取自︰https://www.moa.gov.tw/theme_data.php?theme=news&sub_theme=agri&id=9418
    中央通訊社,2024。台灣少子化2023年貓狗登記數高出新生兒近10萬。取自︰https://www.cna.com.tw/news/ahel/202410160068.aspx
    農業部,2024。增列貓需寵物登記,明年1/1起開始辦理,農業新聞。取自︰https://www.moa.gov.tw/theme_data.php?theme=news&sub_theme=agri&id=9637
    政府資料開放平台,2025。寵物登記站簽約動物醫院名單。取自︰https://data.gov.tw/dataset/124583
    農業部,2025。動物認領養。取自︰https://data.moa.gov.tw/open_detail.aspx?id=QcbUEzN6E6DL
    AmericanPetProductsAssociation.(2024).Industrytrendsandstats.APPAResearch&Insights.RetrievedJune5,2025,fromhttps://americanpetproducts.org/industry-trends-and-stats
    Alajlan,N.N.,&Ibrahim,D.M.(2022).TinyML:Enablingofinferencedeeplearningmodelsonultra-low-powerIoTedgedevicesforAIapplications.Micromachines,13(6),851.https://doi.org/10.3390/mi13060851
    Andrew,M.(2019).CatMeowClassification[Dataset].Kaggle.https://www.kaggle.com/datasets/andrewmvd/cat-meow-classification
    Baowaly,M.K.,Sarkar,B.C.,Walid,M.A.A.,Ahamad,M.M.,Singh,B.C.,Alvarado,E.S.,Ashraf,I.,&Samad,M.A.(2024).Deeptransferlearning-basedbirdspeciesclassificationusingmelspectrogramimages.PLOSONE,19(8),e0305708.https://doi.org/10.1371/journal.pone.0305708
    Battini,M.,&Mattiello,S.(2021).CatMeows:Apublicly-availabledatasetofcatvocalizations[Dataset].Zenodo.https://doi.org/10.5281/zenodo.4008297
    Chittepu,S.,Martha,S.,&Banik,D.(2025).EmpoweringvoiceassistantswithTinyMLforuser-centricinnovationsandreal-worldapplications.ScientificReports.https://doi.org/10.1038/s41598-025-96588-1
    Chu,H.-C.,Zhang,Y.-L.,&Chiang,H.-C.(2023).ACNNsoundclassificationmechanismusingdataaugmentation.Sensors,23(15),6972.https://doi.org/10.3390/s23156972
    Chen,Y.,Zhu,Y.,Yan,Z.,Shen,J.,Ren,Z.,&Huang,Y.(2023).Dataaugmentationforenvironmentalsoundclassificationusingdiffusionprobabilisticmodelwithtop-kselectiondiscriminator.InY.‐M.Cheung&D.S.Huang(Eds.),AdvancedIntelligentComputingTechnologyandApplications:ProceedingsofICIC2023(LectureNotesinComputerScience,Vol.14087,pp.283–295).Springer.https://doi.org/10.1007/978-981-99-4742-3_23Chu,H.-C.,Zhang,Y.-L.,&Chiang,H.-C.(2023).ACNNsoundclassificationmechanismusingdataaugmentation.Sensors,23(15),6972.https://doi.org/10.3390/s23156972
    Dewmini,H.,Meedeniya,D.,&Perera,C.(2025).Elephantsoundclassificationusingdeeplearningoptimization.Sensors,25(2),352.https://doi.org/10.3390/s25020352
    Fischbach,L.,Kleen,C.,Flek,L.,&Lameli,A.(2025,March).Doespreprocessingmatter?Ananalysisofacousticfeatureimportanceindeeplearningfordialectclassification.InProceedingsoftheJoint25thNordicConferenceonComputationalLinguisticsand11thBalticConferenceonHumanLanguageTechnologies(NoDaLiDa/Baltic-HLT2025)(pp.159–169).UniversityofTartuLibrary.https://aclanthology.org/2025.nodalida-1.14
    Furbo360°CatCamera.(n.d.).See,talk,andplaywithyourcatevenwhenyou’reaway!Retrievedfromhttps://furbo.com/us/products/furbo-360-cat-camera
    Graham,B.,El-Nouby,A.,Touvron,H.,Wightman,P.,Bojanowski,T.,Simonyan,A.,&Jégou,H.(2021).LeViT:AvisiontransformerinConvNet’sclothingforfasterinference.InProceedingsoftheIEEE/CVFInternationalConferenceonComputerVision(pp.12259–12269).IEEE.https://doi.org/10.1109/ICCV48922.2021.01204
    Gupta,S.,Chavan,A.S.,Deepak,A.,Kumar,A.,Pundir,S.,Bajaj,R.,&Shrivastava,A.(2024).Speechemotionrecognitionofanimalvocalsusingdeeplearning.InternationalJournalofIntelligentSystemsandApplicationsinEngineering,12(13s),129–136.https://ijisae.org/index.php/IJISAE/article/view/4578
    HuggingFace.(2024).LeViTmodeldocumentation.RetrievedJuly13,2025,fromhttps://huggingface.co/docs/transformers/model_doc/levit
    Howardetal.(2019)“SearchingforMobileNetV3,”ProceedingsofICCV2019,pp.1314–1324。Howard,A.,Sandler,M.,Chu,G.,Chen,L.,Chen,B.,Tan,M.,etal.(2019).SearchingforMobileNetV3.InProceedingsofICCV(pp.1314–1324).
    Howard,A.G.,Zhu,M.,Chen,B.,Kalenichenko,D.,Wang,W.,Weyand,T.,Andreetto,M.,&Adam,H.(2017).MobileNets:Efficientconvolutionalneuralnetworksformobilevisionapplications.arXiv.https://arxiv.org/abs/1704.04861
    He,K.,Zhang,X.,Ren,S.,&Sun,J.(2016).Deepresiduallearningforimagerecognition.InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition(CVPR)(pp.770–778).https://doi.org/10.1109/CVPR.2016.90
    Jiang,X.,&Zhang,Y.(2024).Soundingthealarm:FunctionallyreferentialsignalinginAzure-wingedMagpie.AvianResearch,15,100164.https://doi.org/10.1016/j.avrs.2024.100164(ResearchGate)
    Jurafsky,D.,&Martin,J.H.(2023).Speechandlanguageprocessing(3rded.).Pearson.Retrievedfromhttps://web.stanford.edu/~jurafsky/slp3/
    Kim,E.,Moon,J.,Shim,J.,&Hwang,E.(2023).DualDiscWaveGAN-baseddataaugmentationschemeforanimalsoundclassification.Sensors,23(4),2024.https://doi.org/10.3390/s23042024
    Kahl,S.,Wood,C.M.,Eibl,M.,&Klinck,H.(2021).BirdNET:Adeeplearningsolutionforbirdsounddetectionandclassification.EcologicalInformatics,61,101236.https://doi.org/10.1016/j.ecoinf.2021.101236
    librosadevelopmentteam.(n.d.).librosa:Pythonlibraryforaudioandmusicanalysis.RetrievedJune7,2025,fromhttps://github.com/librosa/librosa
    Lefèvre,R.A.,Sypherd,C.C.R.,&Briefer,É.F.(2025).Machinelearningalgorithmscanpredictemotionalvalenceacrossungulatevocalizations.iScience,28(2),111834.https://doi.org/10.1016/j.isci.2025.111834(PubMed)
    Licciardi,A.,&Carbone,D.(2024).WhaleNet:AnoveldeeplearningarchitectureformarinemammalsvocalizationsonWatkinsMarineMammalSoundDatabase.IEEEAccess,12,154182–154194.https://doi.org/10.1109/ACCESS.2024.3482117
    Lee,S.,Kim,H.,&Park,J.(2023).Detectionofabnormalsymptomsusingacoustic-spectrogram-baseddeeplearning.AppliedSciences,15(9),4679.https://doi.org/10.3390/app15094679
    Li,Y.,Sun,Z.,&Zhao,H.(2023).ACNNsoundclassificationmechanismusingdataaugmentation.Sensors,23(15),6972.https://doi.org/10.3390/s23156972
    Ludovico,L.A.,Ntalampiras,S.,Presti,G.,Cannas,S.,Battini,M.,&Mattiello,S.(2020).CatMeows:Apublicly-availabledatasetofcatvocalizations[Dataset].Zenodo.https://doi.org/10.5281/zenodo.4008297
    MacPhail,A.G.,Stowell,D.,Sait,S.M.,&Banks-Leite,C.(2023).Audiodatacompressionaffectsacousticindicesandreducesdetectionsofbirdsbyhumanlisteningandautomatedrecognisers.Bioacoustics,32(2),123–145.https://doi.org/10.1080/09524622.2023.2290718
    Mehta,S.,&Rastegari,M.(2022).MobileViT:Light-weight,general-purpose,andmobile-friendlyvisiontransformer.InProceedingsoftheInternationalConferenceonLearningRepresentations(ICLR2022).RetrievedJuly14,2025,fromhttps://openreview.net/forum?id=vh-0sUt8HlG
    MeowTalk.(n.d.).MeowTalk:Getfluentin"Meow".Retrievedfromhttps://www.meowtalk.app/
    Mutanu,L.,Gohil,J.,Gupta,K.,Wagio,P.,&Kotonya,G.(2022).Areviewofautomatedbioacousticsandgeneralacousticsclassificationresearch.Sensors,22(21),8361.https://doi.org/10.3390/s22218361
    NationalGeographic.(2024,November).Whatisyourcattellingyou?Newtechnologydeciphersmeows.RetrievedJuly13,2025,fromhttps://www.nationalgeographic.com/animals/article/cats-pets-communication-app-technology
    Nielsen,J.(2020).10usabilityheuristicsforuserinterfacedesign.NielsenNormanGroup.Retrievedfromhttps://www.nngroup.com/articles/ten-usability-heuristics/
    O’Shaughnessy,D.(2023).Reviewofmethodsforcodingofspeechsignals.EURASIPJournalonAudio,Speech,andMusicProcessing,2023,Article8.https://doi.org/10.1186/s13636-023-00274-x
    Pan,W.,Jiao,J.,Zhou,X.,Xu,Z.,Gu,L.,&Zhu,C.(2024).Underdeterminedblindsourceseparationofaudiosignalsforgroup-rearedpigsbasedonsparsecomponentanalysis.Sensors,24(16),5173.https://doi.org/10.3390/s24165173
    Park,D.S.,Chan,W.,Zhang,Y.,Chiu,C.-C.,Zoph,B.,Cubuk,E.D.,&Le,Q.V.(2019).SpecAugment:Asimpledataaugmentationmethodforautomaticspeechrecognition.Interspeech2019(pp.2613–2617).
    Qin,D.,Leichner,C.,Delakis,M.,Fornoni,M.,Luo,S.,Yang,F.,…Howard,A.(2024).MobileNetV4:Universalmodelsforthemobileecosystem.InZ.Liuetal.(Eds.),ComputerVision–ECCV2024(LectureNotesinComputerScience,Vol.15098,pp.78–96).Springer.https://doi.org/10.1007/978-3-031-73661-2_5
    Ranmal,D.,Ranasinghe,P.,Paranayapa,T.,Meedeniya,D.B.,&Perera,C.(2024).ESC-NAS:Environmentsoundclassificationusinghardware-awareneuralarchitecturesearchfortheedge.Sensors,24(12),3749.https://doi.org/10.3390/s24123749
    Rosero-Pena,D.(2024,July22).Foodtofuelglobalpeteconomy’sclimbto$500billion.BloombergIntelligence.https://www.bloomberg.com/professional/insights/markets/food-to-fuel-global-pet-economys-climb-to-500-billion/Bloomberg.com
    Rushibalajiputthewad,R.(2022).SoundClassificationofAnimalVoice[Dataset].Kaggle.https://www.kaggle.com/datasets/rushibalajiputthewad/sound-classification-of-animal-voice
    Silva,L.M.,Gomes,J.R.,&Ribeiro,P.S.(2025).Elephantsoundclassificationusingdeeplearningoptimization.Sensors,25(2),352.
    Schötz,S.,vandeWeijer,J.,&Eklund,R.(2024).Contexteffectsonduration,fundamentalfrequency,andintonationinhuman-directeddomesticcatmeows.AppliedAnimalBehaviourScience,270,106146.https://doi.org/10.1016/j.applanim.2023.106146(ResearchGate)
    Sandler,M.,Howard,A.,Zhu,M.,Zhmoginov,A.,&Chen,L.(2018).MobileNetV2:Invertedresidualsandlinearbottlenecks.InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition(CVPR)(pp.4510–4520).https://doi.org/10.1109/CVPR.2018.00474
    Tang,L.,Hu,S.,Yang,C.,Deng,R.,Chen,A.,&Zhou,G.(2024).JL-TFMSFNet:Adomesticcatsoundemotionrecognitionmethodbasedonjointlylearningthetime–frequencydomainandmulti-scalefeatures.ExpertSystemswithApplications,255,Article124620.https://doi.org/10.1016/j.eswa.2024.124620
    Tan,M.,&Le,Q.V.(2019).EfficientNet:Rethinkingmodelscalingforconvolutionalneuralnetworks.InK.C.Chaudhuri&R.Salakhutdinov(Eds.),Proceedingsofthe36thInternationalConferenceonMachineLearning(Vol.97,pp.6105–6114).ProceedingsofMachineLearningResearch.PMLR.Retrievedfromhttps://proceedings.mlr.press/v97/tan19a.html
    Wang,L.(2023).Anewapproachandguidelineforloudnessingameaudio:Developingspecificloudnessstandardsforeachsectionofgameaudio(Master’sthesis,UniversityofSkövde).DiVAPortal.https://his.diva-portal.org/smash/get/diva2:1768699/FULLTEXT01.pdf
    Weisslinger,M.(2022).Apurringcatisahappycat,TRUEorFALSE?Acatcanpurrforpleasure,forattentionorfood,orevenforstressandpain!VetAgroSup–ChaireBEA.https://doi.org/10.5281/zenodo.13709857
    Xu,M.Y.(2024).Analysisofcat'scommunicationstyleandcognitiveability.InternationalJournalofMolecularZoology,14(1),1–8.https://doi.org/10.5376/ijmz.2024.14.0001

    無法下載圖示 校內:2030-08-08公開
    校外:2030-08-08公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE