研究生: |
徐嘉昊 Hsu, Jia-Hao |
---|---|
論文名稱: |
情緒辨識關鍵特徵提取與缺失數據補值於多模態數位足跡躁鬱症評估之研究 An Investigation into the Key Feature Extraction for Emotion Recognition and Imputation of Multi-modal Missing Data in Digital Phenotyping for Bipolar Disorder Assessment |
指導教授: |
吳宗憲
Wu, Chung-Hsien |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 英文 |
論文頁數: | 110 |
中文關鍵詞: | 數位足跡 、雙相症量表分數預測 、段級別注意力機制 、多模態情緒辨識 、多關係KNN補值 |
外文關鍵詞: | Digital phenotyping, bipolar disorder scale score prediction, segment-level attention, multi-modal emotion recognition, multi-correlation KNN imputation |
相關次數: | 點閱:76 下載:13 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
雙相症是一個罹患率日漸嚴重的精神疾病。治療雙相症患者的方法有許多,但大多需耗費大量醫療資源。對患者的心理狀態及日常生活進行監控及追蹤被證實是對雙相症最好的治療。本研究致力於開發雙相症評估系統預測患者當週的心理健康量表HAM-D及YMRS分數以取代問診的人力。此研究除了能提供臨床醫師患者資訊,也能減少醫療資源耗費壓力。現存已有許多精神健康相關的應用程式及研究,數位足跡被證實能提供許多患者資訊。然而這些應用程式大多缺少證明對臨床的有效性且大多已停更。數位足跡中對情感計算的模型通常未考慮模態中關鍵的訊號進行特徵抽取及時序上分段的重要程度差異。數位足跡收集上常見的缺失資料問題未能被很好的解決。本研究對於這些問題提出改進的方法。
對於追蹤並監控患者日常生活,本研究開發了一個考慮使用者體驗的智慧型手機APP。由APP使用頻率的統計數據顯示使用者每月的使用頻率逐漸提升以證明使用者體驗的提升。並且本研究使用逐年收集到的資料來預測患者的心理健康量表分數,結果顯示資料的準確率逐年提升,證明所收集資料對臨床的有效性及資料的品質。
對於數位足跡中的多媒態情感計算,本研究提出了針對不同模態特性的特徵提取模型。基於韻律短語的VQVAE模型結合AALBERT的時序理解能力被提出作為語音情緒特徵抽取模型。基於saliency map的關鍵term抽取方法被提出作為文字情緒特徵抽取模型。透過在IEMOCAP數據集上的分析結果,證實本研究提出的特徵抽取方法與現存強健的模型,如wav2vec2.0及RoBERTa,是可比較的。在情緒辨識階段,本研究提出使用神經張量網路來計算分段上不同模態之間的情緒一致性,並將一致性分數作為分段注意力。於辨識時,除了考慮分段的注意力權重,也同時使用段級別及句子級別的標註聯合訓練模型。由實驗顯示,本研究提出的方法在BAUM-1及CMU-MOSEI上能贏過其他現存的情緒辨識系統。
最終本研究提出多關係KNN補值方法對於收集數位足跡時發生的缺失問題進行補救。Lasso-MLP模型也被提出來作為量表分數預測模型,非線性模型的設計提升了魯棒性,Lasso的特徵選擇能力使模型克服了低資源語料的問題。由實驗證明提出的模型及補值方法對量表分數預測皆有提升。本研究提出的方法改善了雙相症評估系統中的許多問題,提升了雙相症患者評估系統的準確性。
Bipolar disorder is a progressively prevalent mental illness, and its treatment often requires substantial healthcare resources. Monitoring and tracking the psychological state and daily life of patients have been proven to be beneficial for bipolar disorder treatment. This study focuses on developing a bipolar disorder assessment system to predict patients' weekly mental health scores, specifically Hamilton Depression Rating Scale (HAM-D) and Young Mania Rating Scale (YMRS). The system aims not only to provide information to clinical physicians but also to alleviate the burden on healthcare resources. While there are numerous mental health-related applications and studies utilizing digital phenotypings, many lack evidence of clinical effectiveness and have been discontinued. Models for emotion computation in digital phenotypings often neglect the importance of feature extraction and temporal segmentation differences in key signals within modalities. Additionally, the common issue of missing data in digital phenotyping collection has not been adequately addressed. This study proposes improved methods to address these challenges.
For monitoring and tracking patients' daily lives, this study developed an intelligent smartphone app focusing on user experience. Statistical data on app usage frequency show a gradual increase in monthly usage, indicating improved user experience. Moreover, using annually collected data to predict patients' mental health scale scores demonstrates a yearly improvement in data accuracy, proving the effectiveness and quality of the collected data in a clinical context.
In the realm of multimodal emotion computation within digital phenotypings, this study introduces feature extraction models tailored to different modality characteristics. The Vector-Quantized Variational AutoEncoder model based on rhythmic phrases, combined with Audio ALBERT's temporal understanding capability, is proposed as the speech emotion feature extraction model. The saliency map-based key term extraction method is presented as the text emotion feature extraction model. Analyzing the results on the IEMOCAP dataset confirms that the proposed feature extraction methods are comparable to existing robust models such as wav2vec2.0 and RoBERTa. In the emotion recognition stage, this study suggests using a neural tensor network to calculate emotion consistency between different modal fragments within a segment. The consistency score serves as the segment attention, and during recognition, the model is jointly trained with segment-level and utterance-level labels. Experimental results demonstrate that the proposed methods outperform other existing emotion recognition systems on BAUM-1 and CMU-MOSEI.
Finally, this study proposes a multi-correlation K Nearest Neighbor imputation method to address missing data issues in collected digital phenotypings. The Lasso Multi-Layer Perceptron model is introduced as a scale score prediction model, designed to enhance robustness through a nonlinear model. The feature selection ability of Lasso overcomes issues associated with low-resource corpora. Experimental results show improvements in both scale score prediction and imputation with the proposed models and methods.
[1] J. Townsend and L.L. Altshuler, "Emotion processing and regulation in bipolar disorder: a review," Bipolar disorders, vol. 14, no. 4, pp. 326-339, 2012.
[2] R. Belmaker, "Bipolar disorder," New England Journal of Medicine, vol. 351, no. 5, pp. 476-486, 2004.
[3] H.-H. Wang, C.-M. Chang, S.-S. Chang, A.C. Yang, Y.-H. Liu, S.-C. Liao, and C.-S. Wu, "Ten-year trends in depression care in Taiwan," Journal of the Formosan Medical Association, vol. 121, no. 10, pp. 2001-2011, 2022.
[4] A. Reamer, "Reducing the Economic Burden of Unmet Mental Health Needs," in The White House, ed, 2022.
[5] M. Cloutier, M. Greene, A. Guerin, M. Touya, and E. Wu, "The economic burden of bipolar I disorder in the United States in 2015," Journal of affective disorders, vol. 226, pp. 45-51, 2018.
[6] N.D. Leitan, E.E. Michalak, L. Berk, M. Berk, and G. Murray, "Optimizing delivery of recovery‐oriented online self‐management strategies for bipolar disorder: A review," Bipolar Disorders, vol. 17, no. 2, pp. 115-127, 2015.
[7] I. Grande, M. Berk, B. Birmaher, and E. Vieta, "Bipolar disorder," The Lancet, vol. 387, no. 10027, pp. 1561-1572, 2016.
[8] B.O. Rothbaum, E.A. Meadows, P. Resick, and D.W. Foy, "Cognitive-behavioral therapy," 2000.
[9] T. Kendall, R. Morriss, E. Mayo-Wilson, T.D. Meyer, S.H. Jones, M. Oud, and M.R. Baker, "NICE guidance on psychological treatments for bipolar disorder," The Lancet Psychiatry, vol. 3, no. 4, pp. 317-320, 2016.
[10] G.S. Malhi, E. Bell, D. Bassett, P. Boyce, R. Bryant, P. Hazell, M. Hopwood, B. Lyndon, R. Mulder, and R. Porter, "The 2020 Royal Australian and New Zealand College of Psychiatrists clinical practice guidelines for mood disorders," Australian & New Zealand Journal of Psychiatry, vol. 55, no. 1, pp. 7-117, 2021.
[11] E.E. Michalak, M.J. Suto, S.J. Barnes, S. Hou, S. Lapsley, M.W. Scott, G. Murray, J. Austin, N.B. Elliott, and L. Berk, "Effective self-management strategies for bipolar disorder: A community-engaged Delphi Consensus Consultation study," Journal of Affective Disorders, vol. 206, pp. 77-86, 2016.
[12] R.W. Iannuzzo, J. Jaeger, J.F. Goldberg, V. Kafantaris, and M.E. Sublette, "Development and reliability of the HAM-D/MADRS interview: an integrated depression symptom rating scale," Psychiatry research, vol. 145, no. 1, pp. 21-37, 2006.
[13] P. Bech, O. Rafaelsen, P. Kramp, and T. Bolwig, "The mania rating scale: scale construction and inter-observer agreement," Neuropharmacology, 1978.
[14] A. Lora, F. Hanna, and D. Chisholm, "Mental health service availability and delivery at the global level: an analysis by countries’ income level from WHO's Mental Health Atlas 2014," Epidemiology and psychiatric sciences, vol. 29, p. e2, 2020.
[15] c.-k. Wu. "Dilemmas and challenges faced by primary mental health care in Taiwan." https://www.mohw.gov.tw/dl-58983-62b950e2-b61b-45e0-a1e6-5d14f4bc0553.html (accessed.
[16] D. Ben-Zeev, K.E. Davis, S. Kaiser, I. Krzsos, and R.E. Drake, "Mobile technologies among people with serious mental illness: opportunities for future services," Administration and Policy in Mental Health and Mental Health Services Research, vol. 40, no. 4, pp. 340-343, 2013.
[17] J. Torous, R. Friedman, and M. Keshavan, "Smartphone ownership and interest in mobile applications to monitor symptoms of mental health conditions. JMIR Mhealth Uhealth. 2014; 2 (1): e2. doi: 10.2196/mhealth. 2994," ed, 2014.
[18] B.M. Chaudhry, "Daylio: mood-quantification for a less stressful you," Mhealth, vol. 2, 2016.
[19] K. Huckvale, S. Venkatesh, and H. Christensen, "Toward clinical digital phenotyping: a timely opportunity to consider purpose, quality, and safety," NPJ digital medicine, vol. 2, no. 1, pp. 1-11, 2019.
[20] L. Orsolini, M. Fiorani, and U. Volpe, "Digital phenotyping in bipolar disorder: Which integration with clinical endophenotypes and biomarkers?," International Journal of Molecular Sciences, vol. 21, no. 20, p. 7684, 2020.
[21] Y.-C. Tseng, E.C.-l. Lin, C.H. Wu, H.-L. Huang, and P.S. Chen, "Associations among smartphone app-based measurements of mood, sleep and activity in bipolar disorder," Psychiatry Research, vol. 310, p. 114425, 2022.
[22] M.L. Miller, I.M. Raugh, G.P. Strauss, and P.D. Harvey, "Remote digital phenotyping in serious mental illness: Focus on negative symptoms, mood symptoms, and self-awareness," Biomarkers in neuropsychiatry, vol. 6, p. 100047, 2022.
[23] N.C. Jacobson and Y.J. Chung, "Passive sensing of prediction of moment-to-moment depressed mood among undergraduates with clinical levels of depression sample using smartphones," Sensors, vol. 20, no. 12, p. 3572, 2020.
[24] J.M. Bopp, D.J. Miklowitz, G.M. Goodwin, W. Stevens, J.M. Rendell, and J.R. Geddes, "The longitudinal course of bipolar disorder as revealed through weekly text messaging: a feasibility study," Bipolar disorders, vol. 12, no. 3, pp. 327-334, 2010.
[25] A. Malik, G.M. Goodwin, and E.A. Holmes, "Contemporary approaches to frequent mood monitoring in bipolar disorder," Journal of experimental psychopathology, vol. 3, no. 4, pp. 572-581, 2012.
[26] A.K. Gold and L.G. Sylvia, "The role of sleep in bipolar disorder," Nature and science of sleep, vol. 8, p. 207, 2016.
[27] D.T. Plante and J.W. Winkelman, "Sleep disturbance in bipolar disorder: therapeutic implications," American Journal of Psychiatry, vol. 165, no. 7, pp. 830-843, 2008.
[28] G. Murray and A. Harvey, "Circadian rhythms and sleep in bipolar disorder," Bipolar disorders, vol. 12, no. 5, pp. 459-472, 2010.
[29] Z.N. Karam, E.M. Provost, S. Singh, J. Montgomery, C. Archer, G. Harrington, and M.G. Mcinnis, "Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech," in 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2014: IEEE, pp. 4858-4862.
[30] N. Palmius, A. Tsanas, K.E. Saunders, A.C. Bilderbeck, J.R. Geddes, G.M. Goodwin, and M. De Vos, "Detecting bipolar depression from geographic location data," IEEE Transactions on Biomedical Engineering, vol. 64, no. 8, pp. 1761-1771, 2016.
[31] A. Muaremi, F. Gravenhorst, A. Grünerbl, B. Arnrich, and G. Tröster, "Assessing bipolar episodes using speech cues derived from phone calls," in International symposium on pervasive computing paradigms for mental health, 2014: Springer, pp. 103-114.
[32] L.F. Saccaro, G. Amatori, A. Cappelli, R. Mazziotti, L. Dell'Osso, and G. Rutigliano, "Portable technologies for digital phenotyping of bipolar disorder: A systematic review," Journal of affective disorders, vol. 295, pp. 323-338, 2021.
[33] H.-Y. Su, C.-H. Wu, C.-R. Liou, E.C.-L. Lin, and P.S. Chen, "Assessment of Bipolar Disorder Using Heterogeneous Data of Smartphone-Based Digital Phenotyping," in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021: IEEE, pp. 4260-4264.
[34] K.W. Miskowiak, K. Burdick, A. Martinez‐Aran, C. Bonnin, C. Bowie, A. Carvalho, P. Gallagher, B. Lafer, C. López‐Jaramillo, and T. Sumiyoshi, "Methodological recommendations for cognition trials in bipolar disorder by the International Society for Bipolar Disorders Targeting Cognition Task Force," vol. 19, ed: Wiley Online Library, 2017, pp. 614-626.
[35] E.E. Michalak, G. Murray, and C. BD, "Development of the QoL. BD: a disorder‐specific scale to assess quality of life in bipolar disorder," Bipolar disorders, vol. 12, no. 7, pp. 727-740, 2010.
[36] J. Tomasik, S.Y.S. Han, G. Barton-Owen, D.-M. Mirea, N.A. Martin-Key, N. Rustogi, S.G. Lago, T. Olmert, J.D. Cooper, and S. Ozcan, "A machine learning algorithm to differentiate bipolar disorder from major depressive disorder using an online mental health questionnaire and blood biomarker data," Translational psychiatry, vol. 11, no. 1, p. 41, 2021.
[37] H. Najar, A. Karanti, E. Pålsson, and M. Landén, "Cardiometabolic risk indicators in individuals with bipolar disorders: a replication study," Diabetology & Metabolic Syndrome, vol. 15, no. 1, pp. 1-9, 2023.
[38] R. Malarvizhi and A.S. Thanamani, "K-nearest neighbor in missing data imputation," Int. J. Eng. Res. Dev, vol. 5, no. 1, pp. 5-7, 2012.
[39] J.-H. Hsu, C.-H. Wu, W.-K. Wang, H.-Y. Su, E.C.-L. Lin, and P.S. Chen, "Digital Phenotyping-Based Bipolar Disorder Assessment Using Multiple Correlation Data Imputation and Lasso-MLP," IEEE Transactions on Affective Computing, 2023.
[40] J. Torous, J. Firth, K. Huckvale, M.E. Larsen, T.D. Cosco, R. Carney, S. Chan, A. Pratap, P. Yellowlees, and T. Wykes, "The emerging imperative for a consensus approach toward the rating and clinical recommendation of mental health apps," The Journal of nervous and mental disease, vol. 206, no. 8, pp. 662-666, 2018.
[41] M. Faurholt-Jepsen, M. Frost, M. Vinberg, E.M. Christensen, J.E. Bardram, and L.V. Kessing, "Smartphone data as objective measures of bipolar disorder symptoms," Psychiatry research, vol. 217, no. 1-2, pp. 124-127, 2014.
[42] M. Faurholt-Jepsen and L.V. Kessing, "Monitoring and treatment in patients with bipolar disorder using smartphones—New perspectives for improved quality in patient care," Psychiatry Research, vol. 317, p. 114844, 2022.
[43] M.E. Larsen, K. Huckvale, J. Nicholas, J. Torous, L. Birrell, E. Li, and B. Reda, "Using science to sell apps: evaluation of mental health app store quality claims," NPJ digital medicine, vol. 2, no. 1, p. 18, 2019.
[44] J. Torous, H. Wisniewski, B. Bird, E. Carpenter, G. David, E. Elejalde, D. Fulford, S. Guimond, R. Hays, and P. Henson, "Creating a digital health smartphone app and digital phenotyping platform for mental health and diverse healthcare needs: an interdisciplinary and collaborative approach," Journal of Technology in Behavioral Science, vol. 4, pp. 73-85, 2019.
[45] S. Lagan, A. Ramakrishnan, E. Lamont, A. Ramakrishnan, M. Frye, and J. Torous, "Digital health developments and drawbacks: a review and analysis of top-returned apps for bipolar disorder," International Journal of Bipolar Disorders, vol. 8, pp. 1-8, 2020.
[46] E. Morton, J. Nicholas, L. Yang, L. Lapadat, S.J. Barnes, M.D. Provencher, C. Depp, M. Chan, R. Kulur, and E.E. Michalak, "Evaluating the quality, safety, and functionality of commonly used smartphone apps for bipolar disorder mood and sleep self-management," International Journal of Bipolar Disorders, vol. 10, no. 1, pp. 1-13, 2022.
[47] R.D. Vlisides-Henry, M. Gao, L. Thomas, P.R. Kaliush, E. Conradt, and S.E. Crowell, "Digital phenotyping of emotion dysregulation across lifespan transitions to better understand psychopathology risk," Frontiers in Psychiatry, vol. 12, p. 618442, 2021.
[48] R.D. Lane, C. Subic-Wrana, L. Greenberg, and I. Yovel, "The role of enhanced emotional awareness in promoting change across psychotherapy modalities," Journal of Psychotherapy Integration, vol. 32, no. 2, p. 131, 2022.
[49] Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, "A survey of convolutional neural networks: analysis, applications, and prospects," IEEE transactions on neural networks and learning systems, 2021.
[50] L.R. Medsker and L. Jain, "Recurrent neural networks," Design and Applications, vol. 5, no. 64-67, p. 2, 2001.
[51] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need," Advances in neural information processing systems, vol. 30, 2017.
[52] M.A. Hearst, S.T. Dumais, E. Osuna, J. Platt, and B. Scholkopf, "Support vector machines," IEEE Intelligent Systems and their applications, vol. 13, no. 4, pp. 18-28, 1998.
[53] J. Huang, J. Tao, B. Liu, Z. Lian, and M. Niu, "Multimodal transformer fusion for continuous emotion recognition," in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020: IEEE, pp. 3507-3511.
[54] K. Zhang, Y. Li, J. Wang, Z. Wang, and X. Li, "Feature fusion for multimodal emotion recognition based on deep canonical correlation analysis," IEEE Signal Processing Letters, vol. 28, pp. 1898-1902, 2021.
[55] S. Liu, P. Gao, Y. Li, W. Fu, and W. Ding, "Multi-modal fusion network with complementarity and importance for emotion recognition," Information Sciences, vol. 619, pp. 679-694, 2023.
[56] S. Poria, E. Cambria, D. Hazarika, N. Majumder, A. Zadeh, and L.-P. Morency, "Context-dependent sentiment analysis in user-generated videos," in Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long papers), 2017, pp. 873-883.
[57] A. Van Den Oord and O. Vinyals, "Neural discrete representation learning," Advances in neural information processing systems, vol. 30, 2017.
[58] K.S. Rao, S.G. Koolagudi, and R.R. Vempada, "Emotion recognition from speech using global and local prosodic features," International journal of speech technology, vol. 16, pp. 143-160, 2013.
[59] M. Brahimi, M. Arsenovic, S. Laraba, S. Sladojevic, K. Boukhalfa, and A. Moussaoui, "Deep learning for plant diseases: detection and saliency map visualisation," Human and Machine Learning: Visible, Explainable, Trustworthy and Transparent, pp. 93-117, 2018.
[60] S. Sahoo, P. Kumar, B. Raman, and P.P. Roy, "A segment level approach to speech emotion recognition using transfer learning," in Asian Conference on Pattern Recognition, 2019: Springer, pp. 435-448.
[61] M.T. Shami and M.S. Kamel, "Segment-based approach to the recognition of emotions in speech," in 2005 IEEE international conference on multimedia and expo, 2005: IEEE, p. 4 pp.
[62] Y. Ma, Y. Hao, M. Chen, J. Chen, P. Lu, and A. Košir, "Audio-visual emotion fusion (AVEF): A deep efficient weighted approach," Information Fusion, vol. 46, pp. 184-192, 2019.
[63] Y. Hua, J. Guo, and H. Zhao, "Deep belief networks and deep learning," in Proceedings of 2015 International Conference on Intelligent Computing and Internet of Things, 2015: IEEE, pp. 1-4.
[64] X. Qiu and X. Huang, "Convolutional neural tensor network architecture for community-based question answering," in Twenty-Fourth international joint conference on artificial intelligence, 2015.
[65] J.-H. Hsu, C.-H. Wu, and T.-H. Yang, "Using Prosodic Phrase-Based VQVAE on Audio ALBERT for Speech Emotion Recognition," in 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2022: IEEE, pp. 415-419.
[66] J.-H. Hsu and C.-H. Wu, "Applying Segment-Level Attention on Bi-modal Transformer Encoder for Audio-Visual Emotion Recognition," IEEE Transactions on Affective Computing, 2023.
[67] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, "Albert: A lite bert for self-supervised learning of language representations," arXiv preprint arXiv:1909.11942, 2019.
[68] R. Nawaz, K.H. Cheah, H. Nisar, and V.V. Yap, "Comparison of different feature extraction methods for EEG-based emotion recognition," Biocybernetics and Biomedical Engineering, vol. 40, no. 3, pp. 910-926, 2020.
[69] A. Koduru, H.B. Valiveti, and A.K. Budati, "Feature extraction algorithms to improve the speech emotion recognition rate," International Journal of Speech Technology, vol. 23, no. 1, pp. 45-55, 2020.
[70] S. Haq and P.J. Jackson, "Multimodal emotion recognition," in Machine audition: principles, algorithms and systems: IGI Global, 2011, pp. 398-423.
[71] P. Jiang, S. Zhao, and S. Cheng, "Rotational invariant LBP-SURF for fast and robust image matching," in 2015 9th International Conference on Signal Processing and Communication Systems (ICSPCS), 2015: IEEE, pp. 1-7.
[72] S. Ji, W. Xu, M. Yang, and K. Yu, "3D convolutional neural networks for human action recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221-231, 2012.
[73] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[74] J.H. Hsu, C.H. Wu, and Y.H. Wei, "Speech Emotion Recognition using Decomposed Speech via Multi-task Learning," in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2023, vol. 2023, pp. 4553-4557.
[75] X. Lan, X. Li, Y. Ning, Z. Wu, H. Meng, J. Jia, and L. Cai, "Low level descriptors based DBLSTM bottleneck feature for speech driven talking avatar," in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016: IEEE, pp. 5550-5554.
[76] M.-H. Su, C.-H. Wu, K.-Y. Huang, and T.-H. Yang, "Cell-coupled long short-term memory with $ l $-skip fusion mechanism for mood disorder detection through elicited audiovisual features," IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 1, pp. 124-135, 2019.
[77] L. Sun, Z. Lian, J. Tao, B. Liu, and M. Niu, "Multi-modal continuous dimensional emotion recognition using recurrent neural network and self-attention mechanism," in Proceedings of the 1st international on multimodal sentiment analysis in real-life media challenge and workshop, 2020, pp. 27-34.
[78] J.-H. Hsu and C.-H. Wu, "Attentively-Coupled Long Short-Term Memory for Audio-Visual Emotion Recognition," in 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2020: IEEE, pp. 1048-1053.
[79] D.P. Kingma and M. Welling, "An introduction to variational autoencoders," Foundations and Trends® in Machine Learning, vol. 12, no. 4, pp. 307-392, 2019.
[80] A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, "wav2vec 2.0: A framework for self-supervised learning of speech representations," Advances in neural information processing systems, vol. 33, pp. 12449-12460, 2020.
[81] Y. Peng, S. Yan, and Z. Lu, "Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets," arXiv preprint arXiv:1906.05474, 2019.
[82] J.-H. Hsu, J. Chang, M.-H. Kuo, and C.-H. Wu, "Empathetic Response Generation based on Plug-and-Play Mechanism with Empathy Perturbation," IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023.
[83] Y.-H. Wang, J.-H. Hsu, C.-H. Wu, and T.-H. Yang, "Transformer-based empathetic response generation using dialogue situation and advanced-level definition of empathy," in 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP), 2021: IEEE, pp. 1-5.
[84] J.-H. Hsu, T.-C. Weng, C.-H. Wu, and T.-S. Ho, "Natural Language Processing Methods for Detection of Influenza-Like Illness from Chief Complaints," in 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2020: IEEE, pp. 1626-1630.
[85] A. Liaw, J.-H. Hsu, and C.-H. Wu, "Ensemble of One Model: Creating Model Variations for Transformer with Layer Permutation," in 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021: IEEE, pp. 1026-1030.
[86] R. MohammadiBaghmolaei and A. Ahmadi, "Word embedding for emotional analysis: an overview," in 2020 28th Iranian Conference on Electrical Engineering (ICEE), 2020: IEEE, pp. 1-5.
[87] A. Ratheesh, M. Berk, C.G. Davey, P.D. McGorry, and S.M. Cotton, "Instruments that prospectively predict bipolar disorder–a systematic review," Journal of affective disorders, vol. 179, pp. 65-73, 2015.
[88] R.A. Power, S. Steinberg, G. Bjornsdottir, C.A. Rietveld, A. Abdellaoui, M.M. Nivard, M. Johannesson, T.E. Galesloot, J.J. Hottenga, and G. Willemsen, "Polygenic risk scores for schizophrenia and bipolar disorder predict creativity," Nature neuroscience, vol. 18, no. 7, pp. 953-955, 2015.
[89] C.-H. Wu, J.-H. Hsu, C.-R. Liou, H.-Y. Su, E.C.-L. Lin, and P.-S. Chen, "Automatic Bipolar Disorder Assessment Using Machine Learning with Smartphone-based Digital Phenotyping," IEEE Access, 2023.
[90] J.-H. Hsu, H.-W. Tseng, C.-H. Wu, E.C.-L. Lin, and P.S. Chen, "Temporal and Type Correlation in Digital Phenotyping for Bipolar Disorder State Prediction Using Multitask Self-Supervised Learning," in 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2023: IEEE, pp. 2189-2195.
[91] D. Meng, X. Peng, K. Wang, and Y. Qiao, "Frame attention networks for facial expression recognition in videos," in 2019 IEEE international conference on image processing (ICIP), 2019: IEEE, pp. 3866-3870.
[92] J.-H. Hsu, M.-H. Su, C.-H. Wu, and Y.-H. Chen, "Speech emotion recognition considering nonverbal vocalization in affective conversations," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1675-1686, 2021.
[93] W. Styler, "Using Praat for linguistic research," University of Colorado at Boulder Phonetics Lab, 2013.
[94] J.-H. Hsu, C.-H. Wu, and T.-H. Yang, "Task-Aware BERT-based Sentiment Analysis from Multiple Essences of the Text," in 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021: IEEE, pp. 1982-1986.
[95] E. Alpaydm, "Combined 5× 2 cv F test for comparing supervised classification learning algorithms," Neural computation, vol. 11, no. 8, pp. 1885-1892, 1999.
[96] M.Q. Pembury Smith and G.D. Ruxton, "Effective use of the McNemar test," Behavioral Ecology and Sociobiology, vol. 74, pp. 1-9, 2020.
[97] C. Guanghui and Z. Xiaoping, "Multi-modal emotion recognition by fusing correlation features of speech-visual," IEEE Signal Processing Letters, vol. 28, pp. 533-537, 2021.
[98] I. Kansizoglou, L. Bampis, and A. Gasteratos, "An active learning paradigm for online audio-visual emotion recognition," IEEE Transactions on Affective Computing, vol. 13, no. 2, pp. 756-768, 2019.
[99] D. Ghosal, M.S. Akhtar, D. Chauhan, S. Poria, A. Ekbal, and P. Bhattacharyya, "Contextual inter-modal attention for multi-modal sentiment analysis," in proceedings of the 2018 conference on empirical methods in natural language processing, 2018, pp. 3454-3466.
[100] N. Majumder, S. Poria, D. Hazarika, R. Mihalcea, A. Gelbukh, and E. Cambria, "Dialoguernn: An attentive rnn for emotion detection in conversations," in Proceedings of the AAAI conference on artificial intelligence, 2019, vol. 33, no. 01, pp. 6818-6825.
[101] A. Shenoy and A. Sardana, "Multilogue-net: A context aware rnn for multi-modal emotion detection and sentiment analysis in conversation," arXiv preprint arXiv:2002.08267, 2020.
[102] J.-B. Delbrouck, N. Tits, M. Brousmiche, and S. Dupont, "A transformer-based joint-encoding for emotion recognition and sentiment analysis," arXiv preprint arXiv:2006.15955, 2020.
[103] E. Lattuada, A. Serretti, C. Cusin, M. Gasperini, and E. Smeraldi, "Symptomatologic analysis of psychotic and non-psychotic depression," Journal of Affective Disorders, vol. 54, no. 1-2, pp. 183-187, 1999.