| 研究生: |
朱姳蓁 Chu, Ming-Chen |
|---|---|
| 論文名稱: |
結合母音空間進行嚴重程度保留之健康至病態語音轉換 Healthy-to-Pathological Voice Conversion with Vowel Space-Based Severity Preservation |
| 指導教授: |
吳宗憲
Wu, Chung-Hsien |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 106 |
| 中文關鍵詞: | 構音障礙語音轉換 、嚴重程度保留 、母音空間 、構音障礙 、源-濾波器模型 、語音合成 |
| 外文關鍵詞: | pathological voice conversion, severity preservation, vowel space, dysarthria, source-filter model, speech synthesis |
| 相關次數: | 點閱:18 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究提出一套健康至病態語音轉換架構,旨在保留並控制構音障礙語音中的嚴重程度特徵。病態語音常伴隨高度聲學變異與非典型構音表現,加上資料取得困難,對語音建模構成重大挑戰。為因應此問題,我們引入源自語音學的母音空間壓縮特徵,作為反映構音退化的關鍵依據,並結合源-濾波器語音生成理論設計轉換流程。
我們根據角位母音分布建立母音空間特徵表示,並整合進模型架構中,協助模型學習不同嚴重程度下的語音變化機制,實現具可控制性的病態語音生成。所建構的模型能夠模擬構音與發聲層面的異常機制,並保留語音嚴重程度相關的聲學資訊。在 UASpeech 資料集上的實驗顯示,本系統所生成之語音在可懂度、語者身分、嚴重程度變化與母音空間特徵等層面,均與真實構音障礙語音高度一致。進一步的消融實驗也證實母音空間特徵在建模語音退化中扮演關鍵角色。據我們所知,本研究為首度將母音空間壓縮現象導入病態語音建模流程,提供一個具控制性與可解釋性的語音模擬方法,為後續語音障礙研究提供新穎的建模框架與實驗依據。
This study proposes a framework for healthy-to-pathological voice conversion that aims to preserve and control severity-related characteristics in dysarthric speech. Pathological speech often exhibits high acoustic variability and atypical articulatory patterns, while limited data availability poses challenges for model development. To address these issues, we introduce a phonetics-based feature derived from vowel space compression, an acoustic marker of articulatory degradation, and incorporate it into a source-filter-based synthesis architecture.
We construct a vowel space representation using corner vowel distributions and integrate it into the model to guide severity-dependent variation. This enables controllable generation of pathological speech while preserving articulatory and phonatory abnormalities across severity levels. Experiments on the UASpeech dataset show that the generated speech closely matches real dysarthric speech in intelligibility, speaker identity, severity progression, and vowel space structure. Ablation studies further confirm the importance of vowel space information in modeling speech degradation. To our knowledge, this is the first work to incorporate vowel space compression into pathological speech modeling, offering a structured and interpretable framework for future research in speech disorder analysis.
G. Fant, Acoustic Theory of Speech Production: With Calculations Based on X-Ray Studies of Russian Articulations. Mouton, 1971.
R. D. Kent, G. Weismer, J. F. Kent, H. K. Vorperian, and J. R. Duffy, "Acoustic studies of dysarthric speech: methods, progress, and potential," (in eng), J Commun Disord, vol. 32, no. 3, pp. 141-80, 183-6; quiz 181-3, 187-9, May-Jun 1999, doi: 10.1016/s0021-9924(99)00004-0.
Emflazie, "Source-filter model diagram," ed. Wikimedia Commons, 2020, pp. [Online]. Available: https://commons.wikimedia.org/wiki/File:Source-filter_model_diagram.svg. Licensed under CC BY-SA 4.0.
A. T. Neel, "Vowel space characteristics and vowel identification accuracy," (in eng), J Speech Lang Hear Res, vol. 51, no. 3, pp. 574-85, Jun 2008, doi: 10.1044/1092-4388(2008/041).
Kwamikagami, "Vowel triangle, intermediate vowels," ed. Wikimedia Commons, 2015, pp. [Online]. Available: https://commons.wikimedia.org/wiki/File:Vowel_triangle,_intermediate_vowels.png Licensed under CC BY-SA 3.0.
B. Sisman, J. Yamagishi, S. King, and H. Li, "An overview of voice conversion and its challenges: From statistical modeling to deep learning," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 132-157, 2020.
Y. Stylianou, O. Cappe, and E. Moulines, "Continuous probabilistic transform for voice conversion," IEEE Transactions on Speech and Audio Processing, vol. 6, no. 2, pp. 131-142, 1998, doi: 10.1109/89.661472.
S. Aryal and R. Gutierrez-Osuna, "Can voice conversion be used to reduce non-native accents?," in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014: IEEE, pp. 7879-7883.
S. Liu, D. Wang, Y. Cao, L. Sun, X. Wu, S. Kang, Z. Wu, X. Liu, D. Su, and D. Yu, "End-to-end accent conversion without using native utterances," in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020: IEEE, pp. 6289-6293.
B. M. Halpern, W.-C. Huang, L. P. Violeta, R. van Son, and T. Toda, "Improving severity preservation of healthy-to-pathological voice conversion with global style tokens," in 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2023: IEEE, pp. 1-7.
S. Ghosh, M. Jouaiti, A. Das, Y. Sinha, T. Polzehl, I. Siegert, and S. Stober, "Anonymising elderly and pathological speech: Voice conversion using DDSP and query-by-example," arXiv preprint arXiv:2410.15500, 2024.
W.-S. Hsu, G.-T. Lin, and W.-H. Wang, "Enhancing Dysarthric Voice Conversion with Fuzzy Expectation Maximization in Diffusion Models for Phoneme Prediction," Diagnostics, vol. 14, no. 23, p. 2693, 2024.
W. C. Huang, B. M. Halpern, L. P. Violeta, O. Scharenborg, and T. Toda, "Towards Identity Preserving Normal to Dysarthric Voice Conversion," in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 23-27 May 2022 2022, pp. 6672-6676, doi: 10.1109/ICASSP43922.2022.9747550.
H. Wang, T. Thebaud, J. Villalba, M. Sydnor, B. Lammers, N. Dehak, and L. Moro-Velazquez, "Duta-vc: A duration-aware typical-to-atypical voice conversion approach with diffusion probabilistic model," arXiv preprint arXiv:2306.10588, 2023.
N. M. Joy and S. Umesh, "Improving Acoustic Models in TORGO Dysarthric Speech Database," IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 26, no. 3, pp. 637-645, 2018, doi: 10.1109/TNSRE.2018.2802914.
V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, "Librispeech: an asr corpus based on public domain audio books," in 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2015: IEEE, pp. 5206-5210.
R. Ardila, M. Branson, K. Davis, M. Henretty, M. Kohler, J. Meyer, R. Morais, L. Saunders, F. M. Tyers, and G. Weber, "Common voice: A massively-multilingual speech corpus," arXiv preprint arXiv:1912.06670, 2019.
H. V. Sharma and M. Hasegawa-Johnson, "Acoustic model adaptation using in-domain background models for dysarthric speech recognition," Computer Speech & Language, vol. 27, no. 6, pp. 1147-1162, 2013/09/01/ 2013, doi: https://doi.org/10.1016/j.csl.2012.10.002.
M. Tu, A. Wisler, V. Berisha, and J. M. Liss, "The relationship between perceptual disturbances in dysarthric speech and automatic speech recognition performance," (in eng), J Acoust Soc Am, vol. 140, no. 5, p. El416, Nov 2016, doi: 10.1121/1.4967208.
X. Menendez-Pidal, J. B. Polikoff, S. M. Peters, J. E. Leonzio, and H. T. Bunnell, "The Nemours database of dysarthric speech," in Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, 3-6 Oct. 1996 1996, vol. 3, pp. 1962-1965 vol.3, doi: 10.1109/ICSLP.1996.608020.
H. Kim, M. Hasegawa-Johnson, A. Perlman, J. R. Gunderson, T. S. Huang, K. L. Watkin, and S. Frame, "Dysarthric speech database for universal access research," in Interspeech, 2008, vol. 2008, pp. 1741-1744.
F. Rudzicz, A. K. Namasivayam, and T. Wolff, "The TORGO database of acoustic and articulatory speech from speakers with dysarthria," Language resources and evaluation, vol. 46, pp. 523-541, 2012.
R. Turrisi, A. Braccia, M. Emanuele, S. Giulietti, M. Pugliatti, M. Sensi, L. Fadiga, and L. Badino, "EasyCall corpus: a dysarthric speech dataset," arXiv preprint arXiv:2104.02542, 2021.
J. Liu, X. Du, S. Lu, Y.-M. Zhang, H. U. An-ming, M. Lawrence Ng, R. Su, L. Wang, and N. Yan, "Audio-video database from subacute stroke patients for dysarthric speech intelligence assessment and preliminary analysis," Biomedical Signal Processing and Control, vol. 79, p. 104161, 2023/01/01/ 2023, doi: https://doi.org/10.1016/j.bspc.2022.104161.
H. Zen, V. Dang, R. Clark, Y. Zhang, R. J. Weiss, Y. Jia, Z. Chen, and Y. Wu, "Libritts: A corpus derived from librispeech for text-to-speech," arXiv preprint arXiv:1904.02882, 2019.
A. Benba, A. Jilbab, and A. Hammouch, "Analysis of multiple types of voice recordings in cepstral domain using MFCC for discriminating between patients with Parkinson’s disease and healthy people," International Journal of Speech Technology, vol. 19, pp. 449-456, 2016.
E. A. Belalcazar-Bolaños, J. R. Orozco-Arroyave, J. D. Arias-Londoño, J. F. Vargas-Bonilla, and E. Nöth, "Automatic detection of Parkinson's disease using noise measures of speech," in Symposium of Signals, Images and Artificial Vision - 2013: STSIVA - 2013, 11-13 Sept. 2013 2013, pp. 1-5, doi: 10.1109/STSIVA.2013.6644928.
A. Benba, A. Jilbab, and A. Hammouch, "Discriminating Between Patients With Parkinson’s and Neurological Diseases Using Cepstral Analysis," IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 24, no. 10, pp. 1100-1108, 2016, doi: 10.1109/TNSRE.2016.2533582.
S. R. Shahamiri and S. S. B. Salim, "Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach," Advanced Engineering Informatics, vol. 28, no. 1, pp. 102-110, 2014.
F. Rudzicz, "Adjusting dysarthric speech signals to be more intelligible," Computer Speech & Language, vol. 27, no. 6, pp. 1163-1177, 2013.
F. Xiong, J. Barker, and H. Christensen, "Phonetic Analysis of Dysarthric Speech Tempo and Applications to Robust Personalised Dysarthric Speech Recognition," in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 12-17 May 2019 2019, pp. 5836-5840, doi: 10.1109/ICASSP.2019.8683091.
G. S. Turner and G. Weismer, "Characteristics of speaking rate in the dysarthria associated with amyotrophic lateral sclerosis," Journal of Speech, Language, and Hearing Research, vol. 36, no. 6, pp. 1134-1144, 1993.
M. Nishio and S. Niimi, "Comparison of speaking rate, articulation rate and alternating motion rate in dysarthric speakers," Folia Phoniatrica et Logopaedica, vol. 58, no. 2, pp. 114-131, 2006.
J. P. Teixeira and A. Gonçalves, "Algorithm for jitter and shimmer measurement in pathologic voices," Procedia Computer Science, vol. 100, pp. 271-279, 2016.
N. Sripriya, S. Poornima, R. Shivaranjani, and P. Thangaraju, "Non-intrusive technique for pathological voice classification using jitter and shimmer," in 2017 International Conference on Computer, Communication and Signal Processing (ICCCSP), 10-11 Jan. 2017 2017, pp. 1-6, doi: 10.1109/ICCCSP.2017.7944104.
L. Geng, Y. Liang, H. Shan, Z. Xiao, W. Wang, and M. Wei, "Pathological voice detection and classification based on multimodal transmission network," Journal of Voice, 2022.
F. Teixeira, J. Fernandes, V. Guedes, A. Junior, and J. P. Teixeira, "Classification of control/pathologic subjects with support vector machines," Procedia computer science, vol. 138, pp. 272-279, 2018.
E. Hadjaidji, M. C. A. Korba, and K. Khelil, "Spasmodic dysphonia detection using machine learning classifiers," in 2021 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI), 2021: IEEE, pp. 1-5.
E. E. Zhao, S. A. Nguyen, C. D. Salvador, and A. K. O'Rourke, "A meta-analysis of the association between the voice handicap index and objective voice analysis," Journal of Speech, Language, and Hearing Research, vol. 63, no. 10, pp. 3461-3471, 2020.
J. F. T. Fernandes, D. Freitas, A. C. Junior, and J. P. Teixeira, "Determination of harmonic parameters in pathological voices—efficient algorithm," Applied Sciences, vol. 13, no. 4, p. 2333, 2023.
B. Barsties v. Latoszek, J. Mayer, C. R. Watts, and B. Lehnert, "Advances in clinical voice quality analysis with VOXplot," Journal of Clinical Medicine, vol. 12, no. 14, p. 4644, 2023.
G. S. Turner, K. Tjaden, and G. Weismer, "The influence of speaking rate on vowel space and speech intelligibility for individuals with amyotrophic lateral sclerosis," Journal of Speech, Language, and Hearing Research, vol. 38, no. 5, pp. 1001-1013, 1995.
P. Zwirner and G. J. Barnes, "Vocal tract steadiness: a measure of phonatory and upper airway motor control during phonation in dysarthria," Journal of Speech, Language, and Hearing Research, vol. 35, no. 4, pp. 761-768, 1992.
M. G. Thoppil, C. S. Kumar, A. Kumar, and J. Amose, "Speech signal analysis and pattern recognition in diagnosis of dysarthria," Annals of Indian Academy of Neurology, vol. 20, no. 4, pp. 352-357, 2017.
A. J. Flint, S. E. Black, I. Campbell-Taylor, G. F. Gailey, and C. Levinton, "Acoustic analysis in the differentiation of Parkinson's disease and major depression," Journal of Psycholinguistic Research, vol. 21, pp. 383-399, 1992.
N. P. Condor, C. L. Ludlow, and G. M. Schulz, "Stop consonant production in isolated and repeated syllables in Parkinson's disease," Neuropsychologia, vol. 27, no. 6, pp. 829-838, 1989.
Z. Jin, M. Geng, X. Xie, J. Yu, S. Liu, X. Liu, and H. Meng, "Adversarial data augmentation for disordered speech recognition," arXiv preprint arXiv:2108.00899, 2021.
H. Wang, Z. Jin, M. Geng, S. Hu, G. Li, T. Wang, H. Xu, and X. Liu, "Enhancing Pre-Trained ASR System Fine-Tuning for Dysarthric Speech Recognition Using Adversarial Data Augmentation," in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 14-19 April 2024 2024, pp. 12311-12315, doi: 10.1109/ICASSP48485.2024.10447702.
M. Purohit, M. Patel, H. Malaviya, A. Patil, M. Parmar, N. Shah, S. Doshi, and H. A. Patil, "Intelligibility Improvement of Dysarthric Speech using MMSE DiscoGAN," in 2020 International Conference on Signal Processing and Communications (SPCOM), 19-24 July 2020 2020, pp. 1-5, doi: 10.1109/SPCOM50965.2020.9179511.
H. Mehrez, M. Chaiani, and S. A. Selouani, "Using StarGANv2 Voice Conversion to Enhance the Quality of Dysarthric Speech," in 2024 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), 19-22 Feb. 2024 2024, pp. 738-744, doi: 10.1109/ICAIIC60209.2024.10463241.
B. M. Halpern, J. Fritsch, E. Hermann, R. v. Son, O. Scharenborg, and M. Magimai-Doss, "An Objective Evaluation Framework for Pathological Speech Synthesis," in Speech Communication; 14th ITG Conference, 29 Sept.-1 Oct. 2021 2021, pp. 1-5.
W.-C. Huang, K. Kobayashi, Y.-H. Peng, C.-F. Liu, Y. Tsao, H.-M. Wang, and T. Toda, "A preliminary study of a two-stage paradigm for preserving speaker identity in dysarthric voice conversion," arXiv preprint arXiv:2106.01415, 2021.
Y.-C. Jhuo, "Data Augmentation with PPG-Based Phone Editing for Dysarthric Speech Recognition," Thesis, Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C., 2024. [Online]. Available: https://hdl.handle.net/11296/shrrw9
G. Lin, W. Hsu, G. Liu, and S. Chen, "Fuzzy Expectation Maximization Phoneme Prediction in Diffusion Model-based Dysarthria Voice Conversion," in 2024 International Conference on Fuzzy Theory and Its Applications (iFUZZY), 10-13 Aug. 2024 2024, pp. 1-4, doi: 10.1109/iFUZZY63051.2024.10661368.
T. Kaneko and H. Kameoka, "CycleGAN-VC: Non-parallel Voice Conversion Using Cycle-Consistent Adversarial Networks," in 2018 26th European Signal Processing Conference (EUSIPCO), 3-7 Sept. 2018 2018, pp. 2100-2104, doi: 10.23919/EUSIPCO.2018.8553236.
Z. Kong, W. Ping, J. Huang, K. Zhao, and B. Catanzaro, "Diffwave: A versatile diffusion model for audio synthesis," arXiv preprint arXiv:2009.09761, 2020.
J. Engel, L. Hantrakul, C. Gu, and A. Roberts, "DDSP: Differentiable digital signal processing," arXiv preprint arXiv:2001.04643, 2020.
G. Fant, "Speech sounds and features," 1973.
K. L. Lansford and J. M. Liss, "Vowel acoustics in dysarthria: speech disorder diagnosis and classification," (in eng), J Speech Lang Hear Res, vol. 57, no. 1, pp. 57-67, Feb 2014, doi: 10.1044/1092-4388(2013/12-0262).
M. K. Choi, S. D. Yoo, and E. J. Park, "Destruction of Vowel Space Area in Patients with Dysphagia after Stroke," (in eng), Int J Environ Res Public Health, vol. 19, no. 20, Oct 15 2022, doi: 10.3390/ijerph192013301.
S. Ge, Q. Wan, M. Yin, Y. Wang, and Z. Huang, "Quantitative acoustic metrics of vowel production in mandarin-speakers with post-stroke spastic dysarthria," (in eng), Clin Linguist Phon, vol. 35, no. 8, pp. 779-792, Aug 3 2021, doi: 10.1080/02699206.2020.1827295.
R. D. Kent and Y. J. Kim, "Toward an acoustic typology of motor speech disorders," Clinical linguistics & phonetics, vol. 17, no. 6, pp. 427-445, 2003.
S. Sapir, L. O. Ramig, J. L. Spielman, and C. Fox, "Formant centralization ratio: a proposal for a new acoustic measure of dysarthric speech," (in eng), J Speech Lang Hear Res, vol. 53, no. 1, pp. 114-25, Feb 2010, doi: 10.1044/1092-4388(2009/08-0184).
G. Fant, "The source filter concept in voice production," Stl-Qpsr, vol. 1, no. 1981, pp. 21-37, 1981.
T. M. Mitchell, "The need for biases in learning generalizations."
D. Jones, An English pronouncing dictionary. Psychology Press, 2003.
K. Huet and B. Harmegnies, "Contribution à la quantification du degré d’organisation des systèmes vocaliques," XXIIIèmes Journées d’Etude sur la Parole, vol. 1, pp. 225-228, 2000.
V. Roland, K. Huet, B. Harmegnies, M. Piccaluga, C. Verhaegen, and V. Delvaux, "Vowel production: a potential speech biomarker for early detection of dysarthria in Parkinson’s disease," Frontiers in Psychology, vol. 14, p. 1129830, 2023.
M. McAuliffe, M. Socolof, S. Mihuc, M. Wagner, and M. Sonderegger, "Montreal forced aligner: Trainable text-speech alignment using kaldi," in Interspeech, 2017, vol. 2017, pp. 498-502.
Y. Jadoul, B. Thompson, and B. De Boer, "Introducing parselmouth: A python interface to praat," Journal of Phonetics, vol. 71, pp. 1-15, 2018.
P. Boersma, "Praat, a system for doing phonetics by computer," Glot. Int., vol. 5, no. 9, pp. 341-345, 2001.
S. Chen, C. Wang, Z. Chen, Y. Wu, S. Liu, Z. Chen, J. Li, N. Kanda, T. Yoshioka, and X. Xiao, "Wavlm: Large-scale self-supervised pre-training for full stack speech processing," IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1505-1518, 2022.
Y. Wang, D. Stanton, Y. Zhang, R.-S. Ryan, E. Battenberg, J. Shor, Y. Xiao, Y. Jia, F. Ren, and R. A. Saurous, "Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis," in International conference on machine learning, 2018: PMLR, pp. 5180-5189.
M. Morise, F. Yokomori, and K. Ozawa, "WORLD: a vocoder-based high-quality speech synthesis system for real-time applications," IEICE TRANSACTIONS on Information and Systems, vol. 99, no. 7, pp. 1877-1884, 2016.
D.-Y. Wu, W.-Y. Hsiao, F.-R. Yang, O. Friedman, W. Jackson, S. Bruzenak, Y.-W. Liu, and Y.-H. Yang, "DDSP-based singing vocoders: A new subtractive-based synthesizer and a comprehensive evaluation," arXiv preprint arXiv:2208.04756, 2022.
Y. Wang, R. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, and S. Bengio, "Tacotron: Towards end-to-end speech synthesis," arXiv preprint arXiv:1703.10135, 2017.
C. Veaux, J. Yamagishi, and K. MacDonald, "CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit," University of Edinburgh. The Centre for Speech Technology Research (CSTR), vol. 6, p. 15, 2017.
C.-C. Lo, S.-W. Fu, W.-C. Huang, X. Wang, J. Yamagishi, Y. Tsao, and H.-M. Wang, "Mosnet: Deep learning based objective assessment for voice conversion," arXiv preprint arXiv:1904.08352, 2019.
W.-N. Hsu, B. Bolte, Y.-H. H. Tsai, K. Lakhotia, R. Salakhutdinov, and A. Mohamed, "Hubert: Self-supervised speech representation learning by masked prediction of hidden units," IEEE/ACM transactions on audio, speech, and language processing, vol. 29, pp. 3451-3460, 2021.
P. Janbakhshi, I. Kodrasi, and H. Bourlard, "Pathological speech intelligibility assessment based on the short-time objective intelligibility measure," in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019: IEEE, pp. 6405-6409.
H.-B. Kwon, "Gender difference in speech intelligibility using speech intelligibility tests and acoustic analyses," The journal of advanced prosthodontics, vol. 2, no. 3, p. 71, 2010.
M. Weirich and A. P. Simpson, "Articulatory vowel spaces of male and female speakers," in Proceedings of the 10th International Seminar on Speech Production, 2014: University of Cologne Cologne, Germany, pp. 449-452.