| 研究生: | 許銘峻 Hsu, Ming-Chun | 
|---|---|
| 論文名稱: | 建立強健性台語聲調音素資料庫以發展文字轉語音系統 Design a Robust Taiwanese Tonal Phoneme Database for Taiwanese Text-To-Speech System | 
| 指導教授: | 鍾高基 Chung, Kao-Chi | 
| 學位類別: | 碩士 Master | 
| 系所名稱: | 工學院 - 醫學工程研究所 Institute of Biomedical Engineering | 
| 論文出版年: | 2009 | 
| 畢業學年度: | 97 | 
| 語文別: | 中文 | 
| 論文頁數: | 116 | 
| 中文關鍵詞: | 台語 、聲調音素 、文字轉語音系統 、隱藏式馬可夫模型 | 
| 外文關鍵詞: | Tonal Phoneme, HMM, Taiwanese, Text-To-Speech (TTS) System | 
| 相關次數: | 點閱:88 下載:8 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
台語為台灣人民常用的主要語言,特別是中、高齡人口與中南部人民普遍以台語進行溝通。但現今台灣的醫療服務、臨床教育訓練、醫療多媒體應用主要以中文語音為主,受醫者與醫療人員在溝通上往往有極大的困難,造成就醫問題。另一方面,台灣的聽語障者與銀髮族人口逐年增加中,老年人隨年齡的增長亦常造成聽語障異常,因此突顯了聽語障者與銀髮族的照顧服務需求問題。聽語溝通輔具扮演了聽語障者與人群接觸的重要角色,利用輔助溝通系統中的文字轉語音系統,代替人發出自然流暢的語音,便是一種聽語障者與外界溝通的方法。但聽語輔具資源集中於中文和歐美句調語言系統的開發,無法充分滿足台灣本土音調語言系統之需求。近十幾年來台灣的語音科技研究,對於台語進行探討的研究較少,無法滿足台語在語音科技研發上的需求,因此發展台語的語音科技有其必要性。
本研究的目的在於建立一套強健性的台語聲調音素資料庫,並應用隱藏式馬可夫模型發展建立台語文字轉語音系統。研究之特定目標包括:(1)設計一套台語平衡句的訓練演算法,並建立台羅現代文平衡句語音資料庫;(2)以統計學方法分析理念性聲調音素單元之強健性,以驗證並建立強健性的台語聲調音素資料庫;(3)應用隱藏式馬可夫模型之語音合成系統與台語聲調音素資料庫以發展台語文字轉語音系統。
本研究基於台語語音學、音韻學、統計學方法和數位訊號處理科技的進步,實際應用台羅現代文系統作為建立相關資料庫之基礎。而整體的研究架構主要分為四個階段:第一階段首先將台羅現代文的次音節單元結構作更深入的探討分析,並配合台語聲調變化完成建立台語理念性聲調音素模型。第二階段發展設計台語平衡句的訓練演算法,用以訓練台語文句語料,再透過語音錄製完成平衡句語音資料庫。第三階段應用HTK辨識器辨識語音資料庫之聲調音素序列,並應用統計學中靈敏性與特異性的ROC曲線分析其辨識結果以驗證強健性聲調音素單元集。第四階段應用本研究發展的強健性聲調音素模型與HMM-based語音合成系統發展台語文字轉語音系統。
本研究收集完成台羅現代文連續文句語料,約十萬個音節,共8,905句。以視窗程式設計發展平衡句訓練暨分析系統,經訓練連續文句語料後,獲得869句平衡句與罕見單元載字句218句,實際錄音後成為可供本土臨床語音科技使用的台語平衡句語音資料庫。應用語音辨識器HTK得到的辨識結果,以統計學方法驗證完成聲調音素單元集之強健性分析,獲得156個(不含靜音段)強健性台語聲調音素單元集。最後在Linux和Windows作業系統上均成功地建構出HMM-based台語文字轉語音系統,其合成語音的MOS分數為4分,已符合一般合成語音之要求,更提升了合成語音自然度之水準。綜合以上的研究成果,本研究之各項研發均可提供本土臨床語音科技和台語計算語言學研發之基礎,並達成本研究之預期目標。
本研究已獲得預期的研究成果,可提供台語資訊化及計算語言學相關研發之基礎,以期有效地廣泛應用於醫療服務、臨床教育訓練、醫療多媒體和語音溝通輔助復健科技方面。未來可繼續探討的部分,包括:(1)可利用台語平衡句語音資料庫,進行各聲調基頻趨勢之分析比較。(2)平衡句訓練演算法可參考貪婪演算法之概念作更有效的平衡句挑選。(3)平衡句語音資料庫可再錄製女性語音以進行更深入分析及更廣泛之應用。(4)可應用視窗程式設計建構語音辨識器一貫化的訓練與辨識流程。(5)可考慮增加台語文字轉語音系統之漢字輸入介面,以供社會大眾更廣泛之使用。(6)由建構更多的台語語句文法規則資料庫來判斷合成韻律,以得到更符合口語化之合成語音。
Taiwanese is one of the most commonly used languages in Taiwan, especially for the middle-aged, senior citizens and persons living in central and southern Taiwan. However, Taiwan's medical treatment service, clinical education and training, medical multimedia applications are mainly using Mandarin speech. The patient with Taiwanese speaking often has great communication difficulties with medical staffs, leading to diagnostic and treatment problems. In fact, the elderly and the persons with hearing and/or speech impairments in Taiwan have been increasing year by year, particularly the elderly being highly risky to hearing impairments. Augmentative and Alternative Communication (AAC) systems and hearing aids of assistive technology play an important role for the disabled to interact with the outside world. The Text-To-Speech (TTS) AAC systems can utter nature speech sounds with fluency instead of people. However, most of research and development on communication and hearing assistive technology in Taiwan have been focused on Mandarin and occidental language systems, and it leads to the lack of Taiwanese communication and hearing assistive technology. Therefore, it is important to develop Taiwanese speech technology for the implementation on medical service and medical multimedia as well as assistive technology.
This research purpose is to design and establish a robust Taiwanese tonal phoneme database and then to develop a Taiwanese TTS system. More specifically, this research is aimed to: (1) design an algorithm to train Taiwanese balanced sentences and establish a speech database of Modern Literal Taiwanese (MLT) balanced sentences, (2) analyze the robustness of tonal phoneme models by statistical methods and then establish a robust Taiwanese tonal phoneme database, and (3) develop a Taiwanese TTS system through applying HMM-based TTS system and Taiwanese tonal phoneme database.
The materials and methods include: (1) to analyze MLT subsyllables through Taiwanese phonetics and phonology to establish tonal phoneme models, (2) to design a training algorithm for Taiwanese balanced speech database, (3) to apply HMM Toolkit (HTK) to recognize tonal phoneme and validate the robust Taiwanese tonal phoneme set through Bayes screening test, (4) to apply the robust tonal phoneme models and HMM-based TTS system to develop Taiwanese TTS system.
The collected text corpus consists of 8,905 MLT sentences and one hundred thousand syllables from MLT books. A Taiwanese balanced sentences speech database including 869 MLT sentences is established through a training and analyzing system developed by windows programming, and another 218 sentences of rare phoneme unit are generated to be included in the database. The phonetic set of 156 Taiwanese tonal phonemes are generated from the HTK recognition results, and the robustness of the phonetic set is validated through sensitivity, specificity and receiver operating characteristic (ROC) curve of statistics. The HMM-based Taiwanese TTS system is successfully developed on Linux and Windows operating system, and the synthetic speech has been evaluated with 4 MOS score and the performance has been up to a higher naturalness level. The results of this research can provide the fundamental information and techniques for the development of indigenous clinical speech technology and Taiwanese computational linguistics.
The outcomes of this study are expected to be applied to the fields, such as Taiwan medical services, clinical education and training, medical multimedia and augmentative/alternative communication (AAC) and rehabilitation technology. Further research is recommended to include the following: (1) to analyze and compare fundamental frequency of Taiwanese tones with each other through Taiwanese balanced sentences database, (2) to investigate the training effect of Greedy algorithm on balanced sentences selection, (3) to record female speech for more general range of applications of balanced sentences database, (4) to develop a consistent training and recognition protocol under windows programming, (5) to consider the addition of Chinese character input interface on Taiwanese TTS system for general and community applications, (6) to construct more syntax rules of Taiwanese sentences to judge synthetic prosody, and it will get more colloquial synthetic speech.
[1] 許月琴,溝通障礙者的輔助科技:輔助溝通系統簡介,特殊教育季刊第75期,2000,pp. 38-40。
[2] 佘永吉、鍾高基、吳宗憲、林繼雄,臺語鼻化音節的量化分析,1999年醫學工程科技研討會論文集,pp. 25-26,1999。
[3] 林繼雄,臺羅現代文推廣網站,育德文教基金會http://www.edutech.org.tw,1995。
[4] Jean DeBernardi, Linguistic Nationalism: The Case of Southern Min, Sino-Platonic Papers, 1991.
[5] 林繼雄,臺語現代文,大夏出版社,初版,1990。
[6] 林繼雄、鍾高基、佘永吉、林理智、李榮貴、林柏年,醫務台文讀本,新文京出版社,初版,2005。
[7] 林繼雄,台語現代文及其電腦合成語音解讀,育德文教基金會,2007。
[8] Frank Seide, Nick J.C. Wang, Phonetic Modelling in the Philips Chinese Continuous-Speech Recognition System, In Proceedings of ISCSLP, Singapore, 1998.
[9] Frank Seide, Max Huang, Hank Huang, Chi-wei Che, Nick J.C. Wang, SAMPA-C – A Phonetic Representation of Chinese for Speech Recognition, In Proceedings of ORIENTAL COCOSDA, 1999.
[10] Yeou-Jiunn Chen, Chung-Hsien Wu, Yu-Hsien Chiu, Hsiang-Chuan Liao, Generation of robust phonetic set and decision tree for Mandarin using chi-square testing, Proc. of ISCSLP, pp. 349-364, 2002.
[11] Chiu-Yu Tseng, Fu-Chiang Chou, Machine Reachable Phonetic Transcription System for Chinese Dialects Spoken in Taiwan, The First Oriental COCOSDA Workshop, 1998
[12] 許極敦,臺語文字化的方向,自立晚報,臺北市,pp. 3-55,1992。
[13] 林順傑,九官鳥:國臺語機器翻譯及語音合成發展系統,國立成功大學資訊工程研究所,2000。
[14] 洪惟仁,台羅拼音方案的優點,台灣閩南語羅馬字拼音方案及其發音學習網 http://www.ntcu.edu.tw/tailo/consultation.htm。
[15] 楊秀芳,臺灣閩南語語法稿,大安出版社,初版,1991。
[16] 洪惟仁,臺灣河佬話語聲調研究,自立晚報,臺北市,pp. 1-47,1985。
[17] 謝國平,語言學概論,台北三民書局,1985。
[18] 佘永吉、鍾高基、吳宗憲、邱毓賢,臺語鼻音之參數化聲波解析模型的發展與建立,2000年Y2K生醫科技論文研討會,pp. 103,2000。
[19] J. Wells, SAMPA computer readable phonetic alphabet, At http://www.phon.ucl.ac.uk/home/sampa, 1997.
[20] 國語推行委員會,臺灣閩南語羅馬字拼音方案使用手冊,臺北市:教育部,初版,2007。
[21] X. Huang, A. Acero, H. W. Hon, Spoken Language Processing, International Editions, 2001
[22] Sami Lemmetty, Review of Speech Synthesis Technology, Department of Electrical and Communications Engineering, Helsinki University of Technology, March 1999.
[23] K. Tokuda, H. Zen, A.W. Black, An HMM-based speech synthesis system applied to English, Proc. of 2002 IEEE SSW, Sept. 2002.
[24] Klatt D. Software for a Cascade/Parallel Formant Synthesizer, Journal of the Acoustical Society of America, JASA, Vol. 67: 971-995. 1980.
[25] Charpentier F., Moulines E. Pitch-Synchronous Waveform Prosessing Techniques for Text-to-Speech Synthesis Using Diphones, Proceedings of Eurospeech 89 (2): 13-19. 1989.
[26] 李琳山,國語語音輸入法及國語聽寫機,中華民國專利號碼 NI-40, 173, Jul. 1990。
[27] S. H. Hwang and S. H. Chen, A Neural Network Synthesizer of Pause Duration for Mandarin Test-to-Speech, Electronics Letters, Vol. 28, pp. 720-721, Apr. 1992.
[28] S. H. Chen, S. M. Lee, and S. Chang, A Statistical Model Based Fundamental Frequency Synthesizer for Mandarin Speech, J. Acoust. Soc. Am., Vol. 92(1), pp. 114-120, July 1992.
[29] Chi-Chun Hsia, Chung-Hsien Wu, and Jian-Qi Wu, Conversion Function Clustering and Selection for Expressive Voice Conversion, Proceedings of ICASSP2007, Honolulu, Hawaii, USA, 2007.
[30] 陳信希,臺灣本土語言互譯及語音合成系統,國立臺灣大學資訊工程學研究所 http://nlg3.csie.ntu.edu.tw/systems/TWLLMT,1996。
[31] 林川傑,國語-閩南語機器翻譯系統之研究,國立臺灣大學資訊工程學研究所,1996。
[32] 楊鈺清,台語文句翻語音系統之製作,國立交通大學電信工程研究所,1998。
[33] Ren-Yuan Lyu, Zhen-Hong Fu, Yuang-Chin Chiang, Hui-Mei Liu, A Taiwanese (Min-nan) Text-to-Speech (TTS) System Based on Automatically Generated Synthetic Units, ICSLP2000, Jun 2000.
[34] Min-Siong Liang, Rhuei-Cheng Yang, Yuang-Chin Chiang, Dau-Cheng Lyu, Ren-Yuan Lyu, A Taiwanese Text-to-Speech System with Applications to Language Learning, ICALT2004, 2004.
[35] 佘永吉、鍾高基、吳宗憲,臺語多聲調音節合成單元資料庫暨文字轉語音雛形系統之發展,第十二屆計算語言學研討會,pp.15-35,1999。
[36] Chao Huang, Yu Shi, Jianlai Zhou, Min Chu, Wang Terry, Chang Eric, Segmental Tonal Modeling for Phone Set Design in Mandarin LVCSR, IEEE Trans. Acoustics, Speech, and Signal Processing, I- 901-4 vol. 2004.
[37] Wikipedia, the free encyclopedia, Phonetic transcription, At http://en.wikipedia.org/wiki/Phonetic_transcription, 2009.
[38] S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey,V. Valtchev, P. Woodland, The HTK Book (for HTK Version 3.3), Cambridge University, 2005.
[39] 陳順宇、鄭碧娥,統計學,華泰書局,三版,1998。
[40] 盧誌明、藍守仁,接受器運作指標(ROC)曲線,台灣醫界,第四十卷,第一期 pp.33-35,1997。
[41] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi and T. Kitamura, Simultaneous Modeling of Spectrum, Pitch and Duration in HMMBased Speech Synthesis, Proc. of EUROSPEECH, vol.5, pp.2347– 2350, 1999.
[42] HTS working group, HMM-based Speech Synthesis System (HTS), http://hts.sp.nitech.ac.jp, 2009.
[43] K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, T. Kitamura, Speech parameter generation algorithms for HMM-based speech synthesis, Proc. of ICASSP, pp.1315-1318, June 2000.
[44] 王小川,語音訊號處理,全華,二版,2007。
[45] X. Aubert et al. Large vocabulary continuous speech recognition of Wall Street Journal data, In Proc. ICASSP94 Vol. II, pp. 129-132, Adelaide, 1994.
[46] Ch. Dugast et al. The Philips large-vocabulary recognition system for American English, French and German. In Proc. EUROSPEECH, pp. 197-200, Madrid, 1995.
[47] J. Wells, Computer Coding the IPA: a proposed extension of SAMPA, At http://www.phon.ucl.ac.uk/home/sampa/x-sampa.htm, 1995.