| 研究生: |
鄭如妘 Cheng, Ju-Yun |
|---|---|
| 論文名稱: |
合成單元與問題集之定義於隱藏式馬可夫模型中文歌聲合成系統之建立 Synthesis Unit and Question Set Definition for Mandarin HMM-based Singing Voice Synthesis |
| 指導教授: |
吳宗憲
Wu, Chung-Hsien |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 英文 |
| 論文頁數: | 55 |
| 中文關鍵詞: | 中文歌聲合成 、隱藏馬可夫模型 |
| 外文關鍵詞: | Mandarin Singing Voice Synthesis, Hidden Markov Models |
| 相關次數: | 點閱:104 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在歌聲合成系統中,歌聲的流暢度以及連續性是十分重要的。為了要讓合成的歌聲能夠有平滑且連續的特性,我們選擇使用隱藏式馬可夫模型的合成方式來建立中文 歌聲合成系統。本系統可以產生出包含任意歌詞及在特定音高範圍內之中文歌曲。本論文首先基於中文語音的覆蓋率來錄製歌聲資料庫,以做為訓練語音模型和歌聲合成使用,並且使用STRAIGHT 演算法作為我們的參數提取方法以得到更佳的合成歌聲。
本論文闡述如何建置一中文歌聲合成系統,其中最主要的是合成單元之定義以及問題集的建立,以符合中文歌聲合成系統建立之規格。除此之外,我們也增加了移調之虛擬語料,並應用後處理技術來模擬歌聲中抖音的現象,以使合成歌聲能夠更加自然。實驗結果驗證了調整問題集可以改善合成歌聲的品質並增進理解度。而移調虛擬語料和抖音後處理可以成功地讓合成歌聲的品質和自然度有所改進。
The fluency and continuity in singing voice is very important in singing voice synthesis. In order to synthesize smooth and continuous singing voice, Hidden Markov Model-based synthesis approach is employed to build our Mandarin singing voice synthesis system. The system is designed to generate Mandarin songs with arbitrary lyrics and melody in certain pitch range. We also build a singing voice database for system training and synthesis, which is designed based on the phonetic converge of Mandarin speech. In addition, feature extraction using STRAIGHT algorithm is employed to generate satisfactory vocoded singing voice.
The purpose of this thesis is to elaborate the construction of Mandarin singing voice synthesis system mainly using model definition and question set modification. In addition, we implemented two techniques, including pitch-shift pseudo data and vibrato post-processing, for the singing voice synthesis to make synthesized singing voice more natural.
The experimental results show that question set modification can improve the quality and intelligibility of synthesized singing voice, and pitch-shift pseudo data and vibrato implementation can successfully improve the quality and naturalness of synthesized singing voice.
[1] A. J. Hunt and A. W. Black, “Unit selection in a concatenative speech synthesis system using a large speech database”, in Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 373-376, 1996.
[2] H. Kenmochi, H. Ohshita, “VOCALOID-Commercial singing synthesizer based on sample concatenation”, in INTERSPEECH, pp.4009-4010., 2007.
[3] S.-S. Zhou, Q.-C. Chen, D.-D. Wang, X.-H. Yang, “A Corpus-Based Concatenative Mandarin Singing voice Synthesis System”, in Machine Learning and Cybernetics, 2008 International Conference on, vol.5, no., pp.2695,2699, 12-15 July 2008.
[4] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi and T. Kitamura, “Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis”, in EUROSPEECH, vol.5, pp.2347-2350, 1999.
[5] K. Oura, A. Mase, T. Yamada, S. Muto, Y. Nankaku, and K. Tokuda, “Recent Development of the HMM-bases Singing Voice Synthesis System-Sinsy”, in 7th ISCA Speech Synthesis Workshop, Kyoto Japan, pp.211-216, 22-24 September 2010.
[6] H.-Y. Gu and H.-L. Liao, “Mandarin Singing-voice Synthesis Using an HNM Based Scheme”, in International Congress on Image and Signal Processing (CISP), 2008.
[7] T. Saitou, M. Goto, M. Unoki, M. Akagi, “Speech-to-Singing Synthesis: Converting Speaking Voices to Singing Voices by Controlling Acoustic Features Unique to Singing Voices”, in Applications of Signal Processing to Audio and Acoustics, 2007 IEEE Workshop on, vol., no., pp.215,218, 21-24 Oct. 2007
[8] J. Li, H. Yang, W. Zhang, L. Cai, “A Lyrics to Singing Voice Synthesis System with Variable Timbre”, in Applied Informatics and Communication Communications in Computer and Information Science, volume 225, pp. 186-193, 2011.
[9] Kim, Youngmoo E, “Singing Voice Analysis/Synthesis”, Massachusetts Institute of Technology, 2003.
[10] C.-C. Hsia, C.-H. Wu, J.-Y. Wu, “Exploiting Prosody Hierarchy and Dynamic Features for Pitch Modeling and Generation in HMM-Based Speech Synthesis”, in Audio, Speech, and Language Processing, IEEE Transactions on, vol.18, no.8, pp.1994,2003, November 2010
[11] Y.-C. Huang, C.-H. Wu, S.-T. Weng, “Hierarchical prosodic pattern selection based on Fujisaki model for natural mandarin speech synthesis”, in Chinese Spoken Language Processing (ICASSP), 2012 8th International Symposium on , vol., no., pp.79,83, 5-8 December 2012
[12] C.-P. Chen, Y.-C. Huang, C.-H. Wu, K.-D. Lee, “Cross-lingual frame selection method for polyglot speech synthesis”, in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on , vol., no., pp.4521,4524, 25-30 March 2012
[13] K. Saino, H. Zen, Y. Nankaku, A. Lee, and K. Tokuda “An HMM-based Singing Voice Synthesis System”, in International Conference on Spoken Language Processing (ICSLP), pp. 1141-1144, 2006.
[14] Zen, H. Nose, T. Yamagishi, J. Sako, S, “The HMM-based Speech Synthesis System (HTS) Version 2.0”, in 6th ISCA Workshop on Speech Synthesis, Bonn Germany, 22-24 August 2007.
[15] Hideki Kawahara, “STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds”, Acoustical Science and Technology, vol. 27, no. 6, pp. 349-353, 2006
[16] A. Mase, K. Oura, Y. Nankaku, and K. Tokuda, “HMM-based Singing Voice Synthesis System Using Pitch-Shifted Pseudo Training Data”, in INTERSPPECH, 2010.
[17] K. Oura, A. Mase,Y. Nankaku, K. Tokuda, “Pitch adaptive training for hmm-based singing voice synthesis”, in Acoustics, Speech and Signal Processing, 2012 IEEE International Conference on, vol., no., pp.5377,5380, March 2012.
[18] C.Y. Lin, T.Y. Lin, J.S.R. Jang, “A Corpus-based Singing Voice Synthesis System for Mandarin Chinese”, in ACM Multimedia Conference’13, Singapore, 2005.
[19] MusicXML Definition, http://www.musicxml.com/. (accessed 14 July 2013)
[20] MuseScore software, http://musescore.org/. (accessed 14 July 2013)
[21] Finale software, http://www.finalemusic.com/. (accessed 14 July 2013)
[22] C. Huang, Y. Shi, J.-L. Zhou, M. Chu, T. Wang, and E. Chang, “Segmental Tonal Modeling for Phone Set Design in Mandarin LVCSR”, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.901-904, 2004.
[23] T. Lin, and L.-J. Wang, “Phonetic Tutorials”, Beijing University Press, pp. 103-121, 1992.
[24] 中華民國教育部國語推行委員會, “國語注音符號手冊”,中華民國教育部, 2000.
[25] Editors of the American Heritage Dictionaries, “The American Heritage Dictionary of the English Language Fourth Edition”, Houghton Mifflin Harcourt, 2000.
[26] 謝雲飛, “語音學大綱”, 臺灣學生書局, 民國63年.
[27] Udo Zölzer, “DAFX- Digital Audio Effects”, John Wiley & Sons, Chapter 3, pp. 68-69, 2002.
[28] H. Zen, K. Tokuda, T. Masuko, T. Kobayasih, T. Kitamura, “A Hidden Semi-Markov Model-Based Speech Synthesis System”, in IEICE Trans. Inf. & Sys., vol. 90D, no.5, pp. 825-834, 2007.