| 研究生: |
李昀璋 Li, Yun-Chang |
|---|---|
| 論文名稱: |
基於HTS語音合成系統和迴歸樹與回溯機制之頻譜係數與基頻轉換之語者轉換系統 Speaker Conversion System Based on HMM-Based Speech Synthesis System and Regression-Tree-Based MGC and F0 Conversion with Backtracking Mechanism |
| 指導教授: |
王駿發
Wang, Jhing-Fa |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2012 |
| 畢業學年度: | 100 |
| 語文別: | 英文 |
| 論文頁數: | 57 |
| 中文關鍵詞: | 語音合成 、語者轉換 |
| 外文關鍵詞: | HTS, speech synthesis, speaker adaptation, speaker conversion |
| 相關次數: | 點閱:144 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本篇論文實作了一個基於HTS和迴歸樹與回溯機制之頻譜係數與基頻轉換的語音轉換系統。在HTS中,有三個主要的語音特徵:梅爾倒頻譜係數、基頻、狀態時長。為了合出目標語者之語音,這三項主要語音特徵分別透過本篇論文提出的方法作轉換。在訓練階段,我們需要平行語料來作決策樹的訓練;而因所提出之語者轉換系統架構,目標語者之語料庫可以任意的選取。狀態時長之決策樹和梅爾倒頻譜係數與基頻之迴歸樹會依不同的機制建立完成。在合成階段,首先根據由文字分析器產生出來的文脈資訊,從決策樹中預測出目標語者的狀態時長序列,並由迴歸樹中決定最佳轉換函式;而音框參數經參數產生步驟(parameter generation process)產生便依此函式轉換為目標語者之音框參數,最後再透過MLSA濾波器將聲音合成出來。
在實驗中,我們為語者轉換之結果設計了客觀與主觀的評測方式。在客觀評測中,我們針對梅爾倒頻譜係數、基頻、狀態時長作評測。因為基頻在音框中又分為有聲或無聲,故基頻評測又分為二項。在主觀的評測中,我們使用音質與相似度MOS分數來評估轉換結果。總結來說,所提出的語者轉換系統改善了語者轉換的結果,特別是在梅爾倒頻譜係數與狀態時長的部分。
In this thesis, a new speaker conversion system is implemented using regression-tree-based MGC and F0 conversion based on HMM-based speech synthesis system (HTS, T: triple). In HTS, there are three major acoustic features in synthesis phase: MGC, F0, and duration. To synthesize target speaker’s speech, these three major features are transformed by the proposed methods respectively. In training phase, the parallel corpora are required for decision tree training, and due to the proposed architecture, the target speaker’s corpus can be arbitrarily chosen. Then, the decision tree of duration and regression trees of MGC and F0 are constructed through the proposed mechanisms respectively. In synthesis phase, according to the label sequence generated by text analyzer, the duration sequence of target speaker is predicted from duration’s decision tree at first, and the conversion functions of MGC and F0 are determined from the regression trees respectively. Next, the frame-based features MGC and F0 are generated from the parameter generation process and then converted by the conversion functions of MGC and F0. Finally, target speaker’s speech is synthesized from MLSA vocoder with those converted features.
In the experiments, objective and subjective evaluation tests are designed to compare the speaker conversion results. In objective evaluation, for MGC, F0 and duration, three types of evaluation tests are carried out. Since F0 is voiced or unvoiced in each frame, two evaluation tests are designed for F0. In subjective evaluation, two types of MOS are used to estimate the conversion results: quality and similarity. In summary, the proposed speaker conversion system has improved the conversion performance especially in MGC and duration.
[1] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis,” Proc. EUROSPEECH-99, pp. 2374-2350, Sep. 1999
[2] K. Tokuda, H.Zen, J. Yamagishi, T. Masuko, S. Sako, A. Black and, T. Nose, The HMM-Based Speech Synthesis System (HTS) Version 2.2
http://hts.sp.nitech.ac.jp/
[3] S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Y. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The Hidden Markov Model Toolkit (HTK) Version 3.4.1
http://htk.eng.cam.ac.uk/
[4] K. Tokuda, T. Masuko, N. Miyazaki, and T. Kobayashi, “Multi-space probability distribution HMM,” IEICE Trans. Inf. Syst., vol. E85-D, no. 3, pp. 455-464, Mar. 2002
[5] H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “A hidden semi-Markov model-based speech synthesis system,” IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 825-834, May 2007
[6] K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, “Speech parameter generation algorithms for HMM-based speech synthesis,” Proc. ICASSP, pp. 1315-1318, 2000
[7] S. Imai, K. Sumita, and C. Furuichi, “Mel-Log Spectrum Approximation (MLSA) Filter for Speech Synthesis,” Trans. IECE, vol. JGG-A, pp. 122-129, Feb. 1983
[8] T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, “An Adaptive Algorithm for Mel-cepstral Analysis of Speech,” Proc. ICASSP, 1992
[9] K. Shinoda and T. Watanabe, “Speaker adaptation with autonomous model complexity control by MDL principle,” Proc. ICASSP, pp. 717-720, May 1996
[10] K. Shinoda and C. Lee, “A structural Bayes approach to speaker adaptation,” IEEE Trans. Speech Audio Process., vol. 9, pp. 276-287, Mar. 2001
[11] C. Leggetter and P.Woodland, “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models,” Comput. Speech Lang., vol. 9, no. 2, pp. 171-185, 1995
[12] O. Shiohan, T. Myrvoll, and C. Lee, “Structural maximum a posteriori linear regression for fast HMM adaptation,” Comput. Speech Lang., vol. 16, no. 3, pp. 5-24, 2002
[13] V. Digalakis, D. Rtischev, and L. Neumeyer, “Speaker adaptation using constrained reestimation of Gaussian mixtures,” IEEE Trans. Speech Audio Process., vol. 3, no. 5, pp. 357-366, Sep. 1995
[14] M. Gales, “Maximum likelihood linear transformations for HMM-based speech recognition,” Comput. Speech Lang., vol. 12, no. 2, pp. 75-98, 1998
[15] Y. Nakano, M. Tachibana, J. Yamagishi, and T. Kobayashi, “Constrained Structural Maximum A Posteriori Linear Regression for Average-Voice-Based Speech Synthesis,” Proc. INTERSPEECH, pp. 2286-2289, 2006
[16] J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, “Analysis of Speaker Adaptation Algorithms for HMM-based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm,” IEEE Audio, Speech, & Language Processing, vol.17, pp. 66-83, Jan. 2009
[17] Yu-Ting Chao and Chung-Hsien Wu, “Frame-Based Alignment and Adaptive CRF for Personalized Spectral and Prosody Conversion,” Taiwan National Cheng Kung University Institute of Computer Science and Information Engineering, July 2010
[18] J. Lafferty, A. McCallum, and F. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282-289, June 28-July 01, 2001
[19] Edsger Dijkstra, “Dijkstra’s algorithm,” from Wikipedia
http://en.wikipedia.org/wiki/Dijkstra's_algorithm