簡易檢索 / 詳目顯示

研究生: 吳健綺
Wu, Jian-Qi
論文名稱: 應用轉換函式之歸群與選取於情緒語音合成之研究
A Study on Emotional Speech Synthesis via Conversion Function Clustering and Selection
指導教授: 吳宗憲
Wu, Chung-Hsien
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2006
畢業學年度: 94
語文別: 中文
論文頁數: 55
中文關鍵詞: 語音合成聲音轉換轉換函式
外文關鍵詞: speech synthesis, emotion converison, conversion function
相關次數: 點閱:107下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •  「語音將是未來世界的主流」,電腦科技能融入一般人的日常生活中,語音是最重要的指標技術。雖然電腦語音合成的發展已達至情緒合成語音的階段,但大量聲音語料的需求限制了語音科技的實際應用;透過聲音轉換的技術,可降低語料的需求,然而對每一轉換模型僅以單一轉換函式來描述是不足的,因此藉由多轉換函式達至更佳的聲音轉換音質是本研究的主要目的。
     在本論文中,對於應用多轉換函式於電腦情緒語音轉換的問題,分為下列四項研究重點:1)根據不同情緒,設計小量的平衡語料腳本,並錄製平行語料;2)提出轉換函式的聲學與語言學相似度估算方法;3)導入K-means演算法進行轉換函數歸群;4)整合中性情緒的文字轉語音系統,進行電腦情緒語音之合成。
     在實驗中,首先評估各語音參數對情緒特性的重要性;接著比較不同多函式轉換模型的效能,先以預測誤差進行客觀評估,再以MOS進行主觀測試,最後透過統計檢定的方式進行驗證。本論文提出之方法,確實可在小量語料的限制之下,在主客觀的評估上得到較好的表現。

     Speech technology is the key for the development of computer science in next generation. A text-to-speech (TTS) synthesis system that can express emotion can be an effective communication tool for users. However, the requirement of large size of speech database obstructs the development and application of such a system.
     In this thesis, a conversion function clustering and selection method was proposed for text to emotional speech synthesis. More specially, this study focuses on: 1) designing balanced small-sized emotional parallel speech database, 2) proposing a similarity measure between functions on both acoustic and linguistic features, 3) adopting K-means algorithm to cluster functions, finally, integrating the emotional speech conversion system as a post-processor for emotional speech synthesis.
     Several experiments with statistical hypothesis testing were conducted to evaluate the quality of converted speech as perceived by human subjects. Compared with previous method, the proposed method exhibits encouraging potential in expressive speech synthesis.

    中文摘要 IV Abstract V 致謝 VI 圖目錄 IX 表目錄 X 第一章 序論 1 1.1. 前言 1 1.1.1. 研究動機與目的 1 1.1.2. 研究背景之現況 3 1.2. 文獻回顧 5 研究方法簡介 7 1.2.1. 系統架構 7 1.2.2. 平行的情緒平衡語料 8 1.3. 章節概要 9 第二章 統計式聲音轉換模型與比較 11 2.1 統計式聲音轉換模型 13 2.1.1 聯合常態分佈與轉換函式 13 2.1.2 高斯混和模型 15 2.1.3 隱藏式馬可夫模型 16 2.1.4 分類迴歸樹 18 聲音轉換模型之比較 20 第三章 轉換模型之歸群與選取 23 3.1 模型歸群演算法 25 3.2 聲學特性相似度估算 27 3.3 語言學特性相似度估算 28 3.4 轉換函式之選取 30 第四章 情緒平行語料之收集 31 4.1 收集方式 32 4.2 語料統計量 34 4.3 語料特性 36 第五章 實驗結果與討論 39 5.1 客觀性評估 39 5.1.1 Inside Test 40 5.1.2 Outside Test 45 5.2 主觀性評估 47 5.2.1 清晰度之評量 47 5.2.2 情緒表現度之評量 48 第六章 結論與未來展望 50 參考文獻 51 附錄 53 作者簡歷 55

    [Abe, 1988] M. Abe, S. Nakamura, K. Shikano and H. Kuwabara, “Voice
    conversion through vector quantization,” in Proc. of ICASP, New York, NY, USA, pp. 655-658, Apr. 1988.
    [Chen, 2003] Y. Chen, M. Chu, E. Chang, J. Liu and R. Liu, “Voice
    conversion with smoothed GMM and MAP adaptation,” in Proc. of EUROSPEECH, pp. 2413-2416, Geneva, Switzerland, Sep. 2003.
    [Duzans, 2003] H. Duzans and A. Bonafonte, “Estimation of GMM in
    voice conversion including unaligned data,” in Proc. of EUROSPEECH, pp. 861-864, Geneva, Switzerland, Sep. 2003.
    [Duxans, 2004] H. Duxans, A. Bonafonte, A. Kain and J. van Santen,
    “Including dynamic information in voice conversion systems,” in Proc. of SEPLN, Barcelona, Spain, July 2004.
    [Duxans, 2004] H. Duxans, A. Bonafonte, A. Kain and J. van Santen,
    “Including dynamic and phonetic information in voice conversion systems,” in Proc. of ICSLP, pp. 1193-1196, Jeju Island, South Korea, Oct. 2004.
    [Gillett, 2003] B. Gillett and S. King, “Transforming voice quality,” in
    Proc. of EUROSPEECH, pp. 1713-1716, Geneva, Switzerland, Sep. 2003.
    [Goldberger, 2005] J. Goldberger and H. Aronowitz, “A Distance
    Measure Between GMMs Based on the Unsented Transform and its Application to Speaker Recognition,” in Proc. of EUROSPEECH 2005, pp. 1985-1988, Lisbon, Portugal, 2005.
    [Iida, 2003] A. Iida, N. Campbell, F. Higuchi and M. Yasumura, “A
    corpus-based speech synthesis with emotion,” Speech
    Communication, 40(1-2): 161-187, 2003.
    [Kain, 1998] A. Kain and M. W. Macon, “Spectral voice conversion for
    Text-to-Speech Synthesis,” in Proc. of ICASSP, vol. 1, pp. 285-288, Seattle, Washington, USA, May 1998.
    [Kawahara, 1997] H. Kawahara, “Speech representation and
    transformation using adaptive interpolation of weighted spectrum: vocoder revisited,” in Proc. of ICASSP, vol. 2, pp. 1303-1306, Munich, Germany, Apr. 1997.
    [Kawahara, 1999] H. Kawahara, I. Masuda-Katsuse and A. de Cheveigné,
    “Restructuring speech representations using a pitch adaptive time-frequency-based F0 extraction: possible role of a repetitive structure in sounds,” Speech Communication, 27(3-4): 187-207, Apr. 1999.
    [Orphanidou, 2004] Orphanidou, C., Moroz, I.M. and Roberts, S.J.
    “Wavelet-based voice morphing,” WSEAS Journal on Systems, 10 (3): 3297-3302, 2004.
    [Pfizinger, 2004] H. R. Pfizinger, “DFW-based spectral smoothing for
    concatenative speech synthesis,” in Proc. of ICSLP, pp. 1397-1400, Jeju Island, South Korea, Oct. 2004.
    [Stylianou, 1998] Y. Stylianou, O. Cappé and E. Moulines, “Continuous
    probabilistic transform for voice conversion,” IEEE Trans. on Speech and Audio Processing, vol. 6, no. 2, pp. 131-142, Mar. 1998.
    [Schröder, 2001] M. Schröder, “Emotional speech synthesis – A review”,
    in Proc. of EUROSPEECH, vol. 1, pp. 561-564, Aalborg, Denmark, Sep. 2001.
    [陳俊甫, 2004] 陳俊甫, 應用機率式句法結構與隱含式語義索引於情
    緒語音合成之單元選取, 國立成功大學資訊工程研究所碩士論文, 2004.

    下載圖示 校內:立即公開
    校外:2006-08-25公開
    QR CODE