| 研究生: |
林玫君 Lin, Mai-Chun |
|---|---|
| 論文名稱: |
應用迴歸式歸群於階層式韻律轉換之研究 Hierarchical Prosody Conversion by Regression based Clustering |
| 指導教授: |
吳宗憲
Wu, Chung-Hsien |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2007 |
| 畢業學年度: | 95 |
| 語文別: | 中文 |
| 論文頁數: | 48 |
| 中文關鍵詞: | 韻律轉換 |
| 外文關鍵詞: | prosody conversion |
| 相關次數: | 點閱:77 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著人機介面的重視與發展,電腦語音科技將是一項重要的指標。而目前聲音轉換的技術,雖然可以解決Corpus-based TTS所限制的大量語料需求,但現階段的轉換大多是針對頻譜,且轉換出的語音與目標語音,仍有相當大的差異。因此,本研究針對韻律部分,提出迴歸式歸群的階層式韻律轉換,以期能經由韻律的轉換後,使聲音韻律更接近目標語句的句調走勢。
在本論文中,對於應用迴歸式歸群於階層式的韻律轉換,分為下列四項研究重點:1)提出階層式結構的韻律模型;2)導入迴歸式歸群演算法進行轉換函數歸群;3)利用分類迴歸樹建立函式挑選模型;4)根據不同情緒,設計小量平衡語料之收錄。
在實驗中,針對小量平衡的情緒語料進行測試,測試結果顯示本研究所提出的階層式迴歸式歸群方法,確實能進一步的改善韻律效果。並且,證實了韻律轉換的確存在著潛在的影響效應,有其發展的潛能和改善的空間。
For the development of human machine interaction, speech technology is the key issue in next generation. Voice conversion (VC) technology, which converts spectral and prosodic features of neutral speech to expressive speech, has been adopted to reduce the requirement of large speech database for text-to-speech (TTS) system. Although spectral features are indispensable in speech expression, prosodic features characterize the main expression in emotional speech.
The purpose of this study is to develop a hierarchical prosody conversion method for Mandarin speech. More specifically, this research is aimed 1) to establish a hierarchical prosody model; 2) to construct a set of conversion functions using regression based clustering; 3) to select appropriate conversion functions by means of classification and regression tree (CART); and 4) to design a balanced small-sized emotional parallel speech databases.
A set of phonetically balanced small-sized emotional parallel speech databases was designed and accumulated to construct the conversion functions and CART model. Experiments with hypothesis testing were conducted to evaluate the performance of proposed method. The results show the proposed method exhibits encouraging potential in emotional voice conversion.
[1] A. Kain and Y. Stylianou “Stochastic modeling of spectral adjustment forhight quality pitch modification” in Proc. of IEEE ICASSP’00, pp.949-952.
[2] C. Gao-Peng, Gerard Bailly, L.Qing-Feng, and W. Ren-Hua “ASuperposed prosodic model for Chinese TTS synthesis” in Proc. ofISCSLP’04, pp. 177-180
[3] C. Huang, Y. shi, J. Zhou, M. Chu, T. Wang and E. Chang “Segmentaltonal modeling for phone set design in Mandarin LVCSR” in Proc. ofICASSP’04, pp.901-904.Vol.1
[4] Chen, S. H. and Y. R. Wang, “Vector quantization of pitch information inMandarin speech” in IEEE Trans. on Communications, Vol. 38, No. 9, pp. 1317-1320, 1990.
[5] Greg P. Kochanski and C. Shih “Stem-ML:Language-IndependentProsody Description” in Proc.of ICSLP’00. PP.239-242
[6] H. Fujisaki, C. Wang, S. Ohno, and W. Gu “Analysis and synthesis offundamental frequency contours of Standard Chinese using thecommand-response model” in Speech Communication’05, Vol. 47, pp.59-70
[7] H. Kawahara, “Speech representation and transformation using adaptiveinterpolation of weighted spectrum: vocoder revisited,” in Proc. ofICASSP, vol. 2, pp. 1303-1306, Munich, Germany, Apr. 1997.
[8] J. MA and W. LIU “Voice Conversion based on Joint Pitch and Spectral Transformation with Component Group-GMM” in Proc. of NLP-KE’05. pp. 199-203
[9] J. Tao, Y. Kang and A. Li “Prosody Conversion from Neutral Speech toEmotional Speech” IEEE Trans. on Audi, Speech, And LanguageProcessing, vol. 14, no. 4, July. 2006
[10] M. Abe, S. Nakamura, K. Shikano and H. Kuwabara, “Voice conversionthrough vector quantization,” in Proc. of ICASP, New York, NY, USA, pp. 655-658, Apr. 1988.
[11] O. Turk, and Levent M. Arslan, “Voice Conversion Methods for VocalTract and Pitch Contour Modification” in Porc. Of EUROSPEECH’03.pp.2845-2848
[12] Richard O. Duda, Peter E. Hart, and David G. Stork, “PatternClassification” WILEY-INTERSCIENCE 2001 Second Edition
[13] T. Ceyssens, W. Verhelst and P. Wambacq “A Strategy For PitchConversion And Its Evaluation ” in Proc. of SPS’02, pp. 65-68
[14] T. Ceyssens, W. Verhelst and P. Wambacq “On the consturction of a pitchconversion system” , Proceedings of EUSIPCO’02, pp.1301-1304
[15] T. En-Najjary, O. Rosec and T. Chonavel “A new method for pitchprediction from spectral envelope and its application in voice conversion”in Proc.of EUROSPEECH’03, pp. 1753-1756
[16] T. En-Najjary, O. Rosec and T. Chonavel “A Voice Conversion methodbased on joint pitch and spectral envelope transformation” in Proc. ofICSLP ’04.
[17] Xuejing Sun “The Determination, Analysis, and Synthesis ofFundamental Frequency” Ph. D. Thesis, Northwestern University, 2002
[18] X. Huang, A. Acero, and Hsiao-Wuen Hon, “Spoken LanguageProcessing” 2005, Publisher: Prentice Hall
[19] Yi Xu, Q. Emily Wang “Pitch targets and their realization:Evidence fromMandarin Chinese” in Speech Communication’01, pp. 319-337
[20] Y. Stylianou, O. Cappe and E. Moulines, “Continuous probabilistictransform for voice conversion,” IEEE Trans. on Speech and AudioProcessing, vol. 6, no. 2, pp. 131-142, Mar. 1998
[21] 王小川”語音訊號處理” 全華出版社2005年2月初版二刷
[22] 陳順宇 “迴歸分析” 華泰書局出版 2000年7月3版
[23] 陳俊甫 “應用機率式句法結構與隱含式語意索引於情緒語音合成之研究” 2004年6月
[24] 張云濤, 龔玲, “資料探勘原理與技術” 2007年4月初版一刷