成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	林玫君 Lin, Mai-Chun
論文名稱：	應用迴歸式歸群於階層式韻律轉換之研究 Hierarchical Prosody Conversion by Regression based Clustering
指導教授：	吳宗憲 Wu, Chung-Hsien
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2007
畢業學年度：	95
語文別：	中文
論文頁數：	48
中文關鍵詞：	韻律轉換
外文關鍵詞：	prosody conversion
相關次數：	點閱：77 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著人機介面的重視與發展，電腦語音科技將是一項重要的指標。而目前聲音轉換的技術，雖然可以解決Corpus-based TTS所限制的大量語料需求，但現階段的轉換大多是針對頻譜，且轉換出的語音與目標語音，仍有相當大的差異。因此，本研究針對韻律部分，提出迴歸式歸群的階層式韻律轉換，以期能經由韻律的轉換後，使聲音韻律更接近目標語句的句調走勢。
在本論文中，對於應用迴歸式歸群於階層式的韻律轉換，分為下列四項研究重點：1）提出階層式結構的韻律模型；2）導入迴歸式歸群演算法進行轉換函數歸群；3）利用分類迴歸樹建立函式挑選模型；4）根據不同情緒，設計小量平衡語料之收錄。
在實驗中，針對小量平衡的情緒語料進行測試，測試結果顯示本研究所提出的階層式迴歸式歸群方法，確實能進一步的改善韻律效果。並且，證實了韻律轉換的確存在著潛在的影響效應，有其發展的潛能和改善的空間。

For the development of human machine interaction, speech technology is the key issue in next generation. Voice conversion (VC) technology, which converts spectral and prosodic features of neutral speech to expressive speech, has been adopted to reduce the requirement of large speech database for text-to-speech (TTS) system. Although spectral features are indispensable in speech expression, prosodic features characterize the main expression in emotional speech.
The purpose of this study is to develop a hierarchical prosody conversion method for Mandarin speech. More specifically, this research is aimed 1) to establish a hierarchical prosody model; 2) to construct a set of conversion functions using regression based clustering; 3) to select appropriate conversion functions by means of classification and regression tree (CART); and 4) to design a balanced small-sized emotional parallel speech databases.
A set of phonetically balanced small-sized emotional parallel speech databases was designed and accumulated to construct the conversion functions and CART model. Experiments with hypothesis testing were conducted to evaluate the performance of proposed method. The results show the proposed method exhibits encouraging potential in emotional voice conversion.

中文摘要III
AbstractIV
致謝V
目錄VI
圖目錄VIII
表目錄IX
第一章 序論- 1 -
1.1前言- 1 -
1.1.1研究動機與目的- 1 -
1.1.2文獻回顧- 3 -
1.2研究方法簡介- 4 -
1.2.1系統架構- 5 -
1.3章節概要- 7 -
第二章 階層式結構音韻模型- 8 -
2.1 階層式結構- 8 -
2.2 音高模型之建立	- 10 -
2.3 各層之音韻模型	- 12 -
2.3.1句子階層- 12 -
2.3.2詞階層- 13 -
2.3.3次音節階層- 13 -
第三章 轉換模型之歸群與選取	- 15 -
3.1 轉換模型之比較	- 15 -
3.2 句子階層和詞階層的轉換模型- 17 -
3.3 次音節階層轉換模型- 19 -
3.3.1迴歸式歸群之轉換模型- 20 -
3.3.2迴歸式歸群演算法- 21 -
3.3.3函式挑選模型 (CART)- 23 -
3.4 音長轉換模型- 25 -
第四章 情緒平行語料之收集- 26 -
3.1 收集方式- 27 -
3.2 語料統計量- 29 -
3.3 語料特性- 31 -
第四章 實驗結果與分析- 32 -
4.1 實驗語料設定- 32 -
4.2 客觀性評估- 32 -
4.2.1GMM-based與迴歸歸群之比較- 32 -
4.2.2群數與權重之設定- 36 -
4.2.3階層式的比較- 37 -
4.2.4迴歸式歸群與GMM-based韻律轉換比較- 39 -
第五章 結論- 42 -
參考文獻- 43 -
附錄- 46 -
作者簡歷- 48 -
                                    

[1] A. Kain and Y. Stylianou “Stochastic modeling of spectral adjustment forhight quality pitch modification” in Proc. of IEEE ICASSP’00, pp.949-952.
[2] C. Gao-Peng, Gerard Bailly, L.Qing-Feng, and W. Ren-Hua “ASuperposed prosodic model for Chinese TTS synthesis” in Proc. ofISCSLP’04, pp. 177-180
[3] C. Huang, Y. shi, J. Zhou, M. Chu, T. Wang and E. Chang “Segmentaltonal modeling for phone set design in Mandarin LVCSR” in Proc. ofICASSP’04, pp.901-904.Vol.1
[4] Chen, S. H. and Y. R. Wang, “Vector quantization of pitch information inMandarin speech” in IEEE Trans. on Communications, Vol. 38, No. 9, pp. 1317-1320, 1990.
[5] Greg P. Kochanski and C. Shih “Stem-ML:Language-IndependentProsody Description” in Proc.of ICSLP’00. PP.239-242
[6] H. Fujisaki, C. Wang, S. Ohno, and W. Gu “Analysis and synthesis offundamental frequency contours of Standard Chinese using thecommand-response model” in Speech Communication’05, Vol. 47, pp.59-70
[7] H. Kawahara, “Speech representation and transformation using adaptiveinterpolation of weighted spectrum: vocoder revisited,” in Proc. ofICASSP, vol. 2, pp. 1303-1306, Munich, Germany, Apr. 1997.
[8] J. MA and W. LIU “Voice Conversion based on Joint Pitch and Spectral Transformation with Component Group-GMM” in Proc. of NLP-KE’05. pp. 199-203
[9] J. Tao, Y. Kang and A. Li “Prosody Conversion from Neutral Speech toEmotional Speech” IEEE Trans. on Audi, Speech, And LanguageProcessing, vol. 14, no. 4, July. 2006
[10] M. Abe, S. Nakamura, K. Shikano and H. Kuwabara, “Voice conversionthrough vector quantization,” in Proc. of ICASP, New York, NY, USA, pp. 655-658, Apr. 1988.
[11] O. Turk, and Levent M. Arslan, “Voice Conversion Methods for VocalTract and Pitch Contour Modification” in Porc. Of EUROSPEECH’03.pp.2845-2848
[12] Richard O. Duda, Peter E. Hart, and David G. Stork, “PatternClassification” WILEY-INTERSCIENCE 2001 Second Edition
[13] T. Ceyssens, W. Verhelst and P. Wambacq “A Strategy For PitchConversion And Its Evaluation ” in Proc. of SPS’02, pp. 65-68
[14] T. Ceyssens, W. Verhelst and P. Wambacq “On the consturction of a pitchconversion system” , Proceedings of EUSIPCO’02, pp.1301-1304
[15] T. En-Najjary, O. Rosec and T. Chonavel “A new method for pitchprediction from spectral envelope and its application in voice conversion”in Proc.of EUROSPEECH’03, pp. 1753-1756
[16] T. En-Najjary, O. Rosec and T. Chonavel “A Voice Conversion methodbased on joint pitch and spectral envelope transformation” in Proc. ofICSLP ’04.
[17] Xuejing Sun “The Determination, Analysis, and Synthesis ofFundamental Frequency” Ph. D. Thesis, Northwestern University, 2002
[18] X. Huang, A. Acero, and Hsiao-Wuen Hon, “Spoken LanguageProcessing” 2005, Publisher: Prentice Hall
[19] Yi Xu, Q. Emily Wang “Pitch targets and their realization:Evidence fromMandarin Chinese” in Speech Communication’01, pp. 319-337
[20] Y. Stylianou, O. Cappe and E. Moulines, “Continuous probabilistictransform for voice conversion,” IEEE Trans. on Speech and AudioProcessing, vol. 6, no. 2, pp. 131-142, Mar. 1998
[21] 王小川”語音訊號處理” 全華出版社2005年2月初版二刷
[22] 陳順宇 “迴歸分析” 華泰書局出版 2000年7月3版
[23] 陳俊甫 “應用機率式句法結構與隱含式語意索引於情緒語音合成之研究” 2004年6月
[24] 張云濤, 龔玲, “資料探勘原理與技術” 2007年4月初版一刷

2008-08-24公開

簡易檢索 / 詳目顯示

相關論文