簡易檢索 / 詳目顯示

研究生: 王文生
Wang, Wen-Sheng
論文名稱: 頻譜參數之語音變速變調演算法及其應用
Spectral Parametric Speech Time and Pitch Scaling Algorithms and Its Application System
指導教授: 楊家輝
Yang, Jar-Ferr
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2007
畢業學年度: 95
語文別: 中文
論文頁數: 63
中文關鍵詞: 口腔模型語音速度調變語音音高調變頻譜激發訊號
外文關鍵詞: Spectral Excited Signal, Vocal Tract Model, Pitch-Scale, Time-Scale
相關次數: 點閱:78下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文主要研究在探討連續性語音變速變調的技術。我們利用語音壓縮技術之口腔發音模型與頻譜激發為基礎,找出適合訊號特性的技術,發展頻譜參數之語音變速變調演算法。最後,我們利用歌曲樂譜MIDI的資訊,找出MIDI與歌聲的相對關係,使用語音變速變調技術去調整歌唱者歌聲的音高以及節奏,最後針對歌曲音高的特性作音高修改,完成一個能自動更正走音的卡拉OK系統

    In this thesis, the main search goal is to develop a continuous speech time and pitch scaling algorithm. By using vocoder model and spectral excitation, which are used in speech coding techniques, we try to find the corresponding parameters to achieve effective speech time and pitch scaling technique. By using the melody information of MIDI signal, we can adjust pitch and rhythm of the singer by the developed speech time and pitch scaling techniques. Finally, the modification of pitch and time of the song, we complete an intelligent karaoke system, which can automatically correct the erroneous pitch and rhythm of the singer

    1. 簡介 1 1.1背景與動機 1 1.2論文大綱 2 2. 語音變換技術 3 2.1簡介 3 2.2 PICOLA 4 2.3 Phase vocoder 7 2.3.1分析階段 8 2.3.2合成階段 8 2.3.3修改階段 9 2.3.4音高的位移 10 2.3.5實驗結果 11 3. 口腔模型的語音變換技術 12 3.1線性預測編碼(Liner Prediction Coding) 12 3.2語音口腔模型的參數化 17 3.2.1頻譜封包之估計(Spectral Envelope Estimation) 19 3.2.2開迴路基週(Open-loop pitch)與精確基週(fine pitch)搜.22 3.2.3有聲/無聲(V/UV)的決策 25 3.3 參數與速度的調整 27 3.3.1時間參數內插 27 3.3.2 語音速度的調整 29 3.4合成端的參數合成 32 3.4.1產生包含一基週長度的超過取樣波形 34 3.4.2基週連續性檢查 35 3.4.3循環延伸及重新取樣運算 35 3.4.4語音的後處理 38 3.4.5有聲語音與無聲語音過渡區的平滑化 42 3.5各種演算法效能及複雜度的評估 43 4. 自動更正走音的卡拉OK系統 46 4.1MIDI格式簡介 46 4.2歌聲的特性 50 4.3更正走音的方法 55 4.3.1 音高的調整 56 4.3.2時間匹配 59 5. 結論與未來發展 61 參考文獻 62

    [1] Morita, N. and Itakura, F, “Time-Scale Modification Algorithm for Speech by
    Use of Pointer Interval Control OverLap and Add(PICOLA) and Its Evaluation.”Proc. of Annual Meeting of Acoust. Soc. Of Jpn.,Oct,1986
    [2] Dolson, Mark, “The Phase-Vocoder: a turtorial,” .Computer Music Jurnal, Val.
    10,4,pp.14-27,The MIT Press, Cambridge,MA.(1986).
    [3] Morgan, N and Gold, B, Speech and Audio Singnal Processing, Wiley,200
    [4] M. J. Ross, H. L. Shaffer, A. Cohen, R. Freudberg, and H. J. Manley, “Average
    magnitude difference function pitch extractor,” IEEE Trans. Acoust., Speech, Signal
    Processing, vol. ASSP-22, pp. 353-362, 1974.
    [5] L. Gu and R. Liu, “ High-Performance Mandarin Pitch Estimation,”Journal of
    Electronics, vol.27, pp. 8-11, 1999.
    [6] J. L. Flanagan, R. M. Golden,“ Phase Vocoder,”Bell Syst. Tech. J. 45:1493-1509,
    1966.
    [7] M. R. Portnoff,“ Time-Scale Modification of Speech Based on Short-Time Fourier
    Analysis,” IEEE Trans. Acoust. Speech Signal Process, pp. 374-390,1986.
    [8] J. Laroche, M. Dolson, “Improved phase vocoder timescale modification of
    audio,” IEEE Transactions on Speech and Audio Processing, vol. 7(3), pp. 323-332, ‘99.
    [9] J. Laroche, “Autocorrelation method for high-quality. time/pitch-scaling”, IEEE Workshop on App’s of Signal. Processing to Audio and Acoustics, pp. 131 – 134, ‘93.
    [10] R. J. McAulay and T. F. Quatieri, “Speech analysis-synthesis based on a sinusoidal representation,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-34, no.4, pp. 744-754, Aug. 1986.
    [11] T. F. Quatieri and R. J. McAulay, “Speech transformations based on a sinusoidal
    representation,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-34,
    no. 6, pp. 1449-1464, Dec. 1986.
    [12] ISO/IEC JTC 1/SC 29/WG 11 N2503-2H,1998-11-15, “Information technology Coding of audio-visual objects, Part3: Audio, Subpart 2: Speech Coding-HVXC.
    [13]黃一展,諧波偵測及估計於HVXC編解碼器之快速實現,碩士論文-國立成功學電機工程研究所,民92
    [14]陳安璿,整合MIDI 伴奏之歌唱聲合成系統,碩士論文-國立台灣科技大學資訊工程研究所,民92

    下載圖示 校內:2009-07-18公開
    校外:2009-07-18公開
    QR CODE