簡易檢索 / 詳目顯示

研究生: 黃仕偉
Huang, Shih-Wei
論文名稱: 基於小提琴技法之音樂合成
Violin Bowing Techniques Synthesis
指導教授: 蘇文鈺
Su, W. Y. Alvin
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2011
畢業學年度: 99
語文別: 英文
論文頁數: 45
中文關鍵詞: 源濾波器模型小提琴音樂合成隱藏式馬可夫模型
外文關鍵詞: source filter model, violin music synthesis, hidden markov models
相關次數: 點閱:81下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 用數位訊號處理技術合成出像是真人演奏的小提琴聲音,一直是人們的夢想。在現有的音樂合成方法中,不乏有許多方法可以合成出音色非常相似於小提琴的聲音,例如物理模擬方法(Physical Modeling)和頻譜模擬合成方法(Spectral Modeling Synthesis, SMS)等等。然而大部分的音樂合成方法,在合成時並沒有考慮每個音框(Frame)中音色前後連接的關係,這會讓合成出來的聲音顯得單調且重複地變化。
    本論文中,經由熟知的小提琴發聲原理,與分析著名的物理模型合成法(physical Modeling Synthesis),我們發現小提琴音樂合成與語音合成類似,都可用所謂的源濾波器模型(Source Filter model)來完成。
    在此,我們使用狀態分類方法(State Clustering Method)可將小提琴音樂分成各個具有不同表情的狀態(State)。為了解決音色前後連接的問題,我們套用在語音合成裡常用的具有前後連接關係的決策樹(Decision-Tree Based Context Clustering Technique),並採用隱藏式馬可夫模型(Hidden Markov Models, HMMs)來為各個狀態建模型。其次,我們將分類完的狀態資訊,套用調整參數過的基於隱藏式馬可夫模型之語音合成系統(HMM-Based Speech Synthesis System, HTS)來得出每個狀態的隱藏式馬可夫模型參數,並利用語音合成程式(hts_engine API)來合成出小提琴的聲音。
    為了將繁瑣的準備工作簡化,我們建立了分析與合成的流程。並可利用產生狀態序列工具(Create State Sequence Tool),來得到想要合成的聲音。雖然合成出來的聲音沒有經過專家的驗證,但整體來講非常近似於小提琴音樂。

    Using digital signal processing technology to generate realistic music has always been the dream of the people. Although some synthesis algorithms and models are very good in synthesizing the timbre of tones of Violin, such as Physical Modeling and Spectral Modeling Synthesis (SMS) and so on. However, most synthesis methods consider no relationship of consecutive frames. The synthesized sound will seem monotonous and repeatedly change.
    In this thesis, through the well-known violin sound principles, and analysis outstanding Physical Modeling technology, we found that violin music synthesis is similar to the speech synthesis. Both of them can be modeled by Source Filter model.
    Here, we propose a state clustering method, letting violin music be cut into various different expressions short segments which called “states”. We use decision-tree based context clustering technique, which is often used in speech analysis and synthesis, also applied Hidden Markov Models (HMMs) for each state. Furthermore, we use state clustering method result applied on modified HMM-Based Speech Synthesis System (HTS), and use hts_engine API program to synthesis violin music.
    We have established a flow of analysis and synthesis to simplify the tedious preparation. Although synthesis sounds currently pass without verification of expert violinist, the timbre is very close to the violin music.

    中文摘要 III Abstract IV 誌謝 V Contents VI List of Tables VIII List of Figures IX 1 Introduction 1 1.1 Background and Motivation 1 1.2 The Approach of This Dissertation 5 1.3 The Organization of This Dissertation 5 2 Model-Based Synthesis Methods 6 2.1 Physical Modeling of Musical Instruments 6 2.2 Speech Analysis and Synthesis 12 2.2.1 Speech Techniques 12 2.2.2 Speech Tools 20 3 State Segmentation Method 22 3.1 State Segmentation 22 3.2 State Clustering Method 24 3.3 State Re-clustering Method 25 4 Analysis and Synthesis Flow 26 4.1 System structure 26 4.2 Data Preparation 28 4.2.1 AuthoringTool for extracting fundamental frequency 28 4.2.2 ReadCue tool for generating Label files 30 4.3 HMM-based speech synthesis system parameter setting 31 4.4 Questions (QS) and Decision tree for Violin 31 4.5 Modification of hts_engineAPI 1.04 34 4.6 CreateStateSequence for synthesis Violin 34 5 Experimental Results and Discussion 36 5.1 The Violin Database 36 5.2 State Segmentation Result of Clustering Method 37 5.3 Synthesis Result 39 5.4 Tens of Star Variations 41 6 Conclusions and Future Works 42 Reference 43

    [1] Uwe Andresen, A new way in sound synthesis. in 62nd AES Convention. Brussels, Belgium. 1979.
    [2] Chowning, J., The synthesis of complex audio spectra by means of frequency modulation. Journal of the Audio Engineering Society, 1973. 21(7): p. 526-534.
    [3] Serra, X. and J. Smith, Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition. Computer Music Journal, 1990. 14(4): p. 12-24.
    [4] Smith III, J.O., Physical modeling using digital waveguides. Computer Music Journal, 1992. 16(4): p. 74-91.
    [5] Smith III, J.O., Physical modeling synthesis update. Computer Music Journal, 1996. 20(2): p. 44-56.
    [6] Smith III, J.O., Efficient synthesis of stringed musical instruments. in Proceedings of the International Computer Music Conference. 1993. Tokyo, Japan.
    [7] Maestre, E. Data-driven statistical modeling of violin bowing gesture parameter contours. in Proceedings of the International Computer Music Conference. 2009.
    [8] Liang, S.F. and A.W.Y. Su, A Generalized Model-Based Analysis/Synthesis Method for Plucked-String Instruments by Using Recurrent Neural Networks. in Proceeding of the 106th Convention of the Audio Engineering Society, 1999.
    [9] Lazzarini, V., J. Timoney, and T. Lysaght, The generation of natural-synthetic spectra by means of adaptive frequency modulation. Computer Music Journal, 2008. 32(2): p. 9-22.
    [10] Lawerence, R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2):257-286, February 1989.
    [11] Cremer, L., Physics of the Violin. ISBN 9780262031028. 1984.
    [12] Rabiner, L.R. and R.W. Schafer, Introduction to digital speech processing. Foundations and Trends in Signal Processing, 2007. 1(1): p. 1-194.
    [13] Cadoz, C. and M. Wanderley, Gesture-music. Trends in Gestural Control of Music, 2000: p. 71-94.
    [14] Zen, H., Nose, T., Yamagishi, J., Sako, S., and Tokuda, K., The HMM-based Speech Synthesis System (HTS) Version 2.0. 2007. http://hts.sp.nitech.ac.jp/
    [15] Odell, J.J., The use of context in large vocabulary speech recognition. 1995: Citeseer.
    [16] Tokuda, K., et al., Multi-space probability distribution HMM. IEICE Transactions on Information and Systems E series D, 2002. 85(3): p. 455-464.
    [17] Yoshimura, T., et al. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. 1999: Citeseer.
    [18] Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X.Y., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., and Woodland, P. The Hidden Markov Model Toolkit (HTK) Version 3.4. 2006. http://htk.eng.cam.ac.uk/
    [19] Tokuda, K., H. Zen, and A.W. Black. An HMM-based speech synthesis system applied to English. Proceedings of the IEEE 77(2):257-286, February 1989.2002: IEEE.
    [20] Smith III, J.O., Physical audio signal processing: Digital waveguide modeling of musical instruments and audio effects. Center for Computer Research in Music and Acoustics (CCRMA), Department of Music, Stanford University, Stanford, California. 94305.
    [21] Fukada, T., Tokuda, K., Kobayashi, T., and Imai, S. An adaptive algorithm for mel-cepstral analysis of speech. in Processing of ICASSP, S7.11, P.453-456, 1991.
    [22] Sagisaka, Y., et al. ATR μ-Talk Speech Synthesis System. In Processing of the International Conference on Spoken Language (ICSLP'92). Banff, Alberta, Canada. 1992.
    [23] Hunt, A.J. and A.W. Black. Unit selection in a concatenative speech synthesis system using a large speech database. icassp, vol. 1, pp.373-376, Acoustics, Speech, and Signal Processing, 1996. ICASSP-96 Vol 1. Conference Proceedings., 1996 IEEE International Conference on, 1996.
    [24] Furui, S., Speaker-independent isolated word recognition using dynamic features of speech spectrum. Acoustics, Speech and Signal Processing, IEEE Transactions on, 1986. 34(1): p. 52-59.
    [25] Yoshimura, T., et al. Duration modeling for HMM-based speech synthesis. In Processing of the International Conference on Spoken Language. Sydney, Australia. 1998.
    [26] K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, T. Kitamura, Speech parameter generation algorithms for HMM-based speech synthesis. icassp, vol. 3, pp.1315-1318, Acoustics, Speech, and Signal Processing, 2000 Vol 3. 2000 IEEE International Conference on, 2000.
    [27] Association, I.P., Handbook of the International Phonetic Association: a guide to the use of the international phonetic alphabet. 1999: Cambridge University Press.
    [28] Young, S., et al., The HTK book (for HTK version 3.4). Cambridge University Engineering Department, 2006. 2(2): p. 2.3.
    [29] Imai, T. Kobayashi, K. Tokuda, T. Masuko, K. Koishida, S. Sako, and H. Zen. 2009. Speech signal processing toolkit (SPTK), Version 3.3. http://sp-tk.sourceforge.net.
    [30] Yoshimura, T., Simultaneous modeling of phonetic and prosodic parameters, and characteristic conversion for HMM-based Text-to-Speech systems. 2001, Ph. D. Thesis, Department of Electrical and Computer Engineering, Nagoya Institute of Technology.
    [31] TOKUDA, K.Z., H., Fundamentals and recent advances in HMM-based speech synthesis.In Processing of 10th Annual Conference of the International Speech Communication Association. Brighton. 2009.
    [32] Tokuda, K., Masuko, T., Kobayashi, T. and Imai, S., Mel-generalized cepstral analysis-A Unified Approach to Speech Spectral Estimation. in Proceedings of ICSLP94. 1994.
    [33] Su, W.S., Pitch and Partial Tracking of Polyphonic Musical Signals. Master Thesis, Dept. of Computer Science and Information Engineering, NCKU, Taiwan. 2009.

    下載圖示 校內:2016-08-26公開
    校外:2016-08-26公開
    QR CODE