簡易檢索 / 詳目顯示

研究生: 戴逸任
Dai, Yi-Ren
論文名稱: 基於源濾波器模型和機器學習實現小提琴聲音的高效合成方法
Efficient Synthesis of Violin Sounds Using a BLSTM Network Based Source Filter Model
指導教授: 蘇文鈺
Su, Wen-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 34
中文關鍵詞: 長短期記憶模型激發源與後製濾波器建模法小提琴離散餘弦變換
外文關鍵詞: Source filter model, LSTM, Violin, DCT
相關次數: 點閱:151下載:50
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 弓弦樂器由於演奏技巧的動態變化以及弓弦相互作用產生類似噪音的聲音,使得合成弓弦樂器的聲音變得困難。在先前的研究中提出了一種結合了長短期網路模型(LSTM)預測器和自組織微粒波表(Granular Wavetable)的激發源與後製濾波器建模法(Source Filter Model)。然而此研究除了存在頻繁存取系統記憶體中的自組織微粒波表數據庫的缺陷之外,它在預測的準確度上也有待進步,而該模型也無法表現出小提琴不斷變化的特性所引起的細微差別。

    在本論文中使用離散餘弦變換(DCT)來表示自組織微粒波表,並提出了一種新的預測方法來減少預測誤差。同時我們分析了原始小提琴音色和相應的合成音色之間的差異,合成的聲音聽起來較為規律且乏味,因此為了模擬弓弦樂器變化的特性我們提出了一種隨著對應音高作擾動以及對DCT係數進行調整的方法,使合成的聲音更加貼近真實的聲音,而在本文中我們使用RWC數據庫當中的小提琴音來呈現我們的結果。

    The dynamic changes in playing skills and the noise-like sound generated from bow-string interaction make synthesizing bowed string instrument sounds a difficult task. In the previous research, a source filter model incorporating the long short-term memory(LSTM) predictor and the granular wavetables gives encouraging results. However, the prediction error is still large and the granular wavetable database stored in the system memory has to be accessed frequently. Furthermore, the model has not caught the nuance caused by the constantly changing characteristics of a playing violin.

    In this paper, discrete cosine transform (DCT) is used to represent the granular wavetable, and a new training strategy is proposed to reduce the predictor error. In addition, we analyze the difference between the original violin tone and the corresponding synthesized tone. The reason of which may cause the synthesis results sound regularand dull will be stated in this paper. Therefore, we propose methods of a random pitchperturbation and a DCT coefficient shaping to imitate the changing characteristics. Inthis paper, we use the violin tone in the RWC database to present the results.

    中文摘要 i Abstract ii Contents iii List of Tables v List of Figures vi 1 Introduction 1 2 Related Works 4 2.1 Source Filter Synthesis 4 2.2 Bidirectional LSTM Networks 5 2.3 Discrete Cosine Transform 7 2.4 Previous Research 8 3 Method 12 3.1 The Analysis Stage 14 3.2 The Training Stage 14 3.2.1 The prediction strategy 15 3.2.2 The Granular Wavetable 16 3.3 The Synthesis Stage 16 3.3.1 Random Pitch Perturbation 16 3.3.2 DCT coefficient Shaping 17 4 Experiments and Results 20 4.1 Experiments Setup 20 4.2 Synthesis Result 21 4.2.1 Comparison 21 4.2.2 Random Pitch Perturbation 22 4.2.3 DCT coefficient Shaping 23 4.2.4 Sound Samples 24 5 Conclusions and Future Works 28 5.1 Concousion 28 5.2 Future Works 29 References 30

    [1] Hung-Chih Yang, Yiju Lin, and Alvin Su. A novel source filter model usinglstm/k-means machine learning methods for the synthesis of bowed-string musicalinstruments. InAudio Engineering Society Convention 148. Audio EngineeringSociety, 2020.

    [2] yi-ren dai, hung-chih yang, and alvin w.y. su. efficient synthesis of violin soundsusing a bilstm network based source filter model.journal of the audio engineeringsociety, may 2021.

    [3] B Atal. High-quality speech at low bit rates: Multi-pulse and stochastically ex-cited linear predictive coders. InICASSP’86. IEEE International Conference onAcoustics, Speech, and Signal Processing, volume 11, pages 1681–1684. IEEE, 1986.

    [4] Steven C Bass and Thomas W Goeddel. The efficient digital implementation ofsubtractive music synthesis.IEEE Micro, 1(3):24–37, 1981.

    [5] Henrik Hahn, Axel R ̈obel, Juan Jos ́e Burred, and Stefan Weinzierl. Source-filtermodel for quasi-harmonic instruments. InDigital Audio Effects (DAFx), pages1–1, 2010.

    [6] Anssi Klapuri.Analysis of musical instrument sounds by source-filter-decaymodel. In2007 IEEE International Conference on Acoustics, Speech and SignalProcessing-ICASSP’07, volume 1, pages I–53. IEEE, 2007.

    [7] Pei Ching Li, Wei Chen Chang, Tien Min Wang, Ya Han Kuo, and Alvin WY Su.Source filter model for expressive gu-qin synthesis and its ios app. In16th Inter-30 national Conference on Digital Audio Effects, DAFx 2013. National University ofIreland, 2013.

    [8] Raymond V Migneco and Youngmoo E Kim. Excitation modeling and synthesis forplucked guitar tones. In2011 IEEE Workshop on Applications of Signal Processingto Audio and Acoustics (WASPAA), pages 193–196. IEEE, 2011.

    [9] James Anderson Moorer. Signal processing aspects of computer music: A survey.Proceedings of the IEEE, 65(8):1108–1137, 1977.

    [10] Xavier Serra et al. Musical sound modeling with sinusoids plus noise.Musicalsignal processing, pages 91–122, 1997.

    [11] Xavier Serra. A system for sound analysis/transformation/synthesis based on adeterministic plus stochastic decomposition. 1989.

    [12] JO Smith. Acoustic modeling using digital waveguides.Musical Signal Processing,7:221–264, 1997.

    [13] Julius O Smith. Principles of digital waveguide models of musical instruments.InApplications of digital signal processing to audio and acoustics, pages 417–466.Springer, 2002.

    [14] Julius Orion Smith.Music applications of digital waveguides.Number 39.CCRMA, Dept. of Music, Stanford University, 1987.

    [15] Wei-Lun Chang, Yi-Song Siao, and Alvin WY Su. Analysis and transynthesis ofsolo erhu recordings using addtivie/subtractive synthesis. InAudio EngineeringSociety Convention 120. Audio Engineering Society, 2006.31

    [16] X. Wang, S. Takaki, and J. Yamagishi. Neural source-filter-based waveform modelfor statistical parametric speech synthesis. InICASSP 2019 - 2019 IEEE Inter-national Conference on Acoustics, Speech and Signal Processing (ICASSP), pages5916–5920, 2019.

    [17] X. Wang, S. Takaki, and J. Yamagishi. Neural source-filter waveform modelsfor statistical parametric speech synthesis.IEEE/ACM Transactions on Audio,Speech, and Language Processing, 28:402–415, 2020.

    [18] Xu Milner, Ben / Shao. Speech reconstruction from mel-frequency cepstral coef-ficients using a source-filter model. 2002.

    [19] Vesa V ̈alim ̈aki, Jyri Pakarinen, Cumhur Erkut, and Matti Karjalainen. Discrete-time modelling of musical instruments.Reports on progress in physics, 69(1):1,2005.

    [20] sølvi ystad. sound modeling applied to flute sounds.journal of the audio engi-neering society, 48(9):810–825, september 2000.

    [21] Sheng-Fu Liang, A. W. Y. Su, and Cheng-Teng Lin. A new recurrent-network-based music synthesis method for chinese plucked-string instruments - pipa andqin. InIJCNN’99. International Joint Conference on Neural Networks. Proceed-ings (Cat. No.99CH36339), volume 4, pages 2564–2569 vol.4, 1999.

    [22] Kiefer C. Sample-level sound synthesis with recurrent neural networks and con-ceptors.PeerJ Computer Science, 2019.

    [23] Andy Sarroff and Michael A. Casey. Musical audio synthesis using autoencodingneural nets. ISMIR, 2014.32

    [24] Wei-Chen Chang and A. W. Y. Su. A multi-channel recurrent network for synthe-sizing struck coupled-string musical instruments. InProceedings of the 12th IEEEWorkshop on Neural Networks for Signal Processing, pages 677–686, 2002.

    [25] Jean-Pierre Briot, Ga ̈etan Hadjeres, and Fran ̧cois-David Pachet. Deep learningtechniques for music generation – a survey, 2019.

    [26] Pavel Filonov, Andrey Lavrentyev, and Artem Vorontsov. Multivariate industrialtime series with cyber-attack simulation: Fault detection using an lstm-basedpredictive data model, 2016.

    [27] Felix A. Gers, Douglas Eck, and J ̈urgen Schmidhuber. Applying lstm to timeseries predictable through time-window approaches. In Roberto Tagliaferri andMaria Marinaro, editors,Neural Nets WIRN Vietri-01, pages 193–200, London,2002. Springer London.

    [28] A. W. Y. Su and Liang San-Fu. Synthesis of plucked-string tones by physicalmodeling with recurrent neural networks. InProceedings of First Signal ProcessingSociety Workshop on Multimedia Signal Processing, pages 71–76, 1997.

    [29] Sheng-Fu Liang, A. W. Y. Su, and Chin-Teng Lin. Model-based synthesis ofplucked string instruments by using a class of scattering recurrent networks.IEEETransactions on Neural Networks, 11(1):171–185, 2000.

    [30] Sepp Hochreiter. Recurrent neural net learning and vanishing gradient.Interna-tional Journal Of Uncertainity, 1998.

    [31] Alex Graves, Santiago Fern ́andez, and J ̈urgen Schmidhuber. Bidirectional lstmnetworks for improved phoneme classification and recognition. In W lodzis law33 Duch, Janusz Kacprzyk, Erkki Oja, and S lawomir Zadro ̇zny, editors,ArtificialNeural Networks: Formal Models and Their Applications – ICANN 2005, pages799–804, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg.

    [32] N. Ahmed, T. Natarajan, and K. R. Rao. Discrete cosine transform.IEEE Trans-actions on Computers, C-23(1):90–93, 1974.

    [33] Jussi Pekonen. Computationally efficient music synthesis–methods and sounddesign.Master of Science (Technology) thesis, TKK Helsinki University of Tech-nology, Espoo, Finland, 2007.

    [34] Sepp Hochreiter and J ̈urgen Schmidhuber. Long short-term memory.Neuralcomputation, 9(8):1735–1780, 1997.

    [35] D. Griffin, D. Deadrick, and Jae Lim. Speech synthesis from short-time fouriertransform magnitude and its application to speech processing. InICASSP ’84.IEEE International Conference on Acoustics, Speech, and Signal Processing, vol-ume 9, pages 61–64, 1984.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE