簡易檢索 / 詳目顯示

研究生: 李昱玟
Lee, Yu-Wen
論文名稱: 基於源濾波模型使用機器學習與高斯混合模型合成管樂器
Source-filter-based model for wind instruments synthesis with BiLSTM and GMM
指導教授: 蘇文鈺
Su, Wen-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 醫學資訊研究所
Institute of Medical Informatics
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 37
中文關鍵詞: 高斯混合模型激發源與後製濾波器建模法長短期記憶模型離散餘弦變換長笛
外文關鍵詞: BiLSTM, Gaussian Mixture Model, Source-Filter Model, DCT, Flute
相關次數: 點閱:70下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在一般的木管樂器中吹氣聲(或稱氣音)會使聲音聽起來更加生動多變,但對 於合成管樂器來說添加自然的吹氣聲合成是一件不容易的事情。在先前的研究中有 提出一種方法是利用源與後製濾波器建模法(Source-Filter Model)來還原弓弦樂器 的聲音,利用長短期神經網路(BiLSTM)與經由離散餘弦變換(DCT)後的自組織微 粒波表來進行預測。然而該研究中對於預測高頻段的準確率還有待進步;同時也難 以精準的呈現長笛此類吹管樂器所發出的吹氣聲。
    在本論文中提出一個新的方法來降低高頻的誤差值,我們將一段聲音的高低 頻轉 DCT 後再分割出來討論。低頻的部分會再區分 ADSR 並分別利用 BiLSTM 預測學習,而高頻的部分則是取得平滑處理後的Envelope,利用高斯混合模型 (Gaussian Mixture Model)去逼近並紀錄其參數。
    在本文中,我們使用了 RWC 數據庫中的長笛音色來做實驗與比較與先前研究 的差異性,實驗結果證明我們的研究在高頻的部分表現較佳,對於一些氣音的呈現 也更加突出明顯,同時因為 GMM僅需儲存少量的參數,在記憶體的使用空間上也 做到顯著的下降。

    One of the essential elements of a woodwind instrument playing is breath. Computer synthesis woodwind sounds may usually sound dull due to lack of expression. Another factor for sounding realistic is the breath noise. In this work, a source filter model incorporating the BiLSTM and the Gaussian Mixture Model (GMM) for the synthesis of woodwind instruments is presented. Magnitude-modulated and pitch-synchronous impulse signals are the sources.
    In this paper, the filter is converted by the correspondent discrete cosine transform (DCT) coefficients, divided into the low-frequency and high-frequency parts. In the high-frequency part, DCT coefficients are modeled with the Gaussian Mixture Model (GMM). Instead of predicting the DCT coefficients directly, the time-domain digital signal is converted into spectral envelopes, which provides the GMM parameters to approximate the frames. The low-frequency part is divided into four sections in the training stage: ADSR. Forward-backward BiLSTM predictors are trained to predict the low-frequency DCT coefficients. The proposed method can synthesize realistic and expressive tones and breath noise as well when compared to the conventional Digital Waveguide Filter-based method.

    中文摘要 i Abstract ii 誌謝 iv Contents v List of Tables vii List of Figures viii 1 Introduction 1 2 Related Works 3 2.1 Digital Waveguide 3 2.2 Source Filter Model Synthesis 4 2.3 Gaussian Mixture Model 6 2.4 Discrete Cosine Transform 6 2.5 Recurrent Neural Networks 7 2.6 Synthesis using Bidirectional LSTM Networks 9 3 Method 13 3.1 The Analysis Stage 15 3.1.1 Low-Frequency Part 15 3.1.2 High-Frequency Part 16 3.2 The Training Stage 18 3.3 The Synthesis Stage 20 4 Experiments and Results 21 4.1 Experiments Setup 21 4.2 Comparison 21 4.2.1 Comparison of different training methods 22 4.2.2 Comparison of different prediction strategy 23 4.2.3 FFT comparison 24 4.2.4 Sound Samples 25 5 Conclusions and FutureWorks 31 5.1 Conclusions 31 5.2 Future Works 32 References 33

    [1] Sølvi Ystad and Thierry Voinier. A virtually real flute. Computer Music Journal, 25(2):13–24, 2001.
    [2] Jussi Pekonen. Computationally efficient music synthesis–methods and sound design. 2007.
    [3] Ju-Yen Chen, Hung-Chih Yang, and Wen-Yu Su. Synthesis of wind instruments and their blowing noise using a lstm time varying source filter model. In Audio Engineering Society Convention 150. Audio Engineering Society, 2021.
    [4] Yi-Ren Dai, Hung-Chih Yang, and Alvin WY Su. Efficient synthesis of violin sounds using a bilstm network based source filter model. In Audio Engineering Society Convention 150. Audio Engineering Society, 2021.
    [5] Julius O Smith. Principles of digital waveguide models of musical instruments. In Applications of digital signal processing to audio and acoustics, pages 417–466. Springer, 2002.
    [6] Julius O Smith III. Efficient synthesis of stringed musical instruments. 1993.
    [7] Matti Karjalainen, Vesa Va ̈lima ̈ki, and Tero Tolonen. Plucked-string models: From the karplus-strong algorithm to digital waveguides and beyond. Computer Music Journal, 22(3):17–32, 1998.
    [8] Aa ̈ron Van Den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew W Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. SSW, 125:2, 2016.
    [9] Sheng-Fu Liang, Alvin WY Su, and Chin-Teng Lin. Model-based synthesis of plucked string instruments by using a class of scattering recurrent networks. IEEE transactions on neural networks, 11(1):171–185, 2000.
    [10] Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Mohammad Norouzi, Douglas Eck, and Karen Simonyan. Neural audio synthesis of musi- cal notes with wavenet autoencoders. In International Conference on Machine Learning, pages 1068–1077. PMLR, 2017.
    [11] Sicong Huang, Qiyang Li, Cem Anil, Xuchan Bao, Sageev Oore, and Roger B Grosse. Timbretron: A wavenet (cyclegan (cqt (audio))) pipeline for musical timbre transfer. arXiv preprint arXiv:1811.09620, 2018.
    [12] Vesa V ̈alima ̈ki. Physics-based modeling of musical instruments. Acta Acustica united with Acustica, 90(4):611–617, 2004.
    [13] Douglas H Keefe. Physical modeling of wind instruments. Computer Music Jour- nal, 16(4):57–73, 1992.
    [14] Nicholas Giordano and M Jiang. Physical modeling of the piano. EURASIP Journal on Advances in Signal Processing, 2004(7):1–8, 2004.
    [15] Julius O Smith. Physical modeling using digital waveguides. Computer music journal, 16(4):74–91, 1992.
    [16] Julius O Smith. Physical modeling synthesis update. Computer Music Journal, 20(2):44–56, 1996.
    [17] Alvin WY Su and Liang San-Fu. Synthesis of plucked-string tones by physical modeling with recurrent neural networks. In Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing, pages 71–76. IEEE, 1997.
    [18] Mikael Laurson, Cumhur Erkut, Vesa Va ̈lim ̈aki, and Mika Kuuskankare. Meth- ods for modeling realistic playing in acoustic guitar synthesis. Computer Music Journal, 25(3):38–49, 2001.
    [19] Heidi-Maria Lehtonen, Henri Penttinen, Jukka Rauhala, and Vesa Va ̈lim ̈aki. Anal- ysis and modeling of piano sustain-pedal effects. The Journal of the Acoustical Society of America, 122(3):1787–1797, 2007.
    [20] Daniel Arfib, Florian Keiler, and Udo Zo ̈lzer. Source-filter processing. DAFX: Digital Audio Effects, pages 299–372, 2002.
    [21] Xin Wang, Shinji Takaki, and Junichi Yamagishi. Neural source-filter waveform models for statistical parametric speech synthesis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:402–415, 2019.
    [22] Michael Stark, Michael Wohlmayr, and Franz Pernkopf. Source–filter-based single- channel speech separation using pitch information. IEEE Transactions on Audio, Speech, and Language Processing, 19(2):242–255, 2010.
    [23] Anssi Klapuri. Analysis of musical instrument sounds by source-filter-decay model. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, volume 1, pages I–53. IEEE, 2007.
    [24] Marcelo Caetano and Xavier Rodet. A source-filter model for musical instru- ment sound transformation. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 137–140. IEEE, 2012.
    [25] Hung-Chih Yang, Yiju Lin, and Alvin Su. A novel source filter model using lstm/k-means machine learning methods for the synthesis of bowed-string musical instruments. In Audio Engineering Society Convention 148. Audio Engineering Society, 2020.
    [26] Wei-Lun Chang, Yi-Song Siao, and Alvin WY Su. Analysis and transynthesis of solo erhu recordings using addtivie/subtractive synthesis. In Audio Engineering Society Convention 120. Audio Engineering Society, 2006.
    [27] Douglas A Reynolds. Gaussian mixture models. Encyclopedia of biometrics, 741(659-663), 2009.
    [28] Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1):1–22, 1977.
    [29] N. Ahmed, T. Natarajan, and K. R. Rao. Discrete cosine transform. IEEE Trans- actions on Computers, C-23(1):90–93, 1974.
    [30] Pavel Filonov, Andrey Lavrentyev, and Artem Vorontsov. Multivariate industrial time series with cyber-attack simulation: Fault detection using an lstm-based predictive data model, 2016.
    [31] Felix A. Gers, Douglas Eck, and Ju ̈rgen Schmidhuber. Applying lstm to time series predictable through time-window approaches. In Roberto Tagliaferri and Maria Marinaro, editors, Neural Nets WIRN Vietri-01, pages 193–200, London, 2002. Springer London.
    [32] Sepp Hochreiter and Ju ̈rgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
    [33] Alex Graves, Santiago Ferna ́ndez, and Ju ̈rgen Schmidhuber. Bidirectional lstm networks for improved phoneme classification and recognition. In W?lodzis?law Duch, Janusz Kacprzyk, Erkki Oja, and S?lawomir Zadroz ̇ny, editors, Artificial Neural Networks: Formal Models and Their Applications – ICANN 2005, pages 799–804, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg.
    [34] Byeong Lee. A new algorithm to compute the discrete cosine transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6):1243–1245, 1984.
    [35] Masataka Goto, Hiroki Hashiguchi, Takuichi Nishimura, and Ryuichi Oka. Rwc music database: Music genre database and musical instrument sound database. 2003.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE