簡易檢索 / 詳目顯示

研究生: 楊鴻志
Yang, Hung-Chih
論文名稱: 基於 Source Filter 模型和 LSTM/K-means 機器學習之弓弦樂器合成方法
A Novel Source Filter Model Using LSTM/K-means Machine Learning Methods for The Synthesis of Bowed-String Musical Instruments
指導教授: 蘇文鈺
SU, WEN-YU
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 47
中文關鍵詞: 激發源與後製濾波器建模法長短期記憶模型小提琴
外文關鍵詞: Source filter model, LSTM, Violin
相關次數: 點閱:66下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 如何合成出逼真的弓弦樂器聲音是一項艱鉅的任務,這歸因於弓弦樂器多樣化的演奏技術和不斷變化的動態特性。其中,與弓弦相互作用所產生的噪音也被視為音樂聲音中不可或缺的一部分。在西方管弦樂團中,弓弦樂
    器在當作獨奏樂器時,合成效果是最不令人接受的。神經網路被應用於聲音合成已經很多年。遞歸神經網絡(RNN)被提出用於彈撥樂器的合成,但是在合成時需要大量的計算能力。本論文提出了一種激發源與後製濾波器建模法(Source Filter Model),該模型結合了長短期網路模型 (LSTM) 預測器和自組織微粒波表 (Granular Wavetable)。合成聲音盡可能地接近目標弓弦樂器的錄製音調,音色和噪音的特徵也都保存良好。儘管在分析/訓練階段要花費很多計算能力才能生成預測器的所有參數和微粒波表,但在合成處理中計算效率很高。音高和動態的變化也可以很容易地即時實現。在本論文中,我們使用RWC數據庫中的小提琴音來呈現我們的結果。

    Synthesis of realistic bowed-string instrument sound is a difficult task due to the diversified playing techniques and the ever-changing dynamics which cause rapidly varying characteristics. The noise part closely related to the dynamic bow-string interaction is also regarded as an indispensable part of the musical sound. Among musical instruments in a western orchestra, computer synthesized bowed-string instruments are considered most unsatisfactory especially when used as solo instruments. Neural networks have been applied to sound synthesis for years. Recurrent neural network (RNN) was proposed to the synthesis of
    plucked-string instruments but it required lots of computing power when synthesizing. In this paper, a source filter synthesis model combined with a Long-Short-Term-Memory (LSTM) RNN predictor and a self-organized granular wavetable is proposed. The synthesis sound can be close to the recorded tones of a target bowed-string instrument. The timbre and the noise are both well preserved. Though it may take lots of computing power in the analysis/training stage to generate all the parameters of the predictor and the granular wavetable, it is computationally efficient in the synthesis processing. Changes of pitch and dynamics can be easily achieved in real time, too. In this paper, we use the violin tones in the RWC database to show our results.

    摘要 i Abstract ii Table of Contents iii List of Tables iv List of Figures v Chapter 1. Introduction 1 Chapter 2. Related Work 4 2.1. Violin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2. Source-Filter Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3. LSTM Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Chapter 3. Method 12 3.1. The Analysis Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.1. Time Domain Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . .14 3.1.2. Frequency Domain Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2. The Training Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.1. The LSTM Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.2. Amplitude Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.3. Phase Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.4. The Granular Wavetable . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3. The Synthesis Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24 Chapter 4. Experiments and Result 29 4.1. Experiments Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29 4.2. Synthesis Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2.1. Time Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31 4.2.2. Frequency Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.2.3. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .40 Chapter 5. Conclusion and Future Works 43 5.1. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.2. Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 References 45

    [1] Violin. https://en.wikipedia.org/wiki/Violin.

    [2] Uwe Andresen. A new way in sound synthesis. In Audio Engineering Society Conven-
    tion 62. Audio Engineering Society, 1979.

    [3] B Atal. High-quality speech at low bit rates: Multi-pulse and stochastically excited
    linear predictive coders. In ICASSP’86. IEEE International Conference on Acoustics,
    Speech, and Signal Processing, volume 11, pages 1681–1684. IEEE, 1986.

    [4] S Bass and T Goeddel. The efficient digital implementation of subtractive music syn-
    thesis. IEEE Micro, (3):24–37, 1981.

    [5] John M Chowning. The synthesis of complex audio spectra by means of frequency
    modulation. Journal of the audio engineering society, 21(7):526–534, 1973.

    [6] A Cochocki and Rolf Unbehauen. Neural networks for optimization and signal pro-
    cessing. John Wiley & Sons, Inc., 1993.

    [7] Lothar Cremer. The physics of the violin. 1984.

    [8] Carlo Drioli and Davide Rocchesso. A generalized musicaltone generator with appli-
    cation to sound compression and synthesis. In 1997 IEEE International Conference on
    Acoustics, Speech, and Signal Processing, volume 1, pages 431–434. IEEE, 1997.

    [9] Pavel Filonov, Andrey Lavrentyev, and Artem Vorontsov. Multivariate industrial time
    series with cyber-attack simulation: Fault detection using an lstm-based predictive data
    model. arXiv preprint arXiv:1612.06676, 2016.

    [10] Felix A Gers, Douglas Eck, and Jürgen Schmidhuber. Applying lstm to time series
    predictable through timewindow approaches. In Neural Nets WIRN Vietri01, pages
    193–200. Springer, 2002.

    [11] Masataka Goto, Hiroki Hashiguchi, Takuichi Nishimura, and Ryuichi Oka. Rwc music
    database: Music genre database and musical instrument sound database. 2003.

    [12] Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. Speech recognition with
    deep recurrent neural networks. In 2013 IEEE international conference on acoustics,
    speech and signal processing, pages 6645–6649. IEEE, 2013.

    [13] Henrik Hahn, Axel Röbel, Juan José Burred, and Stefan Weinzierl. Source-filter model
    for quasi-harmonic instruments. 2010.

    [14] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computa-
    tion, 9(8):1735–1780, 1997.

    [15] Anssi Klapuri. Analysis of musical instrument sounds by source-filter-decay model.
    In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-
    ICASSP’07, volume 1, pages I–53. IEEE, 2007.

    [16] Pei Ching Li, Wei Chen Chang, Tien Min Wang, Ya Han Kuo, and WenYu Su. Source
    filter model for expressive gu-qin synthesis and its ios app. In 16th International Con-
    ference on Digital Audio Effects, DAFx 2013. National University of Ireland, 2013.

    [17] Sheng-Fu Liang, Alvin WY Su, and Chin-Teng Lin. Model-based synthesis of plucked
    string instruments by using a class of scattering recurrent networks. IEEE transactions
    on neural networks, 11(1):171–185, 2000.

    [18] Raymond V Migneco and Youngmoo E Kim. Excitation modeling and synthesis for
    plucked guitar tones. In 2011 IEEE Workshop on Applications of Signal Processing to
    Audio and Acoustics (WASPAA), pages 193–196. IEEE, 2011.

    [19] James Anderson Moorer. Signal processing aspects of computer music: A survey. Pro-
    ceedings of the IEEE, 65(8):1108–1137, 1977.

    [20] Jussi Pekonen. Computationally efficient music synthesis-methods and sound design.
    Master of Science (Technology) thesis, TKK Helsinki University of Technology, Espoo,
    Finland, 2007.

    [21] Henri Penttinen, Jyri Pakarinen, Vesa Välimäki, Mikael Laurson, Henbing Li, and Marc
    Leman. Model-based sound synthesis of the guqin. The Journal of the Acoustical
    Society of America, 120(6):4052–4063, 2006.

    [22] Khandakar M Rashid and Joseph Louis. Times-series data augmentation and deep learn-
    ing for construction equipment activity recognition. Advanced Engineering Informatics,
    42:100944, 2019.

    [23] Xavier Serra. A system for sound analysis/transformation/synthesis based on a deter-
    ministic plus stochastic decomposition. 1989.

    [24] Xavier Serra et al. Musical sound modeling with sinusoids plus noise. Musical signal
    processing, pages 91–122, 1997.

    [25] JO Smith. Acoustic modeling using digital waveguides. Musical Signal Processing,
    7:221–264, 1997.

    [26] Julius O Smith. Physical modeling using digital waveguides. Computer music journal,
    16(4):74–91, 1992.

    [27] Julius O Smith. Physical modeling synthesis update. Computer Music Journal,
    20(2):44–56, 1996.

    [28] Julius O Smith. Principles of digital waveguide models of musical instruments. In Ap-
    plications of digital signal processing to audio and acoustics, pages 417–466. Springer,
    2002.

    [29] Julius Orion Smith. Music applications of digital waveguides. Number 39. CCRMA,
    Dept. of Music, Stanford University, 1987.

    [30] Alvin WY Su and Liang San-Fu. Synthesis of plucked-string tones by physical model-
    ing with recurrent neural networks. In Proceedings of First Signal Processing Society
    Workshop on Multimedia Signal Processing, pages 71–76. IEEE, 1997.

    [31] Vesa Välimäki. Physics-based modeling of musical instruments. Acta Acustica united
    with Acustica, 90(4):611–617, 2004.

    [32] Vesa Välimäki, Jyri Huopaniemi, Matti Karjalainen, and Zoltán Jánosy. Physical mod-
    eling of plucked string instruments with application to real-time sound synthesis. In
    Audio Engineering Society Convention 98. Audio Engineering Society, 1995.

    [33] Vesa Välimäki, Jyri Pakarinen, Cumhur Erkut, and Matti Karjalainen. Discrete-time
    modelling of musical instruments. Reports on progress in physics, 69(1):1, 2005.

    [34] Sølvi Ystad. Sound modeling applied to flute sounds. Journal of the Audio Engineering
    Society, 48(9):810–825, 2000.

    下載圖示 校內:2025-08-31公開
    校外:2025-08-31公開
    QR CODE