| 研究生: |
楊鴻志 Yang, Hung-Chih |
|---|---|
| 論文名稱: |
基於 Source Filter 模型和 LSTM/K-means 機器學習之弓弦樂器合成方法 A Novel Source Filter Model Using LSTM/K-means Machine Learning Methods for The Synthesis of Bowed-String Musical Instruments |
| 指導教授: |
蘇文鈺
SU, WEN-YU |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 英文 |
| 論文頁數: | 47 |
| 中文關鍵詞: | 激發源與後製濾波器建模法 、長短期記憶模型 、小提琴 |
| 外文關鍵詞: | Source filter model, LSTM, Violin |
| 相關次數: | 點閱:66 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
如何合成出逼真的弓弦樂器聲音是一項艱鉅的任務,這歸因於弓弦樂器多樣化的演奏技術和不斷變化的動態特性。其中,與弓弦相互作用所產生的噪音也被視為音樂聲音中不可或缺的一部分。在西方管弦樂團中,弓弦樂
器在當作獨奏樂器時,合成效果是最不令人接受的。神經網路被應用於聲音合成已經很多年。遞歸神經網絡(RNN)被提出用於彈撥樂器的合成,但是在合成時需要大量的計算能力。本論文提出了一種激發源與後製濾波器建模法(Source Filter Model),該模型結合了長短期網路模型 (LSTM) 預測器和自組織微粒波表 (Granular Wavetable)。合成聲音盡可能地接近目標弓弦樂器的錄製音調,音色和噪音的特徵也都保存良好。儘管在分析/訓練階段要花費很多計算能力才能生成預測器的所有參數和微粒波表,但在合成處理中計算效率很高。音高和動態的變化也可以很容易地即時實現。在本論文中,我們使用RWC數據庫中的小提琴音來呈現我們的結果。
Synthesis of realistic bowed-string instrument sound is a difficult task due to the diversified playing techniques and the ever-changing dynamics which cause rapidly varying characteristics. The noise part closely related to the dynamic bow-string interaction is also regarded as an indispensable part of the musical sound. Among musical instruments in a western orchestra, computer synthesized bowed-string instruments are considered most unsatisfactory especially when used as solo instruments. Neural networks have been applied to sound synthesis for years. Recurrent neural network (RNN) was proposed to the synthesis of
plucked-string instruments but it required lots of computing power when synthesizing. In this paper, a source filter synthesis model combined with a Long-Short-Term-Memory (LSTM) RNN predictor and a self-organized granular wavetable is proposed. The synthesis sound can be close to the recorded tones of a target bowed-string instrument. The timbre and the noise are both well preserved. Though it may take lots of computing power in the analysis/training stage to generate all the parameters of the predictor and the granular wavetable, it is computationally efficient in the synthesis processing. Changes of pitch and dynamics can be easily achieved in real time, too. In this paper, we use the violin tones in the RWC database to show our results.
[1] Violin. https://en.wikipedia.org/wiki/Violin.
[2] Uwe Andresen. A new way in sound synthesis. In Audio Engineering Society Conven-
tion 62. Audio Engineering Society, 1979.
[3] B Atal. High-quality speech at low bit rates: Multi-pulse and stochastically excited
linear predictive coders. In ICASSP’86. IEEE International Conference on Acoustics,
Speech, and Signal Processing, volume 11, pages 1681–1684. IEEE, 1986.
[4] S Bass and T Goeddel. The efficient digital implementation of subtractive music syn-
thesis. IEEE Micro, (3):24–37, 1981.
[5] John M Chowning. The synthesis of complex audio spectra by means of frequency
modulation. Journal of the audio engineering society, 21(7):526–534, 1973.
[6] A Cochocki and Rolf Unbehauen. Neural networks for optimization and signal pro-
cessing. John Wiley & Sons, Inc., 1993.
[7] Lothar Cremer. The physics of the violin. 1984.
[8] Carlo Drioli and Davide Rocchesso. A generalized musicaltone generator with appli-
cation to sound compression and synthesis. In 1997 IEEE International Conference on
Acoustics, Speech, and Signal Processing, volume 1, pages 431–434. IEEE, 1997.
[9] Pavel Filonov, Andrey Lavrentyev, and Artem Vorontsov. Multivariate industrial time
series with cyber-attack simulation: Fault detection using an lstm-based predictive data
model. arXiv preprint arXiv:1612.06676, 2016.
[10] Felix A Gers, Douglas Eck, and Jürgen Schmidhuber. Applying lstm to time series
predictable through timewindow approaches. In Neural Nets WIRN Vietri01, pages
193–200. Springer, 2002.
[11] Masataka Goto, Hiroki Hashiguchi, Takuichi Nishimura, and Ryuichi Oka. Rwc music
database: Music genre database and musical instrument sound database. 2003.
[12] Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. Speech recognition with
deep recurrent neural networks. In 2013 IEEE international conference on acoustics,
speech and signal processing, pages 6645–6649. IEEE, 2013.
[13] Henrik Hahn, Axel Röbel, Juan José Burred, and Stefan Weinzierl. Source-filter model
for quasi-harmonic instruments. 2010.
[14] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computa-
tion, 9(8):1735–1780, 1997.
[15] Anssi Klapuri. Analysis of musical instrument sounds by source-filter-decay model.
In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-
ICASSP’07, volume 1, pages I–53. IEEE, 2007.
[16] Pei Ching Li, Wei Chen Chang, Tien Min Wang, Ya Han Kuo, and WenYu Su. Source
filter model for expressive gu-qin synthesis and its ios app. In 16th International Con-
ference on Digital Audio Effects, DAFx 2013. National University of Ireland, 2013.
[17] Sheng-Fu Liang, Alvin WY Su, and Chin-Teng Lin. Model-based synthesis of plucked
string instruments by using a class of scattering recurrent networks. IEEE transactions
on neural networks, 11(1):171–185, 2000.
[18] Raymond V Migneco and Youngmoo E Kim. Excitation modeling and synthesis for
plucked guitar tones. In 2011 IEEE Workshop on Applications of Signal Processing to
Audio and Acoustics (WASPAA), pages 193–196. IEEE, 2011.
[19] James Anderson Moorer. Signal processing aspects of computer music: A survey. Pro-
ceedings of the IEEE, 65(8):1108–1137, 1977.
[20] Jussi Pekonen. Computationally efficient music synthesis-methods and sound design.
Master of Science (Technology) thesis, TKK Helsinki University of Technology, Espoo,
Finland, 2007.
[21] Henri Penttinen, Jyri Pakarinen, Vesa Välimäki, Mikael Laurson, Henbing Li, and Marc
Leman. Model-based sound synthesis of the guqin. The Journal of the Acoustical
Society of America, 120(6):4052–4063, 2006.
[22] Khandakar M Rashid and Joseph Louis. Times-series data augmentation and deep learn-
ing for construction equipment activity recognition. Advanced Engineering Informatics,
42:100944, 2019.
[23] Xavier Serra. A system for sound analysis/transformation/synthesis based on a deter-
ministic plus stochastic decomposition. 1989.
[24] Xavier Serra et al. Musical sound modeling with sinusoids plus noise. Musical signal
processing, pages 91–122, 1997.
[25] JO Smith. Acoustic modeling using digital waveguides. Musical Signal Processing,
7:221–264, 1997.
[26] Julius O Smith. Physical modeling using digital waveguides. Computer music journal,
16(4):74–91, 1992.
[27] Julius O Smith. Physical modeling synthesis update. Computer Music Journal,
20(2):44–56, 1996.
[28] Julius O Smith. Principles of digital waveguide models of musical instruments. In Ap-
plications of digital signal processing to audio and acoustics, pages 417–466. Springer,
2002.
[29] Julius Orion Smith. Music applications of digital waveguides. Number 39. CCRMA,
Dept. of Music, Stanford University, 1987.
[30] Alvin WY Su and Liang San-Fu. Synthesis of plucked-string tones by physical model-
ing with recurrent neural networks. In Proceedings of First Signal Processing Society
Workshop on Multimedia Signal Processing, pages 71–76. IEEE, 1997.
[31] Vesa Välimäki. Physics-based modeling of musical instruments. Acta Acustica united
with Acustica, 90(4):611–617, 2004.
[32] Vesa Välimäki, Jyri Huopaniemi, Matti Karjalainen, and Zoltán Jánosy. Physical mod-
eling of plucked string instruments with application to real-time sound synthesis. In
Audio Engineering Society Convention 98. Audio Engineering Society, 1995.
[33] Vesa Välimäki, Jyri Pakarinen, Cumhur Erkut, and Matti Karjalainen. Discrete-time
modelling of musical instruments. Reports on progress in physics, 69(1):1, 2005.
[34] Sølvi Ystad. Sound modeling applied to flute sounds. Journal of the Audio Engineering
Society, 48(9):810–825, 2000.