| 研究生: |
林嘉泰 Lin, Jia-Tai |
|---|---|
| 論文名稱: |
深度神經網路架構於小提琴演奏錄音複音音高偵測 Deep Neural Network Architectures for Polyphonic Pitch Detection in violin recordings |
| 指導教授: |
蘇文鈺
Su, Wen-Yu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2017 |
| 畢業學年度: | 105 |
| 語文別: | 英文 |
| 論文頁數: | 75 |
| 中文關鍵詞: | 深度神經網路 、小提琴雙重奏資料庫 、複音偵測 、八度頻帶 |
| 外文關鍵詞: | Deep Neural Network, Violin duos Database, Polyphonic pitch detection, Octave-bands |
| 相關次數: | 點閱:188 下載:5 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
音高偵測在音樂資訊檢索領域是一個重要的項目,舉凡樂音分離、和弦識別、自動轉譜等項目,都需要倚賴一個能精準識別音高的演算法。又近年類神經網路的興起,其在音高偵測的領域上表現卓越,但在針對絃樂的多音偵測上仍不夠完善。為此,本研究主要目的是小提琴的樂音偵測,使用三種不同架構的深度神經網路,針對多音偵測時的倍頻干擾問題進行改善。
本研究中使用RWC資料庫,小提琴的訓練資料包含基本的單音、雙音、三音與四音的組合,亦考量撥弦和抖音的表現方式,以及五種不同振幅大小的組合配置。此外,基於小提琴的音域範圍,分析三種深度神經網路最合適的參數與架構,以八度音為一頻帶作分層訓練。
三種架構間,除了層數與節點的設定不同外,最主要的差異著重於輸入層設計。在第一架構中,各層頻帶獨立構成輸入層;第二架構為第一架構之延伸,也就是各層頻帶與其二倍頻合併作為輸入層;第三架構是各層頻帶加上其前層頻帶的判定結果做為輸入層。
我們建立音高集合專屬的深度神經網路,應用在第一和第二架構中,進行音高的篩選。此外,於三個神經網路架構之後端,皆應用音高平滑化後處理。經由實驗結果,我們可知道第三架構在獨奏和雙重奏的表現皆為最佳,而第一和第二架構在獨奏的歌曲中,並無太大差異,但在小提琴雙重奏的歌曲中,第二架構的表現優於第一架構。
Multiple pitch detection, an important issue in music information retrieval (MIR), is used in many applications including note separation, chord recognition, and automatic music transcription, all of which rely on a robust pitch estimation algorithm. In recent years, the use of neural networks for polyphonic pitch detection has been studied, but there is room for improvement with respect to the pitch estimation of bowed string instruments. In this thesis, we investigate polyphonic pitch detection in violin recordings and apply three deep neural network (DNN) architectures for handling the problem of harmonic interference.
Specifically, we adopt the RWC music database for training pitch estimation models to build training datasets including single notes and two-note, three-note, and four-note chords. In addition, we consider the playing techniques pizzicato and vibrato at five different intensity levels. Based on the pitch range of violins, we analyse the suitable parameters and architectures of the DNNs using customized octave bands.
In addition to the numbers of layers and nodes, the most important differences of the three architectures are their input layers. In Architecture Ⅰ (Arch-Ⅰ), each input octave band is considered independently. Architecture Ⅱ (Arch-Ⅱ), which is an extension of Arch-Ⅰ, combines the second harmonics of the current octave band. In Architecture Ⅲ (Arch-Ⅲ), the input layer is composed of the current octave band and the estimation results of the lower octaves.
We develop another DNN of the pitch classes and apply it to Arch-Ⅰ and Arch-Ⅱ to extract the correct pitch. We also use post-processing for pitch smoothing in the three architectures. Our evaluations show that Arch-Ⅲ has outstanding performance in violin solos and duos and Arch-Ⅰ and Arch-Ⅱ have similar results in violin solos, but Arch-Ⅱ performs better than Arch-Ⅰ in violin duos.
[1] A. Elowsson and A. Friberg, “Polyphonic Transcription with Deep Layered Learning,” in Proc. of the 15th International Society for Music Information Retrieval Conference, 2014.
[2] R. Kelz, M. Dorfer, F. Korzeniowski, S. Bock, A. Arzt, and G. Widmer, “On the Potential of Simple Framewise Approaches to Piano Transcription” in Proc. of the 17th International Society for Music Information Retrieval Conference, 2016.
[3] D. Troxel. “Music Transcription With a Convolutional Neural Network 2016”, in Proc. of the 17th International Society for Music Information Retrieval Conference, 2016.
[4] M. Marolt, “A connectionist approach to automatic transcription of polyphonic piano music,” IEEE Trans. on Multimedia, 2004.
[5] C. Schörkhuber and A. Klapuri, “Constant-Q transform toolbox for music processing,” in Proc. of the 7th Sound and Music Computing Conference, 2010.
[6] J. C. Brown, “Calculation of a constant Q spectral transform,” Journal of the Acoustical Society of America, 1991.
[7] J. C. Brown and M. S. Puckette, “An efficient algorithm for the calculation of a constant Q transform,” Journal of the Acoustical Society of America, 1992.
[8] Y. S. Siao. Analysis and Transynthesis of Musical Sound with Application to Erhu. National Cheng Kung University, Tainan, Taiwan, 2005.
[9] J.J. Allaire, D. Eddelbuettel, N. Golding, and Y. Tang. tensorflow: R Interface to TensorFlow [Online].
Available: https://github.com/rstudio/tensorflow, 2016.
[10] F. Chollet. (2015). Keras [Online].
Available: https://github.com/fchollet/keras, 2015.
[11] G.E. Poliner and D.P.W. Ellis, “A discriminative model for polyphonic piano transcription,” Eurasip Journal of Advances in Signal Processing, 2007.
[12] M. Goto, H. Hashiguchi, T. Nishimura, and R. Oka, “RWC Music Database: Music Genre Database and Musical Instrument Sound Database,” in Proc. of the 4th International Conference on Music Information Retrieval, 2003.
[13] M. Goto, “Development of the RWC Music Database,” in Proc. of the 18th International Congress on Acoustics, 2004.
[14] P.C. Li, L. Su, Y. H. Yang, and Alvin W.Y. Su, "Analysis of Expressive Musical Terms in Violin Using Score-Informed and Expression-Based Audio Features," in Proc. of the 16th International Society for Music Information Retrieval Conference, 2015.
[15] S. Nemtanu and D. Nemtanu, Bartok: 44 Duos. On Lilburn duos for violin [CD]. Germany: Decca, 2016.
[16] E. Vincent, N. Bertin, and R. Badeau. “Adaptive harmonic spectral decomposition for multiple pitch estimation,” IEEE Trans. on Audio, Speech and Language Processing, 2010.
[17] E. Benetos, S. Cherla, and T. Weyde, “An efficient shift-invariant model for polyphonic music transcription,” in Proc. of the 6th International Workshop on Machine Learning and Music, 2013.
[18] M. Marolt, “A connectionist approach to automatic transcription of polyphonic piano music,” IEEE Trans. on Multimedia, 2004.