| 研究生: | 吳尚曄 Wu, Shang-Yeh | 
|---|---|
| 論文名稱: | 巴哈無伴奏小提琴演奏錄音之資料集 Dataset of Bach Sonata and Partita for Solo Violin Records | 
| 指導教授: | 蘇文鈺 Su, Wen-Yu | 
| 學位類別: | 碩士 Master | 
| 系所名稱: | 電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering | 
| 論文出版年: | 2022 | 
| 畢業學年度: | 110 | 
| 語文別: | 英文 | 
| 論文頁數: | 27 | 
| 中文關鍵詞: | 巴哈 、無伴奏小提琴 、資料集 、加法合成 、動態時間規整 | 
| 外文關鍵詞: | Bach, Solo Violin, Dataset, Additive Synthesis, Dynamic Time Warping | 
| 相關次數: | 點閱:78 下載:10 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
現代許多資訊領域的運行都仰賴著機器學習,而特徵資料在其中必然成為不可或缺的 一環,不論是訓練模型以及驗證都需要大量的資料來輔佐我們才得以擁有如此成就。 2020 年時抖音 (Tiktok) 母公司—北京字節跳動科技有限公司發表一篇基於深度學習 所建立的鋼琴樂曲資料庫論文,加上深度學習技術的進步,鋼琴音樂的演奏表情生成 相關論文開始有更多發表,不僅加速了相關領域的發展,更表明了資料集的重要性。 然而與小提琴相關的研究沒有類似完整的資料集,因此我們期許能透過這篇論文與資 料集的發布來推動具備表情生成的小提琴音樂研究的發展。
本論文目的為提取巴哈無伴奏小提琴樂曲之特徵並儲存為一公開的資料庫,由於 巴哈的這些曲目在早期的樂譜中並沒有對演奏表情有太多的描述,因此每個音樂家的 詮釋方式相差甚遠,本論文認為這些資料對於未來用於表情生成具有相當價值,當中 基於錄音版本甚多,本論文以希拉蕊·哈恩 (Hilary Hahn) 與亨利克·謝林 (Henryk Szeryng) 兩位音樂家的演奏版本為主要採集對象,共約 4 個小時 20 分鐘。本論文使 用動態時間規整(DynamicTimeWarping) 作為時域上的主要定位,並透過CREPE深 度學習技術估測基本頻率以作為頻域分析的依據,本資料庫提供的標記包含每個音符 的開始 (onset) 、結束 (offset) 、能量包絡 (energy envelope) 、基頻 (fundamental frequency) 、音高輪廓 (pitch contour) 、音色 (harmonics) 與雜音 (noise) 等 7 項特 徵,再根據上述不包括雜音的前 6 項特徵以加法合成 (additive synthesis) 重新合成全 曲作為檢驗與校正資料正確性的依據。在本論文中由於錄音資料集受到錄音場所以及音樂家個人表現的影響,在時域以及頻 域上的偵測會有些許誤差,以 Sonata No. 1-4 Presto 為例子,在全曲目共 1622 個音符 當中,有2.7%(45個) 音符在時域上的資料需要人工校驗,此外有0.7%(17個) 音符 在頻域上會因為合成方法的限制導致人耳不易辨識聲音的細節。
Nowadays, data science advances greatly depend on machine learning. Feature datasets are necessary for the training model and verifying its correctness. In 2020, Bytedance Ltd., the parent company of Tiktok, published a paper on a piano recording dataset based on machine learning. After that, more paper on piano performance generation was published with the help of the dataset and the improvement of deep learning technology. They enhance the field’s development and show feature dataset’s importance. However, there is no similar dataset of violin-related research, so we hope to promote the development of violin-related research with performance generation by releasing this paper and dataset.
This paper will extract features of Bach Sonatas and Partitas for solo violin and store them as a public dataset. Since these pieces of Bach did not have much description of playing expressions in the early scores, each musician's interpretation method is very different. We believe that these features are valuable for future performance generation. Because there are many recorded versions, this paper mainly includes the performance version of two musicians, Hilary Hahn, and Nathan Milstein, for about 4 hours and 20 minutes. This paper uses Dynamic Time Warping as the main positioning in the time domain and estimates the fundamental frequency through CREPE deep learning technology as the basis for frequency domain analysis. The features provided by this dataset include 7 data such as onset, offset, energy envelope, fundamental frequency, pitch contour, harmonics, and noise for each note. Using the abovementioned features mentioned above, except for the noise and fundamental frequency, the whole song is synthesized by additive synthesis as the basis for data correctness check and correction.
In this paper, since the recording data set is affected by the recording site and the personal performance of the musician, there will be some errors in the detection in the time and frequency domains. Taking Sonata No. 1-4 Presto as an example, among the 1622 notes in the whole repertoire, 2.7% of the notes require manual verification in the time domain. In addition, about 0.7% of the notes are difficult to be recognized by the human ear due to the limitation of the synthesis method in the frequency domain.
[1]. Qiuqiang Kong, Bochen Li, Jitong Chen, and Yuxuan Wang. "GiantMIDI-Piano: A large-scale MIDI dataset for classical piano music." arXiv preprint arXiv:2010.07061 (2020).
[2]. Müller,Meinard,andSebastianEwert."ChromaToolbox:MATLABimplementations for extracting variants of chroma-based audio features." Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR), 2011. hal-00727791, version 2-22 Oct 2012. 2011.
[3].FumitadaItakura.Minimumpredictionresidualprincipleappliedtospeechrecognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 1975.
[4].HiroakiSakoe&SeibiChiba.Dynamicprogrammingalgorithmoptimizationforspoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 1978.
[5]. Christopher Tralie & Elizabeth Dempsey. Exact, parallelizable dynamic time warping alignment with linear memory. In Proceedings of the International Society for Music Information Retrieval Conference, 2020.
[6]. TarasK.Vintsyuk.Speechdiscriminationbydynamicprogramming.Cybernetics,1968.
[7]. Maximum Filter Vibrato Suppression for Onset Detection. Proceedings of the 16th International Conference on Digital Audio Effects, 2013.
[8]. Additivesynthesis.https://en.wikipedia.org/wiki/Additive_synthesis
[9]. JongWookKim,JustinSalamon,PeterLi,JuanPabloBello.CREPE:AConvolutional Representation for Pitch Estimation Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
[10].C.-L. Hsu et al. "On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset", IEEE Transactions on Audio, Speech, and Language Processing. 2009.
[11].Z. Duan et al. "Multiple Fundamental Frequency Estimation by Modeling Spectral Peaks and Non-Peak Regions", IEEE Transactions on Audio, Speech, and Language Processing. 2010.
[12].M. Mauch et al. "pYIN: A fundamental Frequency Estimator Using Probabilistic Threshold Distributions", Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 2014.
[13].R. M. Bittner et al. "MedleyDB: A Multitrack Dataset for Annotation-Intensive MIR Research", Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference. 2014.
[14].J. Salamon et al. "An Analysis/Synthesis Framework for Automatic F0 Annotation of Multitrack Datasets", Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference. 2017.
[15].J. Engel et al. "Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders",arXiv preprint: 1704.01279. 2017.
[16].Fundamental Frequency. https://en.wikipedia.org/wiki/Pitch_(music)
[17].MIDI Files of Bach Sonata and Partita for Solo Violin. http://www.jsbach.net/midi/midi_solo_violin.html
[18].Luo, Jianwen & Ying, Kui & Bai, Lijing. (2005). Savitzky–Golay smoothing and differentiation filter for even number data. Signal Processing. 85. 1429-1434. 10.1016/j.sigpro.2005.02.002.
[19].Ulrich, T. J. "Envelope calculation from the Hilbert transform." Los Alamos Nat. Lab., Los Alamos, NM, USA, Tech. Rep (2006).
[20].Dixon, S., Goebl, W., Widmer, G. (2002). Real Time Tracking and Visualisation of Musical Expression. In: Anagnostopoulou, C., Ferrand, M., Smaill, A. (eds) Music and Artificial Intelligence. ICMAI 2002. Lecture Notes in Computer Science(), vol 2445. Springer, Berlin, Heidelberg.
[21].Yang, H. C., Lin, Y., & Su, A. (2020). A novel source filter model using LSTM/K- means machine learning methods for the synthesis of bowed-string musical instruments. Paper presented at 148th Audio Engineering Society International Convention 2020, Vienna, Virtual, Online, Austria.
[22].Jeong, D., Kwon, T., Kim, Y., Lee, K., & Nam, J. (2019). VirtuosoNet: A Hierarchical RNN-based System for Modeling Expressive Piano Performance. ISMIR.
[23].RWC Music Database. https://staff.aist.go.jp/m.goto/RWC-MDB/
[24].H. -W. Dong, C. Zhou, T. Berg-Kirkpatrick and J. McAuley, "Deep Performer: Score- to-Audio Music Performance Synthesis," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 951-955, doi: 10.1109/ICASSP43922.2022.9747217. 
[25].de Cheveigné, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111 4, 1917-30.
[26]. M. Mauch and S. Dixon, "PYIN: A fundamental frequency estimator using probabilistic threshold distributions," 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 659-663, doi: 10.1109/ICASSP.2014.6853678.
[27].J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies and M. B. Sandler, "A tutorial on onset detection in music signals," in IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp. 1035-1047, Sept. 2005, doi: 10.1109/TSA.2005.851998.
[28].Liang, C., Su, L., Yang, Y., & Lin, H. (2015). Musical Offset Detection of Pitched Instruments: The Case of Violin. ISMIR.
[29].E. Benetos and S. Dixon, "Polyphonic music transcription using note onset and offset detection," 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, pp. 37-40, doi: 10.1109/ICASSP.2011.5946322.
[30].Hilary Hahn plays Bach: Violin Sonatas Nos. 1 & 2; Partita No. 1. https://www.amazon.com/Bach-Partitas-Sonata-Hilary-Hahn/dp/B000025JNV [31].Bach: Sonatas and Partitas for Solo Violin https://www.amazon.com/Bach-Sonatas-Partitas-Solo-Violin/dp/B000001H00
[32].Dong, Hao-Wen, et al. "Deep Performer: Score-to-Audio Music Performance Synthesis." ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022.