簡易檢索 / 詳目顯示

研究生: 楊依林
Yang, Yi-Lin
論文名稱: 爵士四重奏之節奏樂器偵測
Detection of Rhythmic Instruments in Jazz Quartet Recordings
指導教授: 蘇文鈺
Su, Wen-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 87
中文關鍵詞: 節奏樂器辨識爵士四重奏隱馬可夫模型拍點偵測
外文關鍵詞: rhythmic instruments transcription, Jazz quartet, Hidden Markov model, onset detection
相關次數: 點閱:95下載:7
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 爵士樂的百花爭鳴的年代差不多起始于1940年代,而小編制的爵士樂團如三重奏或四重奏等等也在同時興起。酷派大師Miles Davis讓這個風潮延燒了超過20年,這期間,許多爵士樂大師,如Thelonious Monk,John Coltrane,Sonny Rollins,Bill Evans等等都有屬於自己的這類樂團,這20多年來,其間的錄音數目之多難以估算,可以說這是人類音樂史上的重要瑰寶之一。
    由於這類樂團的主角靈魂人物常常是由大師級的管樂手或鋼琴手來擔任,極度自由即興成為這類樂團的音樂的一大風格特色。但是真正默默撐起了樂團的精氣神的卻往往是擔任配角,負責節奏樂器的鼓手與貝斯手。與一般流行音樂的節奏樂器不同的是,爵士樂四重奏中的這兩種節奏樂器仍然具備爵士樂不可缺乏的即興靈魂,所以不會從頭到尾使用同樣的打擊形態(pattern),當然其節拍(tempo)也不會是一成不變的。其實多數這些節奏樂器的樂手也都是一代大師,例如Ray Brown,Max Roach等等。
    用來分析節奏樂器的演算法眾多,卻很少專門針對此一編制的爵士樂錄音而設計。我們發現,這些演算法在分析這些經典商業錄音時,辨識率極不穩定,有時甚至很低,一般來說,有60%~70%就算不錯了。我們猜想原因大概就是如上面所述。
    本論文爲因應上述之困難,我們將爵士四重奏的節奏樂器粗略分成兩種型態,第一種型態為全頻訊號,而全頻訊號中我們又分為兩種,小鼓(Snare drum)與高音鈸(Hi-hat cymbal),第二種型態為泛音結構的低音大提琴(Contrabass),我們在訓練階段中訓練Multi-HMM(Hidden Markov model)並且在辨識階段中使用這些HMM尋找節奏樂器onset的地方,再進一步透過這些onset的地方來分析對應的樂器。

    Dated back to the 40’s, Jazz ensembles such as trios, quartets or even quintets became popular. Miles Davis kept the flame for more than 20 years. In this period, jazz masters, like Thelonious Monk, John Coltrane, Sonny Rollins, Bill Evans, built their own ensembles. The number of commercial recordings is so large and they are all treasures of human music history.
    The key persons of jazz ensembles are usually wind or piano players. Free style improvisation characterizes such a music genre, though drummers and bassists are still the ones who sustain foundation of musical performances. Unlike most pop music, they often neither employ consistent playing patterns nor keep constant tempo throughout the performance because improvisation is their soul. In the meanwhile, they have to hold all the players together. As a matter of fact, rhythmic instrument players are as important as the soloists. That is why drummers and bassists, such as Ray Brown and Max Roach, are remembered as masters too.
    There are quite a few algorithms focusing on rhythmic instrument performance analysis. Very few of them are designed for the analysis of recordings of jazz ensembles. It is found that these algorithms aren’t as effective for our target recordings, mostly well under 70%.
    In response to the difficulties discussed above, the sounds of rhythmic instruments of jazz quartet can be divided into two types. The first type has clear harmonic structures such as that produced by a contrabass. The second type possesses broadband characteristics such as sounds produced by hi-hat cymbal and snare drum. We trained multi-HMM (Hidden Markov Model) in training stage, and extracting onsets of rhythmic instruments using these HMMs with pre-processing, and then focus on the onsets to recognize the instruments with their respective characteristics.

    中文摘要 I Abstract III 誌謝 V List of Tables VIII List of Figures X Chapter 1 Introduction 1 1.1 Introduction and motivation 1 1.2 Previous works 6 1.3 The approach of this thesis 10 1.4 Thesis organization 12 Chapter 2 Related works 13 2.1 Algorithm I : The method using adapted template matching 13 2.2 Algorithm II : The method using PFNMF 20 2.3 Algorithm III : Musical onset detection using constrained linear reconstruction 24 Chapter 3 Methodology 27 3.1 Training stage 27 3.1.1 Pre-processing for input audio signal 28 3.1.2 PA selection 31 3.1.3 Pattern feature extraction and vector quantization 33 3.1.4 DHMM estimation 38 3.1.5 Onset candidate selection 39 3.2 Recognition stage 42 3.2.1 Onset detection in recognition stage 42 3.2.2 Rhythmic instruments identification 43 Chapter 4 Evaluations and discussions 46 4.1 Performance of our algorithm 46 4.1.1 Performance of inside testes 48 4.1.2 Performance of outside testes 51 4.2 Performance of the algorithm using template matching 55 4.3 Performance of the algorithm using PFNMF 58 4.4 Performance of onset detection algorithm using constrained linear reconstruction 60 4.5 Discussion 61 Chapter 5 Conclusion and future works 64 Reference 65 Appendix 72

    [1] Stan Getz discography. In Wikipedia, the free encyclopedia, from https://en.wikipedia.org/wiki/Stan_Getz_discography
    [2] Lester Young discography. In Wikipedia, the free encyclopedia, fromhttps://en.wikipedia.org/wiki/Lester_Young#Discography
    [3] Michael Brecker discography. In Wikipedia, the free encyclopedia, from https://en.wikipedia.org/wiki/Michael_Brecker#Selected_discography
    [4] Gerry Mulligan discography. In Wikipedia, the free encyclopedia, from https://en.wikipedia.org/wiki/Gerry_Mulligan#Discography
    [5] Charlie Parker discography. In Wikipedia, the free encyclopedia, from https://en.wikipedia.org/wiki/Charlie_Parker_discography
    [6] Kenny Garrett discography. In Wikipedia, the free encyclopedia, from https://en.wikipedia.org/wiki/Kenny_Garrett#Discography
    [7] Sidney Bechet discography. In Wikipedia, the free encyclopedia, from https://en.wikipedia.org/wiki/Sidney_Bechet#Partial_discography
    [8] Wayne Shorter discography. In Wikipedia, the free encyclopedia, from https://en.wikipedia.org/wiki/Wayne_Shorter#Discography
    [9] John Coltrane discography. In Wikipedia, the free encyclopedia, from https://en.wikipedia.org/wiki/John_Coltrane_discography
    [10] Sonny Rollins discography. In Wikipedia, the free encyclopedia, from https://en.wikipedia.org/wiki/Sonny_Rollins_discography
    [11] Miles Davis discography. In Wikipedia, the free encyclopedia, from https://en.wikipedia.org/wiki/Miles_Davis_discography
    [12] Y Linde, A Buzo, RM Gray, ”An algorithm for vector quantizer design,” IEEE Transaction, 1980.
    [13] E. Scheirer, “Tempo and beat analysis of acoustic musical signals,” J. Acoust. Soc. Am., vol. 103, no. 1, pp. 588–601, Jan. 1998.
    [14] J. Paulus and A. Klapuri, “Measuring the similarity of rhythmic patterns,” in Proc. Int. Conf. Music Information Retrieval (ISMIR), 2002, pp. 150–156.
    [15] F. Gouyon and P. Herrera, “Determination of the meter of musical audio signals: seeking recurrences in beat segment descriptors,” in Proc. Audio Engineering Soc. (AES), 114th Conv., 2003
    [16] E. Pampalk, S. Dixon, and G. Widmer, “Exploring music collections by browsing different views,” J. Comput. Music J., vol. 28, no. 2, pp. 49–62, summer 2004.
    [17] Piszczalski M., Galler B., “Automatic Music Transcription,” Computer Music Journal, vol. 1, no. 3, 1977, pp. 24-31, Nov. 1977.
    [18] Eric Scheirer. “Extracting expressive performance information from recorded music”. Master’s thesis, MIT, 1995.
    [19] A. Cogliati and Z. Duan, “Piano Music Transcription Modeling Note Temporal Evolution,” IEEE Int. Conf. Acoust., Speech and Signal Proc., pp. 429-433, South Brisbane, Queensland, Apr. 2015.
    [20] J.S. Downie, “Music information retrieval,” Annual review of information science and technology, vol. 37, pp. 295-340, 2003.
    [21] D. Ellis and J. Arroyo, “Eigenrhythms: Drum pattern basis sets for classification and generation,” in Proc. Int. Conf. Music Information Retrieval (ISMIR), 2004, pp. 554–559.
    [22] C. Uhle and C. Dittmar, ”Drum pattern based genre classification of popular music,” in Proc. Int. Conf. Audio Eng. Soc. (AES), 2004.
    [23] O. Gillet and G. Richard, “Automatic transcription of drum loops,” in Proc. of the IEEE ICASSP 2004 Conference, May 2004.
    [24] Jouni Paulus, “Acoustic modelling of drum sounds with hidden Markov models for music transcription,” in Proc. of the 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2006.
    [25] Jouni Paulus, Anssi Klapuri. “Combining Temporal and Spectral Features in HMM-Based Drum Transcription,” ISMIR , pp. 225-228, 2007.
    [26] J. Paulus and A. Klapuri. “Drum sound detection in polyphonic music with hidden Markov models,” EURASIP Journal on Audio, Speech, and Music Processing, 2009.
    [27] Yiju Lin, Wei-Chen Chang and Alvin W.Y. Su,” Quantitative Evaluation of Violin Solo Performance.” IEEE Signal and Information Processing Association Annual Summit and Conference, pp.1-6, 2013.
    [28] Yiju Lin, W.-C. Chang, T.-M. Wang, Alvin W. Y. Su and W.-H. Liao, ” Timbre-Constrained Recursive Time-Varying Analysis for Musical Note Separation,” in Proc. of the 16th International Conference on Digital Audio Effects (DAFx-13), Maynooth, Ireland, Sep. 2-6, 2013.
    [29] T. M. Wang, et al., "Analysis and Trans-Synthesis of Acoustic Bowed-String Instrument Recordings–A Case Study Using Bach Cello Suites," in International Conference 240 on Digital Audio Effects (Dafx), IRCAM, Paris, French, 2011.
    [30] T.-M. Wang, T.-C. Chen, Y.-L. Chen, Alvin W.Y. Su, “Time-dependent recursive regularization for sound source separation,” in Proc. of the 3rd International Conference on Audio, Language and Image Processing (ICALIP2012), Shanghai, China, Jul. 16-18, 2012.
    [31] T. M. Wang, P.Y. Tsai, and A.W.Y. Su. “Score-informed pitch-wise alignment using score-driven non-negative matrix factorization.” in Proc. IEEE International Conference on Audio, Language and Image Processing (ICALIP).
    [32] Ta-Chun Chen, Tien-Ming Wang, Ya-Han Kuo and Alvin Su.” Effective separation of low-pitch notes using NMF with non-power-of-2 short-time fourier transforms” in Proc. of the 15th Int. Conference on Digital Audio Effects (DAFx-12), York, UK, September 17-21, 2012.
    [33] Tien-Ming Wang, Pei-Yin Tsai, Alvin W. Y. Su. “Note-based alignment using score-driven non-negative matrix factorisation for audio recordings,” IET Signal Processing, 2012.
    [34] Yi-Ju Lin, Tien-Ming Wang*, Ta-Chun Chen, Yin-Lin Chen, Wei-Chen Chang and Alvin WY Su, “Musical note analysis of solo violin recordings using recursive regularization,” EURASIP Journal on Audio, Speech, and Music Processing, 2014.
    [35] P. Smaragdis and J. C. Brown, “Non-negative matrix factorization for polyphonic music transcription,” in Proc. IEEE Workshop Applicat. Signal Process. Audio Acoust (WASPAA), Oct. 2003, pp. 177–180.
    [36] T. Virtanen, “Sound source separation using sparse coding with temporal continuity objective,” In Proc. Int. Computer Music Conf. pp. 231–234, 2003
    [37] J. Paulus and T. Virtanen, “Drum transcription with non-negative spectrogram factorisation,” in Proc. 13th Eur. Signal Process. Conf. (EUSIPCO), Antalya, Turkey, Sep. 4–8, 2005.
    [38] S. Scholler and H. Purwins, “Sparse coding for drum sound classification and its use as a similarity measure,” in Proc. 3rd Int. Workshop Mach. Learn. Music (MML10) at ACM Multimedia, 2010.
    [39] S. Scholler and H. Purwins, “Sparse approximations for drum sound classification,” IEEE J. Select. Topics Signal Process, vol.5, No.5, pp. 933–940, 2011.
    [40] C.-W. Wu and A. Lerch. Drum transcription using partially fixed non-negative matrix factorization with template adaptation. In International Society for Music Information Retrieval Conference (ISMIR), 2015.
    [41] K. Yoshii, M. Goto, and H. G. Okuno, “Automatic drum sound description for real-world music using template adaptation and matching methods,” in Proc. Int. Conf. Music Inf. Retrieval (ISMIR), 2004.
    [42] K. Yoshii, M. Goto, and H. Okuno, “AdaMast: a drum sound recognizer based on adaptation and matching of spectrogram templates,” in Proc. Music Information Retrieval Evaluation eXchange (MIREX), 2005.
    [43] K. Yoshii, M. Goto, and H. Okuno, “Drum sound recognition for polyphonic audio signals by adaptation and matching of spectrogram templates with harmonic structure suppression,” IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 1, pp. 333–345, 2007.
    [44] Olivier Gillet and Ga¨el Richard, “Transcription and separation of drum signals from polyphonic music,” Transactions on Audio, Speech, and Language Processing, vol. 16, no. 3, pp. 529—540, Mar. 2008.
    [45] Che-Yuan Liang, Li Su, and Yi-Hsuan Yang. “Musical onset detection using constrained linear reconstruction,” IEEE Signal Processing Letter, 22(11):2142–2146, 2015.
    [46] A. Holzapfel, Y. Stylianou, A. C. Gedik, and B. Bozkurt, “Three dimensions of pitched instrument onset detection,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 6, pp. 1517–1527, 2010.
    [47] S. Böck, F. Korzeniowski, and F. Krebs, “MIREX 2014 submissions,”Music Information Retrieval Evaluation eXchange, 2014 [Online].Available: http://www.music-ir.org/mirex/abstracts/2014/SB2.pdf
    [48] E. Marchi, G. Ferroni, F. Eyben, L. Gabrielli, S. Squartini, and B. Schuller, “Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, 2014, pp. 2164–2168.
    [49] A. Holzapfel, Y. Stylianou, A. C. Gedik, and B. Bozkurt, “Three dimensions of pitched instrument onset detection,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 6, pp. 1517–1527, 2010.
    [50] M. Goto, H. Hashiguchi, T. Nishimura, and R. Oka, “RWC music database: popular, classical, and jazz music databases,” in Proc. Int. Conf. Music Information Retrieval (ISMIR), 2002, pp. 287–288.
    [51] O. Gillet and G. Richard, “Enst-drums: an extensive audio-visual database for drum signals processin, ” in Proceedings of the 7th International Symposium on Music Information Retrieval (ISMIR), pages 156–159, 2006
    [52] M Goto, H Hashiguchi, T Nishimura, R Oka, “RWC music database: music genre database and musical instrument sound database,” in Proc. Int. Conf. Music Information Retrieval (ISMIR), 2003, pp. 229–230.
    [53]Strong, Jeff (2009). Home Recording For Musicians For Dummies (Third ed.). Indianapolis, Indiana: Wiley Publishing, Inc. p. 249.
    [54] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257–285,Feb. 1989.
    [55] A. J. Viterbi, “Error bounds for convolutional codes and an asymptotically optimal decoding algorithm,” IEEE Trans. Informat. Theory, vol. IT-13, pp.260-269, Apr. 1967.
    [56] G. D. Forney, “The Viterbi algorithm,” in Proc. IEEE, vol. 61, pp.268-278, Mar. 1973.
    [57] O. Lartillot and P. Toiviainen.”A matlab toolbox for musical feature extraction from audio,” in DAFx, 2007.

    下載圖示 校內:2021-09-01公開
    校外:2021-09-01公開
    QR CODE