簡易檢索 / 詳目顯示

研究生: 孫詠富
Sun, Yung-Fu
論文名稱: 基於時域上基週同步疊加法以實現音高調整
Pitch Shifting by Time-Domain Pitch Synchronous Overlap-and-Add Method
指導教授: 廖德祿
Liao, Teh-Lu
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 43
中文關鍵詞: 音高調整時域上基週同步疊加法
外文關鍵詞: Pitch shifting, TD-PSOLA
相關次數: 點閱:43下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來由於國人唱KTV風潮有逐年盛行的趨勢,但是即使再厲害的歌手唱歌也會有走音的時候,於是便發想是否有可以調整音高的裝置以達到當唱歌走音時可以把音高調整回來,雖然坊間也有一些音高調整軟體,但是往往都是需要專業的混音師以手動調整的方式把音高拉到理想的音高,為了達到自動化調整音高以及往後即時音高的調整於是預先將音高編寫進程式當中,在各種語音處理前必須先獲取訊號的音高,因此音高追蹤一直以來是語音處理中很熱門的研究主題。本論文使用AMDF音高追蹤演算法以節省運算的時間,運用所得音高去運算音高標誌,獲得音高標誌後調整音高,同時使用TD-PSOLA音高調整演算法以保留聲音的波形以達到保持音色完整的呈現出來,但由於TD-PSOLA是運用漢寧窗和疊加的方法來改變音高,在疊加的過程中產生噪音及口腔的濁音也被疊加而造成雜音變多,因此本論文在後處理方面使用了適應性卡爾曼濾波器以消除雜音,適應性卡爾曼濾波器透過判斷雜音和歌聲的組成比去改變係數,以決定濾波器所要濾掉的程度。音高調整軟體程式的開發環境是Microsoft Visual Studio 2012 C++,運用了Windows API和多線程處理的方式達到實時的錄音,最後配合Microsoft Foundation Classes (MFC)來設計界面程式方便使用者使用及觀察調音前後的音高。

    In recent years, singing at KTV has become very popular, but even the best singer in the world may sing off key. Therefore, the chosen research topic is on designing an interface which can modify pitch for vocals. There are some applications used for pitch modification in the market, but they need to shift the pitch to ideal pitch manually. The process of modification is so complicated that it needs a professional audio mixer. The focus of this research is to understand how to improve this disadvantage by modifying the pitch automatically using computer programming. Before the audio processing, we need to get the pitch of the audio signal, so pitch tracking has been an issue in audio processing for a long time. The algorithm used for pitch tracking in this thesis is AMDF for saving much of time calculating. We use the pitch calculated by AMDF to calculate pitch marks for the next step. Calculating the pitch marks is the key point before pitch shifting. In this case, the pitch marks will determine the quality of the processed signal. The algorithm used for pitch shifting is TD-PSOLA and it protects the timbre from destroying while shifting pitch. Moreover, it is more timesaving than pitch shifting in the frequency domain. On the other hand, the disadvantage of this algorithm is that it may produce unpredictable noise because it shifts pitch by Hanning window, overlapping and adding. Therefore, the adaptive Kalman filter is designed to eliminate these noises. The proposed adaptive Kalman filter will change its coefficients to determine how much noise we should filter in the audio signal. The Pitch Shifting software is implemented in the environment of Microsoft Visual Studio 2012 C++ and using Windows API and multithread to record audio signal corded in real time. Eventually, designing a convenient interface by Microsoft Foundation Classes to shift pitch and observe the pitch before and after processing becomes the topic of this thesis.

    摘要 I Abstract III 誌謝 V Contents VI List of Figs VIII CHAPTER 1 INTRODUCTION 1 1.1Background 1 1.2 Motivation 1 1.3 Thesis Organization 2 CHAPTER 2 PRE-PROCESSING OF TD-PSOLA 4 2.1 Basic acoustic features of the audio signal 4 2.2 Pitch tracking 6 2.2.1 Autocorrelation function (ACF) 7 2.2.2 Average magnitude difference function (AMDF) 11 2.3 Pitch marks 15 2.3.1 The definition of a pitch mark 15 2.3.2 Static method to find pitch marks 15 2.3.3 Dynamic method to find pitch marks 16 CHAPTER 3 PITCH SHIFTING 18 3.1 Hanning window 18 3.2Select the key 19 3.3 Time-domain pitch synchronous overlap and add (TD-PSOLA) 20 CHAPTER 4 POST-PROCESSING (eliminate noise) 22 4.1 White noise 22 4.2 Kalman filter 23 4.2.1 Linear predict coefficient (LPC) 24 4.2.2 Combine Linear predict coefficient with Kalman filter 27 4.3 Adaptive Kalman filter 29 CHAPTER 5 EXPERIMENTAL RESULTS 31 5.1 Introduction to Windows API 31 5.2 System Design and Architecture 32 5.2.1 Recording in real time 34 5.2.2 Operation Display User Interface 36 CHAPTER 6 CONCLUSION AND FUTURE WORK 39 6.1 Conclusion 39 6.2 Future work 40 REFERENCE 41

    [1] A. Helal, S.E. Moore, B. Ramazhandran, “Drishti: an integrated navigation system for visually impaired and disabled,” in Proc. IEEE, 2001.
    [2] Hui-ju Chiang, “Improvement of Prosody Transplant for Mandarin Chinese.” in Proc. Tsinghua University, pp.1-4, 2009.
    [3] Allam Mousa, “Voice Conversion Using Pitch Shifting Algorithm by Time Stretching with PSOLA and Re-Sampling.” in Proc. Journal of Electrical Engineering, vol.61, no.1, pp.57–61.
    [4] “Introduction to AMDF.” http://uchanchao.blogspot.tw/2014/08/amdf.html
    [5] Wikipedia, “Pitch perception.”
    https://commons.wikimedia.org/wiki/File:Pitch_perception.png
    [6] Quora, “What causes different timbres.”
    https://www.quora.com/What-causes-different-timbres
    [7] Roger Jang, “Audio signal processing and recognition.”
    http://mirlab.org/jang/books/audioSignalProcessing/image/acf.png
    [8] Roger Jang, “Audio signal processing and recognition.”
    http://mirlab.org/jang/books/audioSignalProcessing/image/amdf.png
    [9] Cheng-Yuan Lin and J.-S. Roger Jang, “A Two-Phase Pitch Marking Method for TD-PSOLA Synthesis.” GESTS International Transaction on Speech Science and Engineering, vol. 1, no. 2, pp. 211-221, Dec.2004.
    [10] F. Charpentier ; M. Stella, “Diphone synthesis using an overlap-add technique for speech waveforms concatenation.” in Proc. IEEE International Conference. ICASSP'86, Zurich, Switzerland, 7 Apr.~11 Apr.1986.
    [11] S. Roucos and A. Wilgus, “High-quality time scale modification of speech.” in Proc. IEEE International Conference, ICASSP, Hong Kong, China, 6 Apr.~10 Apr.2003, pp. 236–239.
    [12] Wikipedia, “Window function.”
    https://zh.wikipedia.org/wiki/%E7%AA%97%E5%87%BD%E6%95%B0
    [13] Wikipedia, “音高.”
    https://zh.wikipedia.org/wiki/%E9%9F%B3%E9%AB%98
    [14] “Illustration of TD-PSOLA,”
    https://www.google.com.tw/search?q=psola&tbm=isch&tbs=rimg:CQImJZFtfAqPIjjcg6NeLQMnBRJ3Whjo69E2uEuhG1GNd60JoP5rbBRzcichkQQcJh3GQjbdiZSHwmNwWvrKr9ZyNioSCdyDo14tAycFEYsU12qYkEa8KhIJEndaGOjr0TYRixTXapiQRrwqEgm4S6EbUY13rRGLFNdqmJBGvCoSCQmg_1mtsFHNyEYsU12qYkEa8KhIJJyGRBBwmHcYRWeVQrBy0rnoqEglCNt2JlIfCYxEZxrQ6Xy6CxyoSCXBa-sqv1nI2EVZ9_1ewDoxyL&tbo=u&sa=X&ved=2ahUKEwigw6Hi8rPaAhXIf7wKHf4CD3UQ9C96BAgAEBs&biw=914&bih=613&dpr=1.25#imgrc=WPjIl8L-l2TE9M:
    [15] Aimilios Chalamandaris ; Pirros Tsiakoulis ; Sotiris Karabetsos ; Spyros Raptis, “An efficient and robust pitch marking algorithm on the speech waveform for TD-PSOLA.” in Proc. IEEE International Conference, ICSIPA, Kuala Lumpur, Malaysia, 18 Nov.~19 Nov.2009, pp. 1.
    [16] Wikipedia, “White noise.”
    https://zh.wikipedia.org/wiki/File:White_noise_spectrum.png
    [17] Li Deng, D. O'Shaughnessy, Speech processing: a dynamic and optimization-oriented approach. New York: Marcel Dekker Inc. Jun.2003, pp. 41–48.
    [18] W. F. TRench, “An algorithm for the inversion of finite Toeplitz matrices.” J. Soc. Indust. Appl. Math, vol. 12, no. 3, pp. 515-522, Sep.1964.
    [19] Paul Zarchan and Howard Musoff, Fundamentals of Kalman Filtering: A Practical Approach. in Proc. American Institute of Aeronautics and Astronautics, Incorporated. ISBN 978-1-56347-455-2, 2000.
    [20] Tarek Mellahi, Rachid Hamdi, “LPC-based formant enhancement method in Kalman filtering for speech enhancement.” in Proc. AEU - International Journal of Electronics and Communications, vol. 69, no. 2, pp. 545-554, Feb.2015.
    [21] M. Gabrea, E. Mandridake, M. Najim, “Adaptive Kalman filter for speech enhancement from colored noise.” in Proc. IEEE, 23 Apr.2015.
    [22] C.N. Prabhavathi and K.M. Ravikumar, “A fast adaptive Kalman filtering algorithm for speech enhancement under stationary noise environment.” in Proc. Artificial intelligence and evolutionary computer in engineering system. 06 Feb.2016.
    [23] Hua X, “Adaptive speech enhancement based on discrete cosine transform in high noise.” Harbin engineering university, 2006.
    [24] Yingjie Yang, Huanhuan Zhang, Xiue Guo, “A pitch tracking method mixing ACF & AMDF algorithms based on correlations.” in Proc. IEEE, Oct.2011.

    下載圖示 校內:2023-07-26公開
    校外:2023-07-26公開
    QR CODE