簡易檢索 / 詳目顯示

研究生: 李仲朗
Li, Zhong-Lang
論文名稱: 使用人工神經網絡和人體測量參數預測HRTF空間主成分
Predicting HRTF Spatial Principal Components Using Artificial Neural Networks and Anthropometric Parameters
指導教授: 陳進興
Chen, Chin-Hsing
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 49
中文關鍵詞: 工程聲學HRTF神經網絡PCA
外文關鍵詞: Engineering, Acoustics, HRTF, Neural Networks, PCA
相關次數: 點閱:131下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 人的雙耳空間音頻感知允許辨別聲音的起源。人體與頭部前的聲場的相互作用,產生信號學上的各種差異。在給定的情況下,如在空間中任意一點釋放脈衝訊號,可以在人耳鼓膜參考點接收到脈衝響應。其中的差異通過函數在數學上描述為與頭部相關的傳遞函數 (HRTF)。這包含一系列空間化聲音感知所必需的物理量。由於人耳及頭部幾何形狀的不同,傳遞函數是高度個人化的。本論文對不同公開可用的資料庫的結構進行了評估,並統一了測量的源位置以及HRTF的處理和替代表示,我們對比了資料庫統一的方法,使用雙線性和VBAP(基於向量的幅度合成),對於所生成的模型,進行客觀驗證。
    本論文使用人工神經網路估計空間主成分並將一定數量的人體測量參數映射到相應的權重,預測任意空間方向上的個人化 HRTF。對於一般方法和PCA方法,我們可以比對其效果差異,也會對比不同網絡及參數的結果差異,並進行主觀驗證。相對於一般方法(通用模型),使用人工神經網絡進行合成的平均對數頻譜誤差均值為3.53dB,相對於一般模型降低了2.66dB,這顯示了ANN方法的有效性。

    Human binaural spatial audio perception allows discrimination of the origin of sound. The interaction of the human body with the sound field in front of the head produces various differences in signaling. In a given situation, such as releasing an impulse signal at any point in space, an impulse response can be received at the reference point of the eardrum of the human ear. The difference in this is mathematically described by the function as the head-related transfer function (HRTF).
    This contains a series of physical quantities necessary for spatialized sound perception. The transfer function is highly individual due to differences in human ear and head geometry. This thesis evaluates the structure of different publicly available databases and unifies the source locations of measurements and the processing and alternative representation of HRTFs. We compare approaches to database unification, using bilinear and VBAP (Vector-Based Amplitude Synthesis) , and objectively verify the generated model.This thesis uses artificial neural networks to estimate spatial principal components and map a certain number of anthropometric parameters to corresponding weights to predict personalized HRTFs in arbitrary spatial orientations. For the general method and the PCA method, we can compare the differences in their effects, and also compare the differences in the results of different networks and parameters, and conduct subjective verification. Compared with the general method (general model), the mean logarithmic spectral error of synthesis using artificial neural network is 3.53dB, which is 2.66dB lower than that of general model, which shows the effectiveness of the ANN method.

    中文摘要 I Abstract II 誌謝 VI 目錄 VII 圖目錄 IX 第一章 緒論 1 1-1研究動機 1 1-2文獻回顧 2 第二章 數學模型 3 2-1 信號與系統 3 2-2 傅立葉變換 3 2-3 波動方程 4 2-4 聽覺化 5 2-4.1 耳間差異 5 2-4.2 頭部相關傳輸函數(HRTF) 7 2-5 神經網路和機器學習 8 2-5.1 監督學習 9 2-5.2 非監督學習 12 2-6 HRTFdatabase 12 2-7 常見的HRTF個人化方法 13 2-7.1 個人選擇法 13 2-7.2 頻率刻度 14 2-7.3 3D數值模擬 15 2-7.4 回歸 15 第三章 實驗方法 17 3-1 聲源位置的統一 18 3-2 HRIR的預處理 24 3-2.1 PCA 25 3-3 人工神經網路30 3-3.1 人體測量學參數的選擇31 3-3.2 模型結構 32 3-3.3 HRIRS的重建 34 第四章 結果及討論 37 4-1 可視化的例子 37 4-2 客觀評價 40 第五章 建議與未來展望 45 參考文獻 47

    [1] J. Blauert, "Spatial hearing: the psychophysics of human sound localization," MIT press, pp. 37, 1997.

    [2] Microsoft, "Spatial sound", Audio and Acoustics Research Group Microsoft Research Redmond, pp. 30-31, 2020.

    [3] E. H. A. Langendijk and A. W. Bronkhorst, "Contribution of spectral cues to human sound localization," Acoustical Society of America, pp. 30, 2002.

    [4] T. Chen, "Autoencoding HRTFs for DNN based HRTF personalization using anthropometric features," Master’s Thesis, pp. 6-7, 2018.

    [5] T. Xiao and Q. Liu, “Finite difference computation of head-related transfer function for human hearing,” Acoustical Society of America, Vol. 113, No. 5, pp. 34–41, 2003.

    [6] C. P. Brown and R. O. Duda, “A structural model for binaural sound synthesis,” IEEE Transactions on Speech and Audio Processing, Vol. 6, No. 5, pp. 476–488, 1998.

    [7] J. C. Middlebrooks, "Virtual localization improved by scaling nonindividualized external-ear transfer functions in frequency," Acoustical Society of America, pp. 29-46, 1999.

    [8] A. V. Oppenheim and R. W. Schafer, "Discrete-time signal processing," pp. 33-34, 2010.

    [9] M. Vorländer, "Auralization, fundamentals of acoustics, modelling, simulation, algorithms and acoustic virtual reality," pp. 35-38, 2008.

    [10] J. Blauert, "Communication acoustics. signals and communication technology," pp. 33-37, 2005.

    [11] J. C. Middlebrooks, " Individual differences in external-ear transfer functions reduced by scaling in frequency," Acoustical Society of America, pp. 37-46, 1998.

    [12] V. R. Algazi, R. O. Duda and D. M. Thompson, "The CIPIC HRTF database," IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 30, 43, 44, 45, 48-55, 2001.

    [13] N. G. Andrew, "Machine learning course," https://youtu.be/PPLop4L2eGk, 2016.

    [14] P. Majdak, "Spatially oriented format for acoustics: a data exchange format representing head-related transfer functions," Audio Engineering Society, pp. 44-71, 2013.

    [15] P. Majdak, "ARI HRTF database," Institut für Schallforschung der Österreichischen Akademie der Wissenschaften, 2014

    [16] R. Bomhardt and M. Fuente, "A high-resolution head-related transfer function and three-dimensional ear model database," Acoustical Society of America, pp. 44, 2016.

    [17] S. Rahulram, "A database of head-related transfer function and morphological measurements," Audio Engineering Society Convention, pp. 38-44, 2017.

    [18] F. Brinkmann, R. Pelzer, W. J. Joschka and S. Fabian, "The HUTUBS HRTF database," Technische Universität Berlin, pp. 46-64, 2019.

    [19] A. Tung, "Visualization and principal component analysis from a head-related transfer function database," Department of Electrical and Computer Engineering, University of California, San Diego, pp. 1-3, 2020.

    [20] Brüel Kjær, "High-frequency Head and torso simulator type 5128 family," Brüel Kjær company, pp. 1-9, 2020.

    [21] V. Pulkki, "Virtual sound source positioning using vector base amplitude panning," Audio Engineering Society, pp. 50, 91, 1997.

    [22] M. F. Zhang, Z. S. Ge, T. J. Liu, X. H. Wu and T. S. Qu, "Modeling of individual HRTFs based on spatial principal component analysis," IEEE/ACM Transactions on Audio, Speech And Language Processing, Vol. 28, No. 1, pp. 2, 2020.

    [23] R. B. Palm, “Prediction as a candidate for learning deep hierarchical models of data,” Master’s Thesis, pp. 7, 2012.

    [24] G. F. Kuhn, "Model for the interaural time differences in the azimuthal plane," Acoustical Society of America, pp. 59, 1977.

    無法下載圖示 校內:2027-09-12公開
    校外:2027-09-12公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE