| 研究生: |
黃俊憬 Huang, Jun-Jin |
|---|---|
| 論文名稱: |
以MPEG-7低階聲音特徵值為基準之語句搜尋研究 Spoken Sentence Retrieval Based on MPEG-7 Audio Low-Level Descriptors |
| 指導教授: |
王駿發
Wang, Jhing-Fa |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2003 |
| 畢業學年度: | 91 |
| 語文別: | 英文 |
| 論文頁數: | 77 |
| 中文關鍵詞: | 語句搜尋 、MPEG-7 |
| 外文關鍵詞: | Spoken sentence retrieval, MPEG-7 audio low-level descriptors |
| 相關次數: | 點閱:77 下載:4 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
這篇論文提出一個以MPEG-7低階聲音特徵值為基準的語句檢索系統. 不採用一般大量詞彙辨識器, 我們可以減少運算的需求量, 如此可以更適用於手持式的裝置. 我們的方法主要分做兩部分, 首先, 我們找出在語句資料中與使用者查詢相近的區段. 再利用rank-based的方法從可能區段中選出頭N筆. 至於在MPEG-7低階聲音特徵值中, 我們說明它們低運算量的優點並且在檢索實驗結果中, 我們可以發現他有和MFCC匹配的檢索效果.
In this thesis, we propose a speech retrieval system based on MPEG-7 audio low-level descriptors (LLDs). Without using the large-vocabulary recognizer, we are able to greatly reduce the computational power and make it more suitable for hand-held devices. Therefore, we propose a sentence-matching method. In our proposed method, there are two main steps. First, we locate several possible segments in spoken documents that are similar with the user’s query. Secondly, we rank the candidates with rank-based method and retrieve top N from these candidate segments. Besides, we investigate MPEG-7 audio LLDs as the features for spoken sentence retrieval. We show their low-complexity advantage and the use of MPEG-7 based features is proven comparable with the MFCCs (Mel-Frequency Cepstrum Coefficients) in the experiment results
[1] Berlin Chen; Hsin-min Wang; Lin-shan Lee; “Discriminating capabilities of syllable-based features and approaches of utilizing them for voice retrieval of speech information in Mandarin Chinese”, Speech and Audio Processing, IEEE Transactions on , Volume: 10 Issue: 5 , Jul 2002, Page(s): 303 -314
[2] Meng, H.M., Pui Yu Hui, “Spoken document retrieval for the languages of Hong Kong”, Intelligent Multimedia, Video and Speech Processing, 2001. Proceedings of 2001 International Symposium on , 2001, Page(s): 201 -204
[3] Johnson, S.E.; Jones, K.S.; Jourlin, P.; Moore, G.L.; Woodland, P.C.; ”The Cambridge University spoken document retrieval system”, Acoustics, Speech, and Signal Processing, 1999. ICASSP '99. Proceedings., 1999 IEEE International Conference on , Volume: 1 , 15-19 Mar 1999, Page(s): 49 -52 vol.1
[4] Matthew A. Siegler, “Integration of Continuous Speech Recognition and Information Retrieval for Mutually Optimal Performance”, Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, Pennsylvania 15213, 1999 December 15
[5] Ng, K, Zue, VW, “Phonetic recognition for spoken document retrieval”, Acoustics, Speech, and Signal Processing, 1998. ICASSP '98. Proceedings of the 1998 IEEE International Conference on , Volume: 1 , 12-15 May 1998, Page(s): 325 -328 vol.1
[6] Wechsler, “Spoken Document retrieval based on phoneme recognition”, a dissertation submitted to the SWISS FEDERAL INSTITUTE of TECHNOLOGY (ETH)ZURICH 1998
[7] J. Foote., “An overview of audio information retrieval.”, ACM Multimedia Systems, 7:2 10, 1999.
[8] Savitha Srinivasan, Dragutin Petkovic, “Phonetic confusion matrix based spoken document retrieval”, Annual ACM Conference on Research and Development in Information Retrieval Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval 2000 , Athens, Greece
[9] Amit Singhal, Fernando Pereira ,”Document expansion for speech retrieval”, Annual ACM Conference on Research and Development in Information Retrieval Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval 1999 , Berkeley, California, United States.
[10] Fabio Crestani Univ. of Strathclyde, Glasgow, Scotland, “Towards the use of prosodic information for spoken document retrieval”, Annual ACM Conference on Research and Development in Information Retrieval Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval2001 , New Orleans, Louisiana, United States
[11] H.K Xie, “A Study on Voice Caption Search for Arbitrarily Defined Keywords.” Master Thesis, National Taiwan University of Science and Technology, Taiwan, R.O.C., July 2000.
[12] Itoh, Y, “A matching algorithm between arbitrary sections of two speech data sets for speech retrieval”; Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on , Volume: 1 , 2001 , Page(s): 593 -596 vol.1
[13] Martinez, J.M.; Koenen, R.; Pereira, F, “MPEG-7: the generic multimedia content description standard, part 1” IEEE Multimedia , Volume: 9 Issue: 2, April-June 2002, Page(s): 78 -87
[14] Martinez, J.M. ”Standards - MPEG-7 overview of MPEG-7 description tools, part 2”, IEEE Multimedia , Volume: 9 Issue: 3 , Jul.-Sept. 2002, Page(s): 83 -93
[15] Avaro, O.; Salembier, P. “MPEG-7 Systems: overview”, Circuits and Systems for Video Technology, IEEE Transactions on , Volume: 11 Issue: 6 , June 2001 Page(s): 760 -764
[16] Hunter, J.”An overview of the MPEG-7 description definition language (DDL)”, Circuits and Systems for Video Technology, IEEE Transactions on , Volume: 11 Issue: 6 , June 2001 Page(s): 765 -772
[17] Salembier, P.; Smith, J.R., “MPEG-7 multimedia description schemes”, Circuits and Systems for Video Technology, IEEE Transactions on , Volume: 11 Issue: 6, June 2001 Page(s): 748 -759
[18] “ISO/IEC FDIS 15938-4 Multimedia Interface Description Interface Part 4 audio”
[19] Paliwal, K.K.; ”Spectral subband centroid features for speech recognition”, Acoustics, Speech, and Signal Processing, 1998. ICASSP '98. Proceedings of the 1998 IEEE International Conference on , Volume: 2 , 12-15 May 1998 Page(s): 617 -620 vol.2
[20] Y. S. Weng, “The chip design of Mel frequency cepstrum coefficient for HMM Speech Reconition,” Master Thesis, National Cheng Kung University, Taiwan, R.O.C., June 1998.
[21] Richard.B, Berthier.R”Modern Information Retrieval”, New York: ACM Press, 1999.
[22] L. Rabiner, B. Huang Juan.”Fundamentals of speech recognition”, published by Prentice Hall, 1993