成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃俊憬 Huang, Jun-Jin
論文名稱：	以MPEG-7低階聲音特徵值為基準之語句搜尋研究 Spoken Sentence Retrieval Based on MPEG-7 Audio Low-Level Descriptors
指導教授：	王駿發 Wang, Jhing-Fa
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2003
畢業學年度：	91
語文別：	英文
論文頁數：	77
中文關鍵詞：	語句搜尋、MPEG-7
外文關鍵詞：	Spoken sentence retrieval, MPEG-7 audio low-level descriptors
相關次數：	點閱：173 下載：4
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

這篇論文提出一個以MPEG-7低階聲音特徵值為基準的語句檢索系統. 不採用一般大量詞彙辨識器, 我們可以減少運算的需求量, 如此可以更適用於手持式的裝置. 我們的方法主要分做兩部分, 首先, 我們找出在語句資料中與使用者查詢相近的區段. 再利用rank-based的方法從可能區段中選出頭N筆. 至於在MPEG-7低階聲音特徵值中, 我們說明它們低運算量的優點並且在檢索實驗結果中, 我們可以發現他有和MFCC匹配的檢索效果.

In this thesis, we propose a speech retrieval system based on MPEG-7 audio low-level descriptors (LLDs). Without using the large-vocabulary recognizer, we are able to greatly reduce the computational power and make it more suitable for hand-held devices. Therefore, we propose a sentence-matching method. In our proposed method, there are two main steps. First, we locate several possible segments in spoken documents that are similar with the user’s query. Secondly, we rank the candidates with rank-based method and retrieve top N from these candidate segments. Besides, we investigate MPEG-7 audio LLDs as the features for spoken sentence retrieval. We show their low-complexity advantage and the use of MPEG-7 based features is proven comparable with the MFCCs (Mel-Frequency Cepstrum Coefficients) in the experiment results

ABSTRACT	I
ACKNOWLEDGEMENT	II
CONTENTS	IV
LIST OF FIGURES	VII
LIST OF TABLES	IX
CHAPTER 1.	INTRODUCTION	1
1.1.	BACKGROUND	1
1.2.	MOTIVATION	2
1.3.	OUTLINES OF THIS THESIS	3
CHAPTER 2.	MPEG-7 AUDIO LOW-LEVEL DESCRIPTORS	5
2.1.	AUDIO SPECTRUM DESCRIPTORS	5
2.1.1.	Audio Spectrum Envelope Descriptors	6
2.1.2.	Audio Spectrum Centroid Descriptors	7
2.1.3.	Audio Spectrum Spread Descriptors	7
2.1.4.	Audio Spectrum Flatness Descriptors	8
2.2.	TIMBRE DESCRIPTORS	9
2.2.1.	Harmonic Peaks Detection	11
2.2.2.	Harmonic Spectral Centroid Descriptors	12
2.2.3.	Harmonic Spectral Spread Descriptors	13
2.2.4.	Harmonic Spectral Deviation Descriptors	14
2.2.5.	Harmonic Spectral Variation Descriptors	14
2.3.	THE FEASIBILITY FOR ADOPTING MPEG-7 AUDIO LLDS TO DESCRIBING SPEECH SOUND	15
2.3.1.	Computation Complexity of MPEG-7 Audio Spectrum and Instantaneous Harmonic Descriptors	16
2.3.2.	Keywords/Sentences Matching Results	18
CHAPTER 3.	SYSTEM ARCHITECTURE OVERVIEW	21
3.1.	APPLICABLE AUDIO/SPEECH FEATURES FOR SPEECH EXTRACTION	24
3.2.	SIMILAR FRAMES TAGGING	25
3.3.	POSSIBLE SEGMENTS EXTRACTION	28
3.3.1.	Method 1 : Using a Unit Window	29
3.3.2.	Method 2 : Using a Hamming Window	31
3.4.	POSSIBLE SEGMENTS RANKING	33
3.5.	OUTPUT OF CORRESPONDING SENTENCES	34
3.6.	COMPUTATIONAL ANALYSIS	37
CHAPTER 4.	EXPERIMENTAL RESULTS	40
4.1.	DEMONSTRATION SYSTEM INTERFACE	41
4.2.	RETRIEVAL RESULTS OF SINGLE FEATURE USING KEYWORD QUERIES	44
4.3.	RETRIEVAL RESULTS OF COMBINATION OF THE FEATURES USING KEYWORD QUERIES	47
4.4.	RETRIEVAL RESULTS OF COMBINATION OF THE FEATURES USING A SENTENCE	50
CHAPTER 5.	CONCLUSIONS AND FUTURE WORKS	52
REFERENCES	53
APPENDIX	57
 
List of Figures
FIGURE 2.1 ILLUSTRATION OF AUDIO SPECTRUM ENVELOPE BANDS [18]	7
FIGURE 2.2 TIMBRE HARMONIC DESCRIPTORS ESTIMATION	10
FIGURE 2.3 HARMONIC PEAKS DETECTION	10
FIGURE 2.4 COMPUTATIONAL COMPLEXITY OF FRAME-BASED FEATURES	18
FIGURE 3.1 RECORD PROCESS	22
FIGURE 3.2 RETRIEVAL PROCESS	23
FIGURE 3.3 EXTRACTING FRAMES BY OVERLAPPED HAMMING WINDOWS	24
FIGURE 3.4 SIMILAR FRAMES TAGGING	26
FIGURE 3.5 PSEUDO-CODE FOR SIMILAR FRAMES TAGGING	27
FIGURE 3.6 WINDOW SCANNING	29
FIGURE 3.7 UTILIZING UNIT WINDOW SCANNING TO EXTRACT POSSIBLE SEGMENTS	30
FIGURE 3.8 PSUDO-CODE FOR POSSIBLE SEGMENT EXTRACTION BY METHOD 1	31
FIGURE 3.9 THE TAGGED DATA AFTER CONVOLUTION WITH A HAMMING WINDOW	32
FIGURE 3.10 PSUDO-CODE FOR POSSIBLE SEGMENT EXTRACTION BY METHOD 2	33
FIGURE 3.11 AN EXAMPLE FOR OVERALL RETRIEVAL PROCESS	36
FIGURE 3.12 THE DIRECTLY MATCHING METHOD	37
FIGURE 4.1 THE DEMO INTERFACE OF THE SENTENCE RETRIEVAL SYSTEM	42
FIGURE 4.2 OPEN THE TARGET DATABASE	42
FIGURE 4.3 LOAD THE QUERY KEYWORD	43
FIGURE 4.4 THE RETRIEVAL RESULTS	43
FIGURE 4.5 PRECISION-RECALL RELATION OF METHOD 1 BY USING A SINGLE FEATURE	45
FIGURE 4.6 PRECISION-RECALL RELATION OF METHOD 2 BY USING A SINGLE FEATURE	45
FIGURE 4.7 PRECISION-RECALL RELATION OF DIRECT MATCHING METHOD BY USING A SINGLE FEATURE	46
FIGURE 4.8 PRECISION-RECALL RELATION OF METHOD 1 BY USING MULTI-FEATURE	48
FIGURE 4.9 PRECISION-RECALL RELATION OF METHOD 2 BY USING MULTI-FEATURE	48
FIGURE 4.10 PRECISION-RECALL RELATION OF THE DIRECT MATCHING METHOD BY USING MULTI-FEATURE	49

List of Tables
TABLE 2.1 BAND OVERLAPS	9
TABLE 2.2 COMPUTATIONAL COMPLEXITY OF FRAME-BASED FEATURES	17
TABLE 2.3 MAP FOR SENTENCE MATCHING ON (A) NAMES, ABOUT 1S, (B) ORAL SENTENCES, ABOUT 3 S, (C) NEWS TITLES, ABOUT 5 S.	19
TABLE 3.1 CHARACTERISTIC OF OUR PROPOSED RETRIEVAL SYSTEM	22
TABLE 3.2 SPECIFICATIONS OF FEATURE EXTRACTION	25
TABLE 3.3 COMPUTATIONAL COMPLEXITY OF THE DIRECTLY MATCHING METHOD	38
TABLE 3.4 THE COMPUTATIONAL COMPLEXITY OF OUR PROPOSED METHOD	39
TABLE 3.5 THE AVERAGE NUMBERS OF QUERY FRAMES, SENTENCE FRAMES AND POSSIBLE SEGMENTS	39
TABLE 3.6 AN EXAMPLE FOR COMPUTATIONAL COMPLEXITY	39
TABLE 4.1 SPECIFICATION OF THE QUERIES AND TESTING SPEECH DATA USED IN THIS EXPERIMENT	41
TABLE 4.2 THE RETRIEVAL RESULTS OF METHOD 1 AND METHOD 2	46
TABLE 4.3 THE RETRIEVAL RESULTS OF COMBINATION OF THE FEATURES AND MFCC	49
TABLE 4.4 THE RESULTS OF USING SENTENCES AS QUERIES	50
                                    

[1] Berlin Chen; Hsin-min Wang; Lin-shan Lee; “Discriminating capabilities of syllable-based features and approaches of utilizing them for voice retrieval of speech information in Mandarin Chinese”, Speech and Audio Processing, IEEE Transactions on , Volume: 10 Issue: 5 , Jul 2002, Page(s): 303 -314
[2] Meng, H.M., Pui Yu Hui, “Spoken document retrieval for the languages of Hong Kong”, Intelligent Multimedia, Video and Speech Processing, 2001. Proceedings of 2001 International Symposium on , 2001, Page(s): 201 -204
[3] Johnson, S.E.; Jones, K.S.; Jourlin, P.; Moore, G.L.; Woodland, P.C.; ”The Cambridge University spoken document retrieval system”, Acoustics, Speech, and Signal Processing, 1999. ICASSP '99. Proceedings., 1999 IEEE International Conference on , Volume: 1 , 15-19 Mar 1999, Page(s): 49 -52 vol.1
[4] Matthew A. Siegler, “Integration of Continuous Speech Recognition and Information Retrieval for Mutually Optimal Performance”, Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, Pennsylvania 15213, 1999 December 15
[5] Ng, K, Zue, VW, “Phonetic recognition for spoken document retrieval”, Acoustics, Speech, and Signal Processing, 1998. ICASSP '98. Proceedings of the 1998 IEEE International Conference on , Volume: 1 , 12-15 May 1998, Page(s): 325 -328 vol.1
[6] Wechsler, “Spoken Document retrieval based on phoneme recognition”, a dissertation submitted to the SWISS FEDERAL INSTITUTE of TECHNOLOGY (ETH)ZURICH 1998
[7] J. Foote., “An overview of audio information retrieval.”, ACM Multimedia Systems, 7:2 10, 1999.
[8] Savitha Srinivasan, Dragutin Petkovic, “Phonetic confusion matrix based spoken document retrieval”, Annual ACM Conference on Research and Development in Information Retrieval Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval 2000 , Athens, Greece
[9] Amit Singhal, Fernando Pereira ,”Document expansion for speech retrieval”, Annual ACM Conference on Research and Development in Information Retrieval Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval 1999 , Berkeley, California, United States.
[10] Fabio Crestani Univ. of Strathclyde, Glasgow, Scotland, “Towards the use of prosodic information for spoken document retrieval”, Annual ACM Conference on Research and Development in Information Retrieval Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval2001 , New Orleans, Louisiana, United States
[11] H.K Xie, “A Study on Voice Caption Search for Arbitrarily Defined Keywords.” Master Thesis, National Taiwan University of Science and Technology, Taiwan, R.O.C., July 2000.
[12] Itoh, Y, “A matching algorithm between arbitrary sections of two speech data sets for speech retrieval”; Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on , Volume: 1 , 2001 , Page(s): 593 -596 vol.1
[13] Martinez, J.M.; Koenen, R.; Pereira, F, “MPEG-7: the generic multimedia content description standard, part 1” IEEE Multimedia , Volume: 9 Issue: 2, April-June 2002, Page(s): 78 -87
[14] Martinez, J.M. ”Standards - MPEG-7 overview of MPEG-7 description tools, part 2”, IEEE Multimedia , Volume: 9 Issue: 3 , Jul.-Sept. 2002, Page(s): 83 -93
[15] Avaro, O.; Salembier, P. “MPEG-7 Systems: overview”, Circuits and Systems for Video Technology, IEEE Transactions on , Volume: 11 Issue: 6 , June 2001 Page(s): 760 -764
[16] Hunter, J.”An overview of the MPEG-7 description definition language (DDL)”, Circuits and Systems for Video Technology, IEEE Transactions on , Volume: 11 Issue: 6 , June 2001 Page(s): 765 -772
[17] Salembier, P.; Smith, J.R., “MPEG-7 multimedia description schemes”, Circuits and Systems for Video Technology, IEEE Transactions on , Volume: 11 Issue: 6, June 2001 Page(s): 748 -759
[18] “ISO/IEC FDIS 15938-4 Multimedia Interface Description Interface Part 4 audio”
[19] Paliwal, K.K.; ”Spectral subband centroid features for speech recognition”, Acoustics, Speech, and Signal Processing, 1998. ICASSP '98. Proceedings of the 1998 IEEE International Conference on , Volume: 2 , 12-15 May 1998 Page(s): 617 -620 vol.2
[20] Y. S. Weng, “The chip design of Mel frequency cepstrum coefficient for HMM Speech Reconition,” Master Thesis, National Cheng Kung University, Taiwan, R.O.C., June 1998.
[21] Richard.B, Berthier.R”Modern Information Retrieval”, New York: ACM Press, 1999.
[22] L. Rabiner, B. Huang Juan.”Fundamentals of speech recognition”, published by Prentice Hall, 1993

2003-08-14公開

簡易檢索 / 詳目顯示

相關論文