| 研究生: |
黃建霖 Huang, Chien-Lin |
|---|---|
| 論文名稱: |
應用平行語料和語意相依法則於中文語音文件之摘要 Spoken Document Summarization Using Corpus-Based Approach and Semantic Dependency Grammar |
| 指導教授: |
吳宗憲
Wu, Chung-Hsien |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2004 |
| 畢業學年度: | 92 |
| 語文別: | 中文 |
| 論文頁數: | 60 |
| 中文關鍵詞: | 語音串接 、語音辨識 、語意相依法則 、語音摘要 |
| 外文關鍵詞: | speech concatenation, speech recognition, semantic dependency grammar, speech summarization |
| 相關次數: | 點閱:74 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
自動語音文件摘要技術,可應用於資訊的檢索、語意壓縮及資料記錄等。語音摘要系統兼顧語音辨識、語意摘要及串接生成摘要語音等,一連串的步驟。目前自動語音摘要存在幾個問題,首先是語音辨識準確率的提升,以及如何對語音內容萃取重要資訊、生成句法及語意上合理的摘要結果。
本論文針對自動語音摘要提出探討,解決既有存在問題。自動語音摘要包含三個步驟:語音辨識、語音摘要及摘要單元串接。語音文件透過大詞彙連續語音辨識的方法,將語音辨識成文字,並獲得摘要單元斷點、音節以及詞等資訊。語音摘要部份,就摘要本質從五個分數去分析,分別為:語音辨識信賴分數(confidence measure score)、詞重要性分數(word significance score)、語言學分數(linguistic score) 、句法結構分數(probabilistic context free grammars score)及語意相依法則分數(semantic dependency grammars score)。配合動態規劃搜尋演算法(dynamic programming algorithm, DP)獲得摘要結果。為了使最終摘要語音輸出能原音重現,將摘要語音的有效語音段取出,並且串接其內容。考慮串聯單元間彼此具有良好的流暢度,論文著眼於語音的頻譜特徵,將語音文件中重複的單元,作為摘要可能的候選單元,計算五種損失分數,分別為:頻譜中心(spectral centroid)、頻譜滑動(spectral rolloff)、頻譜變遷(spectral flux)、越零率(zero crossing rate)以及梅爾倒頻譜參數(Mel-frequency cepstral coefficients, MFCC) 。利用動態規劃搜尋方法,挑選累計最小損失分數作為摘要單元的串接。由實驗結果得知,人工摘要做比較,本研究所提出之自動語音摘要架構,能有效地萃取重要資訊,並且建構合乎文法且流暢的摘要語句。
Automatic summarization of spoken document is a useful technology to many applications, such as information extraction and semantic compression. A good summarization system imitates human hearing and understanding. The procedures include speech collection, speech recognition, semantic analysis and understanding and speech summarization. There are some problems including speech recognition, key information extraction and sentence grammar.
This thesis proposes a new approach to solve the above problems. There are three steps in the automatic summarization: speech recognition, speech summarization and speech concatenation. The speech recognition transcribes spoken documents to transcriptions with segment information. We incorporate five knowledge scores with dynamic programming (DP) technique to analyze speech summarization. The summarization scores consist of a confidence score, a word significance score, probabilistic context free grammars and semantic dependency grammars. In order to keep the original voice, we extract the audio of summarized units and concatenate them. The speech concatenation method focus on the spectral fluency including spectral centroid, spectral flux, spectral rolloff and zero crossing rate and mel-frequency cepstral coefficients. DP is also used to search the minimum cost of concatenated units. The experiments prove that our summarized result can extract key information and concatenate fluency speech.
[1] Rivarol Vergin, Douglas O’Shaughnessy, Senior Member, IEEE, and Azarshid Farhat, "Generalized Mel Frequency Cepstral Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition," IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 5, SEPTEMBER 1999
[2] Satoshi IMAI, "CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE", 1983 IEEE, Tokyo Institute of Technology Nagatsuta—cho, Midori-ku, Yokohama 227 Japan
[3] Ralf Schluter, Hermann Ney, "Using Phrase Spectrum Information for Improved Speech Recognition Performance", 2001 IEEE, RWTH Aachen - University of Technology, Germany
[4] Arnaud Martin, Delphine Charlet, Laurent Mauuary, "Robust Speech/Non-Speech Detection Using LDA Applied to MFCC", 2001 IEEE, France Telecom R&D
[5] Jeih-weih Hung, and Lin-shan Lee, "Data-Driven Temporal Filters For Robust Feature in Speech Recognition Obtained via Minimum Classification Error (MCE)", 2002 IEEE, National Taiwan University, Republic of China
[6] BISHNu s. AT& MEMBER, IEEE, AND LAWRENCE R. M I N E R , FELLOW, IEEE, "A Pattern Recognition Approach to Voiced-Unvoiced-Silence Classification with Applications to Speech Recognition," IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-24, NO. 3, JUNE 1976
[7] Hong Kook Kim, Seung Ho Choi, and Hwang Soo Lee, "On Approximating Line Spectral Frequencies to LPC Cepstral Coefficients," IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 8, NO. 2, MARCH 2000
[8] JSD Mason and Y Gu, "Perceptually-based Features in ASR", University College of Swansea, UK
[9] Lawrence Rabiner , Biing-Hwang Juang, “Fundamentals of speech recognition”, Prentice-Hall, Inc., Upper Saddle River, NJ, 1993
[10] Biing-Hwang Juang, Fellow, IEEE, Wu Chou, Member, IEEE, and Chin-Hui Lee, Fellow, IEEE, "Minimum Classification Error Rate Methods for Speech Recognition," IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 5, NO. 3, MAY 1997
[11] Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon, "Spoken Language Processing", Prentice Hall, Inc., 2001
[12] HTK Web-Site. http://htk.eng.cam.ac.uk/
[13] SRI International. http://www.sri.com/
[14] Kiyonori Ohtake, Kazuhide Yamamoto, Yuji Toma, Shiro Sado, Shigeru Masuyama,and Seiichi Nakagawa, "NEWSCAST SPEECH SUMMARIZATION VIA SENTENCE SHORTENING BASED ON PROSODIC FEATURES", Toyohashi University of Technology, Japan
[15] Frank Wessel, Ralf Schluter, Klaus Macherey, and Hermann Ney, Member, IEEE, "Confidence Measures for Large Vocabulary Continuous Speech Recognition," IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 3, MARCH 2001
[16] Stephen Cox, Member, IEEE, and Srinandan Dasmahapatra, "High-Level Approaches to Confidence Estimation in Speech Recognition," IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 10, NO. 7, OCTOBER 2002
[17] Manhung Siu, Member, IEEE, and Mari Ostendorf, Senior Member, IEEE, "Variable N-Grams and Extensions for Conversational Speech Language Modeling", IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 8, NO. 1, JANUARY 2000
[18] Berlin Chen, Hsin-min Wang, Member, IEEE, and Lin-shan Lee, Fellow, IEEE, "Discriminating Capabilities of Syllable-Based Features and Approaches of Utilizing Them for Voice Retrieval of Speech Information in Mandarin Chinese," IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 10, NO. 5, JULY 2002
[19] Julian Kupiec, Jan Pedersen and Francine Chen, "A Trainable Document Summarizer", Xerox Palo Alto Research Center
[20] Christopher D. Manning and Hinrich Schutze, "Foundations of Statistical Natural Language Processing", The MIT Press, 1999
[21] HowNet. http://www.keenage.com/
[22] WordNet. http://www.cogsci.princeton.edu/~wn/
[23] Chung-Hsien Wu and Jau-Hung Chen, "Template-Driven Generation of Prosodic Information for Chinese Concatenative Synthesis", 1999 IEEE, National Cheng Kung University, Taiwan, R.O.C.
[24] Chiori Hori, Member, IEEE, and Sadaoki Furui, Fellow, IEEE, "A New Approach to Automatic Speech Summarization," IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 5, NO. 3, SEPTEMBER 2003
[25] Jean-Manuel Van Thong, Pedro J. Moreno, Member, IEEE, Beth Logan, Member, IEEE, Blair Fidler, Katrina Maffey,and Matthew Moores, Member, IEEE, "Speechbot: An Experimental Speech-Based Search Engine for Multimedia Content on the Web," IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 1, MARCH 2002
[26] SHARON OVIATT, MEMBER, IEEE, "User-Centered Modeling and Evaluation of Multimodal Interfaces", PROCEEDINGS OF THE IEEE, VOL. 91, NO. 9, SEPTEMBER 2003
[27] Yu-Sheng Lai and Chung-Hsien Wu, “Meaning term extraction and discriminative term selection in text categorization via unknown-word methodology,” ACM Trans. on Asian Language Information Processing, Vol.1, No.1, March 2002, pp.34-64.
[28] Regina Barzilay and Michael Elhadad, "Using Lexical Chains for Text Summarization", In Proceedings of the ACL’ 97/EACL’ 97 Workshop on Intelligent Scalable Text Summarization, pp 10-17, Madrid, Spain, July 11.
[29] Konstantinos Koumpis, Steve Renals, Mahesan Niranjan, "Extractive Summarization of Voicemail using Lexical and Prosodic Feature Subset Selection", University of Sheffield, UK
[30] Banko, Mittal and Witbrock, "Headline Generation Based on Statistical Translation", ACL2000, pp318-325
[31] http://www.pts.org.tw/php/news/new_main.php
[32] R. O. Duda, P. E. Hart, and D. G. Stork. “Pattern Classification”. Wiley, New York, 2nd edition, 2000.
[33] http://turing.iis.sinica.edu.tw/treesearch/
[34] G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Transactions on Speech and Audio Processing, vol. 10, No. 5, July 2002.
[35] M. Banko, V. Mittal and M. Witbrock, “Headline generation based on statistical translation, “ in Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, 2000, pp. 318-325.
[36] http://rocling.iis.sinica.edu.tw/
[37] Hermann Ney, “The use of a one-stage Dynamic Programming Algorithm for connected word rcognition,” IEEE Trans. Acoustics, Speech, Signal Proc. , vol.32 ,no2 , P263-271
[38] F. Jelinek and R.L. Mercer, “ Interpolated Estimation of Markov Source Parameters From Sparse Data,” Pattern Recognition in Practice, E.S. Gelsema and L.N. Kanal, Eds., North-Holland Pub. Co., Amsterdam, pp. 381-397, 1980
[39] Furui, S.; Kikuchi, T.; Shinnaka, Y.; Hori, C., “Speech-to-Text and Speech-to-Speech Summarization of Spontaneous Speech,” Speech and Audio Processing, IEEE Transactions on , Volume: 12 , Issue: 4 , July 2004, pp. 401 - 408