簡易檢索 / 詳目顯示

研究生: 謝嘉欣
Hsieh, Chia-Hsin
論文名稱: 廣播新聞音訊串流之音訊內容分析技術之研究
A Study on Audio Content Analysis for Broadcast News Audio Stream
指導教授: 吳宗憲
Wu, Chung-Hsien
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2007
畢業學年度: 95
語文別: 英文
論文頁數: 112
中文關鍵詞: 報導切割語音摘要主題分類音訊內容分析音訊切割音訊分類噪音環境語音辨識
外文關鍵詞: audio content analysis, story segmentation, topic classification, speech summarization, audio segmentation, noisy speech recognition, audio classification
相關次數: 點閱:79下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,在大量網頁文件檢索方面的資訊檢索技術已經獲得快速且成熟的發展。例如Google等網頁文件瀏覽器,已可準確地以關鍵字搜尋網頁文件,甚至運用光學文字辨識(OCR)技術來搜尋之影像文件(PDF)。而隨著多媒體文件的快速增加,多媒體文件、語音文件以及新聞文件的資訊檢索顯得越來越重要。顯而易見的,多媒體資訊檢索、自動索引技術仍存在許多問題與挑戰。
    本論文的目的在於發展一套能自動處理廣播新聞音訊串流之音訊內容分析系統,以便能提供音訊斷點、音訊類別、語音內容、報導主題以及文字及語音摘要給語音檢索系統運用。本論文敘述的方法包含音訊切割及分類、噪音環境下之語音辨識、報導切割及分類以及語音摘要。最終的音訊內容分析系統即是整合所有四大模組的廣播新聞音訊串流內容分析系統。
    首先,因為大多數的廣播新聞音訊串流並沒有提供額外的標記資訊,例如語音、音樂、音樂背景下之語音,噪音背景下之語音,噪音或其他音訊段落斷點,因而有效之自動音訊切割及分類系統、對於日後的語音文件檢索與管理能有助益。本論文提出一運用最小敘述長度 (MDL)為基礎之高斯模型,並且在多斷點滑動視窗下運作,以克服過度切割之情形。音訊切割器運用最小敘述長度模型以及二元切割演算法來將音訊串流切割為同質性的相鄰音段。之後,本論文運用經驗法則來作平滑化,以及運用一常用的音訊分類器作以音段為基礎之音訊分類。
    在得到音訊斷點及類別之後,本論文發展出一運用參數增強方法為基礎之噪音環境下之語音辨識方法。此方法首先運用循序最大期望值演算法配合事前模型估算噪音參數,接著利用隨機向量映射,配合噪音正規化,來發展出一語音增強的方法,用來將噪音語音參數向量轉換為乾淨的增強語音向量,以克服噪音環境下,因為噪音環境變動而導致語音辨識率過低的問題。此外,並運用環境模型調適來降低訓練語料與測試語料不匹配而導致語音增強效能降低的情形。
    接著,本論文提出一報導切割及主題分類之模組,來擷取廣播新聞報導之主題,以及將相同語者不同主題之相鄰語音段落切割出來。首先運用決策樹及最大謪方法做粗略報導切割,接著依據粗略報導切割邊界延伸出邊界範圍,以建立報導切割邊界之搜尋空間,接著運用主題切割模型配合基因演算法,將報導切割問題轉換成最佳基因選取問題,來找尋最佳報導切割邊界。
    最後,本論文提出一語音摘要之方法,根據語音辨識信賴分數、語音韻律資訊、特徵詞、三連語言模型以及語意相依關係,來擷取報導內容之重點,能將不重要語音段捨棄掉以摘要出較為精簡的文字報導摘要;接著,根據聲學特徵參數在來源語音文件中選取最佳的語音摘要串接結果,最終能提供給語音文件檢索系統較友善且省時的瀏覽表示方式。

    In recent year, the information retrieval systems obtain dramatically improvement to retrieve huge web documents precisely. While the increasing of multimedia documents, the multimedia information retrieval, including spoken document or broadcast news document retrieval, becomes more and more important but still remains many challenges.
    The aim of this dissertation is to develop an audio content analysis system to process the broadcast news audio stream in advance, so as to provide the audio change-point, audio type, content transcription, story topic of speech segments and speech summary for further application, such as spoken document retrieval. The approaches described in this dissertation include the audio segmentation and classification, noisy speech recognition, story segmentation and classification and speech summarization.
    First, an audio segmentation and classification approach are proposed for segmenting and classifying an audio stream based on audio type, including speech, music, speech with music background, speech with noise background and noise. A minimum description length (MDL)-based Gaussian model with multiple change-points window is constructed to statistically characterize the audio features. Audio segmentation segments the audio stream into a sequence of homogeneous sub-segments using the MDL-based Gaussian model via binary segmentation algorithm. Finally, a heuristic method is adopted to smooth the sub-segment sequence and provide the final segmentation and classification results via a segment-based audio classifier.
    After locating the change-points positions and identifying all the audio types, robust noisy speech recognition via feature enhancement is developed to transcribe the speech content since much speech segments uttered under noisy environment. Three prior models are introduced to characterize clean speech, noise and noisy speech respectively. Sequential noise estimation is therefore employed for prior model construction based on noise-normalized stochastic vector mapping. Furthermore, an environment model adaptation is also adopted to reduce the mismatch between training data and test data.
    After transcribing all the speech segments, a story segmentation and topic classification is proposed to locate the topic change-points in contiguous stories and to identify the topic for every speech segment. A two-stage paradigm adopts a decision tree and maximum entropy model to identify the potential story boundaries in the broadcast news within a sliding window is conducted first. The story segmentation problem for story segmentation is thus transformed to the determination of a boundary position sequence from the potential boundary regions. The genetic algorithm is then applied to determine the chromosome, which corresponds to the final boundary position sequence. A topic-based segmental model is proposed to define the fitness function applied in the genetic algorithm.
    Finally, a speech summarization is further involved. A word sequence which maximizes a summarization score, including speech recognition confidence, prosody information, word significance, word trigram and semantic dependency relation, is extracted from automatically transcribed speech. Second, in the spoken documents, all the speech segments corresponding to the words in the summarized word sequence are extracted as the candidates for concatenation. Finally, the speech segments with the highest concatenation score among the candidates are selected and concatenated to generate a smooth summarized speech output. This speech summarization function not only keeps the most important information of every story but provides a concise speech summary representation, which is more friendly and time-saving representation for further spoken document retrieval and browsing.

    ABSTRACT i AKNOWLEDGEMENT v TABLE OF CONTENT vii LIST OF TABLES x LIST OF FIGURES xi CHAPTER 1. INTRODUCTION 1 1.1. Motivation 1 1.2. The Approach of this Dissertation 2 1.3. The Organization of this Dissertation 5 CHAPTER 2. AUDIO SEGMENTATION AND CLASSIFICATION via an MDL-based Gaussian Model 7 2.1. Introduction 7 2.2. MDL-based Gaussian Model for Audio Segmentation 9 2.2.1. Multiple Change-Point Gaussian Model 10 2.2.2. MDL-based Gaussian Model 12 2.2.3. Hierarchical Binary Segmentation Procedure 14 2.2.4. Acceleration of Parameter Estimation 17 2.3. Silence Deletion and Segment-based Classification 18 2.3.1. Silence Deletion 18 2.3.2. Feature Analysis 20 2.3.3. Segment-based Audio Classification and Smoothing 20 2.4. Experimental Results and Discussion 21 2.4.1. Training, Tuning and Testing Sets 21 2.4.2. Performance Evaluation 22 2.4.3. Parameter Tuning of the MDL-Gaussian Model, Delta-BIC and Agglomerative Clustering 23 2.4.3.1. Threshold Tuning for Silence Deletion 23 2.4.3.2. MFCC Order 24 2.4.3.3. MDL-based Gaussian Model with Sliding Window Strategy 25 2.4.3.4. Parameter Tuning for Delta-BIC 26 2.4.3.5. Parameter Tuning for Agglomerative Clustering 27 2.4.4. Experiments for Audio Segmentation on the TDT-3 Testing Set 28 2.4.5. Experiments for Audio Classification and Smoothing 29 2.4.5.1. Segment-based Audio Classification vs. Clip-based Audio Classification 29 2.4.5.2. Decreasing False Alarm Rate using Segment-based Classifier and Smoothing 31 2.5. Summary of This Chapter 32 CHAPTER 3. NOISY SPEECH RECOGNITION based on stochastic vector mapping-based feature enhancement via prior models and model adaptation 33 3.1. Introduction 33 3.2. Noise-Normalized Stochastic Vector Mapping for Cepstral Feature Enhancement 3 3.2.1. Stochastic Vector Mapping 34 3.2.2. Noise-Normalized Stochastic Vector Mapping 37 3.3. Prior Model for Sequential Noise Estimation 38 3.3.1. The Acoustic Environment Model 38 3.3.2. The Prior Models 38 3.3.3. Sequential Noise Estimation 40 3.4. Environment Model Adaptation 43 3.4.1. Model Adaptation on Noise and Noisy Speech Prior Models 43 3.4.2. Model Adaptation of Noise Normalized Stochastic Vector Mapping 44 3.5. Experimental Results and Discussion 45 3.5.1. Training, Adaptation and Test Sets 45 3.5.2. Experiments on AURORA2 46 3.5.3. Experiments on MATBN 48 3.6. Summary of This Chapter 50 CHAPTER 4. STORY SEGMENTATION AND CLASSIFICATION via a topic-based segmental model 51 4.1. Introduction 51 4.2. Subsystem Overview 55 4.3. LSA-based Naïve Bayes Topic Classifier 56 4.4. Story Segmentation using Genetic Algorithm 58 4.4.1. Stage 1: Coarse Boundary Detection by Decision Tree and Maximum Entropy 60 4.4.2. Stage 2: Fine Boundary Detection 62 4.4.2.1. Topic-based segmental model 62 4.4.2.2. Search precise boundary positions using genetic algorithm 64 4.5. Experimental Results and Discussion 65 4.5.1. Training, Development and Testing Set 65 4.5.2. Performance Evaluation of Story Segmentation 66 4.5.3. Audio News Transcription using Speech Recognizer 67 4.5.4. Experiments on Topic Classification 68 4.5.5. Parameter Tuning on Story Segmentation 69 4.5.5.1. Parameter Tuning of the DT_ ME-based Pre-Segmenter 70 4.5.5.2. Parameter tuning of window size 70 4.5.5.3. Parameter Tuning of the GA-based Story Segmenter 70 4.5.6. Experiments on Story Segmentation 73 4.5.6.1. Experiments on Text News Corpus 73 4.5.6.2. Experiments on BCC Mandarin Broadcast News Corpus 73 4.5.6.3. Experiments on TDT3 Mandarin Audio Corpora 74 4.6. Summary of This Chapter 75 CHAPTER 5. SPOKEN DOCUMENT SUMMARIZATION via speech sentence compression based on speech segment extraction and concatenation 77 5.1. Introduction 78 5.2. Speech Segment Extraction 80 5.2.1. Prosody Score 81 5.2.2. Speech Recognition Confidence 83 5.2.3. Word Significance 83 5.2.4. Word Trigram Score 85 5.2.5. Semantic Dependency Score 86 5.3. Speech Segment Concatenation 88 5.4. Experimental Results and Discussion 90 5.4.1. Evaluation of Comparison to Manual Summarization Results 90 5.4.2. Evaluation using ROUGE-N 93 5.4.3. Evaluation of Key Information Extraction 94 5.4.4. Subjective evaluation for the speech segment concatenation 95 5.5. Summary of This Chapter 96 CHAPTER 6. CONCLUSION AND FUTURE WORK 97 BIBLIOGRAPHY 99 AUTOBIOGRAPHY 109

    [Ahn et al. 2002] C.W. Ahn, and R.S. Ramakrishna, “A Genetic Algorithm for Shortest Path Routing Problem and the Sizing of Populations,“ IEEE Trans. on Evolutionary Computation, vol. 6, no. 6, pp. 566-579, 2002.
    [Allan et al. 1998] J. Allan, J. Carbonell, G. Doddington, J. Yamron and Y. Yang, “Topic Detection and Tracking Pilot Study: Final Report,” in Proc. DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, Virginia, USA, February 1998, pp. 194-218.
    [Bakis et al. 1997] R. Bakis, et al., “Transcription of BN shows with the IBM LVCSR system,” in Proc. DARPA Speech Recognition Workshop, 1997.
    [Banko et al. 2000] M. Banko, V. Mittal and M. Witbrock, “Headline generation based on statistical translation, “ in Proc. of the 38th Annual Meeting of the Association for Computational Linguistics, 2000, pp. 318-325.
    [Barron et al. 1998] A. Barron, J. Rissanen, and B. Yu, “The minimum description length principle in coding and modeling,” IEEE Trans. on Information Theory, vol. 44, no. 6, pp. 2743-2760, 1998.
    [Beigi et al. 1998] H. Beigi and S. Maes,, “Speaker, channel and environment change detection,” in Proc. World Congress on Automation, 1998
    [Bellegarda et al. 2000] J. R. Bellegarda,, “Exploiting Latent Semantic Information in Statistical Language Modeling, “ Proceedings, of the IEEE, vol. 88, no. 8, pp. 1279-1296, 2000.
    [Benveniste et al. 1990] A. Benveniste, M. Metivier, and P. Priouret, “Adaptive Algorithms and Stochastic Approximations-Applications of Mathematics,” New York: Springer, 1990. vol.22.
    [Beyerlein et al. 2002] P. Beyerlein, X. Aubert, R. Haeb-Umbach, M. Harris, D. Klakow, A. Wendemuth, S. Molau, H. Ney, Michael Pitz ,and A. Sixtus, “Large vocabulary continuous speech recognition of Broadcast News – The Philips/RWTH approach,” Speech Communication, vol. 37, pp. 109-131, 2002
    [Bikel 2000] Daniel M. Bikel, “A statistical model for parsing and word-sense disambiguation,” in Proc. of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, Hong Kong, 2000, pp. 155-168.
    [Boll 1979] S. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. on Acoustic, Speech, Signal Processing, vol. ASSP-27, pp. 113–120, Apr. 1979.
    [Cettolo et al. 2000] M. Cettolo and M. Federico, “Model selection criteria for acoustic segmentation,” in Proc. of the ISCA ITRW ASR 2000 Automatic Speech Recognition, Paris, France, 2000. pp. 221-227.
    [Cettolo et al. 2003] M. Cettolo and M. Vescovi, “Efficient audio segmentation algorithms based on the BIC,” in Proc. ICASSP03, 2003.
    [Chang et al. 2006] S.-F. Chang, R. Manmatha2, and T.-S. Chua, “Combining text and audio-visual features in video indexing,” in Proceedings of ICASSP 2005, Philadelphia, PA, 2005, pp. 1005-1008.
    [Chen et al. 1998] S.S. Chen and P.S. Gopalakrishnan, “Speaker, environment and channel change detection and clustering via the Bayesian Information Criterion,” in Proc, of the DARPA Broadcast News TRanscri. & Underst. Workshop, Landsdowne, VA, 1998.
    [Chen et al. 2000] B.L. Chen, H.M. Wang and L.S. Lee, ““Retrieval of Mandarin Broadcast News Using Spoken Queries,” in Prof. of International Conference on Spoken Language Processing (ICSLP’00), China, Oct. 2000, pp. 520-523.
    [Chen et al. 2002] S.S. Chen, E. Edie, M.J.F. Gales, R.A. Gopinath, D. Kanvesky and P. Olsen, “Automatic transcription of broadcast news,” Speech Communication, vol. 37, pp. 69-87, 2002.
    [Chen et al. 2002] H. Chen and C.H. Wu, “Speech enhancement based on audible noise spectrum and short-time spectral amplitude estimator,” Electronics Letters, vol. 38, vo. 10, May 2002, pp. 485-486.
    [Chen et al. 2002] Y.J. Chen, , C.H. Wu, , Y.H. Chiu, and H.C. Liao, ”Generation of Robust Phonetic Set and Decision Tree for Mandarin Using Chi-square Testing,” Speech Communication, vol. 38, issues 3-4, Nov. 2002, pp 349-364.
    [Chung et al. 2004] F.L. Chung, T.C. Fu, V. Ng and R. W. P. Luk, “An Evolutionary Approach to Pattern-based Time Series Segmentation,” IEEE Trans. on Evolutionary Computation, vol. 8, no. 5, pp. 471-489, 2004.
    [CKIP 2005 ] CKIP Treebank, Home Page: http://godel.iis.sinica.edu.tw/CKIP/treebank/
    [Daume et al. 2002] H. Daume III and D. Marcu, “A noisy-channel model for document compression, “ in Proc. ACL-2002, Philadelphia, PA, 2002, pp.449-456.
    [Deerwester et al. 1990] S. Deerwester, S.T. Dumais, T.K. Landauer, G.W. Furnas and R.A. Harshman, “Indexing by Latent Semantic Analysis,” Journal of the Society for Information Science, vol. 41, no. 6, pp. 391-407, 1990.
    [Delacourt et al. 2000] P. Delacourt and C.J. Wellekens, “DISTBIC: A speaker-based segmentation for audio data indexing,” Speech Communication, vol. 32, pp. 111-126, 2000.
    [Deng et al. 2000] L. Deng, A. Acero, M. Plumpe, and X.D. Huang, “Large-vocabulary speech recognition under adverse acoustic environments,” in Proc. ICSLP’2000, Beijing, China, 2000, pp. 806–809.
    [Deng et al. 2003] L. Deng, J. Droppo, and A. Acero, “Recursive estimation of nonstationary noise using iterative stochastic approximation for robust speech recognition,” IEEE Trans. on Speech and Audio Processing, vol. 11, no. 6, pp. 568-580, 2003.
    [Doddington 1999] G. Doddington. (1999, August). The 1999 Topic Detection and Tracking (TDT3) Task Definition and Evaluation Plan Version 2.7. [Online]. Available: http://www.nist.gov/speech/tdt3/doc/tdt3.eval.plan.99.v2.7.ps
    [Fang et al. 2006] Y. Fang, X.F. Zhai, J.W. Fan, “News video story segmentation,” in Proceedings of Multi-Media Modeling Conference Proceedings 2006, Beijing China, 2006, pp. 397-400.
    [Franz et al. 2000] M. Franz, J.S. McCarley, S. Roukos, T. Ward and W.-J Zhu, “Segmentation and Detection at IBM: Hybrid Statistical Models and Two-Tiered Clustering,” in Proc. TDT-3 Workshop, February 2000, pp. 135-148.
    [Furui et al. 2004] S. Furui, T. Kikuchi, Y. Shinnaka and C. Hori, “Speech-to-text and speech-to-speech summarization of spontaneous speech, “IEEE Trans. on Speech and Audio Processing, vol. 12, no. 4, pp. 401-408, 2004.
    [Furui 2006] S.Furui, “Recent advances in automatic speech summarization,” in Proceedings of Spoken Language Technology Workshop 2006, Palm Beach, Aruba, 2006, pp. 16-21.
    [Gales et al. 1996] M.J.F. Gales and S.J. Young, “Robust Continuous Speech Recognition Using Parallel Model Combination,” IEEE Trans. on Speech and Audio Processing, vol. 4, no. 5, pp. 352-359, 1996.
    [Gauvain et al. 1994] J.-L. Gauvain and C.H. Lee, “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chins,” IEEE Trans. on Speech and Audio Processing, vol. 2, no. 2, pp. 291-298,1994.
    [Gish et al. 1994] H. Gish and N. Schmidt, “Text-independent speaker identification,” IEEE Signal Processing Magazine, pp. 18-21, 1994.
    [Hauptmann et al. 1998] A. G., Hauptmann and M. J. Witbrock, “Story Segmentation and Detection of Commercials In Broadcast News Video,” in Proc. ADL-98 Advances in Digital Libraries Conf., Santa Barbara, CA, April 1998, pp.168-179.
    [Hauptmann et al. 2003] A. Hauptmann, N. Moraveji, M-Y. Chen, M. Christel, Duygulu Huang, C. Baron, R. Lin, W-H. Yang, J. Ng, D. Papernick, N. Snoek, C.G.M. Tzanetakis, G. Wactlar, H. Yan, R. Jin, " Informedia at TRECVID 2003: Analyzing and Searching Broadcast News Video," in Proceedings of (VIDEO) TREC 2003 (Twelfth Text Retrieval Conference), Gaithersburg, MD, November 17-21, 2003
    [Hearst et al. 1997] M. Hearst, “Texttiling: Segmenting Text into Multi-Paragraph Subtopic Passages,” Computational Linguistics, vol. 23, no. 1, pp. 33-64, 1997.
    [Hermansky et al. 1994] Hermansky, H. and Morgan, N., “RASTA processing of speech,” IEEE Trans. on Speech and Audio Processing, vol. 4, pp. 578–589, Oct. 1994.
    [Hirsch et al. 2000] H. G. Hirsch and D. Pearce, “The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions,” in Proc. ISCA ITRWASR2000 on Automatic Speech Recognition: Challenges for the Next Millennium, Paris, France, Sept. 2000, pp. 181-188.
    [Hori et al. 2003] C. Hori and S. Furui, “A new approach to automatic speech summarization,” IEEE Trans. on Multimedia, vol. 5, no. 3, pp. 368-378, 2003.
    [HowNet 2005] HowNet, Home Page: http://www.keenage.com
    [Hsieh et al. 2003] Jia Hsin Hsieh, Chung-Hsien Wu and Kuao-Ann Fung,”Two Stage Story Segmentation and Detection on Broadcast News Using Genetic Algorithm,” in Proceedings of 2003 ISCA Workshop on Multilingual Spoken Document Retrieval, MSDR 2003, Hong Kong, 2003.
    [Hsieh et al. 2004] C.H. Hsieh, C.L. Huang and C.H. Wu, “Spoken document summarization using topic-related corpus and semantic dependency grammar,” in Proc. of International Symposium on Chinese Spoken Language Processing (ISCSLP 2004), Hong Kong, 2004, pp. 333-336.
    [Hsieh et al. 2006] C.H. Hsieh, C.H. Wu and J.Y. Lin, “Stochastic Vector Mapping-based Feature Enhancement Using Prior Model and Environment Adaptation for Noisy Speech Recognition, “ in Proceedings of ICSLP 2006, Pittsburgh, USA, 2006.
    [Huang et al. 2001] Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon, Spoken Language Processing. Prentice Hall, Inc., 2001
    [Huang et al. 2006] R.Q. Huang, and John H. L. Hansen, “Advances in Unsupervised Audio Classification and Segmentation for the Broadcast News and NGSW Corpora,” IEEE Trans. On Audio, Speech and Language Processing, vol. 14, no. 3, pp. 907-919, 2006.
    [Joachims 1998] T. Joachims, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features,” in Proc. ECML-98, 10th European Conference on Machine Learning, Chemnitz, Germany, April 1998, pp. 137-142.
    [Ju et al. 1981] L. Ju. Vostrikova, “Detecting ‘disorder’ in multidimensional random process,” Soviet Mathematics Doklady, vol. 24, pp. 55-59, 1981.
    [Justeson et al. 1995] J.S. Justeson and S.M. Katz, “Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text,” Natural Language Engineering, vol. 11, pp. 9-27, 1995.
    [Kemp et al. 1997] T. Kemp and T. Schaaf, “Estimating confidence using word lattices, “ in Proc. 5th Eurospeech, vol. 2, Rhodes, Greece, 1997, pp. 827-830
    [Kim et al. 2001] E. Y. Kim, S.W. Hwang, S.H. Park and H. J. Kim, “Spatiotemporal Segmentation using Genetic Algorithms,” Pattern Recognition, vol. 34, pp.2063-2066, 2001.
    [Knight et al. 2000] K. Knight and D. Marcu, "Statistics-based summarization --- Step One: Sentence Comprehension", in Proc. Of the 17th National Conference on Artificial Intelligence (AAAI-2000), Austin, TX 2000, pp. 703-710.
    [Knight et al. 2002] K. Knight and D. Marcu, “Summarization beyond sentence extraction: A probabilistic approach to sentence compression, “ Artificial Intelligence, vol. 139, pp. 91-107, 2002.
    [Koumpis et al. 2001] K. Koumpis and S. Renals, “The role of prosody in a voicemail summarization system,” in Proc. ISCA Workshop on Prosody in Speech Recognition and Understanding, Red Bank, NJ, USA, 2001.
    [Kubala et al. 1997] F. Kubala et al., “The 1996 BBN Byblos Hub-4 transcription system,” in Proc. Speech Recognition Workshop, 1997, pp. 90-93.
    [Krishnamurthy et al. 1993] V. Krishnamurthy and J. B. Moore, “Online estimation of hidden markov model parameters based on the Kullback-Leibler information mature,” IEEE Trans. on Signal Processing, vol. 41, pp. 2557-2573, 1993.
    [Lafferty et al. 1999] J. Lafferty, D. Beeferman and A. Berger, “Statistical Models for Text Segmentation,” Machine Learning, Special Issue on Natural Language Learning, 34(1-3) , pp. 177-210, 1999.
    [Lavielle 1998] M. Lavielle, “Optimal segmentation of random process,” IEEE Trans. on Signal Processing, vol. 46, no. 5, pp. 1365-1373, 1998.
    [Lewis 1992] D. D. Lewis, “Representation and Learning in Information Retrieval,” Ph. D. thesis, Dept. of Computer and Information Science, University of Massachusetts, 1992.
    [Lewis et al. 1994] D. D. Lewis and M. Ringuette, “A Comparison of Two Learning Algorithms for Text Categorization,” in Proc., 3rd Annual Symposium on Document Analysis and Information Retrieval (SDAIR94), Las Vegas, NV, USA, April 1994, pp. 81-93.
    [Li et al. 1998] Y. H. Li and A. K. Jain, “Classification of Text Documents,” The Computer Journal, vol. 41, no. 8, pp. 537-546, 1998.
    [Lin et al. 2002] C.-Y. Lin and E. Hovy, “From single to multi-document summarization: A prototype system and its evaluation, “ in Proc. ACL-2002, Philadelphia, PA, 2002, pp. 457-464.
    [Lin 2004] C.-Y. Lin, "ROUGE: a Package for Automatic Evaluation of Summaries", in Proc. of the Workshop on Text Summarization Branches Out (WAS 2004), Barcelona, Spain, 2004, pp. 74-81.
    [Liu et al. 1999] D. Liu and F. Kubala , ”Fast speaker change detection for broadcast news transcription and indexing,” in Eurospeech-99, vol. 3, pp. 1031-1034.
    [Liu et al. 2001] Z. Liu, Y Wang, and T. Chen, “A robust audio classification and segmentation method,” in Proc. 9th ACM Int. Conf. Multimedia, 2001, pp. 203-221.
    [Lu et al. 2002] L. Lu, H.-J. Zhang ,and H. Jiang, “Content analysis for audio classification and segmentation,” IEEE Tran. on Speech and Audio Processing, vol. 10, pp. 504-516, 2002.
    [Lu et al. 2006] L. Lu, Rui Cai, A. Hanjalic, “Audio Elements Based Auditory Scene Segmentation, ” in Proceedings of ICASSP 2006 .
    [Macho et al. 2002] D. Macho, L. Mauuary, B. Noe, Y. M. Cheng, D. Ealey, D. Jouver, H. Kelleher, D. Pearce, and F. Saadoun, “Evaluation of a noise-robust DSR front-end on Aurora databases,” in Proc. ICSLP’2002, Denver, CO, 2002, pp. 17–20.
    [Makhoul et al. 2000] J. Makhoul, F. Kubala and T. Leek, “Speech and Language Technologies for Audio Indexing and Retrieval,” Proc. IEEE, vol. 88, no. 8, pp. 1338-1353, August 2000.
    [Manning et al. 1999] C.D. Manning and H. Schutze, Foundations of Statistical Natural Language Processing. The MIT Press, 1999.
    [Manu et al. 1999] I. Manu and M. Maubury, Advances in Automatic Summarization. Cambridge, MA: MIT Press, 1999.
    [Martin et al. 1997] A. Martin, G. Doddington, T. Kamm, M. Ordowski ,and Przybocki, “The DET curve in assessment of detection task performance,” in Proc. of Eurospeech-97, Rhodes, Greece, Sept. 1997, pp 1895-1898.
    [Michalewicz 1996] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs. Springer Verlag, 3rd Edition, 1996.
    [Mitchell 1996] T. M. Mitchell, Machine Learning, McGraw Hill, New York, NY. 1996.
    [Moreno et al. 1996] P. Moreno, B. Raj, and R. Stern, “A vector Taylor series approach for environment-independent speech recognition,” in Proc. ICASSP-1996, vol. 1, 1996, pp. 733–736.
    [Ohtake et al. 2003] K. Ohtake, K. Yamamoto, Y. Toma, S. Sado, S. Masuyama and S. Nakagawa, "Newscast Speech Summarization via Sentence Shortening based on Prosodic Features," in Proc. ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition, pp. 167-170, 2003.
    [Ponte et al. 1997] J. M. Ponte and W. B. Croft, “Text Segmentation by Topic,” in Proc. First European Conf. on Research and Advanced Technology for Digital Libraries (ECDL97), Pisa, Italy, September 1997, pp 120-129.
    [Rissanen 1987] J. Rissanen, “Stochastic complexity,” Journal of the Royal Statistical Society, series B, vol. 49, pp. 223-239, 1987.
    [Rissanen 1989] J. Rissanen, “Stochastic complexity in statistical inquiry,” World Scientific Publishing Co., Inc., River Edge, NJ, 1989.
    [Salcedo-Sanz et al. 2006] S. Salcedo-Sanz, A. Gallardo-Antolin, J.M. Leiva-Murillo and C. Bousono-Calzon, “Offline Speaker Segmentation Using Genetic Algorithms and Mutual Information,” IEEE Trans. on Evolutionary Computation, vol. 10, no. 2, pp. 175-186, 2006.
    [Seber 1984] G. A. F. Seber, “Multivariate observations,” John Wiley & Sons, New York, NY, 1984.
    [Shriberg et al. 2000] E. Shriberg, A. Stolcke, D. Hakkani-Tür and G. Tür, “Prosody-Based Automatic Segmentation of Speech into Sentences and Topics,” Speech Communication, vol. 32, no. 1-2, pp. 127-154, 2000.
    [Siegler et al. 1997] M.A. Siegler, U. Jain, B. Raj ,and R.M. Stern, “Automatic segmentation, classification and clustering of broadcast news audio,” in DARPA Speech Recognition Workshop, 1997, pp 97-99. [Stokes et al. 2004] N. Stokes, J. Carthy and A. F. Smeaton, “A Lexical Cohesion based News Story Segmentation System,” Journal of AI Communications, vol. 17, no. 1, pp. 3-12, March 2004.
    [Theodoridis et al. 1999] S. Theodoridis and K. Koutroumbas, “Pattern recognition,” Academic Press, San Diego, U.S.A., 1999.
    [Tür et al. 2001] G. Tür , A. Stolcke , D. Hakkani-Tür , E. Shriberg, “ Integrating prosodic and lexical cues for automatic topic segmentation,” Computational Linguistics, vol.27, no.1, pp.31-57, 2001.
    [Tzanetakis et al. 2002] G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Transactions on Speech and Audio Processing, vol. 10, No. 5, July 2002.
    [UDN News Website 2002] United Daily News Website, Home Page: http://www.udn.com
    [Utiyama et al. 2001] M. Utiyama and H. Isahara, “A Statistical Model for Domain-Independent Text Segmentation,” in Proc. 39th Annual Meeting on Association for Computational Linguistics (ACL’01), Toulouse, France, July 2001, pp. 499-506.
    [Valenza et al. 1999] R. Valenza, T. Robinson, M. Hickey and R. Tucker, "Summarization of spoken audio through information extraction", in Proc. ESCA Workshop on Accessing Information in Spoken Audio, Cambridge, MA, 1999, pp. 111-116.
    [Van Mulbregt et al. 1998] P. Van Mulbregt, I. Carp, L. Gillick, S. Lowe and J. Yamron, “Text Segmentation and Topic Tracking on Broadcast News via a Hidden Markov Model Approach,” in Proc. ICSLP98, volume I, Sydney, Australia, 1998, pp. 333-336.
    [Wang et al. 2000] Y. Wang, Z. Liu ,and J. Huang, "Multimedia content analysis using audio and visual information," IEEE Signal Processing Magazine, vol. 17, no. 6, pp. 12-36, 2000.
    [Wang et al. 2005] H.M. Wang, B.L. Chen, J.W. Kuo, and S.S. Cheng, "MATBN: A Mandarin Chinese Broadcast News Corpus," International Journal of Computational Linguistics and Chinese Language Processing, vol. 10(2), pp. 219-236, June 2005.
    [Weinstein et al. 1990] E. Weinstein, M. Feder, and A. Oppenheim, “Sequential algorithms for parameter estimation based on Lullback-Leibler information measure,” IEEE Trans. on Signal Processing, vol. 38, pp. 1652-1654, 1990.
    [Wilcox et al. 1994] L. Wilcox, F. Chen, D. Kimber and V. B. Alasubramanian, “Segmentation of speech using speaker identification,” in Proc. ICASSP94, vol. S1, 1994, pp. 161-164.
    [Wong et al. 1999] K.L. Wong, W. Lam and J. Yen, “Interactive Chinese News Event Detection and Tracking,” in Proc. of Second Asia Digital Library Conference, Taipei, Taiwan, November 1999, pp. 30-43.
    [Woodland et al. 1997] P. Woodland, M. Gales, D. Pye, S. Young, “The development of the 1996 HTK broadcast news transcription system,” in Proc. Speech Recognition Workshop, 1997, pp. 73-78.
    [Wu et al. 2001] C.H. Wu and Y.J. Chen, “Multi-Keyword Spotting of Telephone Speech Using a Fuzzy Search Algorithm and Keyword-Driven Two-Level CBSM,” Speech Communication, vol.33, pp.197-212, 2001.
    [Wu et al. 2002] J. Wu and Q. Huo, “An environment compensated minimum classification error training approach and its evaluation on AURORA2 database,” in Proc. ICSLP-2002, Denver, Colorado, USA, 2002, pp.453-456.
    [Wu et al. 2004] C.H. Wu and G.L. Yan, “Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition,” Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology, vol. 36, pp.87-99, 2004.
    [Wu et al. 2005] C.H. Wu and G.L. Yan, “Speech Act Modeling and Verification of Spontaneous Speech with Disfluency in a Spoken Dialogue System,” IEEE Trans. on Speech and Audio Processing, Vol.13, pp.330-344, 2005
    [Wu et al. 2006] C.H. Wu and C. H. Hsieh, “Multiple Change-Point Audio Segmentation and Classification Using an MDL-based Gaussian Model,” IEEE Trans. on Audio, Speech and Language Processing, vol. 14, no. 2, pp. 647-657, 2006.
    [Wu et al. 2006] J. Wu, and Q. Huo, “An Environment-Compensated Minimum Classification Error Training Approach Based on Stochastic Vector Mapping,” IEEE Trans. On Audio, Speech and Language Processing, vol. 14, no. 6, pp. 2147-2155, 2006.
    [Yang 1994] Y. Yang, “Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval,” in Proc. SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval, Dublin, Ireland, July 1994, pp. 13-22.
    [Yao et al. 2004] K. Yao, K. K. Paliwal and S. Nakamura, “Noise adaptive speech recognition based on sequential noise parameter estimation,” Speech Communication, vol. 42, pp. 5-23, 2004.
    [Zhang et al. 2001] T. Zhang and C.-C. J. Kuo, “Audio content analysis for online audiovisual data segmentation and classification,” IEEE Trans. on Speech Audio Processing, vol. 9, pp. 441-457, 2001.

    下載圖示 校內:2009-07-18公開
    校外:2009-07-18公開
    QR CODE