| 研究生: | 方國安 Fung, Kuao-Ann | 
|---|---|
| 論文名稱: | 應用基因演算法於中文廣播新聞中情境切割及分類 Story segmentation and classification of chinese broadcast news using genetic algorithm | 
| 指導教授: | 吳宗憲 Wu, Chung-Hsien | 
| 學位類別: | 碩士 Master | 
| 系所名稱: | 電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering | 
| 論文出版年: | 2002 | 
| 畢業學年度: | 90 | 
| 語文別: | 中文 | 
| 論文頁數: | 58 | 
| 中文關鍵詞: | 情境切割 、分類 、基因演算法 、新聞廣播 | 
| 外文關鍵詞: | genetic algorithm, classification, broadcast news, story segmentation | 
| 相關次數: | 點閱:67 下載:2 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
情境切割意指將一連續且具有多個情境的文件,分割成數個區塊,每個區塊為一同質性的段落。此種技術大多被應用於檢索系統的後端代理程式、文件分類及文件摘要等,在資料的前處理作業中扮演著智慧型及自動化的重要角色。
在本論文中,我們探討新聞廣播的情境切割,本論文之特定目標為:1).根據新聞語料特有的播報速度進行相關語音參數的分析;2).在語音內容段落處理部分,運用Hamming Energy Mean 於Silence的預切割;3).語音轉文字部分,整合以HMM為架構的辨識器,且根據N-gram-Based語言模型進行配詞及文句生成的處理;4).在主題偵測部分,提出以詞意類別強度為導向的向量化方法,根據文件內容來計算其類別意向,進而預測文章之主題;5).在情境切割部分,運用基因演算法的架構,搭配主題快速偵測方法,設計以模糊歸屬關係計分的適應性函數,並利用基因的疊代演算求得最適當的情境分界點。
為了評估本論文所提出的方法,我們收集了共46,965的新聞文字檔以及2小時的廣播新聞音檔做模型的訓練及測試,並且,使用了TDT的標準評估式子,以Miss Probability及False Alarm Probability來驗証情境切割之效能。結果顯示系統不但在文字新聞能有75%的正確偵測,在語音廣播也能有很好的效能。
In globalizing information exchangeability and wire/wireless communication, intelligent multimedia information retrieval becomes increasingly crucial. Works on retrieving spoken documents meet the demand of convenient access to vast and heterogeneous data records. Recent researches into content-based indexing, segmentation and classification have been addressed to keep up with the growing needs from the application side, especially for the management and summarization of broadcast news. Story segmentation have play key role in supplying the occasion for retrieval through making multimedia resources available to users at their terminals. 
In this thesis, a front-end pre-processing framework was proposed to analyze the content information of spoken documents. More specifically, this study focuses on: 1) extracting the significant acoustic and linguistic features by characterizing the diverse properties in broadcasting environments, 2) using hamming energy mean normalization to pre-segment the silence boundaries to form several larger sections for each input spoken document, 3) integrating a Mandarin dictation system for content information extraction and the derived syllable graph are facilitated to reform the indexing structure with keyword and syllable information, 4) proposing a topic strength quantization  approach to measure the association between topics and content, and 5) proposing a fuzzy fitness measure to establish a GA-based segmenter for estimating the precise topic boundaries. 
In order to evaluate our proposed approach, 2 hours broadcast news and 46,965 corresponding text files were collected and used as the training and testing corpus. The miss probability and false alarm probability are adopted as the evaluation criteria for topic boundary segmentation. Experiments results showed that our proposed approaches achieved 75% accuracy for text news and aimed for broadcast news segmentation. 
參考文獻
[1]	John Makhoul, Francis Kubala, and Timothy Leek “Speech and Language Technologies for Audio Indexing and Retrieval” Proceedings of The IEEE vol. 88, no. 8, August 2000
[2]	Zbigniew Michalewicz “Genetic Algorithms + Data Structures = Evolution Programs” Third Edition 2000
[3]	Christopher D. Manning Hinrich Schutze “Foundations of Statistical Natural Language Processing” pp. 495-529, 191-195 1999
[4]	Bo-ren Bai, Berlin Chen, and Hsin-min Wang “Syllable-Based Chineses Text/Spoken Document Retrieval Using Text/Speech Query”
[5]	Hsin-min Wang “Experiments In Syllable-Based Retrieval of Broadcast News Speech In Mandarin Chinese” IEEE Trans. on Speech Communication 32 (2000) 49-60
[6]	Berlin Chen, Hsin-min Wang, and Lin-shan Lee “Retrieval of Mandarin Broadcast News Using Spoken Queries” 2000
[7]	Alexander G. Hauptmann, and Michael J. Witbrock, “Story Segmentation and Detection of Commercials In Broadcast News Video” IEEE Conference “Research and Technologies Advances In Digital Libraries” 1988.
[8]	Xiaoou Tang, Xinbo Gao, and Chun Yu Wong “NewsEys : a News Video Browsing and Retrieval System” proceeding of 2001 international symposium on intelligent multimedia, video and speech processing may 2-4 2001 Hong Kong
[9]	James Allan, Jaime Carbonell, George Doddington, Jonathan Yamron, and Yiming Yang, “Topic Detection and Tracking Pilot Study Final Report” 2000
[10]	Kam-Lai Wong, Wai Lam, and Jerome Yen, “Interactive Chniese News Event Detection and Tracking” Asia digital library conference 1999
[11] P. van Mulbregt, I. Carp, L. Gillick, S. Lowe and J. Yamron, Dragon System, Inc. “Text Segmentation and Topic Tracking on Broadcast News via a Hidden Markov Model Approach” 1999
[12]	Yeou-Jiunn Chen “A Study on Conversational Speech Recognition and Verification in Computer Telephony Integration” 2000
[13]	Baeza-Yates Ribeiro-Neto “Modern Information Retrieval” pp. 27-30 1999
[14]	聯合新聞網 http://udnnews.com/NEWS/
[15]	Klir Yuan “Fuzzy Sets and Fuzzy Logic theory and applications” pp. 11-34 1995
[16]	Regine Andre-Obrecht “A New Statistical Approach for the Automatic Segmentation of Continuous Speech Signals” IEEE transactions on acoustics, speech, and signal processing, vol. 36, No. 1. January 1988