| 研究生: |
李芳明 Li, Fan-Min |
|---|---|
| 論文名稱: |
語音轉語音翻譯系統晶片設計 SOC Design for Speech to Speech Translation |
| 指導教授: |
王駿發
Wang, Jhing-Fa |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2002 |
| 畢業學年度: | 90 |
| 語文別: | 中文 |
| 論文頁數: | 85 |
| 中文關鍵詞: | 動態可程式規劃 、內嵌型例句式翻譯 |
| 外文關鍵詞: | Track Back, Template Search |
| 相關次數: | 點閱:62 下載:4 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來由於行動資訊服務已蔚為風行,使得語音轉語音(口語)翻譯研究日益興盛,目前相關研究成果有由德國工業前瞻技術中心開發的一套(Server-based) Verbmobile系統,透過語音辨識、語言翻譯及語音合成來循序(serial)處理語音對語音的轉譯,使用者可藉由手機、電話、網際網路與此系統聯繫並進行跨語言的對談。唯此系統需建立在電腦伺服器之上,無法隨處可用及隨時可用,所以在本研究中研發具可攜性之語音轉語音翻譯系統單晶片,有效克服此問題,並可移植在各種多媒體人機介面應用上。
在語音翻譯演算法研發的過程中,由於語音辨識上無法達到百分之百的辨識率,故把語音辨識內嵌(Embedded)於語言翻譯合併處理,而例句式(template-based)語言翻譯是現今主要翻譯技術,其目的是搜尋與輸入句最相似的例句,將比對相異之處再作翻譯後處理調整(Insertion、Deletion和Replacement),所以本研究中直接將語音信號透過動態規劃(one-stage)搜尋作例句比對與翻譯,並於個人電腦驗證其演算法之可行性,透過修正後的動態規劃演算法,其實驗結果可達8成至9成的轉譯正確率。
在其晶片設計方面,硬體架構採SOC之設計理念,除了利用既有的IP外,由於目前搜尋比對相關研究之參考模組大小為固定式,故在研究中亦根據系統需求,特別研發出可適用在變動式動態規劃例句搜尋IP與路徑回溯IP。在ASIC驗證方面,整合例句搜尋IP與路徑回溯IP於單晶片架構中,並利用雙記憶體模組電路設計,使得例句搜尋和路徑回溯處理可同步進行,如此不僅減少閒置元件使用,亦可加快運算速度。
最後,在此研究中不僅完成國台英口語翻譯IC系統晶片架構設計,在適度的Design Reuse及修改之下,亦可應用於跨國電話語音聯繫、觀光旅遊溝通輔助、個人數位助理 (PDA)、影視多語選擇及語言學習等領域中,為這些領域添色不少。
In the past few years, mobile information services have been applied to many communication systems. And more and more people have devoted themselves to the research of speech-to-speech translation (also called spoken language translation). Recently, German Federal Ministry for Education and Research develops a server-based system called Verbmobile, which works through recognition, translation, and synthesis to process the speech transformation. Users first make communication with the system through cell phone, telephone or the Internet. Then they are able to talk to each other using their own languages. However, the system is not real-time and portable. It is built in the local computer server. Therefore, the main goal of this research is to develop the real-time speech-to-speech translation system in SOC (system on a chip). People who speak different kinds of languages could communicate well with the chip spontaneously.
For developing the speech-to-speech translation algorithm, we decide to integrate recognition into language translation because of the defective recognition method. The template-based language translation is the main paradigmatic approach for spoken language translation. The goal of this approach is to search the most similar example sentence of the input sentence and to make post-editing (Insertion, Deletion and Replacement) of the found example. Therefore, we introduce a dynamic programming search algorithm for template-based spoken language translation, and verify the proposed algorithm in PC. In our experiments, the correct percentage of template-based translation ranges from 80% to 90%.
For the design of the chip, we introduce the SOC design tactic. We not only use elementary IP but also study new variable dynamic programming designs for template-search IP and trace-back IP in our research. In the ASIC verification, we combine the template-search IP and track-back IP in a single chip. In addition, we use a double RAM circuit design, which not only execute template search process and trace back process synchronously but also reduce unused components to improve the computation performance.
In conclusion, this research not only completes the framework of Mandarin, Taiwanese, and English translation system IC design. With proper Design-Reuse and modification, it can also be used in multi-national speech communication, touristy assistance, PDA, TV multi-language selection, language learning, and so forth. This has achieved a remarkable progress in these fields.
軟體演算法
[1]. Wolfgang Wahlster, "Verbmobil: Foundation of Speech-to-Speech Translation," Reading, Springer, 2000.
[2]. Baldwin, T. & H. Tanaka.. The Effects of Word Order and Segmentation on Translation Retrieval Performance, In Proceedings of the 18th International Conference on Computational Linguistics, 35-41. Saabruecken: nternational Conference on Computational Linguistics, Inc.2000.
[3]. Baldwin, T. & H. Tanaka, "The Effects of Word Order and Segmentation on Translation Retrieval Performance," In Proc. of the 18th Int. Conf. on Computational Linguistics, 35-41, 2000.
[4]. Casacuberta, F.; Llorens, D.; Martinez, C.; Molau, S.; Nevado, F.; Ney, H.; Pastor, M.; Pico, D.; Sanchis, A.; Vidal, E.; Vilar, J.M, "Speech-to-Speech Translation Based on Finite-State Transducers," Proc. of IEEE on Acoustics, Speech, and Signal Processing, Vol. 1, pp: 613 –616, 2001.
[5]. Enrique Vidal, Finite-State Speech-to-Speech Translation, Proc. of IEEE on Acoustics, Speech, and Signal Processing, Vol. 1, pp: 111-114, 1997.
[6]. Ralf D. Brown, "Example-based Machine Translation in the Pangloss System," Proc. of the Sixteenth International Conference on Computational Linguistics, 1996.
[7]. H. Altay Gevenir and Ilyas Cicekli, "Learning Translation Templates from Examples. Information Systems," Vol. 23, No. 6, pp: 353-363, 1998.
[8]. Bonnie J. Dorr, Pamela W. Jordan and John W. Benoit, “A Survey of Current Paradigms in Machine Translation,” In Advances in Computers, vol. 49, Academic Press.1999.
[9]. Herrmann Ney, Sonja Nießen, Franz Josef Och, Hassan Sawaf, Christoph Tillmann, and Stephan Vogel, “Algorithms for Statistical Translation of Spoken Language,” IEEE Trans. on Speech and Audio Processing, Vol. 8, No. 1:24-36, 2000.
[10]. Jhing-Fa Wang and Shun-Chieh Lin, “Bilingual Corpus Evaluation and Discriminative Sentence Vector Expansion for Machine Translation,” Accepted to appear in ICAIET 2002.
[11]. James N.K. Liu, and Lina Zhou, A Hybrid Model for Chinese-English Machine Translation. Intelligent Processing Systems, pp.1201-1206, 1998.
[12]. H. Altay Gűvenir and Ilyas Cicekli, Learning Translation Templates from Examples. Information Systems, Vol. 23, No. 6, pp: 353-363, 1998.
[13]. D.J. Arnold, Lorna Balkan, Siety Meijer, R.Lee Humphreys and Louisa Sadler , “Machine Translation : An Introductory Guide,” Blackwells-NCC, London, ISBN: 1855542-17x, 1994.
[14]. Nirenburg, S., Beale, S., and Domashnev, C., “A Full-Text Experiment in Example-Based Machine Translation,” Proc. of the International Conference on New Methods in Language Processing, NeMLap, Manchester, UK, 1994, pp:78-87.
[15]. Collins, B., ”Example-Based Machine Translation : An Adaptation-Guided Retrieval Approach,”Technical Report of Trinity College Dublin Computer Science Department, TCD-CS-1999-26 , April, 1999.
[16]. Jan Willers Amtrup, ”Incremental Speech Translation,” 1998.
[17]. Knight, K., “A Statistical MT Tutorial Workbook.” (http://www.isi.edu/~knight), 1999.
[18]. Brown, P., S. Della Pietra, V. Della Pietra, and R. Mercer, “The Mathematics of Statistical Machine Translation: Parameter Estimation,” Computational Linguistics, 19(2), 1993.
[19]. Sumita, E., “Lexical Transfer Using a Vector-Space Model,” Proceedings of the Conference of the Association for Computational Linguistics (ACL), 2000.
[20]. Patgi Kam, Carina Ngai, Tan Lee and P.C. Ching,”Speech-to-Speech Translation from Mandarin to Cantonese,” Proc. Of NCMMSC6, pp.139-144, 2001.
[21]. Center for Machine Translation, CMU, http://www.speech.cs.cmu.edu/speech/.
[22]. Open Long Term Research, Esprit, http://www.cordis.lu/esprit/.
[23]. 林順傑,”九宫鳥:國臺語機器翻譯及語音合成發展系統,” 國立成功大學,資訊工程研究所,碩士論文,民國八十九年.
[24]. 徐歡, 旅遊英文, 2001.
[25]. 中英對照900句型,四海出版社, 1999.
[26]. Lawrence Rabiner and Biing-Hwang Juang, "Fundamentals of Speech Recognition,” Prentice-Hall, Inc., 1993.
[27]. Segawa, O., Takeda, K., Itakura, F. ”Continuous speech recognition without end-point detection,” Acoustics, Speech, and Signal Processing, 2001. Proceedings. 2001 IEEE International Conference on, Volume: 1, 2001
[28]. Stefan Ortmanns and Herrmann Ney, "Dynamic Programming Search for Continuous Speech Recognition," IEEE Signal Processing Magazine, Vol. 16 Issue: 5, 1999.
[29]. Lee, C.-H.; Rabiner, L.R.”A frame-synchronous network search algorithm for connected word recognition,” Acoustics, Speech and Signal Processing [see also IEEE Transactions on Signal Processing], IEEE Transactions, 1989.
[30]. Stefan Ortmanns and Herrmann Ney, "The Time-Conditioned Approach in Dynamic Programming Search for LVCSR," IEEE Transaction on Speech and Audio Processing, Vol. 8, No. 6:24-36, 2000.
[31]. Stolzle, A.; Narayanaswamy, S.; Murveit, H.; Rabaey, J.M.; Brodersen, R.W. “Integrated circuits for a real-time large-vocabulary continuous speech recognition system ,” Solid-State Circuits, IEEE Journal of , 1991
硬體晶片架構
[32]. 賴育昇,”應用鑑別性拜氏網路於中英文語音辨識”,國立成功大學,資訊工程研究所,碩士論文,1996.
[33]. Jhing-Fa Wang, Jia-Ching Wang, An-Nan Suen, Chung-Hsien Wu, and Fan-Min Li, “VLSI Architecture and Implementation for Speech Recognizer Based on Discriminative Bayesian Neural Network”, Accepted in IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences.
[34]. Simon Haykin, Communication systems, New York:/Wiley, 1993.
[35]. J. A. Heller and I. M. Jacobs, “Viterbi Decoding for Satellite and Space Communication,” IEEE Transactions on Communications, 1971.
[36]. C.Rader, “Memory Management in a Viterbi Decoder,” IEEE transactions on Communications, 1981.
[37]. Truong, T.K.; Shih, M.-T.; Reed, I.S.; Satorius, E.H.,”A VLSI design for a trace-back Viterbi decoder,"Communications, IEEE Transactions on , 1992.
[38]. C. Bernard Shung, Horng-Dar Lin, Robert Cypher, Paul H. Siegel, Hemant K. Thapar, “Area-Efficient Architectures for the Viterbi Algorithm – Part I: Theory,” IEEE Transactions on communications, vol. 41, NO. 4 April 1993.
[39]. C. Bernard Shung, Horng-Dar Lin, Robert Cypher, Paul H. Siegel, Hemant K. Thapar, “Area-Efficient Architectures for the Viterbi Algorithm – Part II: Applications,” IEEE Transactions on communications, vol. 41, NO. 5 May 1993.
[40]. K. K. Parhi, T. Nishitani, Digital Signal Processing for Multimedia Systems, John Wiley & Sons, Inc, 1999.
[41]. K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation, John Wiley & Sons, Inc., 1999.