簡易檢索 / 詳目顯示

研究生: 林順傑
Lin, Shun-Chieh
論文名稱: 語音轉語音機器翻譯系統之演算法與VLSI硬體架構設計
Algorithms and VLSI Architecture Design for Speech-to-Speech Machine Translation System
指導教授: 王駿發
Wang, Jhing-Fa
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2007
畢業學年度: 95
語文別: 英文
論文頁數: 133
中文關鍵詞: 混語處理多重翻譯擷取單晶片系統語音轉語音翻譯翻譯歧異校正超大型積體電路設計
外文關鍵詞: VLSI, translation divergence amendment, single chip system, multiple translation spotting, speech‐to‐speech translation, Mixed‐language processing
相關次數: 點閱:99下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 我們在本研究中提出了兩個新方法以處理語音轉語音機器翻譯目前所面臨的課題,首先在所提出的系統連行翻譯語料訓練時,為了建立可靠的翻譯樣版,我們從所搜集來的翻譯平行句進行翻譯歧異性(Translation Divergence)分析與處理。對於翻譯歧異性分析與處理,我們提出一個有效的歧異程度量測方法,並將高歧異程度的平行句進行翻譯歧異校正(Translation Divergence Amendment)演算法,以獲得易於樣版學習的平行句資料庫。在翻譯樣版學習上,我們也提出將學習出的樣版擴展的演算法以獲得更多並多樣化的樣版。
    在軟體系統模擬上結合上述所建立的翻譯樣版,我們建立一套混語(Mixed-Language)的語音轉語機器翻譯系統,利用本研究所提出多重翻譯擷取方法(Multiple-Translation Spotting),搜尋與語者輸入語句最相似的翻譯語音樣版,透過將語音辨識內嵌於機器翻譯整合的處理方式,不僅可讓語者語句適當的辨識在所相似的例句語音段,亦可讓所需的翻譯資訊從例句中取出,具正確性與快速性。在記憶體容量有限的可攜式硬體設計架構下,這種內嵌型架構降低了語音辨識與機器翻譯對記憶體的需求。此外,本研究基於系統單晶片設計的原則下,以矽智產觀念建構整個硬體電路系統,除了套用既有的IP外(例如語音特徵擷取與語音合成),基於消費性IC設計的考量,亦研發了適當的IP,包括了樣版檢索(Template Retrieval) IP和擷取(Pattern Extraction) IP,使得所研發設計的硬體具有高度之擴充性及可移植性。在IP設計上可用於變動的model (word)和變動的state (frame)中找到正確的動態路徑最佳結果,並提供翻譯所需處理的資訊,在兩個IP的整合之下,可針對動態規劃比對平面正確且快速地完成運算。故在此二者硬體架構設計上,以系統整合為目標,但內部又以IP設計為基礎,最後不僅達成了整個晶片系統硬體設計,而且還建構了可應用於其它晶片系統的SIP(silicon IP)。最後整個硬體設計也實現於ALTERA公司的EPXA10F1020C2發展平台上,在40 MHz的工作時脈下,處理3,000個翻譯樣本比對動作將會在1秒內完成,此系統的最大頻率可達46.22 MHz,使用的邏輯元件為19,318 (佔EPXA10全部邏輯元件的50%)。

    This study presents two new techniques for speech-to-speech translation systems. First, for the training phase, to improve template learning accuracy from parallel sentences, this work proposes translation divergence amendment to reduce translation divergence which causes learning ambiguity and perplexity. Second, for the translation phase, to spot exact target-language speech sequence, we also propose multiple translation spotting based on the learned translation templates. By developing and identifying speech templates, the proposed spotting method can be applied with no need to use language recognition thus improves portability and extensibility for different languages.
    This work also presents the first chip design for a portable speech-to-speech translation application. First, we construct a speech-to-speech translation system based on multiple-translation spotting (MTS). In terms of the proposed MTS method, both the optimal multiple-translation spotting template is retrieved and the appropriate target patterns are extracted. To overcome the computational bottleneck due to the MTS algorithm, this work introduces a template retrieval core and a target pattern extraction core. Combined with a low cost programmable core, this work takes the most out of both programmable and application-specific architectures, including performance, design complexity, and flexibility. This design was experimentally verified via semi custom chips using 0.35 µm CMOS single-poly-four-metal technology on a die of approximately 3.85 x 3.85 mm2 size. This entire design was also implemented on ALTERA EPXA10 device. Performance for English-to-Mandarin translation process can be completed within 1 second at a 46.22 MHz clock frequency with 3,000 translation patterns. The total logic usage of the EPXA10 device is 50% (about 19,318 logic cells).

    ABSTRACT (Chinese)......................................I ABSTRACT (English)......................................III ACKNOWLEDGEMENT.........................................V CONTENTS................................................VI FIGURE CAPTIONS.........................................IX TABLE CAPTIONS..........................................XII CHAPTER 1 INTRODUCTION.................................1 1.1 Motivation......................................1 1.2 Background and Literatures Review...............6 1.2.1 Voice Activated Phrase Lookup..............8 1.2.2 Loosely Coupled Speech Translation Systems.8 1.2.3 Tight Coupling between ASR and MT using Finite State Transducer....................9 1.2.4 Interlingua Based Speech Translation Systems....................................9 1.3 Objective of the Dissertation...................10 1.4 Organization of the Dissertation................11 CHAPTER 2 CORPUS DESIGN AND ANALYSIS...................12 2.1 Introduction....................................12 2.2 Translation Divergence in Parallel Sentence Collection......................................13 2.3 Translation Divergence Analysis of Parallel Sentences.......................................15 2.3.1 Overview of Statistical-based Translation Knowledge (STK) Training...................16 2.3.2 Analysis of Divergence Effects on STK Training...................................17 2.3.3 Divergence Evaluation for Collected Parallel Sentences..................................20 2.3.4 Ranking and Labeling the Evaluated Parallel Sentences (EPS)............................23 2.4 Algorithm of Translation Divergence Amendment (TDA) for Labeled EPS...........................23 2.4.1 Irregular Lexical Mapping Amendment based on HowNet.....................................23 2.4.2 Lexical Item Sequence Refinement after Amending Sentence..........................28 2.5 Evaluation and Discussion on Translation Divergence Processing...........................30 2.6 Summary.............................................34 CHAPTER 3 TRANSLATION TEMPLATE LEARNING AND EXPANSION..35 3.1 Introduction....................................35 3.2 Translation Template Learning (TTL) from MLPSC..37 3.3 Extraction of Discriminative Sentence Vectors (DSVs)..........................................41 3.3.1 Semantic Units Acquisition.................41 3.3.2 Acting Role Table Exploration..............45 3.3.3 Intention Classification of Sentence Patterns...................................46 3.4 Translation Template Expansion (TTE)-based on DSVs............................................47 3.5 Discriminative Translation Pattern Spotting (DTPS) using Templates.................................50 3.6 Summary.........................................53 CHAPTER 4 A NEW MIXED-LANGUAGE SPEECH-TO-SPEECH TRANSLATION SYSTEM ..........................55 4.1 Introduction ...................................55 4.2 Framework of the Proposed System................58 4.3 Development of Speech Templates from the Constructed Templates...........................60 4.3.1 Multiple-Translation Spotting Templates (MTST).....................................62 4.3.2 Speech Generation Templates (SGT)..........62 4.4 Algorithm of Multiple-Translation Spotting (MTS) using MTST......................................63 4.4.1 Initial Spotting Process...................63 4.4.2 Score Normalization and Ranking............66 4.4.3 Spotting Sequence Smoothing................68 4.5 Target Speech Generation using SGT..............70 4.6 Functional Evaluation and Discussion on the Proposed System.................................71 4.6.1 Complexity Analysis of MTS.................71 4.6.2 Spotting Accuracy Evaluation...............72 4.6.3 Target Generation Evaluation...............75 4.6.4 Implementation on Personal Digital Assistants.................................77 4.7 Summary..........................................80 CHAPTER 5 HARDWARE DESIGN FOR SINGLE-CHIP SPEECH-TO- SPEECH TRANSLATION...........................81 5.1 Introduction....................................81 5.2 VLSI Architecture for MTS-based Speech-to-Speech Translation.....................................83 5.2.1 The Design of the Template Retrieval Core..85 5.2.2 The Design of the Target Pattern Extraction Core.......................................91 5.3 Chip Implementation.............................97 5.4 Summary.........................................99 CHAPTER 6 AN ARM-BASED SYSTEM-ON-A-PROGRAMMABLE-CHIP ARCHITECTURE FOR SPEECH-TO-SPEECH TRANSLATION .........100 6.1 Introduction...................................100 6.2 The Proposed SoPC Architecture.................101 6.3 Hardware/Software Codesign of the SoPC.........103 6.3.1 AHB Slave IP Design of Template Retrieval.103 6.3.2 Software Procedure........................107 6.4 Implementation and Verification of the Proposed SoPC...........................................110 6.4.1 Implementation/Verification of ADC/DAC Circuit Board.............................111 6.4.2 Chip Feature..............................111 6.5 Summary........................................112 CHAPTER 7 CONCLUSIONS.................................114 7.1 Summary........................................114 7.2 Summary of Contributions.......................117 7.3 Future Works. .................................118 REFERENCES.............................................121 AUTHOR’S BIOGRAPHICAL NOTES...........................128 PUBLICATIONS...........................................131

    [1]L. D. Paulson, “Translation Technology Tries to Hurdle the Language Barrier,” IEEE Computer, vol.34, no. 9, pp.12–15, Sep. 2001.
    [2]C. H. Wu, Y. H. Chiu, C. J. Shia, and C. Y. Lin, “Automatic segmentation and identification of mixed-language speech using delta-BIC and LSA-based GMMs,” IEEE Trans. Audio, Speech and Language Processing, vol. 14, no. 1, pp. 266–276, Jan. 2006.
    [3]Sarich, A., Phraselator, one-way speech translation system, http://www.sarich.com/translator/ , 2001.
    [4]Y. Gao, L. Gu, B. Zhou, R. Sarikaya, M. Afify, H. K. Kuo, W. Z. Zhu, Y. Deng, C. Prosser, and W. Zhang, “IBM MASTOR: Multilingual automatic speech-to-speech translator,” in Proc. ICASSP, vol. 5, 2006, pp. 1205–1208.
    [5]R. Frederking, A. Black, R. Brown, J. Moody, and E. Stein-brecher, “Field testing the tongues speech-tospeech machine translation system,” in LREC, 2002.
    [6]F. Sugaya, T. Takezawa, A. Yokoo, and S. Yamamoto, “End-to-end evaluation in ATR-MATRIX: Speech translation system between English and Japanese,” in Proc. EUROSPEECH, 1999, pp. 2431–2434.
    [7]F. Casacuberta, D. Llorens, C. Martínez, S. Molau, F. Nevado, H. Ney, M. Pastor, D. Picó, A. Sanchis, E. Vidal, and J. M. Vilar, “Speech-to-speech translation based on finite-state transducers,” in Proc. ICASSP, 2001, pp. 613–616.
    [8]A. Lavie, A. Waibel, L. Levin, M. Finke, D. Gates, M. Gavalda, T. Zeppenfeld, and P. Zahn, “JANUS III: Speech-to-speech translation in multiple languages,” in Proc. ICASSP, 1997, pp. 99–102.
    [9]W. Wahlster, Verbmobil: Foundations of speech-to-speech translation. Berlin Heidelberg, New York: Springer-Verlag, 2000.
    [10]A. Lavie, et al. “A multi-perspective evaluation of the NESPOLE! speech-to-speech translation system,” in Proceedings of ACL 2002 workshop on Speech-tospeech Translation: Algorithms and Systems, Philadelphia,PA., 2002.
    [11]H. Ney, S. Nießen, F. J. Och, H. Sawaf, C. Tillmann, and S. Vogel, “Algorithms for statistical translation of spoken language,” IEEE Trans. Speech and Audio Processing, vol. 8, no. 1, pp. 24–36, Jan. 2000.
    [12]P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, and R. L. Mercer, “The mathematics of statistical machine translation: Parameter estimation,” Computational Linguistics, vol. 19, no. 2, pp. 263–311, 1993.
    [13]A. Menezes and S. D. Richardson, “A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora,” in Proc. Workshop on Data-driven Machine Translation at 39th Annual Meeting of the Association for Computational Linguistics, 2001, pp. 39–46.
    [14]B. J. Dorr, Machine translation: A view from the lexicon. Cambridge, MA: The MIT press, 1993.
    [15]B. J. Dorr, “Machine Translation Divergences: A Formal Description and Proposed Solution,” ACL Vol. 20, No. 4, pp. 597–631, 1994.
    [16]J. F. Wang and S. C. Lin, “Bilingual corpus evaluation and discriminative sentence vector expansion for machine translation,” in Proc. ICAIET, 2002, pp.117–120.
    [17]D. Gupta and N. Chatterjee, “Study of divergence for example based English-Hindi machine translation,” in Proc. STRANS, 2002, pp. 132-140.
    [18]Dr. Eye 6.0, developed by Inventec Corporation, 2004.
    [19]Z. Dong and Q. Dong, 2000. HowNet.
    Available at http://www.keenage.com/zhiwang/e_zhiwang.html
    [20]Y. S. Lai, K. L. Lee, and C. H. Wu, “Extraction and Semantic Matching for Internet FAQ Retrieval Using Spoken Language Query,” in Proc. ICSLP, vol. 1, 2000, pp. 1077-1080.
    [21]徐歡, 旅遊英文 Easy Go, 廣讀書城出版社, 2001.
    [22]L. L. Chang, “The modality words in modern Mandarin,” Chinese Knowledge Information Processing Group, Institute of Information Science Academia Sinica, Taiwan, Tech. Rep. 93-06, 1993.
    [23]D. D. Sleator and D. Temperley, “Parsing English with a Link Grammar,” Carnegie Mellon University, Pittsburgh, PA, Tech. Rep. CMU-CS-91-196, 1993.
    [24]James N.K. Liu, and Lina Zhou. 1998. A Hybrid Model for Chinese-English Machine Translation. Intelligent Processing Systems, pp.1201-1206
    [25]H. Altay Gűvenir and Ilyas Cicekli. 1998. Learning Translation Templates from Examples. Information Systems, Vol. 23, No. 6, pp: 353-363.
    [26]S. Eiichiro, “Example-based machine translation using DP-matching between work sequences,” in Proc. ACL Data-Driven Machine Translation, 2001, pp. 1–8.
    [27]L. Levin, D. Gates, A. Lavie, and A. Waibel, “An interlingua based on domain actions for machine translation of task-oriented dialogues,” in Proc. ICSLP, vol. 4, 1998, pp.1155–1158.
    [28]I. Cicekli and H. A. Gűvenir, “Learning translation templates from bilingual translation examples", Applied Intelligence, vol. 15, no. 1, pp. 57–76, July-Aug. 2001.
    [29]M. L. Nguyen, A. Shimazu and S. Horiguchi, “Translation template learning using Hidden Markov Modeling,” in Proc. PACLIC, 2003, pp. 269–276.
    [30]Peter Jackson eds. 1990. Introduction to Expert Systems. Reading, Mass: Addison-Wesley.
    [31]Donghua Xu and Chew Lim Tan 1999. Alignment and Matching of Bilingual English-Chinese News Texts. Machine Translation 14: 1-33.
    [32]Y. Linde, A Buzo, and R.M. Gray. 1980. An algorithm for Vector Quantization Design. IEEE Transactions on Communication, COM-28: 84-95.
    [33]Baldwin, T. & H. Tanaka, “The Effects of Word Order and Segmentation on Translation Retrieval Performance,” Proc. of the 18th International Conference on Computational Linguistics, Saabruecken, pp 35-41, 2000.
    [34]Y. Zhang, “Survey of current speech translation research,” presented at Multilingual Speech-to-Speech Translation Seminar, CMU, Pittsburgh, PA, 2003.
    [35]K. Matsui, Y. Wakita, T. Konuma, K. Mizutani, M. Endo, and M. Murata, “An experimental multilingual speech translation system,” in Proc. ICMI-PUI, 2001, pp. 1–4.
    [36]P. Kam, C. Ngai, T. Lee, and P. C. Ching, “Speech-to-speech translation from Mandarin to Cantonese,” in Proc. NCMMSC, 2001, pp. 139–144.
    [37]F. Casacuberta, E. Vidal, and J. M. Vilar, “Architectures for speech-to-speech translation using finite-state models,” in Proc. ACL Workshop on Speech-to-Speech Translation: Algorithms and Systems, 2002, pp. 39–44.
    [38]E. Vidal, “Finite-state speech-to-speech translation,” Proc. ICASSP, vol. 1, 1997, pp. 111–114.
    [39]J. F. Wang, S. C. Lin, and H. W. Yang, “A novel template-based architecture for spoken language translation on personal digital assistants,” in Proc. ANZIIS, 2003, pp. 105–108.
    [40]H. Ney, “Speech translation: Coupling of recognition and translation,” in Proc. ICASSP, 1999, pp. 517–520.
    [41]J. F. Wang, S. C. Lin, and H. W. Yang, “A new two-layer approach for spoken language translation,” in Proc. ISCSLP, 2004, pp. 321–324.
    [42]J. Véronis and P. Langlais, “Evaluation of parallel text alignment systems – The ARCADE project,” in Parallel Text Processing, J. Véronis, Ed. Dordrecht: Kluwer Academic, 2000, pp. 369–388.
    [43]E. Macklovitch, M. Simard, and P. Langlais, “TransSearch: A free translation memory on the world wide web,” in Proc. LREC, 2000, pp. 1201-1208.
    [44]M. Simard, “Translation spotting for translation memories,” in Proc. HLT-NAACL Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond, 2003, pp. 65–72.
    [45]L. Rabiner and B. H. Juang, Fundamentals of speech recognition. Endlewood Cliffs, NJ: Prentice-Hall, 1993.
    [46]Y. J. Sher, K. C. Chung, and C. H. Wu, “Establish Taiwanese 7-tones syllable–based synthesis units database for the prototype development of text-to-speech system,” in Proc. ROCLING, 1999, pp. 15–35.
    [47]W. Verhelst and M. Roelands, “An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech,” in Proc. ICASSP, 1993, pp. 554–557.
    [48]M. K. Lee, S. K. Jung, H. G. Kang, Y. C. Park, and D. H. Youn, “A packet loss concealment algorithm based on time-scale modification for CELP-type speech coders,” in Proc. ICASSP, vol. 1, 2003, pp. 116–119.
    [49]H. G. Ilk and S. Tugac, “Channel and source considerations of a bit-rate reduction technique for a possible wireless communications system's performance enhancement,” IEEE Trans. Wireless Communications, vol. 4, no. 1, pp. 93–99, Jan. 2005.
    [50]L. R. Rabiner and M. R. Sambru, “An algorithm for determining the endpoints of isolated utterances,” The Bell System Technical Journal, vol. 54, no. 2, pp. 297–315, Feb. 1975.
    [51]R. Isotani, K. Yamabana, S. Ando, K. Hanazawa, S. Ishikawa, T. Emori, H. Hattori, A. Okumura, and T. Watanabe, “An automatic speech translation system on PDAs for travel conversation,” in Proc. IEEE Int. Conf. Multimodal Interfaces, Oct. 2002, pp. 211–216.
    [52]A. Waibel, A. Badran, A. W. Black, R. Frederking, D. Gates, A. Lavie, L. Levin, K. Lenzo, L. M. Tomokiyo, J. Reichert, T. Schultz, D. Wallace, M. Woszczyna, and J. Zhang, “Speechalator: two-way speech-to-speech translation on a consumer PDA”, in Proc. European Conf. Speech Communication and Technology, Sep. 2003, pp. 369–372.
    [53]T. Watanabe, A. Okumura, S. Sakai, K. Yamabana, S. Doi, and K. Hanazawa, “An automatic interpretation system for travel conversation,” in Proc. Int. Conf. Spoken Language Processing, Sep. 2000, pp. IV-444–IV-447.
    [54]J. F. Wang, A. N. Suen, and C. K. Chieh, “A programmable application specific architecture for real-time speech recognition,” in Proc. of VLSI Design/CAD Symposium, Aug. 1995, pp. 261–264.
    [55]S. Yoshizawa, N. Wada, N. Hayasaka, Y. Miyanaga, “Scalable architecture for word HMM-based speech recognition and VLSI implementation in complete system,” IEEE Trans. Circuits Syst. I, vol. 53, no. 1, pp. 70 – 77, Jan. 2006.
    [56]ARM, “ARM922T Technical Reference Manual”, Document Part NO. ARM DDI 0184A, http://www.arm.com, Sep 2000.
    [57]ARM, “AMBA Specification (Rev 2.0)” ARM Document N0. ARM IHI0011A, http://www.arm.com, May 1999.
    [58]M. Y. Chen, Speech recognition implementation. Taiwan: Flag Inc., 1994.

    下載圖示 校內:2008-08-03公開
    校外:2009-08-03公開
    QR CODE