簡易檢索 / 詳目顯示

研究生: 黃典煌
Huang, Tien-Huang
論文名稱: 隨讀隨聽電子書手持裝置於ARM920T嵌入式開發平台之設計與實現
Design and Implementation of the LR-Book Handheld Device Based on ARM920T Embedded Platform
指導教授: 王駿發
Wang, Jhing-Fa
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 59
中文關鍵詞: 平均意見得分文字辨識技術語義不可預測句子語音合成系統
外文關鍵詞: mean opinion score (MOS), optical character recognition (OCR), Text-to-Speech (TTS), semantic unpredictable sentence (SUS)
相關次數: 點閱:95下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,手持式裝置越來越普及化,它們主要特點趨向小體積、低價格、高運算能力且擁有強大的軟體功能。由於技術的進步,許多無法在傳統手持式裝置實現的應用在今日已有被實現之可能性。

    本論文的目的在於採用三星公司所開發的一顆ARM920T處理器(S3C2440A),並使用Linux環境的作業系統來實作一套「LR-Book隨讀隨聽電子書裝置」。針對銀髮族設計適合的使用者操作介面,將內容先經由人性化語音合成系統合成(TTS, Text-to-Speech),而使用者可透過USB傳輸介面存取合成後的多媒體語音資料並下載至LR-Book手持裝置儲存媒體中。最後透過LR-Book的文字辨識技術(OCR, Optical Character Recognition)取得目前正在閱讀的實體書籍內容範圍,配合記憶體內的多媒體數位化文章內容,使文章以語音輸出以達到閱讀的目的。

    本論文所開發的語音合成系統中,為採用基於隱藏式馬可夫模型的語音合成器(HTS,HMM-based Speech Synthesis System)。在語義不可預測句子(SUS, Semantic Unpredictable Sentence)聽寫的測試中,平均受測者的正確率達到96.4%;而在針對不同的題裁短文測試中,主觀評測的自然度平均意見得分(MOS, Mean Opinion Score)達到3.6分。所以本系統已可合成出流暢及可理解的語音。同時合成部份的語音模型所佔記憶體空間非常的小,在可攜性及適應性更是其發展優勢。

    In recent years, hand-held devices have become more and more popular in our daily life. In addition to the trend of low price and small volume, these devices usually possess strong software functions and high operation ability. Owing to these technology advances, many unfeasible applications in old hand-held devices can already be realized nowadays.

    The purpose of this thesis is to propose a “Listenable and Readable BOOK Device”. This device is based on S3C2440A with ARM920T as the main processor and the Linux is adopted as the operating system. First, the text content of a physical book is converted into digital speech by a user-friendly text-to-speech (TTS) system. The speech content can then be easily downloaded into the memory of LR-Book through the USB interface. With the optical character recognition (OCR) process, LR-Book system is able to identify the page number of the currently reading physical book and then obtain the corresponding digital speech content in the memory. Finally, the speech output of the LR-Book can be read out.

    The proposed speech synthesis system is based on Hidden Markov Models to synthesize smooth and easy-understanding speech. In the semantic unpredictable sentence (SUS) dictation test, the correct mean rate is 96.4%. In the naturalness test, the mean opinion score (MOS) is 3.6. The model of synthesize output is very small and can be used in many applications because of its flexibility and portability.

    中文摘要 ............................................................................................................................................. I Abstract ........................................................................................................................................... III 誌謝(Acknowledgments) ................................................................................................................. V Contents .......................................................................................................................................... VI Figure Captions ............................................................................................................................ VIII Table Captions .................................................................................................................................. X Chapter1 Introduction ................................................................................................................ 1 1.1 Background and Motivation ............................................................................................. 1 1.2 Related Work ...................................................................................................................... 3 1.2.1 Overview of Speech Synthesis ....................................................................................... 3 1.2.2 General TTS Architecture ............................................................................................. 3 1.2.3 Speech Synthesis Methods ............................................................................................. 4 1.2.3.1 Concatenative Synthesis ................................................................................................ 4 1.2.3.2 LPC-Based Synthesis ..................................................................................................... 5 1.2.3.3 HMM-Based Synthesis .................................................................................................. 5 1.3 Thesis Organization ........................................................................................................... 7 Chapter 2 HMM-Based Mandarin Speech Synthesizer ........................................................... 8 2.1 HMM-Based Speech Synthesis System ............................................................................ 9 2.2 Training Part of the System ............................................................................................ 11 2.2.1 Context-Dependent Modeling Techniques ................................................................. 11 2.2.2 Model Reduction .......................................................................................................... 12 2.2.3 Pre-processor of Text Analysis ................................................................................... 14 2.3 Synthesis Part of the System ........................................................................................... 16 2.3.1 Speech Parameter Generation .................................................................................... 16 2.3.2 Mel Log Spectrum Approximation Filter .................................................................. 17 Chapter 3 Embedded System Design for the Text-to-Speech Synthesizer Based on ARM920T-S3C2440A ...................................................................................................................... 19 3.1 System Overview of S3C2440A ....................................................................................... 21 3.2 Hardware Architecture of the Proposed System ........................................................... 26 3.2.1 NAND Flash and SDRAM Controller ........................................................................ 26 3.2.2 Audio Circuit ................................................................................................................ 27 3.2.3 Camera Circuit ............................................................................................................. 28 3.2.4 Regulation Circuit ........................................................................................................ 29 3.2.5 UART Circuit ............................................................................................................... 31 3.3 Software Architecture of the Proposed System ............................................................. 33 3.3.1 LR-Book System Processes ......................................................................................... 33 3.3.2 Related Work of Optical Character Recognition ...................................................... 34 3.3.3 Proposed OCR System ................................................................................................. 36 Chapter 4 Experiments and Implementation .......................................................................... 39 4.1 Training Phase .................................................................................................................. 39 4.2 Synthesis Phase ................................................................................................................. 40 4.3 Establishment of the Corpus ........................................................................................... 41 4.4 Experimental Design ........................................................................................................ 43 4.4.1 Testing Sentences .......................................................................................................... 43 4.4.2 Testing Criterion .......................................................................................................... 44 4.4.3 Testing Results.............................................................................................................. 45 4.5 System Board Implementation ....................................................................................... 47 4.5.1 Design Process .............................................................................................................. 47 4.5.2 Layout Demonstration ................................................................................................. 48 Chapter 5 Conclusions and Future Work ................................................................................ 50 References ........................................................................................................................................ 51 Appendix 1 ....................................................................................................................................... 53 Appendix 2 ....................................................................................................................................... 55 Appendix 3 ....................................................................................................................................... 56

    [1] E. Fitzpatrick, "An introduction to text-to-speech synthesis," Computational Linguistics, vol.
    24, pp. 322-323, Jun 1998.
    [2] A. Iida, N. Campbell, F. Higuchi, and M. Yasumura, "A corpus-based speech synthesis system
    with emotion," Speech Communication, vol. 40, pp. 161-187, Apr 2003.
    [3] M. Puckette, "FORMANT-BASED AUDIO SYNTHESIS USING NONLINEAR
    DISTORTION," Journal of the Audio Engineering Society, vol. 43, pp. 40-47, Jan-Feb 1995.
    [4] C. H. Shadle and B. S. Atal, "SPEECH SYNTHESIS BY LINEAR INTERPOLATION OF
    SPECTRAL PARAMETERS BETWEEN DYAD BOUNDARIES," Journal of the Acoustical
    Society of America, vol. 66, pp. 1325-1332, 1979.
    [5] H. Zen, T. Masuko, K. Tokuda, T. Yoshimura, T. Kobayasih, and T. Kitamura, "State duration
    modeling for HMM-based speech synthesis," Ieice Transactions on Information and Systems,
    vol. E90D, pp. 692-693, Mar 2007.
    [6] L. J. Siegel and A. C. Bessey, "VOICED UNVOICED MIXED EXCITATION
    CLASSIFICATION OF SPEECH," Ieee Transactions on Acoustics Speech and Signal
    Processing, vol. 30, pp. 451-460, 1982.
    [7] K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi and T. Kitamura, “Speech parameter
    generation algorithms for HMM-based speech synthesis,” Proc. of ICASSP 2000, vol.3,
    pp.1315–1318, June 2000.
    [8] K. Tokuda, H. Zen, and A.W. Black, “An HMM-based speech synthesis system applied to
    English,” in Proc. Of IEEE Speech Synthesis Workshop, Sept. 2002. http://hts.sp.nitech.ac.jp/
    [9] L. S. Lee, C. Y. Tseng, and C. J. Hsieh, "Improved Tone Concatenation Rules in a
    Formant-Based Chinese Text-to-Speech System," Ieee Transactions on Speech and Audio
    Processing, vol. 1, pp. 287-294, Jul 1993.
    [10] C. Huang, Y. Shi, J. L. Zhou, M Chu, T Wang, and E. Chang, “Segmental Tonal Modeling for
    Phone Set Design in Mandarin LVCSR,” in Proc. of ICASSP, pp.901-904, 2004.
    [11] Zen. Heiga, “An example of context-dependent label format for HMM-based speech synthesis
    in English,” March 2, 2006.
    [12] http://www.arm.com/products/CPUs/index.html
    [13] “S3C2440A, 32-BIT RISC MICROPROCESSOR USER’S MANUAL PRELINIMARY
    Revision 0.14”, Samsung Electronics., June 30, 2004.
    [14] “K4S641632D, SDRAM Datasheet”, Samsung Electronics.
    [15] “K9F1208U0M FLASH MEMORY Datasheet”, Samsung Electronics.
    [16] “UDA1341TS Economy Audio Codec Datasheet”, Philips Semiconductors.
    [17] “LM1117 800mA Low-Dropout Linear Regulator Datasheet”, National Semiconductors.
    [18] “MAX3232 Datasheet”, MAXIM Electronics.
    [19] “ARM9 DMA-2440 User’s Manual”, DMATEK, http://www.dmatek.com.tw
    [20] G. Nagy, “Optical character recognition: Theory and Practice”, in P. R. Krishnaiah and L. N.
    Kanal, eds., Handbook of Statistics, vol. 2, pp. 621-649, 1982. This is a survey of statistical
    feature analysis techniques for OCR.
    [21] Brill, P.H. (1996), “Level Crossing Methods”, in Encyclopedia of Operations Research and
    Management Science, Gass, S.I. and Harris, C.M..editors, Kluwer Academic Publishers,
    338-340.
    [22] Histogram, http://www.netmba.com/statistics/histogram/
    [23] MS_SDK51 Download, http://www.microsoft.com/speech/download/sdk51/
    [24] NCHU-TTS, http://speechlab.cs.nchu.edu.tw/OnLineTTS/cgitest.html
    [25] IFLYTEK, http://www.iflytek.com/TtsDemo/interPhonicShow.aspx

    下載圖示 校內:2012-08-26公開
    校外:2014-08-26公開
    QR CODE