| 研究生: |
陳柏勳 Chen, Po-Hsun |
|---|---|
| 論文名稱: |
基於ARM920T架構之隨讀隨聽電子書手持裝置嵌入式軟韌體設計與實現 Design and Implementation of Embedded Software and Firmware on LR-BOOK Handheld Device Based on ARM920T Architecture |
| 指導教授: |
王駿發
Wang, Jhing-Fa |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2010 |
| 畢業學年度: | 98 |
| 語文別: | 英文 |
| 論文頁數: | 58 |
| 中文關鍵詞: | 定點化 、語音合成 、移植Linux2.6.14作業系統 、光學文字辨識 |
| 外文關鍵詞: | fixed-point, speech synthesis, porting OS, optical character recognition |
| 相關次數: | 點閱:122 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文利用三星公司所研發基於ARM920T架構的處理器來發展出隨聽隨讀電子書系統(LR-BOOK),LR-BOOK的設計是要加強老年人在閱讀書籍的便利性,這設計包含兩個主要部分:韌體設計與軟體設計。在韌體設計方面,為了使得複雜的應用程式和硬體裝置能夠溝通,我們移植Linux2.6.14作業系統在自行開發的雛型開發板上執行。在軟體設計方面,提出了整合語音合成和光學文字辨識的系統架構,在語音合成部分,本論文將HTS團隊研發的HTS_engine浮點數運算架構加以定點化,並利用牛頓疊代法和最小平方估測實作出定點的除法與指數函數。在光學文字辨識部分,為了克服輕微旋轉與縮放的問題,我們利用數字中心位置旋轉不變的特性來擷取特徵。最後,本論文整合了按鈕模組來控制LR-BOOK的功能。
根據實驗結果得知,在無浮點運算器的嵌入式裝置上,提出的定點語音合成器的執行速度比浮點運算架構之HTS_engine快了45倍並達到即時處理之需求。光學文字辨識的辨識率達到89%。
This thesis develops a listening and reading book (LR-Book) system with the Samsung ARM920T based S3C2440 fixed-point processor. The LR-Book system is designed to enhance the convenience of reading style for senior people, and it consists of two major parts: firmware design and software design. In the firmware part, to deal with the interaction between complicated application programs and hardware, we port the boot loader, file system, and the Linux 2.6.14 kernel to a prototype development board. In the software design, we integrate the speech synthesis and OCR. We modify the floating-point HTS 1.0.2 speech synthesis engine into fixed-point version. Moreover, two fixed-point mathematical operations of division and exponentiation are specially designed by the Newton-Raphson method and the least squares approximation, respectively. To overcome the problems of snapshot of rotating and shrinking in OCR, we propose a new feature extraction method that utilities the rotation invariant characteristic of a number’s center.
According to the experimental results, the fixed-point speech synthesizer is faster 45 times than floating-point HTS_engine_1.0.2, and reaches real-time on the embedded device without Floating Point Units (FPUs). The recognition rate of OCR reaches 89%.
[1] Keiichi TOKUDA, Heiga ZEN, "Fundamentals and recent advances in HMM-based speech synthesis," http://hts.sp.nitech.ac.jp, 2009
[2] DH Klatt, "Software for a cascade/parallel formant synthesizer," Journal of the Acoustical Society of America, 1980
[3] M Beutnagel, M Mohri, M Riley "Rapid Unit Selection from a Large Speech Corpus for Concatenative Speech Synthesis," Sixth European Conference on Speech, 1999
[4] T Yoshimura, K Tokuda, T Masuko, "Simultaneous Modeling of Spectrum, Pitch and Duration in HMM-Based Speech Synthesis," in Proc EUROSPEECH-99, Sept. 1999, pp. 2374–2350.
[5] Trier, Ø.D., Jain, A.K., Taxt, T., "Feature extraction methods for character recognition - a survey." Pattern Recognition 29 (4), 641-662. 1996.
[6] Keiichi Tokuda, Takayoshi Yoshimura, Takashi Masuko, Takao Kobayashi, Tadashi Kitamura :"Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis," in Proc ICASSP. 2000
[7] T Fukada, K Tokuda, T Kobayashi, S Imai, "An Adaptive Algorithm for Mel-cepstral Analysis of Speech," in Proc. ICASSP, 1992
[8] S. Imai, K. Sumita and C. Furuichi, “Me1-Log Spectrum Approximation (MLSA) Filter for Speech Synthesis,” Trans. IECE, vol. JGG-A, pp.122-129, Feb. 1983
[9] Karin Yaghmour, ”Building Embedded LINUX System”, O’REILLY company, 2003
[10] Zen. Heiga, “An example of context-dependent label format for HMM-based speech synthesis in English,” https://wiki.inf.ed.ac.uk/twiki/pub/CSTR/ F0parametrisation/ hts_lab_format.pdf March 2, 2006.
[11] “S3C2440A, 32-BIT RISC MICROPROCESSOR USER’S MANUAL PRELINIMARY Revision 0.14”, Samsung Electronics., June 30, 2004.
[12] H. Zen, T. Masuko, K. Tokuda, T. Yoshimura, T. Kobayasih, and T. Kitamura, "State duration modeling for HMM-based speech synthesis," Ieice Transactions on Information and Systems, vol. E90D, pp. 692-693, Mar 2007.
[13] A. Khotanzad and Y. H. Hong:”Invariant image recognition by Zernike moments”, in IEEE Trans. Pattern Anal. Mach. lntell. 12(5), 489-497 ,1990.
[14] F.P. Kuhl and C. R. Giardina, “Elliptic Fourier features of a closed contour”, Vis. Graphics Image Process. 18, 236-258 ,1982.
[15] PM Cassereau, “A new class of optimal unitary transforms for image processing”, MIT ,1985.
[16] T Kanungo, J Yang, KC Choy, MR Bokser, ” Automatic language identification system for multilingual optical character recognition”, US Patent, 2000
[17] D Xu, H Li, “Geometric moment invariants”, Pattern recognition, 2008
[18] Huang, C., Shi. Y. ,Zhou, J. L., Chu, M., Wang., and Chang, E., “Segmental Tonal Modeling for Phone Set Design in Mandarin LVCSR”, in Proc of ICASSP.pp. 901-904, 2004
[19] Takashi NOSE, Makoto TACHIBANA, Takao KOBAYASHI,” HMM-Based Style Control for Expressive Speech Synthesis with Arbitrary Speaker's Voice Using Model Adaptation”, IEICE TRANSACTIONS on Information and system 2009
[20] Jung-Yun Wu, ” Pitch Prediction Using Prosody Hierarchy and Dynamic Features for HMM-based Mandarin Speech Synthesis”, 2008
校內:2015-08-30公開