簡易檢索 / 詳目顯示

研究生: 劉日新
Liu, Jhin-Hsin
論文名稱: 樹莓派語音輔助閱讀器之製作
Implementation of Raspberry Pi-Based Text Reader
指導教授: 李炳鈞
Li, Bing-Jing
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 74
中文關鍵詞: 樹莓派相機鏡頭模組字元辨識圖像處理OpenCVTesseractGoogle VisionGoogle TTS APIPython
外文關鍵詞: Raspberry Pi, picamera, Python, image processing, optical character recognition, Text to Speech
相關次數: 點閱:71下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文使用樹莓派Raspberry Pi搭配樹莓派原廠鏡頭,以及壓克力材質的盒子製作一簡單的語音輔助閱讀器,使用Python語言搭配其相關的技術文件實現將紙本文件轉換為語音的功能。在此研究,會以電腦印刷文件和手寫稿為主,將兩種不同的文件上的文字由圖片轉換成聲音的形式,在一開始先使用Picamera操作樹莓派相機模組將紙本文件給擷取下來並將圖片存在樹莓派裡,之後使用OpenCV將拍攝下來的圖片做圖片前處理以提升字元辨識的準確率,接著使用字元辨識工具Tesseract以及Google Cloud Platform的Google Vision將處理過的圖片轉換成文字,最後再透過文字轉語音工具Google TTS API將文字轉換成音檔。經過實驗,得知目前使用Google Vision來做字元辨識,再搭配Google TTS API將文字轉換成聲音,不論是在電腦印刷文件或是手寫稿的測試上都有最好的效果。在結論,有提供語音輔助閱讀器的展示影片,呈現的內容為將紙本文件上的文字內容轉換成語音檔的過程。若能夠將語音輔助閱讀器繼續做研發,在將來則能夠作為給視障者使用的視覺輔具。

    This research demonstrates text reader which consists of Raspberry Pi, pi-camera, a folder and an acrylic box and uses modules in Python to transform text on the paper into an audio file. The testing materials for the text reader include printed and hand-written text. Images captured by pi-camera are processed and transformed into homologous text on Raspberry Pi with tools of optical character recognition. After that, the text is converted to an audio file with tool of Text to Speech. A video is provided with a link on Youtube to demonstrate the operation and result of the text reader. Comparison and discussion indicate that Google Vision is better than Tesseract on recognizing text, especially on hand-written text. The text reader presented still needs lots of development to be more practical in the future.

    摘要 II 誌謝 VII 表目錄 XI 圖目錄 XII 程式碼 XV 第一章 緒論 1 1-1 背景 1 1-2 動機跟目的 3 第二章 閱讀輔助器操作流程與實驗設備介紹 5 2-1 樹莓派(Raspberry Pi) 5 2-1-1 樹莓派介紹 5 2-1-2 樹莓派作業系統 7 2-1-3 樹莓派相機模組 12 2-1-4 樹莓派軟體(程式語言) 14 2-2 實驗裝置架設 15 第三章 文件影像化與字元辨識 17 3-1 簡介 17 3-2 相關技術文件介紹與安裝 19 3-2-1 圖片擷取 19 3-2-2 圖片處理 21 3-2-3 字元辨識 27 3-2-4 文字差異比較 34 3-3 執行規劃 35 3-4 結果與討論 37 3-4-1 影像存取 37 3-4-2 圖片前處理 38 3-4-3 印刷文件字元辨識 40 3-4-4 手寫稿字元辨識 41 3-4-5 結論 44 第四章 文字檔轉語音檔(Text to Speech ) 46 4-1 簡介 46 4-2 相關技術文件介紹與安裝 47 4-2-1 文字轉語音 47 4-2-2 音檔播放 51 4-3 實驗規劃 52 4-4 實驗結果與討論 54 4-4-1 圖片轉文字檔 54 4-4-2 人工辨識以及Google STT文字檔比對 54 4-4-3 印刷文件及手寫稿音檔轉換 56 4-4-4 結論 57 第五章 結果展示與結論 59 5-1 結論 59 5-2 未來展望 60 參考文獻 62 附錄1 66 附錄2 67 附錄3 69 附錄4 71

    [1] 世界衛生組織. "盲症與視力損害." https://www.who.int/zh/news-room/fact-sheets/detail/blindness-and-visual-impairment (accessed 2022).
    [2] G. S. Shrestha and R. Kaiti, "Visual functions and disability in diabetic retinopathy patients," Journal of optometry, vol. 7, no. 1, pp. 37-43, 2014.
    [3] 杞昭安. "視覺輔助圖解." http://www.tcda.org.tw/wp-content/uploads/2014/12/%E8%A6%96%E9%9A%9C%E8%BC%94%E5%85%B7%E5%9C%96%E8%A7%A31.pdf (accessed 2022).
    [4] DO-IT. "How are the terms low vision, visually impaired, and blind defined?" University of Washington. https://www.washington.edu/doit/how-are-terms-low-vision-visually-impaired-and-blind-defined (accessed 2022).
    [5] L. Hardesty. "Finger-mounted reading device for the blind." MIT News Office. https://news.mit.edu/2015/finger-mounted-reading-device-blind-0310 (accessed 2022).
    [6] "OrCam MyEye 2-適合盲人和視力受損者." ORCAM. https://www.orcam.com/hk/orcam-myeye-2/ (accessed 2022).
    [7] "i-Reader 2." irie AT. https://irie-at.com/product/i-reader-2/ (accessed 2022).
    [8] "About us." Raspberry Pi Foundation. https://www.raspberrypi.org/about/ (accessed 2022).
    [9] "What is Arduino?" Arduino https://www.arduino.cc/en/Guide/Introduction (accessed.
    [10] S. Sonth and J. S. Kallimani, "OCR based facilitator for the visually challenged," in 2017 International Conference on Electrical, Electronics, Communication, Computer, and Optimization Techniques (ICEECCOT), 2017: IEEE, pp. 1-7.
    [11] R. Mithe, S. Indalkar, and N. Divekar, "Optical character recognition," International journal of recent technology and engineering (IJRTE), vol. 2, no. 1, pp. 72-75, 2013.
    [12] D. Berchmans and S. Kumar, "Optical character recognition: an overview and an insight," in 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), 2014: IEEE, pp. 1361-1365.
    [13] D. Sasirekha and E. Chandra, "Text to speech: a simple tutorial," International Journal of Soft Computing and Engineering (IJSCE), vol. 2, no. 1, pp. 275-278, 2012.
    [14] S. R. Mache, M. R. Baheti, and C. N. Mahender, "Review on text-to-speech synthesizer," International Journal of Advanced Research in Computer and Communication Engineering, vol. 4, no. 8, pp. 54-59, 2015.
    [15] "Tesseract Open Source OCR Engine github." Github. https://github.com/tesseract-ocr/tesseract#about (accessed 2021).
    [16] "Google Vision API." Google https://cloud.google.com/vision (accessed 2022).
    [17] "Cloud Text-to-speech basics." Google. https://cloud.google.com/text-to-speech/docs/basics (accessed 2022).
    [18] P. Aggarwal. "Why Raspberry Pi Isn’t a Good Choice for Commercial Products." ALL ABOUT CIRCUIT. https://www.allaboutcircuits.com/technical-articles/10-reasons-raspberry-pi-isnt-a-good-choice-for-commercial-products/ (accessed 2022).
    [19] "Raspberry Pi." hackster.io https://www.hackster.io/raspberry-pi/projects (accessed 2022).
    [20] M. T. Yazici, S. Basurra, and M. M. Gaber, "Edge Machine Learning: Enabling Smart Internet of Things Applications," Big Data and Cognitive Computing, vol. 2, no. 3, p. 26, 2018. [Online]. Available: https://www.mdpi.com/2504-2289/2/3/26.
    [21] F. Femling, A. Olsson, and F. Alonso-Fernandez, "Fruit and vegetable identification using machine learning for retail applications," in 2018 14th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), 2018: IEEE, pp. 9-15.
    [22] V. Stork. "What is Raspberry Pi? Specs and Models (2021 Guide)." freeCodeCamp. https://www.freecodecamp.org/news/what-is-raspberry-pi-specs-and-models-2021-guide/ (accessed 2022).
    [23] "Raspberry Pi 3." COMPONENTS101. https://components101.com/microcontrollers/raspberry-pi-3-pinout-features-datasheet (accessed 2022).
    [24] "Raspberry Pi OS." wikipedia. https://zh.wikipedia.org/zh-tw/Raspberry_Pi_OS (accessed 2022).
    [25] "Linux." wikipedia. https://zh.wikipedia.org/wiki/Linux (accessed 2022).
    [26] A. B. "34 Basic Linux Commands Every User Should Know." Hostinger tutorials. https://www.hostinger.com/tutorials/linux-commands (accessed 2022).
    [27] "安裝Raspberry Pi OS (Debian Bullseye)." 奇特衛科技. https://www.chipwaygo.com/doc/rpi_install.php (accessed 2022).
    [28] "SD Memory Card Formatter for Windows Download." SD Association. https://www.sdcard.org/downloads/formatter/sd-memory-card-formatter-for-windows-download/ (accessed 2022).
    [29] "Raspberry Pi Documentation:Cameras." Raspberry Pi Foundation. https://www.raspberrypi.com/documentation/accessories/camera.html (accessed 2022).
    [30] "Python." https://zh.wikipedia.org/zh-tw/Python (accessed 2022).
    [31] G. Bradski and A. Kaehler, Learning OpenCV: Computer vision with the OpenCV library. " O'Reilly Media, Inc.", 2008.
    [32] W. Bieniecki, S. Grabowski, and W. Rozenberg, "Image preprocessing for improving ocr accuracy," in 2007 international conference on perspective technologies and methods in MEMS design, 2007: IEEE, pp. 75-80.
    [33] A. E. Harraj and N. Raissouni, "OCR accuracy improvement on document images through a novel pre-processing approach," arXiv preprint arXiv:1509.03456, 2015.
    [34] "Types of Morphological Operations." MathWorks https://www.mathworks.com/help/images/morphological-dilation-and-erosion.html (accessed 2022).
    [35] 林酷妹. "影像濾波." https://ithelp.ithome.com.tw/articles/10272114 (accessed 2022).
    [36] A. Polesel, G. Ramponi, and V. J. Mathews, "Image enhancement via adaptive unsharp masking," IEEE transactions on image processing, vol. 9, no. 3, pp. 505-510, 2000. [Online]. Available: https://ieeexplore.ieee.org/stampPDF/getPDF.jsp?tp=&arnumber=826787&ref=.
    [37] A. Rosebrock. "OpenCV Resize Image ( cv2.resize )." pyimagesearch. https://pyimagesearch.com/2021/01/20/opencv-resize-image-cv2-resize/ (accessed 2022).
    [38] A. Rosebrock. "pip install OpenCV." Pyimagesearch. https://pyimagesearch.com/2018/09/19/pip-install-opencv/ (accessed 2022).
    [39] P. K. Charles, V. Harish, M. Swathi, and C. Deepthi, "A review on the various techniques used for optical character recognition," International Journal of Engineering Research and Applications, vol. 2, no. 1, pp. 659-662, 2012.
    [40] "Cloud Vision 定價." Google. https://cloud.google.com/vision/pricing?hl=zh-tw (accessed 2022).
    [41] "許久未見的“手寫字”分享!大家來分享自己的手寫字吧!!順帶票選出最受歡迎的字體喔!!!!." https://www.hpfl.net/forum/thread/26054/3 (accessed.
    [42] M. Jan. "typographie." Pinterest. https://www.pinterest.com/pin/437201076327166062/?mt=login (accessed 2022).
    [43] "素材市集_电子版文字转换手写模拟器." http://www.sucaijishi.com/articles-51-396-1.html (accessed 2022).
    [44] R. A. Khan and J. Chitode, "Concatenative speech synthesis: A Review," International Journal of Computer Applications, vol. 136, no. 3, pp. 1-6, 2016.
    [45] H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis," speech communication, vol. 51, no. 11, pp. 1039-1064, 2009.
    [46] A. v. d. Oord et al., "Wavenet: A generative model for raw audio," arXiv preprint arXiv:1609.03499, 2016.
    [47] H. Zen et al., "The HMM-based speech synthesis system (HTS) version 2.0," SSW, vol. 6, pp. 294-299, 2007.
    [48] H. Ze, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks," in 2013 ieee international conference on acoustics, speech and signal processing, 2013: IEEE, pp. 7962-7966.
    [49] O. Räsänen. "Statistical parametric speech synthesis." Aalto University Wiki. https://wiki.aalto.fi/display/ITSP/Statistical+parametric+speech+synthesis (accessed 2022).
    [50] A. Van Oord, N. Kalchbrenner, and K. Kavukcuoglu, "Pixel recurrent neural networks," in International conference on machine learning, 2016: PMLR, pp. 1747-1756.
    [51] c. Grandma. "語音合成 Part2 – WaveNet 語音生成模型." https://weikaiwei.com/neural/wavenet/ (accessed 2022).
    [52] garydavenport73. "PreferredSoundPlayer." GitHub. https://github.com/garydavenport73/PreferredSoundPlayer (accessed 2022).
    [53] Y. Wang et al., "Tacotron: Towards end-to-end speech synthesis," arXiv preprint arXiv:1703.10135, 2017.

    下載圖示
    2024-09-26公開
    QR CODE