| 研究生: |
劉日新 Liu, Jhin-Hsin |
|---|---|
| 論文名稱: |
樹莓派語音輔助閱讀器之製作 Implementation of Raspberry Pi-Based Text Reader |
| 指導教授: |
李炳鈞
Li, Bing-Jing |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 中文 |
| 論文頁數: | 74 |
| 中文關鍵詞: | 樹莓派 、相機鏡頭模組 、字元辨識 、圖像處理 、OpenCV 、Tesseract 、Google Vision 、Google TTS API 、Python |
| 外文關鍵詞: | Raspberry Pi, picamera, Python, image processing, optical character recognition, Text to Speech |
| 相關次數: | 點閱:71 下載:6 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文使用樹莓派Raspberry Pi搭配樹莓派原廠鏡頭,以及壓克力材質的盒子製作一簡單的語音輔助閱讀器,使用Python語言搭配其相關的技術文件實現將紙本文件轉換為語音的功能。在此研究,會以電腦印刷文件和手寫稿為主,將兩種不同的文件上的文字由圖片轉換成聲音的形式,在一開始先使用Picamera操作樹莓派相機模組將紙本文件給擷取下來並將圖片存在樹莓派裡,之後使用OpenCV將拍攝下來的圖片做圖片前處理以提升字元辨識的準確率,接著使用字元辨識工具Tesseract以及Google Cloud Platform的Google Vision將處理過的圖片轉換成文字,最後再透過文字轉語音工具Google TTS API將文字轉換成音檔。經過實驗,得知目前使用Google Vision來做字元辨識,再搭配Google TTS API將文字轉換成聲音,不論是在電腦印刷文件或是手寫稿的測試上都有最好的效果。在結論,有提供語音輔助閱讀器的展示影片,呈現的內容為將紙本文件上的文字內容轉換成語音檔的過程。若能夠將語音輔助閱讀器繼續做研發,在將來則能夠作為給視障者使用的視覺輔具。
This research demonstrates text reader which consists of Raspberry Pi, pi-camera, a folder and an acrylic box and uses modules in Python to transform text on the paper into an audio file. The testing materials for the text reader include printed and hand-written text. Images captured by pi-camera are processed and transformed into homologous text on Raspberry Pi with tools of optical character recognition. After that, the text is converted to an audio file with tool of Text to Speech. A video is provided with a link on Youtube to demonstrate the operation and result of the text reader. Comparison and discussion indicate that Google Vision is better than Tesseract on recognizing text, especially on hand-written text. The text reader presented still needs lots of development to be more practical in the future.
[1] 世界衛生組織. "盲症與視力損害." https://www.who.int/zh/news-room/fact-sheets/detail/blindness-and-visual-impairment (accessed 2022).
[2] G. S. Shrestha and R. Kaiti, "Visual functions and disability in diabetic retinopathy patients," Journal of optometry, vol. 7, no. 1, pp. 37-43, 2014.
[3] 杞昭安. "視覺輔助圖解." http://www.tcda.org.tw/wp-content/uploads/2014/12/%E8%A6%96%E9%9A%9C%E8%BC%94%E5%85%B7%E5%9C%96%E8%A7%A31.pdf (accessed 2022).
[4] DO-IT. "How are the terms low vision, visually impaired, and blind defined?" University of Washington. https://www.washington.edu/doit/how-are-terms-low-vision-visually-impaired-and-blind-defined (accessed 2022).
[5] L. Hardesty. "Finger-mounted reading device for the blind." MIT News Office. https://news.mit.edu/2015/finger-mounted-reading-device-blind-0310 (accessed 2022).
[6] "OrCam MyEye 2-適合盲人和視力受損者." ORCAM. https://www.orcam.com/hk/orcam-myeye-2/ (accessed 2022).
[7] "i-Reader 2." irie AT. https://irie-at.com/product/i-reader-2/ (accessed 2022).
[8] "About us." Raspberry Pi Foundation. https://www.raspberrypi.org/about/ (accessed 2022).
[9] "What is Arduino?" Arduino https://www.arduino.cc/en/Guide/Introduction (accessed.
[10] S. Sonth and J. S. Kallimani, "OCR based facilitator for the visually challenged," in 2017 International Conference on Electrical, Electronics, Communication, Computer, and Optimization Techniques (ICEECCOT), 2017: IEEE, pp. 1-7.
[11] R. Mithe, S. Indalkar, and N. Divekar, "Optical character recognition," International journal of recent technology and engineering (IJRTE), vol. 2, no. 1, pp. 72-75, 2013.
[12] D. Berchmans and S. Kumar, "Optical character recognition: an overview and an insight," in 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), 2014: IEEE, pp. 1361-1365.
[13] D. Sasirekha and E. Chandra, "Text to speech: a simple tutorial," International Journal of Soft Computing and Engineering (IJSCE), vol. 2, no. 1, pp. 275-278, 2012.
[14] S. R. Mache, M. R. Baheti, and C. N. Mahender, "Review on text-to-speech synthesizer," International Journal of Advanced Research in Computer and Communication Engineering, vol. 4, no. 8, pp. 54-59, 2015.
[15] "Tesseract Open Source OCR Engine github." Github. https://github.com/tesseract-ocr/tesseract#about (accessed 2021).
[16] "Google Vision API." Google https://cloud.google.com/vision (accessed 2022).
[17] "Cloud Text-to-speech basics." Google. https://cloud.google.com/text-to-speech/docs/basics (accessed 2022).
[18] P. Aggarwal. "Why Raspberry Pi Isn’t a Good Choice for Commercial Products." ALL ABOUT CIRCUIT. https://www.allaboutcircuits.com/technical-articles/10-reasons-raspberry-pi-isnt-a-good-choice-for-commercial-products/ (accessed 2022).
[19] "Raspberry Pi." hackster.io https://www.hackster.io/raspberry-pi/projects (accessed 2022).
[20] M. T. Yazici, S. Basurra, and M. M. Gaber, "Edge Machine Learning: Enabling Smart Internet of Things Applications," Big Data and Cognitive Computing, vol. 2, no. 3, p. 26, 2018. [Online]. Available: https://www.mdpi.com/2504-2289/2/3/26.
[21] F. Femling, A. Olsson, and F. Alonso-Fernandez, "Fruit and vegetable identification using machine learning for retail applications," in 2018 14th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), 2018: IEEE, pp. 9-15.
[22] V. Stork. "What is Raspberry Pi? Specs and Models (2021 Guide)." freeCodeCamp. https://www.freecodecamp.org/news/what-is-raspberry-pi-specs-and-models-2021-guide/ (accessed 2022).
[23] "Raspberry Pi 3." COMPONENTS101. https://components101.com/microcontrollers/raspberry-pi-3-pinout-features-datasheet (accessed 2022).
[24] "Raspberry Pi OS." wikipedia. https://zh.wikipedia.org/zh-tw/Raspberry_Pi_OS (accessed 2022).
[25] "Linux." wikipedia. https://zh.wikipedia.org/wiki/Linux (accessed 2022).
[26] A. B. "34 Basic Linux Commands Every User Should Know." Hostinger tutorials. https://www.hostinger.com/tutorials/linux-commands (accessed 2022).
[27] "安裝Raspberry Pi OS (Debian Bullseye)." 奇特衛科技. https://www.chipwaygo.com/doc/rpi_install.php (accessed 2022).
[28] "SD Memory Card Formatter for Windows Download." SD Association. https://www.sdcard.org/downloads/formatter/sd-memory-card-formatter-for-windows-download/ (accessed 2022).
[29] "Raspberry Pi Documentation:Cameras." Raspberry Pi Foundation. https://www.raspberrypi.com/documentation/accessories/camera.html (accessed 2022).
[30] "Python." https://zh.wikipedia.org/zh-tw/Python (accessed 2022).
[31] G. Bradski and A. Kaehler, Learning OpenCV: Computer vision with the OpenCV library. " O'Reilly Media, Inc.", 2008.
[32] W. Bieniecki, S. Grabowski, and W. Rozenberg, "Image preprocessing for improving ocr accuracy," in 2007 international conference on perspective technologies and methods in MEMS design, 2007: IEEE, pp. 75-80.
[33] A. E. Harraj and N. Raissouni, "OCR accuracy improvement on document images through a novel pre-processing approach," arXiv preprint arXiv:1509.03456, 2015.
[34] "Types of Morphological Operations." MathWorks https://www.mathworks.com/help/images/morphological-dilation-and-erosion.html (accessed 2022).
[35] 林酷妹. "影像濾波." https://ithelp.ithome.com.tw/articles/10272114 (accessed 2022).
[36] A. Polesel, G. Ramponi, and V. J. Mathews, "Image enhancement via adaptive unsharp masking," IEEE transactions on image processing, vol. 9, no. 3, pp. 505-510, 2000. [Online]. Available: https://ieeexplore.ieee.org/stampPDF/getPDF.jsp?tp=&arnumber=826787&ref=.
[37] A. Rosebrock. "OpenCV Resize Image ( cv2.resize )." pyimagesearch. https://pyimagesearch.com/2021/01/20/opencv-resize-image-cv2-resize/ (accessed 2022).
[38] A. Rosebrock. "pip install OpenCV." Pyimagesearch. https://pyimagesearch.com/2018/09/19/pip-install-opencv/ (accessed 2022).
[39] P. K. Charles, V. Harish, M. Swathi, and C. Deepthi, "A review on the various techniques used for optical character recognition," International Journal of Engineering Research and Applications, vol. 2, no. 1, pp. 659-662, 2012.
[40] "Cloud Vision 定價." Google. https://cloud.google.com/vision/pricing?hl=zh-tw (accessed 2022).
[41] "許久未見的“手寫字”分享!大家來分享自己的手寫字吧!!順帶票選出最受歡迎的字體喔!!!!." https://www.hpfl.net/forum/thread/26054/3 (accessed.
[42] M. Jan. "typographie." Pinterest. https://www.pinterest.com/pin/437201076327166062/?mt=login (accessed 2022).
[43] "素材市集_电子版文字转换手写模拟器." http://www.sucaijishi.com/articles-51-396-1.html (accessed 2022).
[44] R. A. Khan and J. Chitode, "Concatenative speech synthesis: A Review," International Journal of Computer Applications, vol. 136, no. 3, pp. 1-6, 2016.
[45] H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis," speech communication, vol. 51, no. 11, pp. 1039-1064, 2009.
[46] A. v. d. Oord et al., "Wavenet: A generative model for raw audio," arXiv preprint arXiv:1609.03499, 2016.
[47] H. Zen et al., "The HMM-based speech synthesis system (HTS) version 2.0," SSW, vol. 6, pp. 294-299, 2007.
[48] H. Ze, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks," in 2013 ieee international conference on acoustics, speech and signal processing, 2013: IEEE, pp. 7962-7966.
[49] O. Räsänen. "Statistical parametric speech synthesis." Aalto University Wiki. https://wiki.aalto.fi/display/ITSP/Statistical+parametric+speech+synthesis (accessed 2022).
[50] A. Van Oord, N. Kalchbrenner, and K. Kavukcuoglu, "Pixel recurrent neural networks," in International conference on machine learning, 2016: PMLR, pp. 1747-1756.
[51] c. Grandma. "語音合成 Part2 – WaveNet 語音生成模型." https://weikaiwei.com/neural/wavenet/ (accessed 2022).
[52] garydavenport73. "PreferredSoundPlayer." GitHub. https://github.com/garydavenport73/PreferredSoundPlayer (accessed 2022).
[53] Y. Wang et al., "Tacotron: Towards end-to-end speech synthesis," arXiv preprint arXiv:1703.10135, 2017.