| 研究生: |
葉品忻 Ye, Pin-Xin |
|---|---|
| 論文名稱: |
運用語音切割技術與動態時軸校正自動辨認口吃重複音 Using speech segmentation technology and dynamic time warping for the automatic recognition of repetitions in stuttered speech |
| 指導教授: |
謝孟達
Shieh, Meng-Dar |
| 學位類別: |
碩士 Master |
| 系所名稱: |
規劃與設計學院 - 工業設計學系 Department of Industrial Design |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 英文 |
| 論文頁數: | 63 |
| 中文關鍵詞: | 口吃 、語音辨識 、語音切割 、動態時軸校正 |
| 外文關鍵詞: | stuttering, speech recognition technology, speech segmentation technology, dynamic time warping |
| 相關次數: | 點閱:94 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
口吃是語言障礙的一種,在先前的研究中可知,全世界有1%的人有口吃問題。實務上語言治療師計算每一百字中口吃特徵出現的次數當作評估的標準;而判斷為口吃與否,攙雜著聽者的主觀知覺,耗時間,容易出錯且缺乏一致性。因此,自動化且準確的口吃評估對於臨床上是非常有幫助的。故本研究的重點在於將語音切割技術應用在口吃的研究上,並以中文的語料作為研究的素材進而探討設計出更方便協助語言治療師診斷的工具。因此本研究以語音的基本參數作為重複現象切割上的依據,將說中文的一般人模仿口吃重複現象的語音特徵提取出來,並分析其變化程度,以達到客觀準確的口吃言語特徵提取與辨認。在口吃重複音辨識方面,則是運用動態時軸校正(Dynamic time warping, DTW)的技術,此方法不需要龐大的語音樣本做訓練,亦可以辨識出相鄰的語音片段是否一致,以達到重複音的辨別。研究主要得到三個結論,(一)研究使用的語音切割系統已可以切割大部分中文語音,但是在端點偵測部分仍有少部分需修改。(二)音素與單音節詞在DTW門檻值設定上沒有顯著差異。(三)重複現象辨識效果已達83%以上,證明DTW在辨識重複上是可行的。
Stuttering is one of the various speech disorders. According to previous studies, 1% of people worldwide have a problem with stuttering. In practice, a speech language pathologist calculates the number of stuttering occurrences per one hundred words as the standard of assessment. The judgment of whether or not a person stutters is mixed with the subjective perception of the therapist. This system of judgment is time-consuming, error-prone and lacks consistency. Having an automatic and accurate method to assess stuttering for clinical practice would be very helpful. Therefore, the focus of this thesis is to apply speech segmentation technology in stuttering research and to use Mandarin speech as the material to explore a more convenient and accurate tool designed to assist speech language pathologists in diagnosis.
The repetition segmentation method of this thesis is based on the basic features of speech. It extracts the imitated stuttering repetition features of people who speak Chinese and analyzes the degree of variation in order to achieve accurate and objective stuttering speech features extraction and identification of stuttering. In recognition of stuttering repetition, dynamic time warping is used. This method does not require a huge number of voice samples to do the training, and, to achieve the goal of repetition recognition, it can also identify whether the adjacent voice clips are the same.
The thesis has reached three conclusions: 1. The voice segmentation system used can cut out most of the Mandarin speech, but there is a recording room for enhancement in the end-point detection method. 2. There are no significant differences for phoneme and single-syllable words in the setting of the DTW threshold value. 3. The repetition recognition result has reached more than 83%, proving that the DTW is feasible and fast in the recognition of repetition.
Reference in English
Bloodstein, O. (1995). A Handbook on Stuttering(5th ed.). San Diego,CA:Singular Publishing Group.
Che Yong Yeo, Al-Haddad, S.A.R., Chee Kyun Ng (2012). Dog voice identification (ID) for detection system. Digital Information Processing and Communications (ICDIPC), 2012 Second International Conference on.
Chia Ai, O., Hariharan, M., Sazali, Y., & Sin Chee, L. (2012). "Classification of speech dysfluencies with MFCC and LPCC features." Expert Systems with Applications 39(2): 2157-2165.
Czyzewski, A., Kaczmarek, A., & Kostek, B. (2003). "Intelligent processing of stuttered speech." Journal of Intelligent Information Systems 21(2): 143-171.
Finn, P., Ingham, R., Ambrose, N., & Yairi, E. (1997). "Children recovered from stuttering without formal treatment: Perceptual assessment of speech normalcy." Journal of Speech, Language and Hearing Research 40(4): 867.
Geetha, Y. V., Pratibha, K., Ashok, R., & Ravindra, S. K. (2000). "Classification of childhood disfluencies using neural networks." Journal of fluency disorders 25(2): 99-117.
Goldberg, S. A. (1981). Behavioral cognitive stuttering therapy, CC Publications.
Howell, P., Davis, S., & Bartrip, J. (2009). "The University College London Archive of Stuttered Speech (UCLASS)." Journal of Speech, Language and Hearing Research 52(2): 556.
Howell, P., & Sackin, S. (1995). "Automatic recognition of repetitions and prolongations in stuttered speech." Proceedings of the First World Congress on Fluency Disorders 2: 372-374.
Howell, P., Sackin, S., & Glenn, K. (1997). "Development of a two-stage procedure for the automatic recognition of dysfluencies in the speech of children who stutter: I. Psychometric procedures appropriate for selection of training material for lexical dysfluency classifiers." Journal of Speech, Language and Hearing Research 40(5): 1073.
Jang, J.-S. R. Audio Signal Processing and Recognition. Retrieved from http://mirlab.org/jang/books/audioSignalProcessing/
Krekovic, G., & Petrinovic, D. (2012). Automated control of sound synthesis in live musical performances using modified online time warping. ELMAR, 2012 Proceedings.
Nöth, E., Niemann, H., Haderlein, T., Decher, M., Eysholdt, U., Rosanowski, F., & Wittenberg, T. (2000). Automatic Stuttering Recognition Using Hidden Markov Models. Sixth International Conference on Spoken Language Processing.
Palazón-González, V.,& Marzal, A. (2012). "On the dynamic time warping of cyclic sequences for shape retrieval." Image and Vision Computing 30(12): 978-990.
Ravikumar, K. M., Rajagopal, R., & Nagaraj, H.C. (2009). "An approach for objective assessment of stuttered speech using MFCC features." ICGST International Journal on Digital Signal Processing, DSP 9(1): 19-24.
Ravikumar, K., Reddy, B., Rajagopal, R., & Nagaraj, H. (2008). "Automatic detection of syllable repetition in read speech for objective assessment of stuttered disfluencies." World Academy of Science, Engineering and Technology(46): 270-273.
Sajjan, S. C., & Vijaya, C. (2012). Comparison of DTW and HMM for isolated word recognition. Pattern Recognition, Informatics and Medical Engineering (PRIME), 2012 International Conference on.
Serrà, J., Gómez, E., Herrera, P., & Serra, X. (2008). "Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification." Audio, Speech, and Language Processing, IEEE Transactions on 16(6): 1138-1151.
Shen, J. L., Hung, J. W., & Lee, L. S. (1998). "Robust entropy-based endpoint detection for speech recognition in noisy environments." Proc. ICSLP 98.
Kelvin Lo Vir Siang, & Siak Wang Khor (2012). Path clustering using Dynamic Time Warping technique. Computing Technology and Information Management (ICCM), 2012 8th International Conference on.
Silverman, F. H. (1996). Stuttering and other fluency disorders 2rd edition, Allyn and Bacon.
Silverman, F. H. (2004). Stuttering and other fluency disorders 3rd edition, Waveland Pr Inc.
Sin Chee, L., Chia Ai, O., & Sazali, Y. (2009). "Overview of automatic stuttering recognition system." Proceedings of the International Conference on Man-Machine Systems (ICoMMS): 5B7 1 - 5B7 6.
Subbu, K. P., Gozick, B., & Dantu, R. (2011). Indoor localization through dynamic time warping. Systems, Man, and Cybernetics (SMC), 2011 IEEE International Conference on.
Suresh, P., Vasudevan, N., & Ananthanarayanan, N. (2012). Computer-Aided Interpreter for Hearing and Speech Impaired. Computational Intelligence, Communication Systems and Networks (CICSyN), 2012 Fourth International Conference on.
Szczurowska, I., Kuniszyk-Jozkowiak, W., & Smolka, E. (2006). "The application of Kohonen and Multilayer Perceptron Networks in the speech nonfluency analysis." Archives of Acoustics 31(4).
Wiśniewski, M., Kuniszyk-Jóźkowiak, W., Smołka, E., & Suszyński, W. (2007). "Automatic detection of prolonged fricative phonemes with the Hidden Markov Models approach." Journal of Medical Informatics & technologies 11: 293-297.
Yairi, E., & Ambrose, N. G. (2005). Early childhood stuttering for clinicians by clinicians, Pro Ed.
Reference in Chinese
王小川 (2007). 語音訊號處理修訂版. 新北市:全華圖書股份有限公司.[Wang, H.C (2007). Speech Signal Processing-Revision. New Taipei City, Taiwan, ROC: Chuan Hwa Books Co. , Ltd.]
林宥余 (2010). 使用取樣點式聲學參數之音素分段. 電信工程研究所, 國立交通大學.[Lin, Y.Y (2010).Phonetic Segmentation using Sample-based Acoustic Parameters. Institute of Communications Engineering,NCTU.]
楊淑蘭、莊淳斐(2011).修訂中文口吃嚴重度評估工具(成人版)(SSI-4).台北:心理出版社。[Yang, S.L.,& Chuang, C.F. (2011). The Revised Stuttering Severity Instrument-4 for Mandarin Speaking Adults. Taipei City, Taiwan, Psychological Publishing Co., Ltd.]
楊淑蘭(2011). 口吃-理論與實務工作. 台北:心理出版社.[Yang, S.L. (2011). Stuttering-Theory and Practice. Taipei City, Taiwan, Psychological Publishing Co., Ltd.]
鄭靜宜(2011). 語音聲學-說話聲音的科學. 台北:心理出版社.[Jeng,J.Y.(2011). Phonetics - Speech voices science. Taipei City, Taiwan, Psychological Publishing Co., Ltd.]
國立臺灣師範大學國語教材編輯委員會(1991).國音學.新北市:正中書局股份有限公司.[Mandarin Textbook Editorial Committee of National Taiwan Normal University. (1991).Mandarin Chinese Phonetics.New Taipei City, Taiwan, CHENG CHUNG BOOK]
校內:2023-12-31公開