| 研究生: |
佘永吉 Sher, Yung-Ji |
|---|---|
| 論文名稱: |
台灣學齡前兒童口語詞彙資料庫之發展 Development of Lexicon Database from Speech Corpus for Taiwanese Pre-school Children |
| 指導教授: |
鍾高基
Chung, Kao-Chi |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
工學院 - 醫學工程研究所 Institute of Biomedical Engineering |
| 論文出版年: | 2006 |
| 畢業學年度: | 94 |
| 語文別: | 英文 |
| 論文頁數: | 153 |
| 中文關鍵詞: | 詞彙資料庫 、口語語料庫 、台灣學齡前兒童 |
| 外文關鍵詞: | Taiwanese Preschool Children, Speech Corpus, Lexicon Database |
| 相關次數: | 點閱:161 下載:14 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
聽語障礙者的語言學習與矯治強調早期療育的必要性。學齡前兒童語言能力的學習、評估與矯治,仰賴具信度與效度的語言量表與教育訓練資料庫。西方國家針對拉丁語系發展系統化的詞彙資料庫以提供臨床評估、治療與教育訓練量表,由於語言的不同無法直接轉移至本土語言應用。國內缺乏兒童語言樣本研發,導致欠缺具信度與效度的語言樣本資料庫,嚴重影響語言教育、臨床語言治療及語音科技的發展。本研究的目的針對台灣二至六歲學齡前兒童,發展建立口語詞彙資料庫。特定目標包括:(1)收集記錄口語語料,轉譯成文字檔案;(2)分析傳統詞頻;(3)正常化分析傳統詞頻;(4)量化分析鄰近雙疊詞的候選詞頻。基於上述研究成果發展並建立語詞資料庫,最後以所建立之資料庫詞彙應用於單音節語詞之研發,以分析植入人工耳蝸兒童之說話清晰度。
研究方法為:(1)記錄80位2至6歲學齡前兒童的自發性口語語料,依年齡分成四組,男女各半,錄音情境包括吃飯、穿衣、洗澡及睡前活動,分別以人工與電腦分析程式轉譯成文字,斷詞、轉注音;(2)計算音素、單字、雙字、三字詞、四字詞的傳統詞頻、總句數、總詞數、語句平均長度的分佈情形;(3)拜氏條件機率正常化分析情境類別意圖分類之分布;(4)二階隱藏式馬可夫模型分析雙疊詞候選詞頻;(5)發展建立資料庫;(6)由資料庫選取語音平衡詞,應用於評量植入人工耳蝸兒童之說話清晰度,包括26位為正常聽力與26位植入人工耳蝸個案,語音清晰度分數分成語詞、音素(聲母及韻母)、聲調四個部分,分別由5位成人聽寫轉譯成文字以計算聽能知覺正確率。
結果蒐集記錄80位學齡前兒童的自發性口語資料,依年齡分成四組,包括45,197個語句、183,781個語詞。口語語料分別由人工及電腦分析程式處理,程式自動轉譯成文字樣本的正確率約為75.4%,斷詞的正確率約是94.37%,自動轉注音程式的正確率約是96.43%。出現最多的單字詞(我)6,133次、雙字詞(媽媽)1,784次、三字詞(為什麼)429次。本研究單一兒童總句數的平均值563句,總詞數的平均值2,289個,語句平均長度3.902。結果顯示總句數、總詞數及語句平均長度均隨著年齡增加而遞增。
本研究的結果發展建立台灣學齡前兒童口語詞彙資料庫,包括:(1)轉譯並標記之文字,斷詞、轉注音;(2)分析音素、單字、雙字、三字詞詞頻、總句數、總詞數、語句平均長度分布的程式與規則;(3)分析正常化拜氏情境類別意圖分類以及雙疊詞候選詞頻的程式與規則;(4)分析音素、單字、雙字、三字詞、四字詞的傳統詞頻之分布情況; (5)分析總句數、總詞數、語句平均長度之分布情況;(6)分析正常化拜氏情境類別意圖分類之分布情況;(7)分析雙疊詞候選詞頻。最後,由資料庫選取語音平衡單音節詞彙,應用於評量植入人工耳蝸兒童之說話清晰度。聽力正常個案的聽能知覺正確率是語詞部分為42.8%,子音部分為62.2%,母音部分為73.0%,以及聲調部分為75.8%。植入人工耳蝸個案的聽能知覺正確率是語詞部分為18.2%,子音部分為40.4%,母音部分為53.6%,以及聲調部分為54.8%。以植入人工耳蝸個案的聽能知覺正確率分數,除以聽力正常個案的聽能知覺正確率分數,是語詞部分為0.425,子音部分為0.649,母音部分為0.735,以及聲調部分為0.723。結果顯示聽力正常個案的語音可辨度比植入人工耳蝸個案的語音可辨度佳。
本研究傳統詞頻分析結果顯示母音產生的數量隨著兒童語言發展增加。傳統詞頻之分布情況可以和正常化拜氏情境類別意圖分類之分布比較,提供教育訓練、研究與臨床不同功能用途的語料分析參數。雙疊詞候選詞頻可以提供相鄰詞彙測試之語詞篩選分析。發展建立音素平衡單音節語詞資料庫,提供正常及植入人工耳蝸兒童之詞彙、音素及聲調的說話清晰度分析。
本研究重要性發展建立語料庫及相關程式與規則,可以提供相關領域量化分析與訓練的兒童學齡前語彙資料庫建立,對於特殊教育語言教學評量評估工具發展、教育訓練的教材、計算語言學研究、臨床聽語障礙治療、研發與訓練等,提供系統化、科學化發展的最基本資料庫。
Language learning and speech therapy are the important issues in early intervention for language disorders. Assessment of language development for the preschool children is relied on the verbal language scale. Developed countries have developed and established lexicon databases for language norm and testing standard. However, they cannot be directly transferred and applied into our native languages such as Mandarin and Taiwanese. There are only a few researches on domestic language samples, which lead to the lack of a reliable and valid lexicon database. Lack of domestic database is detrimental to clinical evaluation, education, clinic training and speech technology.
The purpose of this research is to develop and establish a Mandarin lexicon database for Taiwanese preschool children. More specifically, this research is aimed (1) to record and collect spontaneous speech and transcript the recorded verbal words into text; (2) to analyze conventional word frequency; (3) to analytically normalize the word frequency; (4) to quantitatively analyze bigram perplexities. The outcomes are used to develop and establish a Mandarin lexicon database. Finally, the lexicon database is implemented to develop mono-syllabic words for Mandarin speech intelligibility of cochlear implants (CI).
Totally eighty preschool children of 2 to 6 years old were recruited from the children of medical doctors or therapists in National Cheng Kung University Hospital. Eighty subjects were divided into four groups: 1) A group of 2 to < 3 years old; 2) B group of 3 to < 4 years old; 3) C group of 4 to < 5 years old; 4) D group of 5 to < 6 years old; Group A had 20 subjects (11 male and 9 female); Group B had 20 subjects (11 male and 9 female); Group C had 20 subjects (10 male and 10 female); Group D had 20 subjects (12 male and 8 female). Spontaneous speech corpus were recorded and collected from these subjects in four activities of mealing, dressing, bathing and before sleeping for eighty minutes. Then the recorded verbal corpus was transcript to text through both automatic and manual processes. The occurrence of phonemes, characters, word-tokens, utterances (TNW), word-tokens (NDW), and the mean length of utterances (MLU) were counted for the conventional analysis of word frequency. The mean and standard deviation of each group were calculated for statistical analysis. The word frequency based on semantic-based intension categorization (SIC) was analyzed by Bayesian conditional probability. The bigram perplexities were analyzed through a second-order Hidden Markov model (HMM). The mono-syllabic, which was selected from the lexicon database based on phonetic balance words, is used for clinical speech intelligibility (SI) evaluation. Twenty six CI and 26 normal-hearing subjects are evaluated in a zero azimuth position by listening to the mono-syllabic words of 80 dB SPL output through headset speakers, and then recorded the subject’s speaking words repeatedly. The SI scores, consisted of sub-scores for words, phonemes and tones, are the average correct perception rate of the tested words transcribed by five adult listeners.
The whole speech corpus recorded contains more than 183,781 word-tokens and 45,197 utterances. The transcription results show that the accuracy of using the automatic speech recognition machine is approximately 75.4% correct, using the automatic segmentation programs is approximately 94.37%, using the automatic text-to-phonemes programs is approximately 96.43% based on the comparative evaluation by using the manual transcription. The results demonstrate that ‘I’ is the most frequent character appearing 6,133 times, ‘mother’ is the most frequent bi-phone appearing 1,784 times, and ‘why’ is the most frequent tri-phone appearing 429 times. The mean of utterance frequency for the 80 subjects is 563. The mean of TNW for the 80 subjects is 2,289. The mean of MLU for the 80 subjects is 3.902. The results indicate that utterance frequency, TNW and MLU are increasing with age.
The lexicon database for preschool children of 2 to 6 years old in Taiwan has been developed from this research outcomes, involves seven integrated components including (1) a large formatted text transcripts from speech corpus; (2) programming algorithms of the conventional word frequency including phonological, character-based, and word-based analysis; (3) programming algorithms of lexicon analysis for the Bayesian Normalization Index (BNI) based on semantic-based intension categorization (SIC) mealing, dressing, bathing and before sleeping activities, and bi-gram perplexity by a second-order Hidden Markov Model (HMM); (4) the distribution of the conventional word frequency including phonological, character-based, and word-based analysis; (5) the distribution of the TNW, the number of different sub-syllables, characters, and NDW, as well as MLU within the whole sampled transcript corpus; (6) the distribution of normalized word frequency with BNI of characters based on SIC; (7) the distribution of bi-gram perplexities.
The perception rates for the normal-hearing subject are 42.8% for words, 62.2% for consonants, 73.0% for vowels and 75.8% for tones. The perception rates for the CI subject are 18.2% for words, 40.4% for consonants, 53.6% for vowels and 54.8% for tones. The ratios of CI scores divided by normal scores are 0.425 for words, 0.649 for consonants, 0.735 for vowels and 0.723 for tones. The results indicate that the SI scores of normal group are significantly higher than those of CI group.
The results of conventional word frequency indicate that vowels production is increasing with age for the children with language development stage. The results demonstrate that the BNI and conventional frequency methods are comparable for lexicon analysis of utterances of children in language development stage. However, the BNI method seems to provide more scientific merit than conventional word frequency for adults and is likely to be a better vigorous index of lexicon analysis because of the bias from language sampling and different semantic intension categorization. The HMM could provide a more efficient and scientific selection criteria of language cues in languages model. And the established mono-syllabic corpus consists of 37 phoneme-containing words, which is applied to develop SI scores including in words, phonemes, and tones are founded to be improved with the age for both CI and normal-hearing subjects.
This database is also used extensively by students of child language disorders, aphasia, second language learning, computational linguistics, literacy development, narrative structures, and adult socio-linguistics. The contributions may be significant to language education, special education, and clinical speech therapy as well as commercial application to domestic computer aids instruction.
1. Bloom, L. and M. Lahey, Language Development and Language Disorder. 1978, New York: Wiley.
2. MacWhinney, B., The CHILDES System. American Journal of Speech-Language Pathology, 1996. 5: p. 5-14.
3. Manning, C.D. and H. Schutze, Foundations of Statistical Natural Language Processing. 1999, Cambridge, England: The MIT Press. 29, 31-33, 43, 81, 117-119, 129, 317-320, 347, 575.
4. 林永松, 張簡培崙, and 沈美玲, 中文揚揚格語詞聽閾檢定表. 耳鼻喉科醫學會雜誌, 1997. 32(1): p. 7-13.
5. Yang, H.-M., et al. The Database of Normal Hearing Children in the Language Sampling. in The 8th Asia-Pacific Congress on Deafness. 2002. Taipei, Taiwan.
6. Yang, H.-M., et al. Development of Mandarin Monosyllabic Lexical Neighborhood Test (LNT). in The 4th Congress of Asia Pacific Symposium of Cochlear Implant and Related Science. 2003. Taipei, Taiwan.
7. Yang, H.-M., et al., Assessment of Speech Intelligibility in Children after Cochlear Implantation. J Taiwan Otolaryngol Head Neck Surg, 2003. 38(4): p. 146-152.
8. Fromkin, V. and R. Rodman, An Introduction to Language. 5th ed. 1993: Harcourt Brace Jovanovich.
9. Gee, J.P., An Introduction to Human Language- Fundamental Concept in Linguistics. 1993: Prentice Hall Inc.
10. http://www.elra.info/. [cited; Available from: http://www.elra.info/].
11. http://www.ldc.upenn.edu.
12. http://nora.hd.uib.no/icame.html.
13. http://ota.ahds.ac.uk/.
14. http://childes.psy.cmu.edu/.
15. MacWhinney, B. and C. Snow, The child language data exchange system. J Child Lang, 1985. 12(2): p. 271-95.
16. Kirk, K.I., Assessing Speech Perception in Listeners with Cochlear Implants: The Development of Lexical Neighborhood Tests. The Volta Review, 1999. 100(2): p. 63-85.
17. Aram, D.M., B.L. Ekelman, and J.E. Nation, Preschoolers with language disorders: 10 years later. J Speech Hear Res, 1984. 27(2): p. 232-44.
18. Schery, T.K., Correlates of language development in language-disordered children. J Speech Hear Disord, 1985. 50(1): p. 73-83.
19. Weiner, P., The Value of Follow-up Studies, Topics in Language Disorders 5.3. 1985. 78-92.
20. Kirk, K.I. Comparison of Childrenメs Familiarity with Tokens on the PBK, LNT, and MLNT. in Seventh Symposium on Cochlear Implants in Children. 1999.
21. Luce, P.A., A computational analysis of uniqueness points in auditory word recognition. Percept Psychophys, 1986. 39(3): p. 155-8.
22. Pisoni, D.B. and P.A. Luce, Acoustic-phonetic representations in word recognition. Cognition, 1987. 25(1-2): p. 21-52.
23. Sun, S.-S., et al. Development of Taiwanese Children Lexicon Using Bayesian Estimation and Intension Categorization. in 2001 Conference on Biomedical Engineering Technology. 2001. Chung Li, Taiwan.
24. Fellbaum, C., Wordnet-an Electronic Lexical Database. 1998, England: The MIT Press. 146-147, 247-266.
25. Lee, L., Developmental Sentence Analysis, ed. I. Evanston. 1974: Northwestern University Press.
26. Jurafsky, D. and J.H. Martin, Speech and Language Processing. 2000: Prentice Hall. 187, 197, 239, 658.
27. Rosner, B., Fundamentals of Biostatistics. 4th ed. 1995, USA: ITP. 52-62.
28. Allen, J., Natural Language Understanding. 2nd ed. 1995, Benjamin: Cummings Publishing Company.
29. Krulee, G.K., Computer Processing of Natural Language. 1991: Prentice-Hall.
30. Mann, W.C. and J.P. Lane, Chap. 4: Assistive Technology for Persons with Speech or Cognitive Disabilities. Assistive Technology for Persons with Disabilities-The Role of Occupational Therapy. 1991: American Occupational Therapy Association.
31. S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-27, p. 113-120, Apr. 1979.
32. M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of Speech corrupted by acoustic noise,” in Proc. IEEE ICASSP79, p. 208-211, Washington, DC, Apr. 1979.
33. R. J. McAulay and M. L. malpass, “Speech enhancement using a soft decision noise suppression filter,” IEEE Trans. Acoust., Speech, signal Processing, vol. ASSP-28, p. 137-145, Apr. 1980.
34. Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-32, p. 1109-1121, Dec. 1984.
35. P. Lockwood and J. Boudy, “Experiments with a nonlinear spectral subtractor (NSS), hidden Markov models and projection, for robust recognition in cars,” Speech Commun., vol. 11, p. 215-228, June 1992.
36. J. H. L. Hansen, “Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect,” IEEE Trans. Speech Audio Processing, vol. 2, p. 598-614, Oct. 1994.
37. D. Tsoukalas, M. Paraskevas, and J. Mourjopoulos, “Speech enhancement using psycho-acoustic criteria,” in Proc. IEEE ICASSP93, p. 359-361, Minneapolis, MN, Apr. 1993.
38. T. Usagawa, M. Iwata, and M. Ebata, “Speech parameter extraction in noisy environment using a masking model,” in Proc. IEEE ICASSP94, vol. II, p. 81-84, Adelaide, Australia, Apr. 1994.
39. S. Nandkumar and J. H. L. Hansen, “Dual-channel iterative speech enhancement with constraints on an auditory-based spectrum,” IEEE Trans. Speech and Audio Processing, vol. 3, p. 22-34, Jan. 1995.
40. Nathalie Virag, “Signal channel speech enhancement based on masking properties of the human auditory system,” IEEE Trans. Speech and Audio Processing, vol. 7, p. 126-137, March 1999.
41. M. J. F. Gales and S. J. Young, “Cepstral parameter compensation for HMM recognition in noise,” Speech Commun., vol. 12, p. 231-239, 1993.
42. B. Dautrich, L. Rabiner, T. Martin, “On the effects of varying filter-bank parameters on isolated word recognition,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSp-31, p. 793-806, 1992.
43. Chung-Hsien Wu and Yeou-Jiunn Chen, “Recovery of False Rejection Using Statistical Partial Pattern Trees for Sentence Verification,” Speech Communication, Vol. 43, p.71-88, June, 2004.
44. Chung-Hsien Wu and Yeou-Jiunn Chen, “Multi-Keyword Spotting of Telephone Speech Using Fuzzy Search Algorithm and Keyword-Driven Two-Level CBSM,” Speech Communication, Vol. 33, p.197-212, Feb., 2001.
45. Chung-Hsien Wu, Yeou-Jiunn Chen, and Gwo-Lang Yan, “Integration of Phonetic and Prosodic Information for Robust Utterance Verification,” IEE Proceedings-Vision, Image and Signal Processing, Vol. 147, p. 55-61, Feb., 2000.
46. M. R. Schroeder, B. S. Atal, and J. L. Hall, “Optimizing digital speech coders by exploiting masking properties of the human ear,” J. Acoust. Soc. Amer., vol. 66, p. 1647-1651, December 1979.
47. Saeed V. Vaseghi, Advanced Digital Signal Processing and Noise Reduction, John Wiley & Sons, Ltd, England, 2000.
48. K. Fukunaga, Introduction to Statistical Pattern Recognition, New York: Academic, 1972.
49. http://www.aclclp.org.tw, The Association for Computational Linguistics and Chinese Language Processing
50. C. H. Wu and Y. J. Chen, “Recovery of False Rejection Using Statistical Partial Pattern Trees for Sentence Verification,” Speech Communication, vol. 43, p.71-88, 2004.
51. Chung-Hsien Wu, Yu-Hsien Chiu, Chi-Jiun Shia, and Chun-Yu Lin, “Automatic Segmentation and Identification of Mixed-language Speech using Delta-BIC and LSA-based GMMs,” IEEE Trans. Audio, Speech and Language Processing, Vol. 14, p. 266-276, Jan, 2006.
52. Ricardo Baeza-Yates and Berthier ribeiro-Neto, Modern Information Retrieval, Addison-Wesley, 1999.
53. Mauro Cettolo and Arcello Federico, “Model selection criteria for acoustic segmentation,” in Proc. of the ISCA ITRW ASR 2000 Automatic Speech Recognition, p. 221-227, Paris, France, 2000.