簡易檢索 / 詳目顯示

研究生: 邱毓賢
Chiu, Yu-Hsien
論文名稱: 台灣手語語言處理於聽障者擴大性溝通與語言學習之研究
A Study on Taiwanese Sign Language Processing to Augmentative Communication and Language Learning for the Hearing Impaired
指導教授: 吳宗憲
Wu, Chung-Hsien
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2004
畢業學年度: 92
語文別: 英文
論文頁數: 118
中文關鍵詞: 聽語障礙台灣手語輔助科技電腦輔助教學
外文關鍵詞: hearning/speech impairment, Taiwanese sign language, alternative and augmentative communication, computer aided instruction
相關次數: 點閱:124下載:18
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 輔助科技的提供與應用促使身心障礙者參與並融入社會的主流。聽語障者因先天或後天的功能性障礙而嚴重影響其語言學習與溝通表達。歐美先進國家已發展出替代暨擴大性溝通輔助科技來改善身心障礙者的溝通能力與日常生活獨立性,此類科技的發展原理乃根據身心障礙者殘餘的功能性,透過科技的處理,使其以最具效率的操作模式來產生最充分且具理解性的溝通訊息;相關的輔助系統已成功應用於特殊教育與臨床實務。然而,由於語言及文化的差異,使得此類先進輔助科技無法直接應用或移轉於本土使用者;目前聽語障輔助科技仍處於萌芽階段,其相關技術的研究與開發甚為醫學工程與資訊科學領域中重要的研究課題。
    本研究的目的為設計及發展台灣手語替代暨擴大性溝通輔助系統,以改善本土聽障者溝通及語言學習的效能。本研究之理論基礎與原理包括資訊檢索、計算語言學、機器翻譯、語言模型及影像處理,並且考量符號認知、視覺注意力集中及功能性障礙於人機介面設計,以提供並符合聽障者個別化需求之電腦輔助教學訓練及溝通輔助系統。研究之特定目標,包括:1) 發展具效率性及強健性之手語手勢碼檢索輸入系統,提供使用者以最直覺的模式達成文字的輸入;2) 發展具效率性之手語符號預測虛擬鍵盤來增強輸入及中文文句生成的效能;3) 發展具強健性之手語翻譯系統來達成中文轉譯手語;4) 發展手語影帶串接合成技術來增進手語教育訓練的效能;5) 系統實現及探討本文所提方法之可行性。
    實驗評量主要探討手語詞檢索正確率、手語詞預測正確率、手語轉譯中文生成正確率、中文轉譯手語正確率及中文/手語閱讀理解效能。透過主、客觀之實驗,其結果顯示:本文所提之方法架構與實現系統,在系統功能性評估上,具顯著性的效能提升;在個案施測部分,中文讀寫與手語閱讀理解能力亦有顯著的改善,且於實務應用上,檢索及預測機制呈現容錯的特性。在替代式文字輸入部分,本文所提之最大事後機率估測手語手勢碼檢索模型,可有效模型化使用者輸入型態,其中,根據台灣手語音韻學所定之手語手勢碼,可提供替代性溝通輔具的發展;在文句生成部分,本文所提之手語關鍵詞預測句型樣版語言模型,可提供聽語障者構句及文法矯正的應用;在擴大性溝通與手語學習部分,本文所提之最大事後機率估測二階段對譯模型及動態規劃化影帶串接合成模型,可產生最佳之中文轉譯手語真人合成影像序列,提供聽障者及一般聽人之台灣手語教育訓練。
    本研究之中文/手語相關資料庫、實驗結果及研究探討,可提供語言學家重要的基礎研究資料於手語語言學研究,並提供電腦資訊處理學者具指標性且系統化的概念設計架構於本土輔助科技整合與發展,另外,所研發之電腦輔助教學與溝通輔助系統,可提供特殊教育及復健人員進行相關的教育訓練實務。未來之研究將著重於大量手語平行語料庫發展及運用先進之台灣手語音韻構詞研究發現、自然語言理解及影像處理技術於系統發展與設計的修改及改善。

    Allowing disabled people to participate in everyday life is important, and developing the technological means to support this participation is also important. Speech and hearing dysfunction often affect and limit the language learning and expression seriously. An Alternative and Augmentative Communication (AAC) technology is to facilitate the disabled with minimum efforts of simple input to provide comprehensive information output and greatly improve their expressive communication abilities in daily activities and independence. Assistive communication technology and the developed AAC devices have been developed recently in developed countries for twenty years. The implementation has been proved in special education and clinical practice. Unfortunately, the AAC technology can not be directly applied to the people using Chinese language and Taiwanese Sign Language (TSL). The lacks of domestic AAC technology make a great challenge for native biomedical engineers and computer scientists to design and develop the technology.
    The purpose of this study is to investigate the increase rate and accuracy of communication and language learning aids for the development of TSL AAC system. Theories in information retrieval, computational linguistics, machine translation, language modeling and image processing form the basis to provide underlying principles for the development of this research. Human machine interface design considers human factors, including symbol recognition, visual concentration and physical disability, to develop computer aided instruction (CAI) and communication aided AAC system for customized needs. More specifically, the study was aimed to: 1) develop an effective and robust sign retrieval system using TSL sign features to enable more intuitively the retrieval of a sign word for text entry, 2) develop an effective TSL virtual keyboard and sign prediction strategy for input rate enhancement and text generation, 3) develop a robust language translator for the generation of TSL from Chinese, 4) develop a concatenating sign synthesizer for TSL teaching and learning, and 5) implement and evaluate the proposed systems.
    For the assessment of practical aids, several subjective and objective experiments were performed in the developed educational environments to investigate the sign retrieval accuracy, sign prediction rate, text entry rate, correct translation rate and the performance of reading comprehension. Experimental results show that the proposed approaches give an encouraging improvement in its tasks and shows tolerance to user input errors. Case study also shows that the literacy aptitude test and the performance of reading comprehension were significantly improved. For text entry, the proposed sign retrieval model, based on the maximum a posteriori (MAP) framework, aims to model the retrieval behavior by estimating the probability of matching and occurrence of entry feature patterns. The gesture feature set, based on the TSL phonological features, is used as an indexing strategy and the benefits of developing an alternative symbol system for alternative communication were explored. For text generation and rate enhancement, the proposed predictive sentence template language model has potential to assist people with language defects in sentence formation and grammatical correction. For augmentative communication and language learning, the proposed MAP-based two-pass alignment model and dynamic programming based sign video concatenation approach aims to generate the sign language version of written text with smooth motion perceived by the viewers and to assist in teaching and learning TSL for both hearing people and deaf people.
    The future work is recommended to stepwisely modify and improve each model. The outcomes are expected to provide useful information for language researchers and computer scientists to develop the related assistive technology, and also contribute to useful application in education training and communication aids of special education and rehabilitation engineering.

    中文摘要 ………………………………………………………………………………… VII ABSTRACT …………………………………………………………………………….... IX 誌 謝 (Acknowledgment) ……………………………………………………………… XI CONTENTS …………………………………………………………………………….... XII LIST OF FIGURES ………………………………………………………………………. XV LIST OF TABLES …………………………………………………………………….... XVIII Chapter 1 Introduction ………………………………………………………………. 1 1.1 Motivation …………………………………………………………………….. 5 1.1.1 Purpose and Specific Aims ………………………………………………... 5 1.1.2 Significances ……………………………………………………………..... 6 1.2 Background and Literatures Review ……………………………………… 6 1.2.1 Alternative and Augmentative Communication ………………………….. 6 1.2.2 Alternative Access and Symbol Systems ……….…………………….... 8 1.2.3 Prediction Strategies for Rate Enhancement ……….…………………. 9 1.2.4 Natural Language Processing for Augmentative Communication …….. 10 1.2.5 Current Research and Applications on Sign Language …………..…… 11 1.3 Organization of the Dissertation ………………………………………… 12 Chapter 2 Corpus Development ……………………………………………………….. 14 2.1 Text Corpus Collection and Linguistic Processing ..……………… 14 2.2 Collection and Indexing of TSL Sign Symbols ……………………….. 17 2.2.1 Definition of Sign Features ……………………………………………... 17 2.2.2 Data Structure of Indexed Feature Sequences …………………………. 23 2.3 Parallel Text Corpus ……………………………………………………….. 25 2.4 Development of Sign Video Databases ……………………………………. 27 2.4.1 Annotation of Initial and Final Positions of Hand …………………… 28 2.4.2 Development of a Motion Transition Balanced Corpus …………………. 31 2.4.3 Segmentation and Acquisition of Sign Videos …………………………… 34 2.5 Summary ……..………………………………………………………………….. 35 Chapter 3 Sign Retrieval Using Sign Features ..…………………………………. 37 3.1 Investigation of Retrieval Behavior …………………………………….. 37 3.2 Maximum A Posteriori based Retrieval Framework ……………………... 38 3.2.1 Alignment Probability Estimation ………………………………………... 41 3.2.2 A Priori Probability Estimation ………………………………………….. 42 3.3 The Error-Tolerant Framework ………………………………………………. 44 3.3.1 Hand Shape Recovery for Partial Matching ……………………………... 44 3.3.2 Error Recovery for No Matching ………………………………………..... 45 3.4 Functional Evaluation of Sign Retrieval Framework ………………….. 47 Chapter 4 Text Generation from Taiwanese Sign Language …………………….. 50 4.1 Acquisition of Phrase Formation Rules ……………………………….... 51 4.2 Construction of Predictive Sentence Template Tree …………………… 54 4.3 Sentence Generation …………………………………………………………… 61 4.3.1 Key-Phrase Identification ……………………………………………….... 63 4.3.2 Template Matching ………………………………………………………...... 63 4.3.3 Function Word Filling …………………………………………………….... 66 4.3.4 Automatic Sentence Pattern Learning …………………………………….. 66 4.4 Functional Evaluation of Keystroke Saving Rate ………………………. 68 4.4.1 Perplexity Evaluation …………………………………………………….... 68 4.4.2 Function Word Deletion …………………………………………………..... 69 Chapter 5 Taiwanese Sign Language Translation from Chinese ………………... 71 5.1 Acquisition of Syntactic Clusters and Grammar Fragments …………… 71 5.2 Two-Pass Alignment Model ……………………………………………………. 73 5.3 Evaluation on the Performance of Translation …………………………. 77 Chapter 6 Concatenating Sign Synthesis Using Sign Videos ……………………. 80 6.1 Acquisition of Hand Positions and Motion Trajectory ………………… 80 6.2 Dynamic Programming Based Video Clips Joining ………………………… 81 Chapter 7 Experimental Results and Discussion …………………………………. 85 7.1 Interface Design Considerations ………………………………………….. 85 7.2 Experiments on the Sign Retrieval Interface …………………………… 86 7.2.1 Experiments on Similarity and Clustering of Hand Shapes ……….... 87 7.2.2 Experiments on Extraction of Hand Shape Recovery Patterns ……….. 90 7.2.3 Case Study ………………………………………………………………....... 92 7.2.4 Summary …………………………………………………………………........ 93 7.3 Experiments on the TSL Virtual Keyboard ……………………………….. 94 7.3.1 Development of Sign Prediction Interface ………………………………. 94 7.3.2 Evaluation of Sign-based Scanning ……………………………………….. 95 7.3.3 Evaluation of Reading Comprehension ………………………………….... 96 7.3.4 Summary …………………………………………………………………........ 99 7.4 Experiments on the Sign Synthesis …………….………………………... 100 7.4.1 Evaluation of Sign Video Concatenation ……………………………..... 100 7.4.2 Case Study ………………………………………………………………....... 103 Chapter 8 Conclusions and Future Study ………………………………………….. 104 REFERENCES …………………………………………………………………………....... 107 作者簡歷 (Author’s Biographical Notes) ……………………………………………. 114 著作 (Publications) ………………………………………………………………………. 117

    [Alonso, 1995] Alonso, F., Antonio, Angelica de, Fuertes, Jose L., etc., "Teaching communication skills to hearing-impaired children," IEEE Multimedia, pp. 55-67, 1995.
    [ASHA] American Speech-Language Hearing Association (ASHA), The ASHA Web Site, http://professional.asha.org/research/AAC.htm.
    [Ann, 1993] Ann, Jean, A Linguistic Investigation of the Relationship between Physiology and Handshape. Ph.D. Dissertation, University of Arizona, 1993.
    [Baeza-Yates, 1999] Baeza-Yates, Ricardo and Ribeiro-Neto, Berthier, Modern Information Retrieval. ACM Press, 1999.
    [Berretti, 2000] Berretti, S., Bimbo, A. D. and Pala, P., "Retrieval by shape similarity with perceptual distance and effective indexing," IEEE Trans. on Multimedia, vol. 2, no. 4, 2000.
    [Beukelman, 1992] Beukelman, D. R. and Mirenda, P., Augmentative and Alternative Communication: Management of Severe Communication Disorders in Children and Adults. Paul H. Brookes Pub. Co., 1992.
    [Brown, 1992] Brown, C., "Assistive technology computers and persons with disabilities," Communications of the ACM, vol. 5, pp. 36-46, 1992.
    [Brown,1993] Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., and Mercer, R. L., "The mathematics of statistical machine translation: parameter estimation," Computational Linguistics, Vol. 19, No. 2, pp.263-311, 1993.
    [Chang, 1992] Chang, S. K., et. al., "A Methodology for Iconic Language Design with Application to Augmentative Communication," in Proceedings of the 1992 IEEE Workshop on Visual Language, 1992, pp.110-116.
    [Chou, 2003] Chou, W. and Juang, B. H., Pattern Recognition in Speech and Language Processing. CRC Press, 2003.
    [Cook, 1995] Cook, A. M. and Hussey, S. M., Assistive Technologies: Principles and Practice. St. Louis, MO: Mosby-Year Book, 1995.
    [Cormen, 1994] Cormen, T. H., Leiserson, C. E. and Rivest, R. L., Introduction to Algorithms. The MIT Press, 1994.
    [Darragh, 1992] Darragh, J. and Witten, I., The Reactive Keyboard. Cambridge University Press, 1992.
    [DeGroot, 1970] DeGroot, M. H., Optimal Statistical Decisions. McGraw-Hill Publishing Com., 1970.
    [Demasco, 1992] Demasco, P. and McCoy, K. F., "Generating text from compressed input: an intelligent interface for people with server motor impairments," Comm. of the ACM, vol. 35, no. 5, pp. 68-78, 1992.
    [MOI] Department of Statistics, Ministry of Interior (MOI), Taiwan, The MOI Web Site, http://www.moi.gov.tw/.
    [Dong] Dong Zhendong, The HowNet Web Site, http://www.how-net.com/.
    [Dong, 1999] Dong, Zhou Guo and Teng, Lua Kim, "Interpolation of n-gram and mutual-information based trigger pair language models for Mandarin speech recognition," Computer Speech and Language, vol. 13, pp. 125-141, 1999.
    [Gonnet, 1984] Gonnet, G. H., Handbook of Algorithms and Data Structures. Addison-Wesley Publishing Company, 1984.
    [Hamaker, 1999] Hamaker, J. S., "Towards building a better language model for switchboard: the POS tagging task, " in Proceedings of Int. Conf. Acoustics, Speech, Signal Processing, 1999, pp. 579-582.
    [Hand, 1989] Hand, D. J., Discrimination and Classification. John Wiley & Sons, 1989, pp. 100-101.
    [Hsing, 2000] Hsing, M. H., "An analysis on deaf-school teachers' utterance messages and morpheme semantic features between spoken and signed language channels," Bulletin of Special Education and Rehabilitation, vol. 8, pp. 27-52, June 2000.
    [ISTSL] Institute of Linguistics, National Chung Cheng University, Chiayi, Taiwan. International Symposium on Taiwan Sign Language Linguistics (ISTSL). [Online] Available: http://www.ccunix.ccu.edu.tw/~lngsign/tsl-links-e.htm.
    [Kanji, 1999] Kanji, G. K., 100 Statistical Tests. SAGE Publications Ltd., 1999.
    [Kennaway, 2001] Kennaway, R., "Synthetic animation of deaf signing gestures," Lecture Notes in Computer Science, 4th International Workshop on Gesture and Sign Language Based Human-Computer Interaction, 2001. Lecture Notes in Artificial Intelligence, Vol. 2298.
    [Liang, 1997] Liang, R., Continuous Gesture Recognition System for Taiwanese Sign Language. Ph.D. dissertation, National Taiwan University, Taiwan, 1997.
    [Liddell, 1989] Liddell, S. K. and Johnson, R. E., "American Sign Language: The Phonological Base," Sign Language Studies, vol. 64, pp. 195-277, 1989.
    [Lloyd, 1997] Lloyd, L. L., Fuller D.R., and Arvidson, H. H., Augmentative and Alternative Communication: A Handbook of Principles and Practices. Allyn and Bacon, Inc., 1997.
    [MacKenzie, 1993] MacKenzie, I. Scott and Buxton, William, "A tool for rapid evaluation of input devices using Fitt's law models," in SIGCHI Bulletin, vol. 25, no. 3, pp. 58-63, 1993.
    [Manning, 1999] Manning, C. C. and Schutze, H., Foundations of Statistical Natural Language Processing. The MIT Press, 1999.
    [Manoranjan, 2000] Manoranjan, M. D. and Robinson, John A., "Practical low-cost visual communication using binary images for deaf sign language," IEEE Trans. on Rehab. Eng., vol. 8, no. 1, pp. 81-88, March 2000.
    [Matas, 1985] Matas, J., Mathy-Laikko, P., Beukelman, D. and Legresley, K., "Identifying the Non-Speaking Population: A demographic study," Augmentative and Alternative Communication, vol. 1, pp.17-31, 1985.
    [MOE, 2000] Ministry of Education (MOE), Division of Special Education, Changyong Cihui Shouyu Huace [Sign Albums and Sign Album of Common Words], vol. 1 & 2. Taipei: Ministry of Education, 2000.
    [Mohri, 1997] Mohri, Mehryar, "Finite-state transducers in language and speech processing," Association for Computational Linguistics, vol. 23, pp. 1-42, 1997.
    [Morris, 1991] Morris, C., Newell, Booth, A., L., and Arnott, J., "Syntax Pal - A system to improve the syntax of those with language dysfunction," in Proceedings of the Fourteen Annual RESNA Conference, Washington: RESNA, 1991, pp. 105-106.
    [Ney, 2000] Ney, H., et al., "Algorithms for statistical translation of spoken language," IEEE Trans. on Speech and Audio Processing, Vol. 8, No. 1, pp. 24-36, 2000.
    [Niesler, 1999] Niesler, T. and Woodland, P., "Variable-length category n-gram language models," Computer Speech and Language, vol. 21, pp. 1-26, 1999.
    [Oflazer, 1997] Oflazer, Kemal, "Error-tolerant retrieval of trees," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 19, no. 12, 1997.
    [Oviatt, 1995] Oviatt, Sharon, "Predicting spoken disfluencies during human-computer interaction," Computer Speech and Language, vol. 9, pp.19-35, 1995.
    [Rabiner, 1993] Rabiner, L. and Juang, B., Fundamentals of Speech Recognition. Prentice Hall, 1993.
    [Reichle, 1992] Reichle, Joe, Implementing Augmentative and Alternative Communication: Strategies for Learners with Severe Disabilities. Paul H. Bookes Pub. Co., 1992.
    [Rosenfeld, 1994] Rosenfeld, R., Adaptive Statistical Language Modeling: A Maximum Entropy Approach. Ph.D. Thesis, Carneige Mellon University, Boston, MA, 1994.
    [Ross, 1993] Ross, S. M., Introduction to Probability Models. Academic Press, Inc, 1993.
    [Sandler, 1996] Sandler, Wendy, "Phonological features and feature classes: the case of movements in sign language," Lingua 98, pp. 197-220, 1996.
    [Sara, 1998] Sara, Carmeli and Yechayahu, Shen, "Semantic Transparency and Translucency in Compound Blissymbols," Augmentative and Alternative Communication, vol. 14, pp. 171-183, 1998.
    [Simpson, 1999] Simpson, R. C. and Koester, H. H., "Adaptive one-switch row-column scanning," IEEE Trans. on Rehab. Eng., vol.7, no. 4, pp. 464-473, 1999.
    [Siu, 2000] Siu, M. and Ostendorf, M., "Variable n-grams and extensions for conversational speech language modeling," IEEE Trans. on Speech and Audio Processing, vol. 8, no. 1, pp.63-75, Jan. 2000.
    [Smith, 1997] Smith, Wayne H. and Ting, Li-Fen, Shou Neng Sheng Chyau [Your Hands Can Become A Bridge], vol. 1 & 2. Taipei: Deaf Sign Language Research Association, R.O.C., 1997.
    [Smith, 1989] Smith, Wayne H., Morphological Characteristics of Verbs in Taiwan Sign Language. Ph.D. dissertation, Indiana University, 1989.
    [Solina, 1999] Solina, F. and Krape, S., "Synthesis of the sign language of the deaf from the sign video clips," Electrotechnical Review, Vol. 66, pp.260-265, 1999.
    [Speers, 2001] Speers, d'Armond L., Representation of American Sign Language for Machine Translation. Ph.D. Dissertation, Graduate School of Arts and Sciences, Georgetown University, 2001.
    [Starner, 1998] Starner, T., Weaver, J. and Pentland, A., "Real-time American sign language recognition using desk and wearable computer-based video," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 20, no. 12, pp. 1371-1375, 1998.
    [Su, 2001] Su, Mu-Chun, Zhao, Yu-Xiang, Huang, Hai, and Chen, Hsuan-Fan, " A fuzzy rule-based approach to recognizing 3-D Arm Movements," IEEE Trans. on Neural Systems and Rehab. Eng., vol. 9, no. 2, 2001.
    [Swiffin, 1987] Swiffin, A. L., Arnott, J. L., and Newell, A. F., "Adaptive and predictive techniques in a communication prosthesis," Augmentative and Alternative Communication, vol. 3, no. 4, pp. 181-191, 1987.
    [CKIP, 1993] The Chinese Knowledge Information Processing Group (CKIP), Analysis of Chinese Part of Speech. CKIP Technical Report (in Chinese), no. 93-05, Institute of Information Science, Academic Sinica, Taipei, 1993.
    [Theodoridis,1999] Theodoridis, S. and Koutroumbas, K., Pattern Recognition. Academic Press, 1999.
    [Valli, 1995] Valli, Clayton and Lucas, Ceil, Linguistics of American Sign Language: An Introduction. Gallaudet University Press, 1995.
    [VanDyke, 1991] VanDyke, J. A., Word Prediction for Disabled Users: Applying Natural Language Processing to Enhance Communication. Thesis, University of Delaware, 1991.
    [Vanderheiden, 1982] Vanderheiden, G. C., "Computer can play a dual role for disabled individuals," BYTE, vol. 7, no. 9, pp.136-62, 1982.
    [Vogler 2001] Vogler, C. and Metaxas, D., "A framework for recognizing the simultaneous aspects of American sign language," Computer Vision and Image Understanding, no. 81, pp. 358-384, 2001.
    [Vogler, 1999] Vogler, C. and Metaxas, D., "Toward scalability in ASL recognition: breaking down signs into phonemes," Lecture Notes in Artificial Intelligence, vol. 1739, pp. 211-224, 1999.
    [Wahlster, 2000] Wahlster, Wolfgang, Verbmobil: Foundations of Speech-to-Speech Translation. Springer-Verlag Press, 2000.
    [Webster, 1985] Webster, J. G., Cook, A. M., Tompkins, W. J., and Vanderheiden, G. C., Electronic Devices for Rehabilitation. John Wiley & Sons, 1985.
    [Wilcox, 1997] Wilcox, S. and Wilcox, P. P., Learning to See. Gallaudet University Press, 1997.
    [Zhou, 2000] Zhou, Q. and Feng, S., "Build a relation network representation for How-net," in Proceedings of International Conference on Multilingual information Processing, 2000, pp. 139-145.

    下載圖示 校內:立即公開
    校外:2004-01-05公開
    QR CODE