| 研究生: |
梁育誠 Liang, Yu-Cheng |
|---|---|
| 論文名稱: |
應用增音檢測於連續手語斷字之研究 An Approach to Detecting Epentheses for Segmentation of Continuous Sign Language |
| 指導教授: |
謝璧妃
Hsieh, Pi-Fuei |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2010 |
| 畢業學年度: | 98 |
| 語文別: | 英文 |
| 論文頁數: | 42 |
| 中文關鍵詞: | 台灣手語 、增音 、斷字 、隱藏馬可夫模型 |
| 外文關鍵詞: | Taiwanese Sign Language, Epenthesis, Segmentation, HMMs |
| 相關次數: | 點閱:104 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
手語辨識系統為人機介面中不可或缺的一環。近年來,許多研究主題延伸至辨識連續手語語句。面對連續手語首先所遭遇到的問題便是單字的斷字,連續手語中單字間的過渡不易察覺且比劃時間很短。台灣手語可分成手型、軌跡與表情三個部分。為了達成斷字的目標,我們觀察台灣手語語句的特性並使用手型和軌跡資訊來切割句子。
從收集的語料中發現台灣手語具有三種增音。所謂增音為二個單字間,手從前一個單字的結束位置移動(或改變手型)到下一個字的開始位置其過程中不含手語意義的比劃。欲完成連續手語斷字,增音偵測便是重要的問題。台灣手語的增音中,約87%的增音可利用手型變化的時間點偵測。其餘的增音需藉由軌跡變化獲得其資訊,而台灣手語中,每個單字最多一種軌跡和二種手型。我們利用手型和軌跡資訊達到連續手語斷字的目標。
為了得到手型資訊,我們使用類別條件式局部線性肉嵌法處理多角度手型的問題。找出手型變化的時間點後,檢查句子是否包含成對手型的單字。從各單字的區段中,辨識軌跡並找出最佳的軌跡組合以找出各單字的開始和結束時間。對於斷字的結果,我們利用台灣手語的文法來檢查其正確性。
實驗選取20句連續手語語句的影片做測試,結果顯示單字正確切割率約90%,並且在刪除和插入錯誤皆有良好的表現。
Sign language recognition system is an important issue in Human-Computer Interaction. Recently, many researches extend to continuous sign language sentence recognition. The first problem in continuous sign language recognition is spotting the sentence; this problem cause by non-sign word between two sign words and its duration is very short. Taiwanese Sign Language (TSL) contains hand shape, trajectory and facial expression. For spotting sentence, we observe TSL then utilize hand shape and trajectory information to segment sentence.
From the dataset, we notice that TSL has three types of epenthesis. Epenthesis is non-meaningful movement or shape changed between two sign words that beginning from the last location of previous word; end in the start location of current word. To accomplish spotting continuous sign language, epenthesis detection is a significant problem. In TSL, there are about 87% epentheses which can detect by hand shape changed points. The other epenthesis needs trajectory information to determine. Every sign word at most has one kind of trajectory and paired hand shape in TSL. We use hand shape and trajectory information to achieve segmentation of continuous sign language.
For obtaining hand shape, we utilize Class-Conditional Locally Linear Embedding (CLLE) handle multi-view hand shape. After find out shape changed points, we check sentence contains sign word corresponding to paired hand shape or not. Then, from each sign word segment recognize the trajectory and select optimal partition to get start and end frame of each sign word. For robustness, we use grammar constraint to check the spotting result.
The experiment test 20 continuous sign language sentences, the correct spotting rate is about 90% and effetely reduce the deletion and insertion error.
[1] R. N. Chapman, “Lexicon for computer translation of American sign language,” Lecture Note in Computer Science, vol. 1458, pp. 33-49, 1998.
[2] R. Malladi, J. A. Sethian, and B. C. Vemuri, “Shape modeling with front propagation: a level set approach,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, no. 2, pp. 158–175, Feb. 1995.
[3] F. I. Bashir, A. A. Khokhar, and D. Schonfeld, “Object Trajectory-Based Activity Classification and Recognition Using Hidden Markov Models,” IEEE Trans. Image Processing, vol. 16, no. 7, pp. 1912-1919, 2007.
[4] C. Vogler and D. Metaxas, “A framework of recognizing the simultaneous aspects of American Sign Language,” Computer Vision and Image Understanding, vol. 81, no. 81, pp. 358–384, 2001.
[5] C. Vogler and D. Metaxas, “ASL recognition based on a coupling between HMMs and 3D motion analysis,” International Conference on Computer Vision, 1998, pp. 363 -369.
[6] J. Lichtenauer, E. Hendriks, and M. Reinders, “Sign languagerecognition by combining statistical DTW and independent classification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 2040–2046, 2008.
[7] 謝佳成, “利用核等位函數法追蹤複雜背景下之手勢,” 碩士論文--成功大學醫學資訊研究所, 2010.
[8] H. Yang, S. Sclaroff, and S. Lee, “Sign Language Spotting with a Threshold Model Based on Conditional Random Fields,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 7, pp. 1264 – 1277, 2009.
[9] R. Yang,; S. Sarkar,; B. Loeding, “Handling Movement Epenthesis and Hand Segmentation Ambiguities in Continuous Sign Language Recognition using Nested Dynamic Programming,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 3, pp. 462-477, 2010
[10] S. L. Phung, A. Bouzerdoum and D. Chai, “Skin segmentation using color pixel classification: analysis and comparison,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 1, pp. 148-154, Jan. 2005.
[11] V. Vezhnevets, V. Sazonov and A. Andreeva, “A survey on pixel-based skin color detection techniques.” Proc. Graphicon-2003, Moscow, Russia, pp. 85-92, Sep.2003.
[12] M. H. Yang and N. Ahuja, “Gaussian mixture model for human skin color and its applications in image and video databases,” Conf. on Storage and Retrieval for Image and Video Database, vol. 3656, pp. 458-466, 1999.
[13] T. F. Chan and L. A. Vese, “Active contours without edges,” IEEE Trans. Image. Processing, vol. 10, no. 2, pp. 266–277, Feb. 2001.
[14] S. Osher and J. Sethian, “Front propagation with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations,” J. Comput. Phys., vol. 79, pp. 12–49, 1988.
[15] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: active contour models,” Int. Journal Computer Vision, vol. 1, no. 4, pp. 321–331, 1987.
[16] K. Arbter, W. E. Snyder, H. Burkhardt, and G. Hirzinger, “Application of affine-invariant Fourier descriptors to recognition of 3-D objects,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 12, no. 7, pp. 640–647, July 1990.
[17] P.F. Hsieh, M.H. Yang, Y.J. Gu, and Y.C. Liang, “Classification-Oriented Locally Linear Embedding,” submitted to International Journal of Pattern Recognition and Artificial Intelligence, in revision.
[18] 楊銘華, “用於分類之類別條件式局部線性內嵌法,” 碩士論文--成功大學資訊工程研究所, 2007.
[19] P. Perona and M. Polito, “Grouping and dimensionality reduction by locally linear embedding,” Advances in Neural Information Processing Systems 14, pp. 1255-1264, 2002.
[20] H.S. Yoon, J. Soh, B.W. Ming, and H. S. Yang, “Recognition of Alphabetical Hand Gestures Using Hidden Markov Model,” IEICE Trans. Fund. Electr., vol. 87, pp. 1358-1366, Jul. 1999.
[21] J. A. Montero V. and L. E. Sucar S., “Feature Selection for Visual Gesture Recognition Using Hidden Markov Models,” Proc. IEEE Int’l Conf. Computer Science, pp. 196-203, Mexican, 2004.
[22] C. Maggioni, “GestureComputer-New Ways of Operating a Computer,” Proc. Int’l Workshop on Automatic Face and Gesture Recognition, Zurich, Switzerland, pp. 166-171, June 1995.
[23] F. K. H. Quek, T. Mysliwiec, and M. Zhao, “Finger Mouse: A Freehand Pointing Interface,” Proc. Int’l Workshop on Automatic Face and Gesture Recognition, Zurich, Switzerland, pp. 372-377, June 1995.
[24] 鄭佳玄, “台灣手語轉譯之運動軌跡辨識,” 碩士論文--成功大學資訊工程研究所, 2005.
[25] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” IEEE Trans. ASSP, vol. 77, no. 2, pp. 257-286, 1989.
[26] N. Johnson, “Learning object behaviour models,” Ph. D Thesis, School of Computer Studies, University of Leeds, England, Sep 1998.
[27] B. W. Min, H. S. Yoon, J. Soh, Y. M. Yang, and T. Ejima, “Hand Gesture Recognition Using Hidden Markov Models,” International Conference on Systems, Man, and Cybernetics., vol. 5, pp. 4232 – 4235, Oct. 1997.
[28] A. J. Viterbi, “Error Bounds for Convolutional Codes and an Asymptotically Optimal Decoding Algorithm,” IEEE Trans. Informat. Theory, vol. IT-13, pp. 260-269, Apr. 1967.
[29] 戴浩一, 蔡素娟, “手語的本質: 以台灣手語為例,” 《語言與認知》, 國立臺灣大學出版中心, Aug. 2009.
[30] Tsay, Jane, James H.-Y. Tai, H.H. Lee, Y.J. Chen and C.H. Liu.,”2008. Taiwan Sign Language Online Dictionary.,” Institute of Linguistics, National Chung Cheng University, Taiwan.
[31] H. Brashear, T. Starner, P. Lukowicz, and H. Junker, “Using multiple sensors for mobile sign language recognition,” in IEEE International Symposium on Wearable Computers, pp. 45–52, 2003.
[32] Sunita Nayak, Sudeep Sarkar and Barbara Loeding, “Unsupervised Modeling of Signs Embedded in Continuous Sentences,” Proc. IEEE Workshop Vision for Human-Computer Interaction, pp. 81-88, June 2005.
校內:立即公開