| 研究生: |
黃書堯 Hwang, Shu-Yao |
|---|---|
| 論文名稱: |
無詞彙標註的台灣手語翻譯之研究 Toward Gloss-Free Sign Language Translation for Taiwanese Sign Language |
| 指導教授: |
賀保羅
Horton, Paul |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 醫學資訊研究所 Institute of Medical Informatics |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 44 |
| 中文關鍵詞: | 台灣手語 、手語翻譯 、Gloss-Free |
| 外文關鍵詞: | Taiwanese Sign Language, Sign Language Translation, Gloss-Free |
| 相關次數: | 點閱:4 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Gloss(詞彙)是指用文字標示手語中每一個手勢所對應的語意單位,常作為手語翻譯中的中介表徵。本研究探討在低資源條件下,是否能以不依賴 gloss 標註的神經網路架構,實現台灣手語 (TSL) 至自然語言句子的自動翻譯。我們採用一種 gloss-free翻譯方法,直接將手語影片映射為對應的自然語言輸出,無需任何人工 gloss 標記。為彌補 gloss 缺乏所造成的語意落差,採用的架構中包含以自然語言句子自動擷取詞彙為基礎的預訓練策略,將其作為模型學習語意對齊的弱監督信號。
訓練資料包含約 500 則句子級的台灣手語影片,並保留部分資料作為驗證集進行模型評估。由於資料規模有限,雖然模型在訓練集上表現良好,卻在驗證集上出現泛化不足的情況,顯示過擬合的產生。實驗結果顯示,此方法可在無 gloss 標註情況下在部分語句中展現出語法與主詞辨識上的能力。
整體而言,本研究初步驗證了 gloss-free 手語翻譯方法在台灣手語上的可行性,並指出未來可從擴充資料規模、強化預訓練機制與導入應用場景知識等方向進行改進,以提升實務應用效能。
This thesis investigates the feasibility of translating Taiwanese Sign Language (TSL) into natural language sentences under low-resource conditions using a neural architecture that does not rely on gloss annotations. We adopt a gloss-free translation approach that directly maps sign language videos to corresponding textual output without any manual gloss labels.
To compensate for the semantic gap caused by the absence of gloss supervision, the adopted architecture includes a pretraining strategy based on automatically extracting lexical cues from spoken language sentences. This serves as a form of weak supervision to guide the model in learning semantic alignment.
The training data consists of approximately 500 sentence-level TSL videos, with a subset reserved for validation. Due to the limited dataset size, the model performs well on the training set but shows poor generalization on the validation set, indicating the presence of overfitting. Nevertheless, experimental results demonstrate that even without gloss annotations, the model exhibits some capacity to capture grammatical structure and subject information in certain sentences.
Overall, this thesis provides an initial validation of the feasibility of gloss-free sign language translation for TSL, and highlights potential directions for future improvements—such as expanding the dataset, enhancing pretraining mechanisms, and incorporating domain-specific knowledge—to improve practical applicability.
N. C. Camgoz, S. Hadfield, O. Koller, H. Ney, and R. Bowden, “Neural sign language translation,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 7784–7793. DOI: 10.1109/CVPR.2018.00812.
N. C. Camgoz, O. Koller, S. Hadfield, and R. Bowden, “Sign language transformers: Joint end-to-end sign language recognition and translation,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” ICML ’06 Proceedings of the 23rd international conference on Machine learning, 2006, pp. 369–376.
M. Honnibal, I. Montani, S. Van Landeghem, and A. Boyd, “spaCy: Industrial-strength Natural Language Processing in Python,” 2020. DOI: 10.5281/zenodo.1212303.
E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” CoRR, vol. abs/2106.09685, 2021. arXiv: 2106.09685. [Online]. Available: https://arxiv.org/abs/2106.09685.
A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, and T. Mikolov, “Fast-Text.zip: Compressing text classification models,” arXiv e-prints, arXiv:1612.03651, arXiv:1612.03651, Dec. 2016. DOI: 10.48550/arXiv.1612.03651. arXiv: 1612.03651 [cs.CL].
C.-Y. Lin, “ROUGE: A package for automatic evaluation of summaries,” Text Summarization Branches Out, Barcelona, Spain: Association for Computational Linguistics,Jul. 2004, pp. 74–81. [Online]. Available: https : / / aclanthology . org / W04 - 1013/.
X. V. Lin, T. Mihaylov, M. Artetxe, T. Wang, S. Chen, D. Simig, M. Ott, N. Goyal, S. Bhosale, J. Du, R. Pasunuru, S. Shleifer, P. S. Koura, V. Chaudhary, B. O’Horo, J. Wang, L. Zettlemoyer, Z. Kozareva, M. T. Diab, V. Stoyanov, and X. Li, “Few-shot learning with multilingual language models,” CoRR, vol. abs/2112.10668, 2021. arXiv: 2112.10668. [Online]. Available: https://arxiv.org/abs/2112.10668.
I. Loshchilov and F. Hutter, “Decoupled Weight Decay Regularization,” arXiv e-prints, arXiv:1711.05101, arXiv:1711.05101, Nov. 2017. DOI: 10 . 48550 / arXiv . 1711 .05101. arXiv: 1711.05101 [cs.LG].
I. Loshchilov and F. Hutter, “SGDR: Stochastic Gradient Descent with Warm Restarts, arXiv e-prints, arXiv:1608.03983, arXiv:1608.03983, Aug. 2016. DOI: 10.48550/ arXiv.1608.03983. arXiv: 1608.03983 [cs.LG].
M. Oquab, T. Darcet, T. Moutakanni, H. V. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. HAZIZA, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R.Howes, P.-Y. Huang, S.-W. Li, I. Misra, M. Rabbat, V. Sharma, G. Synnaeve, H. Xu, H. Jegou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “DINOv2: Learning robust visual features without supervision,” Transactions on Machine Learning Research, 2024, Featured Certification, ISSN: 2835-8856. [Online]. Available: https://openreview.net/forum?id=a68SUt6zFt.
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: A method for automatic evaluation of machine translation,” Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ser. ACL ’02, Philadelphia, Pennsylvania: Association for Computational Linguistics, 2002, pp. 311–318. DOI: 10 . 3115 / 1073083 .1073135. [Online]. Available: https://doi.org/10.3115/1073083.1073135.
W. H. Smith, “Taiwan sign language research: A historical overview,” Language and Linguistics, vol. 6, no. 2, pp. 187–215, 2005.
S.-f. Su and J. H.-Y. Tai, “Lexical comparison of signs from taiwan, chinese, japanese, and american sign languages: Taking iconicity into account,” Taiwan Sign Language and Beyond, J. H.-Y. Tai and J. Tsay, Eds., Taiwan: Taiwan Institute for the Humanities, National Chung Cheng University, 2009, pp. 149–177.
J. Tsay, “Taiwan sign language online dictionary: Construction and expansion,” Proceedings of the 2nd ILAS Annual Linguistics Forum – National Language Corpora: Design and Construction, Taipei: Institute of Linguistics, Academia Sinica, 2019, pp. 85–110.
J. Tsay, H.-Y. Tai, S.-K. Liu, and Y.-C. Chen, Taiwan sign language online dictionary (4th edition), Chinese, https : / / twtsl . ccu . edu . tw / TSL/, Chiayi: Center for Taiwan Sign Language Linguistics, National Chung Cheng University, 2022.
R. Wong, N. Cihan Camgoz, and R. Bowden, “Sign2GPT: Leveraging Large Language Models for Gloss-Free Sign Language Translation,” arXiv e-prints, arXiv:2405.04164, arXiv:2405.04164, May 2024. DOI: 10.48550/arXiv.2405.04164. arXiv: 2405. 04164 [cs.CV].
J. Zhao, W. Qi, W. Zhou, N. Duan, M. Zhou, and H. Li, “Conditional sentence generation and cross-modal reranking for sign language translation,” IEEE Transactions on Multimedia, vol. 24, pp. 2662–2672, 2022. DOI: 10.1109/TMM.2021.3087006.
B. Zhou, Z. Chen, A. Clapés, J. Wan, Y. Liang, S. Escalera, Z. Lei, and D. Zhang, “Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining,” arXiv e-prints, arXiv:2307.14768, arXiv:2307.14768, Jul. 2023. DOI: 10.48550/arXiv.2307.14768. arXiv: 2307.14768 [cs.CV].
H. Zhou, W. Zhou, W. Qi, J. Pu, and H. Li, “Improving Sign Language Translation with Monolingual Data by Sign Back-Translation,” arXiv e-prints, arXiv:2105.12397, arXiv:2105.12397, May 2021. DOI: 10.48550/arXiv.2105.12397. arXiv: 2105.12397 [cs.CV].