研究生: |
蔡文傑 Tsai, Wen-Jie |
---|---|
論文名稱: |
針對句法可控的文本生成之跨句法語言模型預訓練 Pre-training of Cross-syntax Language Model for Syntactically Controllable Text Generation |
指導教授: |
高宏宇
Kao, Hong-Yu |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 45 |
中文關鍵詞: | 自然語言處理 、類神經網路 、語言模型 、句法可控的文本生成 |
外文關鍵詞: | Natural Language Processing, Neural Network, Language Model, Syntactically controlled text generation |
相關次數: | 點閱:117 下載:9 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
目前在機器翻譯、自動化摘要、聊天機器人等自然語言任務中,通常依賴給定的訓練文本資料當作標準答案,然而在英文中並沒有所謂的標準答案,我們能用不同的句法結構表達同一件事。近年有些研究關於提升自然語言模型泛化(Generalization)能力,讓一個句子意思用不同方式表現出來。這些方法通常是基於序列到序列模型(SEQ2SEQ),當輸入一句英文時,依據控制條件輸出另一句不同句法結構的英文句子表達同樣的意思,這稱之為句法可控的文本生成任務,此問題難度在於如何有效的描述與利用句法結構資訊。在本篇論文中,我們針對句子和其句法結構提出一個新的假設,以及基於此假設下的兩個自監督預訓練任務。此假設的核心想法是將句法結構看作是一種在跨語言翻譯中的語言編碼,如此一來我們可以將句法可控的文本生成任務看作是在不同語言編碼之間的翻譯。兩個預先訓練任務目標是分別讓模型理解單一句法結構的Mono-syntax Pre-training任務,和在一組回譯(Back-translation)句子上理解不同句法之間差異的Cross-syntax Pre-training任務。我們在人工撰寫的數據集上對我們的模型進行效能評估。結果表明對比其它相關研究,我們的模型能夠在句法結構差異較小的情況下,在BLEU、ROUGE和METEOR指標上獲得較高的分數。此外,我們進一步分析預訓練的有效性、不同噪聲因子下對句法和語意之間的影響、消融研究和分析模型產生的句子。
Natural language generation tasks such as machine translation, summarization, and chatbots usually rely on a given training corpus as a ground truth answer. In recent years, there has been some research on improving the generalization ability of natural language models to represent the meaning of a sentence in different ways. These approaches are usually based on the sequence-to-sequence model. One English sentence is inputted and output another English sentence with a different syntactic structure to express the same meaning according to the given control condition. This task usually is called a syntactically controllable text generation task. The difficulty of this problem is how to describe and utilize syntactic structural information effectively. In this paper, we propose a new hypothesis for sentences and their syntactic structures, and two self-attention pre-training tasks based on this hypothesis. This hypothesis aims to treat syntactic structures as very close to languages embeddings in cross-language translations. We can consider the task of syntactically controllable text generation as translation between different language embeddings. The two pre-training tasks are the Mono-syntax pre-training task, which allows the model to understand a single syntactic structure, and the Cross-syntax pre-training task, which understands the differences between syntaxes on a set of back-translation sentences. We evaluate the performance of our model on a manually written dataset. The results show that our model is able to obtain higher scores on BLEU, ROUGE, and METEOR indices under automatic evaluation with smaller syntactic structure differences compared to other related studies. In addition, we further analyzed the effectiveness of pre-training, the effect of different noise factors on the syntactic and semantic interactions, the ablation study, and the analysis of the sentences generated by our model.
[1]Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning Phrase Representations using RNN EncoderDecoder for Statistical Machine Translation. In EMNLP, 2014.
[2]Abigail See, Peter J Liu, and Christopher D Manning. Get To The Point: Summarization with PointerGenerator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1073–1083, Vancouver, Canada, jul 2017. Association for Computational Linguistics.
[3]Tong Niu and Mohit Bansal. Polite Dialogue Generation Without Parallel Data. Transactions of the Association for Computational Linguistics, 6:373– 389, 2018.
[4]Mohit Iyyer, John Wieting, Kevin Gimpel, and Luke Zettlemoyer. Adversarial Example Generation with Syntactically Controlled Paraphrase Networks. In Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1875–1885, New Orleans, Louisiana, jun 2018. Association for Computational Linguistics.
[5]Mingda Chen, Qingming Tang, Sam Wiseman, and Kevin Gimpel. Controllable Paraphrase Generation with a Syntactic Exemplar. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5972–5984, Florence, Italy, jul 2019. Association for Computational Linguistics.
[6]Tanya Goyal and Greg Durrett. Neural Syntactic Preordering for Controlled Paraphrase Generation. In Association for Computational Linguistics, 2020.
[7]Guillaume Lample and Alexis Conneau. Crosslingual Language Model Pretraining. In 33rd Conference on Neural Information Processing Systems, volume abs/1901.0, 2019.
[8]Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. The Stanford CoreNLP Natural Language Processing Toolkit. In Association for Computational Linguistics, 2014.
[9]Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, volume 2017December, pages 5999–6009. Neural information processing systems foundation, 2017.
[10]Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. {BART}: Denoising SequencetoSequence Pretraining for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online, jul 2020. Association for Computational Linguistics.
[11]Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie Yan Liu. MASS: Masked sequence to sequence pretraining for language generation. In International Conference on Machine Learning, ICML, volume 2019June, pages 10384– 10394. International Machine Learning Society (IMLS), 2019.
[12]Ion Androutsopoulos and Prodromos Malakasiotis. A survey of paraphrasing and textual entailment methods. ArXiv, abs/0912.3747, 2010.
[13]Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. In CoRR, volume abs/ 1409.0, 2015.
[14]Kathleen McKeown. Paraphrasing Questions Using Given and New Information. In Am. J. Comput. Linguistics, volume 9, pages 1–10, 1983.
[15]Michel Carl, Paul Schmidt, and Jörg Schütz. Reversible Templatebased Shake & Bake Generation. 2005.
[16]Sepp Hochreiter and Jürgen Schmidhuber. Long ShortTerm Memory. In
Neural Computation, volume 9, pages 1735–1780, 1997.
[17]Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to Sequence Learning with Neural Networks. In Proc. of NeurIPS, volume abs/1409.3, 2014.
[18]Takuya Kida, Shuichi Fukamachi, Masayuki Takeda, Ayumi Shinohara, Takeshi Shinohara, and Setsuo Arikawa. Byte Pair Encoding: a Text Compression Scheme That Accelerates Pattern Matching. Technical report, Technical Report DOITR161; 1999; Dept. of Informatics, Kyushu University, 1999.
[19]Jacob Devlin, MingWei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pretraining of Deep Bidirectional Transformers for Language Understanding. In Association for Computational Linguistics, volume abs/1810.0, 2019.
[20]Pieter Delobelle, Thomas Winters, and Bettina Berendt. {R}ob{BERT}: a {D}utch {R}o{BERT}abased {L}anguage {M}odel. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3255–3265, Online, nov 2020. Association for Computational Linguistics.
[21]Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G Carbonell, Ruslan Salakhutdinov, and Quoc V Le. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Conference on Neural Information Processing Systems, 2019.
[22]Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. {GLUE}: A MultiTask Benchmark and Analysis Platform for Natural Language Understanding. In Proceedings of the 2018 {EMNLP} Workshop {B}lackbox{NLP}: Analyzing and Interpreting Neural Networks for {NLP}, pages 353–355, Brussels, Belgium, nov 2018. Association for Computational Linguistics.
[23]Zewen Chi, Li Dong, Furu Wei, Wenhui Wang, Xianling Mao, and Heyan Huang. CrossLingual Natural Language Generation via PreTraining. In AAAI, volume abs/1909.1, 2020.
[24]Shashi Narayan, Shay B Cohen, and Mirella Lapata. Don’t Give Me the Details, Just the Summary! TopicAware Convolutional Neural Networks for Extreme Summarization. In EMNLP, 2018.
[25]John Wieting and Kevin Gimpel. {P}ara{NMT}50{M}: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 451–462, Melbourne, Australia, jul 2018. Association for Computational Linguistics.
[26]Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural Machine Translation of Rare Words with Subword Units. In Association for Computational Linguistics, volume abs/1508.0, 2016.
[27]Diederik P Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. In CoRR, volume abs/1412.6, 2015.
[28]Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. Bleu: a Method for Automatic Evaluation of Machine Translation. In Association for Computational Linguistics, 2002.
[29]ChinYew Lin. ROUGE: A Package for Automatic Evaluation of Summaries. In ACL 2004, 2004.
[30]Satanjeev Banerjee and Alon Lavie. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In IEEvaluation@ACL, 2005.
[31]Kaizhong Zhang and Dennis Shasha. Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems. In SIAM J. Comput., volume 18, pages 1245–1262, 1989.
[32]Mingda Chen, Qingming Tang, Sam Wiseman, and Kevin Gimpel. A MultiTask Approach for Disentangling Syntax and Semantics in Sentence Representations. In Proceedings of the 2019 Conference of the North
{A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2453– 2464, Minneapolis, Minnesota, jun 2019. Association for Computational Linguistics.