| 研究生: |
夏啓銘 Hsia, Chi-Ming |
|---|---|
| 論文名稱: |
基於BERT模型實現影音的文字摘要於影片篩選 Filtering Videos By the Text Summarization technique based on BERT Model |
| 指導教授: |
黃悅民
Huang, Yueh-Min |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系碩士在職專班 Department of Engineering Science (on the job class) |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 中文 |
| 論文頁數: | 47 |
| 中文關鍵詞: | 深度學習 、BERT 、自動摘要 、影片篩選 |
| 外文關鍵詞: | Deep Learning, BERT, Automatic Text Summarization, Video Filtering |
| 相關次數: | 點閱:163 下載:22 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,多媒體檢索的相關技術受到許多研究人員的重視。隨著網路時代的演進,多媒體平台的興起,造成多媒體資訊量成長快速,巨量的多媒體資訊造成資訊超載的狀況越發嚴重,因此多媒體平台需提供更加快速且準確的檢索方式來因應相關問題。現今多媒體資訊的檢索方式大多採取標籤方式,由多媒體擁有者提供標籤來加速搜尋效率、精準搜尋結果。但標籤資訊檢索時有機率會因擁有者的主觀意識影響結果,且當標籤資訊過於簡單或者標籤資訊與內文不符時,導致使用者無法正確且快速的找到資訊。影片摘要是多媒體檢索的方法之一,透過此方法可快速了解影片重點,因此使用者可更完整且快速的檢索影片資訊。
由於文字閱讀效率高於閱讀影片,故本研究嘗試提出一個系統,以文字自動摘要方式實現影片摘要。文字閱讀的效率高於觀賞影片,故本篇研究採用文字摘要作為多媒體檢索的方式。文字自動摘要屬於自然語言處理的眾多研究之一,目的是從文本中擷取重點。近年來,深度學習應用於文字摘要,使文字摘要的研究獲得大幅度的進步。深度學習透過大量的訓練資料進行學習,將學習好的模型運用在文本上取得重點。本研究使用BERT(Bidirectional Encoder Representations from Transformers)模型,BERT是近年來較熱門的深度學習方法。許多研究指出BERT應用於自然語言上都獲得不錯的成績。BERT經由大量的語料訓練產出預處理模型,預處理模型擁有基礎的語言能力,預處理模型經過微調訓練獲得摘要模型。
本系統將影片轉檔後產出文字檔,再使用已訓練完成的摘要模型取得文本摘要。最後以主觀評論方式評比摘要,在摘要的接受度、完整性都有取得不錯的成績。
The research of multimedia search has grown great attention to many researchers in recent years. multimedia needs to provide a fast and accurate search method. This research proposes a system to realize a video summary by using an automatic text summarization. People usually read faster than watch videos, therefor a text summarization is used as the multimedia search method in this research. Automatic text summarization is one of natural language processing and aims to extract the key points from the text. In recent years, the research of text summarization has a huge process because deep learning has been applied in it. Deep learning learned from a great amount of training data and applied the leaning models on the text to stress the essentials. This study applies the BERT model (Bidirectional Encoder Representations from Transformers), which is a popular deep learning method recently. Many types of research have pointed out that BERT has reached a good achievement in natural language processing. Through a great amount of language training data, BERT uses a lot of text training to produce a pre-processing model with basic language ability. After fine-tuning, the pre-processing model generates a summarization model. This system converts the video into text and uses the summarization model to generate a text summarization. Finally, the summarization is rated by subjective comments. As a result, acceptance and competence reached a good outcome.
[1] Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., & Kochut, K. (2017). Text summarization techniques: a brief survey. arXiv preprint arXiv:1707.02268.
[2] Aslam, S. (2020). YouTube by the numbers: Stats, demographics & fun facts. Omnicore, Feb 10, 2020. https://www. omnicoreagency. com/youtube-statistics.
[3] Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
[4] Case, D. O. (2002). Looking for information: A survey of research on information seeking. Needs and Behavior (2nd edn)(Academic Press, Amsterdam, The Netherlands, 2007).
[5] Celikyilmaz, A., Bosselut, A., He, X., & Choi, Y. (2018). Deep communicating agents for abstractive summarization. arXiv preprint arXiv:1803.10357.
[6] Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
[7] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[8] Dog Breed Knowledge (2018, September 27). Formosan Mountain Dog. The Breed that just won't die [Video file].Retrieved from https://www.youtube.com/watch?v=TkPmyTo9jRY
[9] Gordon Lau (民108 年 5 月 8日)。非結構化數據【部落格文字資料】。取自https://tecky.io/en/blog/%E9%9D%9E%E7%B5%90%E6%A7%8B%E5%8C%96%E6%95%B8%E6%93%9A/
[10] Hanani, U., Shapira, B., & Shoval, P. (2001). Information Filtering: Overview of Issues, Research and Systems. User Modeling and User-Adapted Interaction, 11(3), 203-259.
[11] Hermann, K. M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., & Blunsom, P. (2015). Teaching machines to read and comprehend. In Advances in neural information processing systems (pp. 1693-1701).
[12] Hopkinton, Mass. (2014, April 09) Re: Digital Universe Invaded By Sensors Retrieved from https://corporate.delltechnologies.com/en-us/newsroom/announcements/2014/04/20140409-01.htm
[13] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
[14] It's Okay To Be Smart (2015, September 28).Why Do We Have To Sleep? [Video file].Retrieved from https://www.youtube.com/watch?v=3mufsteNrTI
[15] Kruitbosch, G., & Nack, F. (2008, October). Broadcast yourself on YouTube: really?. In Proceedings of the 3rd ACM international workshop on Human-centered computing (pp. 7-10).
[16] Lin, C. Y. (2004). ROUGE: A Packagefor Automatic Evaluation of Summaries. In ProceedingsofWorkshop on Text Summarization Branches Out, Post2Conference Workshop of ACL.
[17] Liu, Y., & Lapata, M. (2019). Text summarization with pretrained encoders. arXiv preprint arXiv:1908.08345.
[18] Liu, Y., Titov, I., & Lapata, M. (2019, June). Single document summarization as tree induction. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 1745-1755).
[19] Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of research and development, 2(2), 159-165.
[20] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[21] Nallapati, R., Zhai, F., & Zhou, B. (2017, February). Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In Thirty-First AAAI Conference on Artificial Intelligence.
[22] Nallapati, R., Zhou, B., Gulcehre, C., & Xiang, B. (2016). Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023.
[23] Nallapati, R., Zhai, F., & Zhou, B. (2017, February). Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In Thirty-First AAAI Conference on Artificial Intelligence.
[24] Nenkova, A., & McKeown, K. (2012). A survey of text summarization techniques. In Mining text data (pp. 43-76). Springer, Boston, MA.
[25] Olah, C. (2015). Understanding lstm networks.Retrieved from https://colah.github.io/posts/2015-08-Understanding-LSTMs/
[26] Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002, July). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 311-318). Association for Computational Linguistics.
[27] Paulus, R., Xiong, C., & Socher, R. (2017). A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304.
[28] Reinsel, D., Gantz, J., & Rydning, J. (2018). The digitization of the world: from edge to core. IDC White Paper.Retrieved from https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf
[29] Rush, A. M., Chopra, S., & Weston, J. (2015). A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685.
[30] Salton, G., Singhal, A., Mitra, M., & Buckley, C. (1997). Automatic text structuring and summarization. Information processing & management, 33(2), 193-207.
[31] See, A., Liu, P. J., & Manning, C. D. (2017). Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368.
[32] Science Insider(2019, October 27).Why The Great Barrier Reef Could Disappear By 2050[Video file].Retrieved from https://www.youtube.com/watch?v=Tmd4tEDDF1s
[33] See, A., Liu, P. J., & Manning, C. D. (2017). Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368.
[34] Singh, Niven(2016, Oct 28). How to Get Started as a Developer in AI .Retrieved from https://software.intel.com/content/www/us/en/develop/articles/how-to-get-started-as-a-developer-in-ai.html
[35] Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (pp. 3104-3112).
[36] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).
[37] Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., ... & Klingner, J. (2016). Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
[38] Zhu, Y., Kiros, R., Zemel, R., Salakhutdinov, R., Urtasun, R., Torralba, A., & Fidler, S. (2015). Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE international conference on computer vision (pp. 19-27).
[39] Zhang, X., Lapata, M., Wei, F., & Zhou, M. (2018). Neural latent extractive document summarization. arXiv preprint arXiv:1808.07187.
[40] 王博文(2012)。資料前處理及影片搜尋與推薦之資料探勘技術。國立成功大學資訊工程學系碩博士班博士論文,台南市。 取自https://hdl.handle.net/11296/nvvx7v
[41] 林昆賢、蔡俊明 (民108)。基於深度學習的自然語言處理中預訓練Word2Vec模型的研究。國教新知,66,15 – 31。
[42] 林秀慧(民107)。107學年度學科能力測驗 英文考科評分標準說明。https://www.ceec.edu.tw/xcepaper/cont?xsmsid=0J066588036013658199&qunit=0J066609506087505232&sid=0J133630011608125564
[43] 林婷嫻(民107)。斷開中文的鎖鍊!自然語言處理 (NLP)。研之有物。取自https://research.sinica.edu.tw/nlp-natural-language-processing-chinese-knowledge-information/
[44] 吳肇銘(2004)。行動商務平臺之資訊代理人設計與評估--以PDA為例,中原學報,第32期。
[45] 洪學儒(2017)。基於Word2Vec字詞向量模型之熱門主題偵測與命名方法。國立臺北科技大學資訊工程系所碩士論文,台北市。 取自https://hdl.handle.net/11296/r266k5
[46] 深度學習筆記(民108年10月14日)。詞向量詳解:從word2vec、glove、ELMo到BERT。取自https://kknews.cc/zh-tw/code/638vqlm.html
[47] 張祖耀(2017)。社群媒體訊息之資訊過濾與檢索。國立成功大學工程科學系碩士論文,台南市。 取自https://hdl.handle.net/11296/44949b
[48] 黃詩婷(2010)。移植多媒體講解呈現播放器於Android平台的製作。國立交通大學多媒體工程研究所碩士論文,新竹市。 取自https://hdl.handle.net/11296/zxwjj5
[49] 陳佳欣(2019)。基於字詞向量模型之影片分類方法及搜尋平台實作。國立中正大學資訊工程研究所碩士論文,嘉義縣。 取自https://hdl.handle.net/11296/wsuayd
[50] 陳怡君(2019)。建置雲端網路安全演練平臺之研究。國防大學網路安全碩士班碩士論文,桃園縣。 取自https://hdl.handle.net/11296/ptw6jh
[51] 陳威遠(2017)。中文多義詞標記及其在語言模型的應用。國立交通大學電機工程學系碩士論文,新竹市。 取自https://hdl.handle.net/11296/v5bsqw
[52] 游哲誠(2017)。使用深度學習Seq2seq方法處理短文本對話生成。朝陽科技大學資訊工程系碩士論文,台中市。 取自https://hdl.handle.net/11296/j5n89x
[53] 楊勝婷(2014)。整合型資訊服務平台:應用視覺化與智慧分析技術於巨量資料分析之研究。國立清華大學工業工程與工程管理學系碩士論文,新竹市。 取自https://hdl.handle.net/11296/nqa64b
[54] 葉鎮源、 楊維邦、柯皓仁、鄭培成(2014) 應用語句關係網路計算語句向心性之新聞事件摘要方法. 資訊管理學報, 21(3).
[55] 羅天宏、陳映文、陳冠宇、王新民、陳柏琳 (民108) 。語音文件檢索使用類神經網路技術。中文計算語言學期刊,22,1-15。
[56] 劉慈恩(2019)。應用階層式語意暨聲學特徵表示於語音文件摘要之研究。國立臺灣師範大學資訊工程學系碩士論文,台北市。 取自https://hdl.handle.net/11296/jnb9ca