| 研究生: |
王越 Wang, Yue |
|---|---|
| 論文名稱: |
知識表示學習中動作約束生成對抗網路之負抽樣方法 Action Constrained GAN for Negative Sampling in Knowledge Representation Learning |
| 指導教授: |
高宏宇
Kao, Hung-Yu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 英文 |
| 論文頁數: | 55 |
| 中文關鍵詞: | 知識圖譜 、知識圖譜嵌入 、知識表示學習 、三元組分類 、鏈接預測 |
| 外文關鍵詞: | Knowledge Graph, Knowledge Graph Embedding, Knowledge Representation Learning, Triples Classification, Link Prediction |
| 相關次數: | 點閱:269 下載:4 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
知識圖譜起源於Google 提出的”Things, Not Strings”,他們認為搜尋引擎不應該單純將使用者的關鍵字當做一個字串進行檢索,即通過字與字的匹配返回使用者所需要的內容,而是將這些字串當做一個實體去瞭解其背後更多的語義關係。目前知識圖譜應用廣泛,如不一致性檢驗、語義問答、驗證碼等眾多領域。例如:用戶搜尋"誰是比爾.蓋茨的妻子?”時,即可以得到答案”梅琳達.蓋茨!”。
在知識表示學習領域中,我們利用生成對抗網絡提出一種負採樣的方法,從而提高知識圖譜嵌入模型的效能。在知識表示學習中,負採樣必不可少。一般生成負例是通過隨機替換頭尾實體的方式進行負例抽樣,但得到的負例質量往往不盡人意。因為模型可以很容易的將正負例分開,無法學習更多細節。學習初期有較好效果,長久以往則會導致梯度消失或弱負例的情況發生,所以對模型學習沒有更好的指導。
為得到更高質量的負例,使用生成對抗網絡模型進行負採樣,運用不同的翻譯模型的對抗訓練得到知識的向量表示。傳統的生成對抗網路的精神意旨通過對生成器的不斷訓練,使生成器產生的資料可以“騙過”鑒別器。換言之,其意圖獲得一個更好的生成器。而在此任務中,我們的目的是得到一個更完美的鑒別器。為產生更高質量的負例,我們對知識進行聚類並定向抽取負採樣,從而完成基礎模型的訓練,隨後對生成器生成動作進行限定。其知識形式表現為真實世界中各領域的人、事物等。通過不同方式的聚類演算法進行負採樣從而得到更好的效能。
Google believes that a searching engine shouldn’t accomplish tasks only by using several keywords, which means through a word-to-word match. Instead, it is required to find the semantic relations delivering by those strings as an entity. That is called knowledge graph originating from a famous viewpoint——“Things, Not Strings” proposed by Google. Nowadays, the knowledge graph is applied in many fields, such as inconsistency check, semantic Q&A, security code, etc. Here is an example, when you search a question like “who is the wife of Bill Gates? “You will get the answer, “Melinda Gates!”.
We put forward an approach called Negative Sampling by generating adversarial networks in knowledge representation learning fields to improve the knowledge graph embedding model, which is a Semantic Network naturally. Negative triples appear by applying negative sampling, an essential approach in knowledge representation learning fields, replacing entities at first and last position, whose outcomes are always unpleasant. Because the model can easily separate positive and negative triples, we can't learn more details. While it may achieve better results in the early days, vanishing gradient or weakening negative sample issues will happen in the long term.
Adopting negative sampling with adversarial network model and vector quality with the training of different translation model is aimed to produce higher quality negative triples. Data from traditional GAN [16] could bypass discriminator by training generator. We make knowledge more clustered and select a negative sample in a targeted manner to accomplish the training of the base model with the limitation of the generator. All of these moves are to achieve the same goal; the forms of knowledge are things in various real fields, which will trigger higher efficiency in diverse clustered algorithms.
[1] Bruno Trstenjaka, Sasa Mikacb, Dzenana Donkoc. KNN with TF-IDF Based Framework for Text Categorization. In 24th DAAAM International Symposium on Intelligent Manufacturing and Automation. 2013.
[2] Songbo Tan. Neighbor-weighted K-nearest neighbor for unbalanced text corpus. In Expert Systems with Applications 28 (2005) 667–671. 2005.
[3] Ruobing Xie, Zhiyuan Liu, Fen Lin, and Leyu Lin. Does William Shakespeare Really Write Hamlet? Knowledge Representation Learning with Confidence. In Association for the Advancement of Artificial Intelligence. arXiv:1705.03202v2. 2018.
[4] Zhenghao Liu, Chenyan Xiong, Maosong Sun, and Zhiyuan Liu. Entity-Duet Neural Ranking: Understanding the Role of Knowledge Graph Semantics in Neural Information Retrieval. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Page 2395-2405. 2018.
[5] Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. A Three-Way Model for Collective Learning on Multi-Relational Data. In Proceedings of the 28th International Conference on Machine Learning. Page 809-816. 2011.
[6] Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; and Yakhnenko, O. Translating Embeddings for Modeling Multi-Relation Data. In Advances in Neural Information Processing Systems,2787-2795. 2013.
[7] W. Zhang and T. Yoshida. A comparative study of TF-IDF, LSI, and multi-words for text classification. In Expert Systems with Applications 38. Page 2758-2765. 2011.
[8] Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embedding entities and relations for learning and inference in knowledge bases. The 3rd International Conference on Learning Representations. 2015.
[9] Theo Trouillon, Johannes Welbl, Sebastian Riedel, EricGaussier, and Guillarume Bouchard. Complex embeddings for simple link prediction. In International Conference on Machine Learning. Pages 2071-2080. 2016.
[10] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zhen Chen. Knowledge Graph Embedding by Translating on Hyperplanes. In Association for the Advancement of Artificial Intelligence. Page 1112-1119. 2014.
[11] Maja Rudolph, Francisco Ruic, Susan Athey, and David Blei. Structured Embedding Models Crouped Data. In 31st Conference on Neural Information processing Systems. 2017.
[12] Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM pages 1247-1250. 2008.
[13] Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. Knowledge Graph Embedding via Dynamic Mapping Matrix. In Association for Computational Linguistics. Page 687-696. 2015.
[14] Liwei Cai and William Yang Wang. KBGAN: Adversarial Learning for knowledge Graph Embeddings. In the North American Chapter of the Association for Computational Linguistics arXiv:1711.04071v3 [cs.CL]. 2018
[15] Yankai Lin, Zhiyuan Liu, Huanbo Luan, Maosong Sun, Siwei Rao, and Song Liu. Modeling Relation Paths for Representation Learning of Knowledge Bases. In Association for Computational Linguistics. Page 705-714. 2015.
[16] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville and Yoshua Bengio. Generative Adversarial Nets. arXiv:1406.2661v1 [stat.ML] 10 Jun. 2014.
[17] Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web. ACM pages 697–706. 2007.
[18] H. Han, G. Karypis, and V. Kumar. Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification. The Pacific Asia Knowledge Discovery and Data Mining page 53-65. 2001.
[19] Jingzhong Wang and Xia Li. An improved KNN algorithm for text classification. In Institute of Electrical and Electronics Engineers. 2010.
[20] Gongde Guo, Hui Wang, David Bell, Yaxin Bi, and Kieran Greer. KNN Model-Based Approach in Classification. OTM Confederated International Conferences. Pages 986-996. 2003.
[21] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. arXiv:1310.4546.2013.
.