研究生: |
李欣欣 Li, Hsin-Hsin |
---|---|
論文名稱: |
基於同義詞語義擴展查詢建置企業智識問答系統 Enterprise Intellectually Question Answering System Based On Synonyms Semantic Expansion Query |
指導教授: |
王惠嘉
Wang, Hei-Chia |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 工業與資訊管理學系碩士在職專班 Department of Industrial and Information Management (on the job class) |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 37 |
中文關鍵詞: | 知識管理 、問答系統 、語義擴展 、餘弦相似度 、Jaccard Similarity |
外文關鍵詞: | Knowledge Management, Question Answering Systems, Semantic Expansion, Cosine Similarity, Jaccard Similarity |
相關次數: | 點閱:90 下載:20 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
目前越來越多企業已經意識到企業內部知識庫的重要性,但企業內部知識庫系統普遍存在著冷啟始的問題(Cold-Start problem),即系統在資訊量不足的情況下,無法給予使用者正確的答案,因此在企業內部使用者遇到問題時,僅搜尋內部知識庫冷啟始的問題會因資料較少而更嚴重,因此造成許多使用者在找不到資料時會上搜尋引擎尋找答案,而現今Web 2.0的興起,經由網路平台,人與人間可以互相交流各種資訊,透過網際網路搜尋引擎,使用者可以得到想要查詢的相關知識,使用者在使用時會依據關鍵字的搜尋來找到程式相關的資訊,但是僅靠關鍵字搜尋對於使用者來說會因為對程式的不熟悉,無法下正確的關鍵字來搜尋,導致搜尋結果不盡理想,且搜尋結果還需人工逐筆檢視過濾並判讀,加上搜尋引擎的列表結果不全然是正確答案,造成工作時間上的浪費。若此時能讓使用者以自然語言提問,在使用結合外部資源的企業內部智識問答系統,透過語義相似性的擴展運用,可加快使用者找到適合的答案,因此如何統整企業內部知識庫,與運用外部專業網站論壇資源來改善企業內部智識問答系統,讓知識分享又迅速又準確將是一個重要課題。
本研究會以程式設計問題為例,及目前網路相關論壇的問與答作為資料來源,使用同義字的語義擴展查詢並比對原有資料庫內的問題來建立企業智識問答系統,評估可在較短時間得到更準確的解答的可能性。
Sometimes the company's internal Question Answering (QA) systems are needed to find answers due to work needs, but the company’s internal database has insufficient information. When a user asks a question, he searches for the answer in the database after grabbing keywords from the question. This is due to the data in the QA systems. If the amount is too small, it may not be retrieved. At this time, it is necessary to find external resources to help solve the problem. The most commonly used traditional search engines such as Google, Yahoo, etc., but because traditional search engines lack the support of semantic processing technology, search the engine will search for the most relevant documents in the keywords proposed by the user from the huge data database and then calculate the relevance and popularity of the website according to the algorithm of the website search engine to sort these search results. This is used in this case. The expression of the question is not precise enough, and it is often impossible to find the answer that the user wants with a few keyword combinations, and the search results returned by the search engine will not be concise enough, causing the user to check one by one, and the manual query effect is poor.
Therefore, we set up a professional forum on the Internet, the discussion raised during programming as the main analysis object, using the title part, the text is preprocessed, including word hyphenation, removal of stop words, and part-of-speech tags, and leave the nouns and verbs, and then use WordNet to find synonyms for the problem. After converting the text to a vector using Word2vec, use the cosine similarity to compare the most similar words with the original problem and then expand, and then compare the expanded problem with after comparing the similarity between sentences with jaccard similarity for the questions in the database, the closest question answer is provided to the user. It has been found that the proposed method is superior to other experiments in terms of Precision, Recall, and F-measure.
Cai, L. Q., Wei, M., Zhou, S. T., & Yan, X. (2020). Intelligent Question Answering in Restricted Domains Using Deep Learning and Question Pair Matching. Ieee Access, 8, 32922-32934. doi:10.1109/access.2020.2973728
Diefenbach, D., Lopez, V., Singh, K., & Maret, P. (2018). Core techniques of question answering systems over knowledge bases: a survey. Knowledge and Information Systems, 55(3), 529-569. doi:10.1007/s10115-017-1100-y
Dimitriadis, D., & Tsoumakas, G. (2019). Word embeddings and external resources for answer processing in biomedical factoid question answering. Journal of biomedical informatics, 92, 103118.
Esposito, M., Damiano, E., Minutolo, A., De Pietro, G., & Fujita, H. (2020). Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering. Information Sciences, 514, 88-105.
Gomaa, W. H., & Fahmy, A. A. (2013). A survey of text similarity approaches. International Journal of Computer Applications, 68(13), 13-18.
Lu, M., Sun, X., Wang, S., Lo, D., & Duan, Y. (2015). Query expansion via WordNet for effective code search. Paper presented at the 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).
Maryamah, M., Arifin, A. Z., Sarno, R., & Morimoto, Y. (2019). Query expansion based on Wikipedia word embedding and BabelNet method for searching Arabic documents. International Journal of Intelligent Engineering & System, 12(5), 202-213.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Mitra, B., & Craswell, N. (2017). Neural models for information retrieval. arXiv preprint arXiv:1705.01509.
Raza, M. A., Mokhtar, R., Ahmad, N., Pasha, M., & Pasha, U. (2019). A taxonomy and survey of semantic approaches for query expansion. Ieee Access, 7, 17823-17833.
Sunilkumar, P., & Shaji, A. P. (2019). A Survey on Semantic Similarity. Paper presented at the 2019 International Conference on Advances in Computing, Communication and Control (ICAC3).
Thalib, I., & Soesanti, I. (2020). A Review on Question Analysis, Document Retrieval and Answer Extraction Method in Question Answering System. Paper presented at the 2020 International Conference on Smart Technology and Applications (ICoSTA).
Wang, J., & Dong, Y. (2020). Measurement of Text Similarity: A Survey. Information, 11(9), 421.
Zhong, B., He, W., Huang, Z., Love, P. E., Tang, J., & Luo, H. (2020). A building regulation question answering system: A deep learning methodology. Advanced Engineering Informatics, 46, 101195.
林紹婷.(2011). 企業如何留住知識力?. 天下雜誌,454期.
陳永隆、莊宜昌(2005). 知識價值鏈. 中國生產力中心