簡易檢索 / 詳目顯示

研究生: 陳正煌
Chen, Jeng-Huang
論文名稱: 使用語意分析提升Help Desk處理問題效能
Use Semantic Analysis to Improve the Performance of Help Desk Problems
指導教授: 王惠嘉
Wang, Hei-Chia
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業與資訊管理學系碩士在職專班
Department of Industrial and Information Management (on the job class)
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 76
中文關鍵詞: 問答系統主題模型語意分析文件分群文字摘要
外文關鍵詞: Question and answer system, Topic model, Semantic analysis, Document Clustering, Text summarization
相關次數: 點閱:92下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著資訊系統在企業越來越普及與重要,專門處理資訊相關問題的Help Desk人員也對企業的工作效率有重大的影響,然而企業對於Help Desk的重視依然保守,為了減低Help Desk高流動率的負面影響以及提升企業工作效率,對於企業現有的User Help Desk問題利用文字分析方法來處理系統的歷史處理問題紀錄集來挖掘有價值、可重複使用的資訊是企業值得投入的。
    因在傳統的關鍵字查詢結果不是不夠精準就是相同語意但不同用詞的語意問題,使得查尋條件太難設定,為了分析語意關連問題,本研究採用E-HowNet語意知識庫來轉換中文詞彙之語意關係,再使用主題模型LDA(Latent Dirichlet Allocation)方法來找出每篇文章所代表的主題,依題來將相似的問題聚集起來,取出這些問題的回答紀錄進行分群並萃取摘要,並依主題關連性依序呈現給使用者,經實作驗證後,轉換語意時加入完整詞性之篩選比無語意處理提升Precision約8.5%,而用LDA訓練好的主題模型取出相同主題之問題來計算,雖然Precision從99%降為92%,但花費時間可縮短為原本的1/34,而本研究文集屬於短文集,因此句子關聯度門檻值不宜設太高避免摘要萃取失敗,建議值為0.05,此外還發現AP Cluster分群之摘要效果比K-means好。

    With the increasing popularity and importance of information systems in enterprises, in order to reduce the negative impact of the high turnover rate of Help Desk personnel who specialize in information-related issues and improve the efficiency of enterprises, text analysis methods are used to record the history of problem-solving systems. It is worth investing in the collection of valuable, reusable information. In order to analyze the semantic relationship, this study uses the E-HowNet semantic knowledge base to convert the semantic relationship of Chinese vocabulary, and then uses the topic model LDA method to find out the topic represented by each article, and gather similar questions according to the topic. The answer records of these questions are taken out and the abstracts are extracted and presented to the users according to the topic relevance. After the verification, the precision of the conversion of semantic meanings into the complete part of speech screening is improved by 8.5% than the no semantic processing. The LDA-trained topic model takes the same subject problem to calculate. Although precision is reduced from 99% to 92%, the time spent can be shortened to the original 1/34, and the study essay belongs to the short essay, so the threshold of sentence relevance should not be set too high to avoid the abstract extraction failure. In addition, it is found that the summary effect of AP Cluster is better than K-means.

    摘要 I Extended Abstract II 誌謝 VI 第1章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 5 1.3 研究範圍與限制 6 1.4 研究流程 6 1.5 論文架構 7 第2章 文獻探討 9 2.1 問答系統 9 2.2 向量空間模型 10 2.3 語意分析 11 2.3.1 奇異值分解 11 2.3.2 非負矩陣拆解法 12 2.3.3 本體論 13 2.3.4 主題模型 14 2.3.5 綜合比較 18 2.4 文件分群 18 2.4.1 階層式分群演算法 19 2.4.2 分割式分群演算法 20 2.4.3 密度分群法 20 2.4.4 單通道法 21 2.4.5 AP聚類演算法 21 2.5 文字摘要 23 2.6 小結 26 第3章 研究方法 27 3.1 研究架構 27 3.2 資料前置處理模組 30 3.3 語意分析模組 31 3.3.1 語意關係轉換 32 3.3.2 主題模型分析 33 3.4 答案推薦模組 36 3.4.1 問題分群與排名 37 3.4.2 候選答案分群 38 3.4.3 候選答案摘要 39 第4章 實作及驗證 42 4.1 系統建置環境 42 4.1.1 資料收集 43 4.1.2 資料前處理 43 4.1.3 語意分析 43 4.1.4 答案推薦 44 4.2 實驗設計 44 4.2.1 資料來源 44 4.2.2 比較對象 44 4.2.3 評估指標 45 4.2.4 實驗結果 47 4.3 問題查詢之答案推薦範例 59 第5章 結論與未來研究方向 60 5.1 研究成果 60 5.2 未來研究方向 62 參考文獻 64 附錄一 專家自訂停用字集 68 附錄二 專門術語詞性字典 69 附錄三 停用詞詞性字典 73

    中文部分
    iThome. (2018). 【iThome 2018企業CIO大調查:IT編制篇】金融業IT最缺人,資安人力需求僅次MIS. Retrieved from https://www.ithome.com.tw/article/122457
    馬偉雲, & 陳克健. (2011). 廣義知網知識本體架構2.0版. Retrieved from http://ehownet.iis.sinica.edu.tw/index.php
    陳美玲, 陳啟斌, & 王則人. (2014). 客服派遣人員之工作特性、工作滿足、教練輔導對留任意願之研究-以C電信公司為例. Paper presented at the 科際整合管理研討會 ; 2014第17屆 (2014 / 06 / 22).
    維基百科編者. (2018, July 29). 語意分析. Retrieved from https://zh.wikipedia.org/w/index.php?title=%E8%AA%9E%E6%84%8F%E5%88%86%E6%9E%90&oldid=50640137
    維基百科編者. (2019, April 10). 隱含狄利克雷分布. Retrieved from https://zh.wikipedia.org/w/index.php?title=%E9%9A%90%E5%90%AB%E7%8B%84%E5%88%A9%E5%85%8B%E9%9B%B7%E5%88%86%E5%B8%83&oldid=53954296
    英文部分
    Al Qady, M., & Kandil, A. (2014). Automatic clustering of construction project documents based on textual similarity. Automation in Construction, 42, 36-49. doi:10.1016/j.autcon.2014.02.006
    Arun, R., Suresh, V., Veni Madhavan, C. E., & Narasimha Murthy, M. N. (2010, 2010//). On Finding the Natural Number of Topics with Latent Dirichlet Allocation: Some Observations. Paper presented at the Advances in Knowledge Discovery and Data Mining, Berlin, Heidelberg.
    Cao, J., Xia, T., Li, J., Zhang, Y., & Tang, S. (2009). A density-based method for adaptive LDA model selection. Neurocomput., 72(7-9), 1775-1781. doi:10.1016/j.neucom.2008.06.011
    Carthy, J. (2004). Lexical Chains versus Keywords for Topic Tracking. Paper presented at the Computational Linguistics and Intelligent Text Processing, Berlin, Heidelberg.
    Daud, A., Khan, J. A., Nasir, J. A., Abbasi, R. A., Aljohani, N. R., & Alowibdi, J. S. (2018). Latent Dirichlet Allocation and POS Tags Based Method for External Plagiarism Detection: LDA and POS Tags Based Plagiarism Detection. International Journal on Semantic Web and Information Systems, 14(3), 53-69. doi:10.4018/ijswis.2018070103
    Deveaud, R., Sanjuan, E., & Bellot, P. (2014). Accurate and Effective Latent Concept Modeling for Ad Hoc Information Retrieval. Revue des Sciences et Technologies de l'Information - Série Document Numérique, 61-84. doi:10.3166/dn.17.1.61-84
    Erkan, G., & Radev, D. R. (2004). LexRank: graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22(1), 457-479.
    Gorenjak, B., Ferme, M., & Ojsteršek, M. (2011). A question answering system on domain specific knowledge with semantic web support. International journal of computers.
    Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl 1), 5228-5235. doi:10.1073/pnas.0307752101
    Gupta, V. (2009). A Survey of Text Mining Techniques and Applications. Journal of emerging technologies in web intelligence, 1(1), 60. doi:10.4304/jetwi.1.1.60-76
    Halkidi, M. (2009). Hierarchial Clustering. In L. Liu & M. T. ÖZsu (Eds.), Encyclopedia of Database Systems (pp. 1291-1294). Boston, MA: Springer US.
    Hennig, L. (2009). Topic-based Multi-Document Summarization with Probabilistic Latent Semantic Analysis. Paper presented at the International Conference Recent Advances in Natural Language Processing, RANLP.
    Ieva, Gotlieb, Kaci, & Lazaar. (2018). Discovering Program Topoi via Hierarchical Agglomerative Clustering. IEEE Transactions on Reliability, 67(3), 758-770. doi:10.1109/TR.2018.2828135
    Jeon, & Lee. (2016). Language Model Adaptation Based on Topic Probability of Latent Dirichlet Allocation. Etri Journal, 38(3), 487-493. doi:10.4218/etrij.16.0115.0499
    Jovita, Linda, Hartawan, A., & Suhartono, D. (2015). Using Vector Space Model in Question Answering System. Procedia Computer Science, 59, 305-311. doi:10.1016/j.procs.2015.07.570
    Kriegel, H.-P., Kröger, P., Sander, J., & Zimek, A. (2011). Density-based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(3), 231-240. doi:doi:10.1002/widm.30
    Li, & Ding. (2006). The Relationships Among Various Nonnegative Matrix Factorization Methods for Clustering. Sixth International Conference on Data Mining (ICDM'06), 362-371.
    Li, & Li. (2007, 23-25 Nov. 2007). Application of Ontology to Question-Answer Knowledge Management System. Paper presented at the 2007 First IEEE International Symposium on Information Technologies and Applications in Education.
    Li, Wu, Yen, & Lee. (2011). Improving the efficiency of IT help-desk service by Six Sigma management methodology (DMAIC) - a case study of C company. Production Planning & Control, 22(7), 612-627. doi:10.1080/09537287.2010.503321
    Li, Yao, Fan, & Yu. (2017). A Text Similarity Measurement Method Based on Singular Value Decomposition and Semantic Relevance. Journal of Information Processing Systems, 13(4), 863-875. doi:10.3745/jips.02.0067
    Li, Zhou, Xue, Zha, & Yu. (2009). Enhancing diversity, coverage and balance for summarization through structure learning. Paper presented at the Proceedings of the 18th international conference on World wide web, Madrid, Spain.
    Lin, J. (2002). The Web as a Resource for Question Answering: Perspectives and Challenges. In (Vol. Proceedings of the Third International Conference on Language Resources and Evaluation (LREC'02) ): European Language Resources Association (ELRA).
    Lin, J., & Katz, B. (2003). Question answering from the web using knowledge annotation and knowledge mining techniques. Paper presented at the Proceedings of the twelfth international conference on Information and knowledge management, New Orleans, LA, USA.
    Liu, & Huet. (2016). Event-based cross media question answering. Multimedia Tools and Applications, 75(3), 1495-1508. doi:10.1007/s11042-014-2085-0
    Liu, & Lee. (2018). Email Sentiment Analysis Through k-Means Labeling and Support Vector Machine Classification. Cybernetics and Systems, 49(3), 181-199. doi:10.1080/01969722.2018.1448242
    Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval: Cambridge University Press.
    Middleton, S. E., Shadbolt, N. R., & Roure, D. C. D. (2004). Ontological user profiling in recommender systems. ACM Trans. Inf. Syst., 22(1), 54-88. doi:10.1145/963770.963773
    Momtazi. (2018). Unsupervised Latent Dirichlet Allocation for supervised question classification. Information Processing & Management, 54(3), 380-393. doi:10.1016/j.ipm.2018.01.001
    Nie, Wei, Zhang, Wang, Gao, & Yang. (2017). Data-Driven Answer Selection in Community QA Systems. Ieee Transactions on Knowledge and Data Engineering, 29(6), 1186-1198. doi:10.1109/tkde.2017.2669982
    Sidorov, G., Gelbukh, A., Gómez-Adorno, H., & Pinto, D. (2014). Soft Similarity and Soft Cosine Measure: Similarity of Features in Vector Space Model. Computación y Sistemas, 18, 491-504.
    Stevens, K., Kegelmeyer, P., Andrzejewski, D., & Buttler, D. (2012). Exploring topic coherence over many models and many topics. Paper presented at the Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea.
    Sun, & Zhuge. (2018). Summarization of Scientific Paper Through Reinforcement Ranking on Semantic Link Network. IEEE Access, 6, 40611-40625. doi:10.1109/access.2018.2856530
    Sun, L., & Guo, C. (2014). Incremental Affinity Propagation Clustering Based on Message Passing. Ieee Transactions on Knowledge and Data Engineering, 26(11), 2731-2744. doi:10.1109/TKDE.2014.2310215
    Weng, S., Wu, C.-K., Wang, Y.-C., & Tsai, R. T.-H. (2017). Question Retrieval with Distributed Representations and Participant Reputation in Community Question Answering. 中文計算語言學期刊, 22(2), 17-29.
    Yeh, Tan, & Lee. (2016). Topic detection and tracking for conversational content by using conceptual dynamic latent Dirichlet allocation. Neurocomputing, 216, 310-318. doi:10.1016/j.neucom.2016.08.017
    Zhang, & Li. (2011). Topic detection based on K-means. Paper presented at the 2011 International Conference on Electronics, Communications and Control (ICECC).

    下載圖示 校內:立即公開
    校外:2022-01-01公開
    QR CODE