研究生: |
陳正煌 Chen, Jeng-Huang |
---|---|
論文名稱: |
使用語意分析提升Help Desk處理問題效能 Use Semantic Analysis to Improve the Performance of Help Desk Problems |
指導教授: |
王惠嘉
Wang, Hei-Chia |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 工業與資訊管理學系碩士在職專班 Department of Industrial and Information Management (on the job class) |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 中文 |
論文頁數: | 76 |
中文關鍵詞: | 問答系統 、主題模型 、語意分析 、文件分群 、文字摘要 |
外文關鍵詞: | Question and answer system, Topic model, Semantic analysis, Document Clustering, Text summarization |
相關次數: | 點閱:92 下載:5 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著資訊系統在企業越來越普及與重要,專門處理資訊相關問題的Help Desk人員也對企業的工作效率有重大的影響,然而企業對於Help Desk的重視依然保守,為了減低Help Desk高流動率的負面影響以及提升企業工作效率,對於企業現有的User Help Desk問題利用文字分析方法來處理系統的歷史處理問題紀錄集來挖掘有價值、可重複使用的資訊是企業值得投入的。
因在傳統的關鍵字查詢結果不是不夠精準就是相同語意但不同用詞的語意問題,使得查尋條件太難設定,為了分析語意關連問題,本研究採用E-HowNet語意知識庫來轉換中文詞彙之語意關係,再使用主題模型LDA(Latent Dirichlet Allocation)方法來找出每篇文章所代表的主題,依題來將相似的問題聚集起來,取出這些問題的回答紀錄進行分群並萃取摘要,並依主題關連性依序呈現給使用者,經實作驗證後,轉換語意時加入完整詞性之篩選比無語意處理提升Precision約8.5%,而用LDA訓練好的主題模型取出相同主題之問題來計算,雖然Precision從99%降為92%,但花費時間可縮短為原本的1/34,而本研究文集屬於短文集,因此句子關聯度門檻值不宜設太高避免摘要萃取失敗,建議值為0.05,此外還發現AP Cluster分群之摘要效果比K-means好。
With the increasing popularity and importance of information systems in enterprises, in order to reduce the negative impact of the high turnover rate of Help Desk personnel who specialize in information-related issues and improve the efficiency of enterprises, text analysis methods are used to record the history of problem-solving systems. It is worth investing in the collection of valuable, reusable information. In order to analyze the semantic relationship, this study uses the E-HowNet semantic knowledge base to convert the semantic relationship of Chinese vocabulary, and then uses the topic model LDA method to find out the topic represented by each article, and gather similar questions according to the topic. The answer records of these questions are taken out and the abstracts are extracted and presented to the users according to the topic relevance. After the verification, the precision of the conversion of semantic meanings into the complete part of speech screening is improved by 8.5% than the no semantic processing. The LDA-trained topic model takes the same subject problem to calculate. Although precision is reduced from 99% to 92%, the time spent can be shortened to the original 1/34, and the study essay belongs to the short essay, so the threshold of sentence relevance should not be set too high to avoid the abstract extraction failure. In addition, it is found that the summary effect of AP Cluster is better than K-means.
中文部分
iThome. (2018). 【iThome 2018企業CIO大調查:IT編制篇】金融業IT最缺人,資安人力需求僅次MIS. Retrieved from https://www.ithome.com.tw/article/122457
馬偉雲, & 陳克健. (2011). 廣義知網知識本體架構2.0版. Retrieved from http://ehownet.iis.sinica.edu.tw/index.php
陳美玲, 陳啟斌, & 王則人. (2014). 客服派遣人員之工作特性、工作滿足、教練輔導對留任意願之研究-以C電信公司為例. Paper presented at the 科際整合管理研討會 ; 2014第17屆 (2014 / 06 / 22).
維基百科編者. (2018, July 29). 語意分析. Retrieved from https://zh.wikipedia.org/w/index.php?title=%E8%AA%9E%E6%84%8F%E5%88%86%E6%9E%90&oldid=50640137
維基百科編者. (2019, April 10). 隱含狄利克雷分布. Retrieved from https://zh.wikipedia.org/w/index.php?title=%E9%9A%90%E5%90%AB%E7%8B%84%E5%88%A9%E5%85%8B%E9%9B%B7%E5%88%86%E5%B8%83&oldid=53954296
英文部分
Al Qady, M., & Kandil, A. (2014). Automatic clustering of construction project documents based on textual similarity. Automation in Construction, 42, 36-49. doi:10.1016/j.autcon.2014.02.006
Arun, R., Suresh, V., Veni Madhavan, C. E., & Narasimha Murthy, M. N. (2010, 2010//). On Finding the Natural Number of Topics with Latent Dirichlet Allocation: Some Observations. Paper presented at the Advances in Knowledge Discovery and Data Mining, Berlin, Heidelberg.
Cao, J., Xia, T., Li, J., Zhang, Y., & Tang, S. (2009). A density-based method for adaptive LDA model selection. Neurocomput., 72(7-9), 1775-1781. doi:10.1016/j.neucom.2008.06.011
Carthy, J. (2004). Lexical Chains versus Keywords for Topic Tracking. Paper presented at the Computational Linguistics and Intelligent Text Processing, Berlin, Heidelberg.
Daud, A., Khan, J. A., Nasir, J. A., Abbasi, R. A., Aljohani, N. R., & Alowibdi, J. S. (2018). Latent Dirichlet Allocation and POS Tags Based Method for External Plagiarism Detection: LDA and POS Tags Based Plagiarism Detection. International Journal on Semantic Web and Information Systems, 14(3), 53-69. doi:10.4018/ijswis.2018070103
Deveaud, R., Sanjuan, E., & Bellot, P. (2014). Accurate and Effective Latent Concept Modeling for Ad Hoc Information Retrieval. Revue des Sciences et Technologies de l'Information - Série Document Numérique, 61-84. doi:10.3166/dn.17.1.61-84
Erkan, G., & Radev, D. R. (2004). LexRank: graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22(1), 457-479.
Gorenjak, B., Ferme, M., & Ojsteršek, M. (2011). A question answering system on domain specific knowledge with semantic web support. International journal of computers.
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl 1), 5228-5235. doi:10.1073/pnas.0307752101
Gupta, V. (2009). A Survey of Text Mining Techniques and Applications. Journal of emerging technologies in web intelligence, 1(1), 60. doi:10.4304/jetwi.1.1.60-76
Halkidi, M. (2009). Hierarchial Clustering. In L. Liu & M. T. ÖZsu (Eds.), Encyclopedia of Database Systems (pp. 1291-1294). Boston, MA: Springer US.
Hennig, L. (2009). Topic-based Multi-Document Summarization with Probabilistic Latent Semantic Analysis. Paper presented at the International Conference Recent Advances in Natural Language Processing, RANLP.
Ieva, Gotlieb, Kaci, & Lazaar. (2018). Discovering Program Topoi via Hierarchical Agglomerative Clustering. IEEE Transactions on Reliability, 67(3), 758-770. doi:10.1109/TR.2018.2828135
Jeon, & Lee. (2016). Language Model Adaptation Based on Topic Probability of Latent Dirichlet Allocation. Etri Journal, 38(3), 487-493. doi:10.4218/etrij.16.0115.0499
Jovita, Linda, Hartawan, A., & Suhartono, D. (2015). Using Vector Space Model in Question Answering System. Procedia Computer Science, 59, 305-311. doi:10.1016/j.procs.2015.07.570
Kriegel, H.-P., Kröger, P., Sander, J., & Zimek, A. (2011). Density-based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(3), 231-240. doi:doi:10.1002/widm.30
Li, & Ding. (2006). The Relationships Among Various Nonnegative Matrix Factorization Methods for Clustering. Sixth International Conference on Data Mining (ICDM'06), 362-371.
Li, & Li. (2007, 23-25 Nov. 2007). Application of Ontology to Question-Answer Knowledge Management System. Paper presented at the 2007 First IEEE International Symposium on Information Technologies and Applications in Education.
Li, Wu, Yen, & Lee. (2011). Improving the efficiency of IT help-desk service by Six Sigma management methodology (DMAIC) - a case study of C company. Production Planning & Control, 22(7), 612-627. doi:10.1080/09537287.2010.503321
Li, Yao, Fan, & Yu. (2017). A Text Similarity Measurement Method Based on Singular Value Decomposition and Semantic Relevance. Journal of Information Processing Systems, 13(4), 863-875. doi:10.3745/jips.02.0067
Li, Zhou, Xue, Zha, & Yu. (2009). Enhancing diversity, coverage and balance for summarization through structure learning. Paper presented at the Proceedings of the 18th international conference on World wide web, Madrid, Spain.
Lin, J. (2002). The Web as a Resource for Question Answering: Perspectives and Challenges. In (Vol. Proceedings of the Third International Conference on Language Resources and Evaluation (LREC'02) ): European Language Resources Association (ELRA).
Lin, J., & Katz, B. (2003). Question answering from the web using knowledge annotation and knowledge mining techniques. Paper presented at the Proceedings of the twelfth international conference on Information and knowledge management, New Orleans, LA, USA.
Liu, & Huet. (2016). Event-based cross media question answering. Multimedia Tools and Applications, 75(3), 1495-1508. doi:10.1007/s11042-014-2085-0
Liu, & Lee. (2018). Email Sentiment Analysis Through k-Means Labeling and Support Vector Machine Classification. Cybernetics and Systems, 49(3), 181-199. doi:10.1080/01969722.2018.1448242
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval: Cambridge University Press.
Middleton, S. E., Shadbolt, N. R., & Roure, D. C. D. (2004). Ontological user profiling in recommender systems. ACM Trans. Inf. Syst., 22(1), 54-88. doi:10.1145/963770.963773
Momtazi. (2018). Unsupervised Latent Dirichlet Allocation for supervised question classification. Information Processing & Management, 54(3), 380-393. doi:10.1016/j.ipm.2018.01.001
Nie, Wei, Zhang, Wang, Gao, & Yang. (2017). Data-Driven Answer Selection in Community QA Systems. Ieee Transactions on Knowledge and Data Engineering, 29(6), 1186-1198. doi:10.1109/tkde.2017.2669982
Sidorov, G., Gelbukh, A., Gómez-Adorno, H., & Pinto, D. (2014). Soft Similarity and Soft Cosine Measure: Similarity of Features in Vector Space Model. Computación y Sistemas, 18, 491-504.
Stevens, K., Kegelmeyer, P., Andrzejewski, D., & Buttler, D. (2012). Exploring topic coherence over many models and many topics. Paper presented at the Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea.
Sun, & Zhuge. (2018). Summarization of Scientific Paper Through Reinforcement Ranking on Semantic Link Network. IEEE Access, 6, 40611-40625. doi:10.1109/access.2018.2856530
Sun, L., & Guo, C. (2014). Incremental Affinity Propagation Clustering Based on Message Passing. Ieee Transactions on Knowledge and Data Engineering, 26(11), 2731-2744. doi:10.1109/TKDE.2014.2310215
Weng, S., Wu, C.-K., Wang, Y.-C., & Tsai, R. T.-H. (2017). Question Retrieval with Distributed Representations and Participant Reputation in Community Question Answering. 中文計算語言學期刊, 22(2), 17-29.
Yeh, Tan, & Lee. (2016). Topic detection and tracking for conversational content by using conceptual dynamic latent Dirichlet allocation. Neurocomputing, 216, 310-318. doi:10.1016/j.neucom.2016.08.017
Zhang, & Li. (2011). Topic detection based on K-means. Paper presented at the 2011 International Conference on Electronics, Communications and Control (ICECC).