簡易檢索 / 詳目顯示

研究生: 林思婷
Lin, Si-Ting
論文名稱: 社群問答網站答案品質分析
Spam Detection and Quality Evaluation in Community Question Answering
指導教授: 王惠嘉
Wang, Hei-Chia
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 67
中文關鍵詞: 社群問答網站答案品質廣告答案
外文關鍵詞: Community Question Answering, Answer Quality, Spam Answers
相關次數: 點閱:100下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   網路科技的蓬勃發展使網路成為新型態的資訊分享平台,社群問答網站也應運而生,社群問答網站允許使用者以自然語言的方式提出問題並獲得其他使用者詳細的回答,使用者也可搜尋過往問答紀錄觀看是否有相似的問題及回答,因網站中的答案是由使用者自行提供,鑒於使用者的知識限制及自然語言表達方式過於複雜,答案品質會有極大落差。
      另外,近年來行銷方式改變,廠商會徵求寫手於各大社群網站、論壇及部落格等網站中撰寫推銷自家產品及服務或攻擊對手的文章,目前社群問答網站中也逐漸出現這些廣告文章,使用者須要花費大量時間過濾掉廣告答案及低品質的答案,才能獲得真正符合其需求的答案。過往對社群問答網站答案品質的研究大多將答案分為高品質與低品質兩個類別,但因廣告答案通常會是推銷與問題相關的產品,若採用過往相關研究的方法,可能會因為答案與問題高度相關而將廣告答案判斷為高品質答案,使分類結果不如預期。過往研究亦指出不同問題類型對於答案品質的定義會有不同。因此本研究欲過濾出社群問答網站中的廣告答案,並於不同問題類型下分析答案品質,將答案分為高品質答案、低品質答案及廣告答案三類,讓使用者閱讀答案時能更有效率。實驗結果顯示考慮問題類型的答案品質分析時準確率為0.842,於答案品質分析前先進行廣告答案識別有助於降低將廣告答案誤判為高品質答案的比例。

    The rapid development of Internet makes it a new information sharing platform. Community question answering websites emerge as the time required. Users can post and answer questions in the community. Since the answers are devoted by volunteers, due to the knowledge limitation of users and the complexity of natural language expression, the answer quality varies greatly.
    Excepting to the quality of answers, some answers are posted by the writers who are paid to post advertising content in social media for commercial purpose. The community question answering become the targets of those campaigns recently. Several researches try to classify the answers in the community question answering website into high-quality and low-quality. However, using those research methods, the spam answer may be misjudged as high-quality answers because the spam answers are usually highly related to the question.
    In order to ignore the spam answers and suggest the real high-quality answers, this study wants to filter the spam answers and evaluate the quality of the non-spam answers under different question types. The answers will be divided into high-quality, low-quality and spam. The results show that the accuracy of our quality analysis method is 0.842, and doing spam filtering before answer quality analysis can reduce the proportion of misjudging spam answers as high-quality answers.

    第1章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 4 1.3 研究範圍與限制 5 1.4 研究流程 5 1.5 論文大綱 6 第2章 文獻探討 8 2.1 社群問答網站答案品質研究 8 2.2 文件分類 10 2.2.1 文件表示方法 10 2.2.2 特徵選取方法 10 2.2.3 分類器 12 2.3 垃圾評論過濾 14 2.4 小結 15 第3章 研究方法 16 3.1 研究架構 16 3.2 資料前處理模組(Data Preprocessing) 19 3.3 問題分類模組(Question Classification) 21 3.4 廣告答案識別模組(Spam Answer Identification) 24 3.4.1 廣告資訊資料庫建立(Initial Spam Information Dataset) 24 3.4.2 廣告答案分類器建立(Spam Classifier Establishment) 26 3.4.3 第一階段過濾(Phase I Filtering) 27 3.4.4 第二階段過濾(Phase II Filtering) 28 3.4.5 廣告資訊資料庫更新(Spam Information Dataset Updating) 28 3.5 答案品質分析模組(Answer Quality Analysis) 30 3.5.1 各問題類型之答案品質分類(Type-Specified Quality Classification) 30 3.5.2 混合式答案品質分析(Hybrid Quality Analysis) 34 第4章 系統建置與驗證 36 4.1 系統環境建置 36 4.2 實驗方法 36 4.2.1 資料來源 37 4.2.2 評估指標 39 4.3 參數設定 40 4.4 實驗結果 45 4.4.1 實驗一 45 4.4.2 實驗二 47 4.4.3 實驗三 49 4.4.4 實驗四 50 4.4.5 實驗五 56 第5章 結論 58 5.1 研究成果 58 5.2 未來研究方向 60 參考文獻 62 附錄 65

    Agichtein, E., Castillo, C., Donato, D., Gionis, A., & Mishne, G. (2008). Finding high-quality content in social media. Paper presented at the Proceedings of the 2008 International Conference on Web Search and Data Mining, Palo Alto, California, USA.
    Alberto, T. C., Lochter, J. V., & Almeida, T. A. (2015). Post or Block? Advances in Automatically Filtering Undesired Comments. Journal of Intelligent & Robotic Systems, 80, S245-S259.
    Altınel, B., Can Ganiz, M., & Diri, B. (2015). A corpus-based semantic kernel for text classification by using meaning values of terms. Engineering Applications of Artificial Intelligence, 43, 54-66.
    Arai, K., & Handayani, A. N. (2013). Predicting quality of answer in collaborative Q/A community. International Journal of Advanced Research in Artificial Intelligence, 2(3), 21-25.
    Blooma, M. J., Goh, D. H. L., & Chua, A. Y. K. (2012). Predictors of high-quality answers. Online Information Review, 36(3), 383-400.
    Chen, C., Wu, K., Srinivasan, V., & Kesav, B. R. (2015). The Best Answers? Think Twice: Identifying Commercial Campagins in the CQA Forums. Journal of Computer Science and Technology, 30(4), 810-828.
    Chua, A. Y. K., & Banerjee, S. (2013). So fast so good: An analysis of answer quality and answer speed in community Question-answering sites. Journal of the American Society for Information Science and Technology, 64(10), 2058-2068.
    Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297.
    Fattah, M. A. (2015). New term weighting schemes with combination of multiple classifiers for sentiment analysis. Neurocomputing, 167, 434-442.
    Habernal, I., Ptacek, T., & Steinberger, J. (2014). Supervised sentiment analysis in Czech social media. Information Processing & Management, 50(5), 693-707.
    Kim, H. K., & Kim, M. (2016). Model-induced term-weighting schemes for text classification. Applied Intelligence, 45(1), 30-43.
    Li, H., Chen, Z., Mukherjee, A., Liu, B., & Shao, J. (2015). Analyzing and detecting opinion spam on a large-scale dataset via temporal and spatial patterns. Paper presented at the Proceedings of the 9th International AAAI Conference on Web and Social Media (ICWSM-15), Oxford, UK.
    Lin, H. T., Lin, C. J., & Weng, R. C. (2007). A note on Platt’s probabilistic outputs for support vector machines. Machine learning, 68(3), 267-276.
    Liu, B., Feng, J., Liu, M., Hu, H., & Wang, X. (2015). Predicting the quality of user-generated answers using co-training in community-based question answering portals. Pattern Recognition Letters, 58, 29-34.
    Liu, Y., Wang, Y., Feng, L., & Zhu, X. (2016). Term frequency combined hybrid feature selection method for spam filtering. Pattern Analysis and Applications, 19(2), 369-383.
    Mukherjee, A., Venkataraman, V., Liu, B., & Glance, N. (2013). Fake review detection: Classification and analysis of real and pseudo reviews: UIC-CS-03-2013. Technical Report.
    Shah, C., & Pomerantz, J. (2010). Evaluating and predicting answer quality in community QA. Paper presented at the Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, Switzerland.
    Sharma, A., & Dey, S. (2012). A comparative study of feature selection and machine learning techniques for sentiment analysis. Paper presented at the Proceedings of the 2012 ACM Research in Applied Computation Symposium, San Antonio, TX, USA
    Toba, H., Ming, Z. Y., Adriani, M., & Chua, T. S. (2014). Discovering high quality answers in community question answering archives using a hierarchy of classifiers. Information Sciences, 261, 101-115.
    Xia, R., Xu, F., Zong, C., Li, Q., Qi, Y., & Li, T. (2015). Dual Sentiment Analysis: Considering Two Sides of One Review. IEEE Transactions on Knowledge and Data Engineering, 27(8), 2120-2133.
    Yao, Y., Tong, H., Xie, T., Akoglu, L., Xu, F., & Lu, J. (2015). Detecting high-quality posts in community question answering sites. Information Sciences, 302, 70-82.
    Yen, S. J., Wu, Y. C., Yang, J. C., Lee, Y. S., Lee, C. J., & Liu, J. J. (2013). A support vector machine-based context-ranking model for question answering. Information Sciences, 224, 77-87.
    mis2000lab(2015)。破窗理論 & 論壇走向.....以Yahoo知識+為例。2016年9月26日,取自http://ithelp.ithome.com.tw/articles/10166745
    高照明(2012)。語料庫建構技術—研究報告。2017年5月26日,取自http://wd.naer.edu.tw/project/NAER-101-12-F-2-03-00-2-01.pdf
    維基百科(2016)。問答系統。2016年8月22日,取自https://zh.wikipedia.org/wiki/%E5%95%8F%E7%AD%94%E7%B3%BB%E7%B5%B1

    下載圖示 校內:2022-12-31公開
    校外:2022-12-31公開
    QR CODE