簡易檢索 / 詳目顯示

研究生: 李宜蓁
Li, Yi-Zhen
論文名稱: 假評論階層性偵測模型之研究
Hierarchical Detection Model for Fake Reviews
指導教授: 李昇暾
Li, Sheng-Tun
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 65
中文關鍵詞: 假評論偵測語言風格分析階層羅吉斯迴歸分析
外文關鍵詞: fake review detection, linguistic analysis, hierarchical logistic model
相關次數: 點閱:132下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著網路平台的興起,人們習慣在消費前瀏覽網路平台上的評論,透過其他消費者的評論來幫助自己做選擇,然而研究估計所有網路評論中約有 15%至30%是假評論,部分商家為了增加自己的正面評論,並利用負面評論來攻擊競爭對手,便會購買寫手來撰寫假評論,這些假評論不僅會危害商家的名譽,也會降低第三方平台的可信度,使消費者無法獲得正確的資訊,因此假評論的偵測已成為網路平台中重要的議題,故本研究致力於建立一個假評論的分類模型,期望可以為假評論偵測領域做出貢獻。
    本研究利用評論文章之語言風格與評論者之習性特徵作為模型特徵值,語言風格之特徵值包含詞性分析與語文探索與字詞計算之數量(Linguistic Inquiry and Word Count, LIWC)、易讀性、可信性、實據性語詞性分析,習性特徵包含評論者之評分亂度、評分偏差、間隔時間與帳號生命值,並搭配階層羅吉斯迴歸分析作為模型。由於評論文章與評論者之間為巢套關係,不符合一般迴歸分析中資料須具獨立性的前題,而過往研究均未探討資料的巢套關係,故本研究採用階層式的分析,考慮各個評論者之間的組間差異,並分析語言風格與習性特徵如何影響假評論的偵測,實驗結果顯示對於具有階層特性資料集,階層羅吉斯迴歸分析能夠有效的分類出假評論與假評論者,預測假評論者之準確率甚至高達 94%,且其成效優於其他機器學習演算法。

    In this study, we propose a model to predict fake reviews and fake reviewers. Using linguistic features from each review and behavioral features from each reviewer as model features. There have been many studies exploring the detection of fake reviews. However, only few studies have considered both features from review and reviewer. And most of the studies use Logistic Model (LM) to predict fake reviews. Yet the relationship between reviews and reviewers is nested. One reviewer can write more than one review. With the data being hierarchical, we should use hierarchical analysis instead of using LM for prediction. Therefore, Hierarchical Logistic Model (HLM) is used as model in this study. The dataset is divided into sub-data sets according to different review amount of each reviewer and review time range of the review. The experimental results show that HLM can effectively classify fake reviews and fake reviewers due to the hierarchical characteristic of the data. In the analysis of fake reviews, the use of each reviewer’s reviews within one month can effectively predict fake reviews with an accuracy rate of 86%. However, it is more effective to predict fake reviewers when we consider more past reviews of each reviewer. The best reviewer prediction result is when we consider the reviews within the past 6 years of each reviewer. And the accuracy rate is 94%. All of the prediction results of HLM are better than other machine learning algorithms, such as Support Vector Machine (SVM), Random Forest (RF), Naive Bayes classifier (NB), KNearest Neighbor (KNN).

    摘要 I 英文延伸摘要 II 致謝 XI 目錄 XIII 圖目錄 XVI 表目錄 XVII 第1章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 2 1.3 研究範圍 2 1.4 研究流程 3 第2章 文獻探討 4 2.1 假評論議題與研究 4 2.1.1 假評論的議題 4 2.1.2 假評論的相關研究 5 2.1.3 特徵值的選取 8 2.2 語言風格分析 10 2.2.1 LIWC 10 2.2.2 易讀性 11 2.2.3 可信性 12 2.2.4 實據性 12 2.2.5 POS 13 2.3 階層羅吉斯迴歸 13 2.3.1 巢套資料 13 2.3.2 羅吉斯迴歸分析 15 2.3.3 階層羅吉斯迴歸分析 16 2.3.4 總平減與組平減 20 2.3.5 空模型 21 2.4 小結 22 第3章 研究方法 24 3.1 問題定義與資料來源 24 3.2 研究框架 24 3.3 資料前處理 26 3.4 特徵萃取 26 3.4.1 評論者之習性特徵 27 3.4.2 評論之語言風格特徵 28 3.5 模型建置 30 3.5.1 資料層次 30 3.5.2 建置流程 31 3.5.3 評估指標 33 第4章 實驗結果與分析 35 4.1 資料敘述 35 4.2 階層迴歸模型評估 38 4.2.1 評估流程 38 4.2.2 評估結果 40 4.2.3 勝敗比分析 42 4.3 實驗評估 46 4.3.1 演算法比較 46 4.3.2 模型調整 49 4.3.3 其他資料集測試 50 4.4 模型參數評估 51 4.4.1 資料敘述 52 4.4.2 評估結果 54 4.4.3 演算法比較 56 第5章 結論與未來展望 59 5.1 結論與貢獻 59 5.2 未來展望與研究限制 60 參考文獻 61 中文參考文獻 61 英文參考文獻 61

    中文參考文獻
    邱皓政. (2017). 多層次模式與縱貫資料分析: Mplus 8 解析應用. 台北, 五南.
    溫福星, & 邱皓政. (2009). 多層次模型方法論: 階層線性模式的關鍵議題與試解. 臺大管理論叢, 19(2), 263-293.

    英文參考文獻
    Ali, S., Ali, A., Khan, S. A., & Hussain, S. (2016). Sufficient Sample Size and Power in Multilevel Ordinal Logistic Regression Models. Computational and mathematical methods in medicine, 2016.
    Archak, N., Ghose, A., & Ipeirotis, P. G. (2011). Deriving the pricing power of product features by mining consumer reviews. Management science, 57(8), 1485-1509.
    Banerjee, S., & Chua, A. Y. (2014). A study of manipulative and authentic negative reviews. Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication.
    Chafe, W. (1986). Evidentiality in English conversation and academic writing. Evidentiality: The linguistic coding of epistemology, 20, 261-273.
    Chen, Y., & Xie, J. (2008). Online consumer review: Word-of-mouth as a new element of marketing communication mix. Management science, 54(3), 477-491.
    Cohen, J. (2013). Statistical power analysis for the behavioral sciences: Routledge.
    Dellarocas, C., Zhang, X. M., & Awad, N. F. (2007). Exploring the value of online product reviews in forecasting sales: The case of motion pictures. Journal of Interactive marketing, 21(4), 23-45.
    Dewaele, J.-M., & Furnham, A. (2000). Personality and speech production: a pilot study of second language learners. Personality and Individual differences, 28(2), 355-365.
    Dhar, V., & Chang, E. A. (2009). Does chatter matter? The impact of user-generated content on music sales. Journal of Interactive marketing, 23(4), 300-307.
    Ekström, M., Esseen, P.-A., Westerlund, B., Grafström, A., Jonsson, B. G., & Ståhl, G. (2018). Logistic regression for clustered data from environmental monitoring programs. Ecological informatics, 43, 165-173.
    Flesch, R. (1948). A new readability yardstick. Journal of applied psychology, 32(3), 221-233
    Galbraith, J., Moustaki, I., Bartholomew, D. J., & Steele, F. (2002). The analysis and interpretation of multivariate data for social scientists: Chapman and Hall/CRC.
    Ghose, A., & Ipeirotis, P. G. (2010). Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. IEEE Transactions on Knowledge and Data Engineering, 23(10), 1498-1512.
    Goldstein, H. (2011). Multilevel statistical models (Vol. 922): John Wiley & Sons.
    Ho, S. M., & Hancock, J. T. (2018). Computer-mediated deception: Collective language-action cues as stigmergic signals for computational intelligence. Proceedings of the 51st Hawaii International Conference on System Sciences.
    Ho, S. M., Hancock, J. T., Booth, C., & Liu, X. (2016). Computer-mediated deception: Strategies revealed by language-action cues in spontaneous communication. Journal of Management Information Systems, 33(2), 393-420.
    Ho, Y.-C., Wu, J., & Tan, Y. (2017). Disconfirmation effect on online rating behavior: A structural model. Information Systems Research, 28(3), 626-642.
    Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (Vol. 398): John Wiley & Sons.
    Hu, N., Liu, L., & Sambamurthy, V. (2011). Fraud detection in online consumer reviews. Decision Support Systems, 50(3), 614-626.
    Jindal, N., & Liu, B. (2008). Opinion spam and analysis. Proceedings of the 2008 international conference on web search and data mining.
    Khansa, L., Ma, X., Liginlal, D., & Kim, S. S. (2015). Understanding members’ active participation in online question-and-answer communities: A theory and empirical analysis. Journal of Management Information Systems, 32(2), 162-203.
    Ku, Y.-C., Wei, C.-P., & Hsiao, H.-W. (2012). To whom should I listen? Finding reputable reviewers in opinion-sharing communities. Decision Support Systems, 53(3), 534-542.
    Kumar, N., Qiu, L., & Kumar, S. (2018). Exit, voice, and response on digital platforms: An empirical investigation of online management response strategies. Information Systems Research, 29(4), 849-870.
    Kumar, N., Venugopal, D., Qiu, L., & Kumar, S. (2018). Detecting review manipulation on online platforms with hierarchical supervised learning. Journal of Management Information Systems, 35(1), 350-380.
    Lappas, T., Sabnis, G., & Valkanas, G. (2016). The impact of fake reviews on online visibility: A vulnerability assessment of the hotel industry. Information Systems Research, 27(4), 940-961.
    Lau, R. Y., Liao, S., Kwok, R. C. W., Xu, K., Xia, Y., & Li, Y. (2011). Text mining and probabilistic language modeling for online review spam detecting. ACM Transactions on Management Information Systems, 2(4), 1-30.
    Li, F. H., Huang, M., Yang, Y., & Zhu, X. (2011). Learning to identify review spam. Twenty-second international joint conference on artificial intelligence.
    Li, S.-T., Pham, T.-T., & Chuang, H.-C. (2019). Do reviewers’ words affect predicting their helpfulness ratings? Locating helpful reviewers by linguistics styles. Information & Management, 56(1), 28-38.
    Lin, Y., Zhu, T., Wang, X., Zhang, J., & Zhou, A. (2014). Towards online review spam detection. Proceedings of the 23rd International Conference on World Wide Web.
    Luca, M., & Zervas, G. (2016). Fake it till you make it: Reputation, competition, and Yelp review fraud. Management science, 62(12), 3412-3427.
    Maas, C. J., & Hox, J. J. (2005). Sufficient sample sizes for multilevel modeling. Methodology, 1(3), 86-92.
    Mairesse, F., Walker, M. A., Mehl, M. R., & Moore, R. K. (2007). Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of artificial intelligence research, 30, 457-500.
    Moineddin, R., Matheson, F. I., & Glazier, R. H. (2007). A simulation study of sample size for multilevel logistic regression models. BMC medical research methodology, 7(1), 34.
    Mukherjee, A., Kumar, A., Liu, B., Wang, J., Hsu, M., Castellanos, M., & Ghosh, R. (2013). Spotting opinion spammers using behavioral footprints. Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining.
    Mukherjee, A., Venkataraman, V., Liu, B., & Glance, N. (2013). What yelp fake review filter might be doing? Seventh international AAAI conference on weblogs and social media.
    Ott, M., Choi, Y., Cardie, C., & Hancock, J. T. (2011). Finding deceptive opinion spam by any stretch of the imagination. Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1.
    Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015.
    Rayana, S., & Akoglu, L. (2015). Collective opinion spam detection: Bridging review networks and metadata. Proceedings of the 21th acm sigkdd international conference on knowledge discovery and data mining.
    Rayson, P., Wilson, A., & Leech, G. (2002). Grammatical word class variation within the British National Corpus sampler. In New Frontiers of Corpus Research (pp. 295-306): Brill Rodopi.
    Rubin, V. L., & Liddy, E. D. (2006). Assessing Credibility of Weblogs. AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.
    Sommet, N., & Morselli, D. (2017). Keep calm and learn multilevel logistic modeling: A simplified three-step procedure using Stata, R, Mplus, and SPSS. International Review of Social Psychology, 30(1).
    Su, Q., Huang, C.-R., & Chen, H. K.-y. (2010). Evidentiality for text trustworthiness detection. Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground.
    Tom, A., Bosker, T. A. S. R. J., & Bosker, R. J. (1999). Multilevel analysis: an introduction to basic and advanced multilevel modeling: Sage.
    Wasko, M. M., & Faraj, S. (2005). Why should I share? Examining social capital and knowledge contribution in electronic networks of practice. MIS quarterly, 29(1), 35-57.
    Weerkamp, W., & De Rijke, M. (2008). Credibility improves topical blog post retrieval. Proceedings of ACL-08: HLT.
    Ye, J., & Akoglu, L. (2015). Discovering opinion spammer groups by network footprints. Joint European Conference on Machine Learning and Knowledge Discovery in Databases.
    Yu, H., Jiang, S., & Land, K. C. (2015). Multicollinearity in hierarchical linear models. Social science research, 53, 118-136.
    Zakaluk, B. L., & Samuels, S. J. (1988). Readability: Its Past, Present, and Future: ERIC.
    Zhang, D., Zhou, L., Kehoe, J. L., & Kilic, I. Y. (2016). What online reviewer behaviors really matter? Effects of verbal and nonverbal behaviors on detection of fake online reviews. Journal of Management Information Systems, 33(2), 456-481.

    無法下載圖示 校內:2025-06-15公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE