簡易檢索 / 詳目顯示

研究生: 洪權逸
Hung, Chuan-I
論文名稱: 應用文字探勘和XGBoost分類器分析蛋白粉市場中的電子口碑
Uses of Text Mining and XGBoost Classifier to Analyze the Electronic Word-of-Mouth of the Protein Powder Market
指導教授: 呂執中
Lyu, Jr-Jung
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 57
中文關鍵詞: 文字探勘主題模型XGBoostBERTopic銀髮族蛋白粉
外文關鍵詞: Text Mining, Topic Modelling, XGBoost, BERTopic
相關次數: 點閱:66下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 根據蛋白粉市場產業調查報告,2022年蛋白粉市場達到約260億美元,並仍持續成長。而因全球高齡化影響與健康意識的提升,蛋白粉消費市場中的銀髮族消費者佔比越來越高。銀髮族消費者選購蛋白粉過程中容易受到口碑影響,且銀髮族因為自身健康狀況與飲食偏好,會有不同的選購需求,因此若能細部分析銀髮族消費者的偏好與意見,就能作為新產品研發的參考。
    過去研究較少利用電子口碑調查銀髮族蛋白粉市場的成功因素,多使用問卷調查針對銀髮族消費者偏好的研究僅能找出表層的市場成功因素。本研究主要目的為利用電子口碑協助蛋白粉廠商找出銀髮族市場的成功因素,提出一整合分析方法,應用XGBoost分類器,針對亞馬遜購物平台上,蛋白粉產品銷售排行榜前十名蛋白粉產品,共蒐集4681則顧客留言進行分類,以識別出顧客分群,再進一步選擇銀髮族消費者的評論使用文字探勘中的主題模型方法:BERTopic模型進行分析,找出顧客偏好並進一步分析市場成功因素。
    研究結果發現XGBoost演算法在評論分類任務中可達到89%的準確率,表示消費者的分群精確,蛋白粉消費者可被分為三類,第一類為以健身需求為主的年輕族群,第二類為素食者,而第三類為以保健需求為主的銀髮族群。應用BERTopic發現深層語意方面效果佳,銀髮族消費者評論中有四個隱含主題,顯示該市場成功因素分別是「營養功效」、「品牌聲譽」、「產品口味」與「包裝」等四個面向。銀髮族消費者特別重視蛋白粉的營養功效和品牌聲譽,為了健康考量,他們比起價格更在意產品是否有效,並且更傾向於選擇信譽良好且有較高評價或是醫師認證的品牌。本研究顯示文字探勘不僅能精準達到顧客分群且能夠發現更深層次的分群消費者偏好和需求,提供產品開發商有價值的參考資訊。

    Based on the survey reports of the protein powder market (Global Information, 2023), the market size reached approximately USD 26 billion in 2022 and continues to grow. With the trend of high aging populations and increasing health awareness, the proportion of the protein powder market for elderly consumers is increasing. Elderly consumers are more easily influenced by word-of-mouth when purchasing protein powder, and due to their health conditions and dietary preferences, they have various purchasing needs. A detailed analysis of elderly consumers' preferences could provide valuable insights for the manufacturers of protein power. This study aims to use text mining to analyze electronic word-of-mouth to help protein powder manufacturers to identify the key success factors for products targeting the elderly market. The XGBoost classifier was applied to classify 4,681 customer reviews of the top ten best-selling protein powder products on the Amazon shopping platform to identify customer segments. Further, focusing on reviews by elderly customers, the BERTopic model was adopted to analyze customer preferences and success factors in the market. The results show that the XGBoost classifier achieves an accuracy of 89% in market classification and is appropriate for consumer segmentation. Four implicit themes were identified in the reviews indicating that the market success factors are "nutritional efficacy", "brand reputation", "product taste" and "packaging". This study demonstrates the power of combining XGBoost classifier and BERTopic model to accurately segment customers and to discover deeper preferences and needs of segmented consumers. Results provide valuable references for product developers on precise consumer market.

    摘要ii 誌謝vii 表目錄x 圖目錄xi 第一章 緒論1 1.1 研究背景1 1.2 研究動機2 1.3 研究目的2 1.4 研究範圍與限制3 1.5 研究流程3 第二章 文獻探討5 2.1 蛋白粉5 2.1.1 蛋白粉介紹5 2.1.2 蛋白粉之於銀髮族6 2.1.3 銀髮族蛋白粉市場6 2.2 電子口碑8 2.3 文字探勘9 2.3.1 資料前處理10 2.3.2 分類11 2.3.3 主題建模12 2.4 文字探勘用於電子口碑之研究15 第三章 研究方法20 3.1 問題定義與研究架構20 3.2 資料蒐集21 3.3 資料前處理22 3.3.1 資料清洗22 3.3.2 斷詞22 3.3.3 刪除停止詞23 3.3.4 詞幹提取與詞形還原23 3.4 分類模型24 3.5 主題建模25 3.6 驗證與結果分析25 第四章 研究結果29 4.1 資料收集與處理29 4.2 分類模型30 4.3 主題建模32 4.4 分析結果34 4.5 研究結果小結36 第五章 結論與建議38 5.1 結論38 5.2 管理意涵39 5.3 未來研究建議39 參考文獻40

    Albanese, F., Feuerstein, E., Lombardi, L., & Balenzuela, P. (2023). Characterizing Community Changing Users using Text Mining and Graph Machine Learning on Twitter. Alberto Mendelzon Workshop on Foundations of Data Management,
    Ambulkar P, Hande P, Tambe B, Vaidya VG, Naik N, Agarwal R, & G., G. (2023). Efficacy and safety assessment of protein supplement - micronutrient fortification in promoting health and wellbeing in healthy adults - a randomized placebo-controlled trial. Transl Clin Pharmacol, 31(1), 13-27.
    An, Y., Oh, H., & Lee, J. (2023). Marketing Insights from Reviews Using Topic Modeling with BERTopic and Deep Clustering Network. Applied Sciences, 13(16), 9443.
    Anna Kårlund, Carlos Gómez-Gallego, Anu M. Turpeinen, Outi-Maaria Palo-oja, El-Nezami, H., & Kolehmainen1, M. (2019). Protein Supplements and Their Relation with Nutrition, Microbiota Composition and Health: Is More Protein Always Better for Sportspeople? Nutrients, 11(4), 829.
    Arnaud De Bruyn, & Lilien, G. L. (2008). A multi-stage model of word-of-mouth influence through viral marketing. Intern. J. of Research in Marketing.
    Artem Timoshenko, & Hauser, J. R. (2023). Identifying Customer Needs from User-Generated Content. Marketing Science, 38(1).
    Aschemann-Witzel, J., Gantriis, R., Fraga, P., & Perez-Cueto, F. (2020). Plant-based food and protein trend from a business perspective: markets, consumers, and the challenges and opportunities in the future. Critical Reviews in Food Science and Nutrition, 61, 1-10. https://doi.org/10.1080/10408398.2020.1793730
    Avinash Kumar, Shibashish Chakraborty, & Bala, P. K. (2023). Text mining approach to explore determinants of grocery mobile app satisfaction using online customer reviews. JOURNAL OF RETAILING AND CONSUMER SERVICES, 73.
    Ayat Zaki Ahmed, & Rodríguez-Díaz, M. (2020). Significant Labels in Sentiment Analysis of Online Customer Reviews of Airlines. SUSTAINABILITY, 12(8683).
    Bahety, P. K., Sarkar, S., De, T., Kumar, V., & Mittal, A. (2022). Exploring the factors influencing consumer preference toward dairy products: an empirical research. Vilakshan - XIMB Journal of Management.
    Baker, A. M., Donthu, N., & Kumar, V. (2016). Investigating how Word-of-Mouth Conversations about Brands Influence Purchase and Retransmission Intentions. Journal of marketing research, 53(2), 225-239.
    Ban, H.-J., & Kim, H.-S. (2019). Understanding Customer Experience and Satisfaction through Airline Passengers’ Online Review. SUSTAINABILITY, 11(15), 4066.
    Birch, K., Cochrane, D., & Ward, C. (2021). Data as asset? The measurement, governance, and valuation of digital personal data by Big Tech. Big Data & Society, 8(1).
    Bonifazi, G., Corradini, E., Ursino, D., & Virgili, L. (2023). Modeling, Evaluating, and Applying the eWoM Power of Reddit Posts. Big Data and Cognitive Computing, 7(1), 47.
    Chai, C. P. C. (2023). Comparison of text preprocessing methods. Natural language engineering, 29(3), 509-553.
    Chapman, I., Oberoi, A., Giezenaar, C., & Soenen, S. (2021). Rational Use of Protein Supplements in the Elderly-Relevance of Gastrointestinal Mechanisms. Nutrients, 13(4).
    Chegini, G. R., & Taheri, M. (2013). Whey powder: Process technology and physical properties: A review. Middle East Journal of Scientific Research, 13, 1377-1387.
    Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA.
    Chittiprolu, V. C., Vinay) [1] ; , Samala, N. S., Nagaraj) [2] ; , & Bellamkonda, R. B., Raja Shekhar) [1]. (2021). Heritage hotels and customer experience: a text mining analysis of online reviews. International Journal of Culture, Tourism and Hospitality Research, 15(2), 131-156.
    Chung, J., Lee, J., & Yoon, J. (2022). Understanding music streaming services via text mining of online customer reviews. Electronic Commerce Research and Applications, 53, 101145.
    Douglas, J., Lawrence, J., & Knowlden, A. (2017). The use of fortified foods to treat malnutrition among older adults: A systematic review. Quality in Ageing and Older Adults, 18.
    Ertan Ermis, Ismail Hakki Tekiner, Chi Ching Lee, Sumeyye Ucak, & Yetim, H. (2023). An overview of protein powders and their use in food formulations. Food Process Engineering, 46(5).
    Global Information. (2023). Global Sports Nutrition Market 2020-2030.
    Gurusamy, V., & Kannan, S. (2014). Preprocessing Techniques for Text Mining.
    Harry P. Cintineo, Michelle A. Arent, Jose Antonio, & Arent, S. M. (2018). Effects of Protein Supplementation on Performance and Recovery in Resistance and Endurance Training. Front Nutr., 5(83).
    Jemai, J., & Zarrad, A. (2023). Feature Selection Engineering for Credit Risk Assessment in Retail Banking. Information, 14(3), 200.
    Jia, S. (2019). Measuring tourists’ meal experience by mining online user generated content about restaurants. Scandinavian Journal of Hospitality and Tourism, 19(4-5), 371-389. https://doi.org/10.1080/15022250.2019.1651671
    Jia, S. S. (2018). Behind the ratings: Text mining of restaurant customers’ online reviews. International Journal of Market Research, 60(6), 561-572.
    Jun Yu, Xiaobin Zhang, & Hak-Seon Kim. (2023). Using Online Customer Reviews to Understand Customers’ Experience and Satisfaction with Integrated Resorts. SUSTAINABILITY, 15(17), 13049.
    Katerina Berezina, Anil Bilgihan, Cihan Cobanoglu, & Okumus, F. (2016). Understanding Satisfied and Dissatisfied Hotel Customers: Text Mining of Online Hotel Reviews. Journal of Hospitality Marketing & Management, 25:1, 1-24.
    Khan, F. M., Khan, S. A., Shamim, K., Gupta, Y., & Sherwani, S. I. (2022). Analysing customers' reviews and ratings for online food deliveries: A text mining approach. Consumer Studies, 47(3), 953-976.
    Kherwa, P., & Bansal, P. (2018). Topic Modeling: A Comprehensive Review. ICST Transactions on Scalable Information Systems, 7, 159623.
    Khyani, D., & B S, S. (2021). An Interpretation of Lemmatization and Stemming in Natural Language Processing. Shanghai Ligong Daxue Xuebao/Journal of University of Shanghai for Science and Technology, 22, 350-357.
    Kim, E.-G., & Chun, S.-H. (2019). Analyzing Online Car Reviews Using Text Mining. SUSTAINABILITY, 11(1611).
    Kim, J. J., Nam, M., & Kim, I. (2019). The effect of trust on value on travel websites: enhancing well-being and word-of-mouth among the elderly. Journal of Travel & Tourism Marketing, 36(1), 76-89. https://doi.org/10.1080/10548408.2018.1494086
    Kousis, A., & Tjortjis, C. (2024). Investigating the Key Aspects of a Smart City through Topic Modeling and Thematic Analysis. Future Internet, 16(1), 3.
    Lester, S., Cornacchia, L., Corbier, C., Hurst, K., Ayed, C., Taylor, M. A., & Fisk, I. (2021). Age group determines the acceptability of protein derived off-flavour. Food Quality and Preference, 91, 104212. https://www.sciencedirect.com/science/article/pii/S0950329321000392
    Mendonça, M., & Figueira, Á. (2024). Topic Extraction: BERTopic’s Insight into the 117th Congress’s Twitterverse. Informatics, 11(1), 8.
    Mishra, A., Chandel, A. K. S., Bhalani, D., V; , & Shrivastava, R. (2021). Importance of Dietary Supplements to the Health. Current Nutrition & Food Science, 17(6), 583-600.
    Mittal, D., & Agrawal, S. R. (2021). Determining banking service attributes from online reviews: text mining and sentiment analysis. International Journal of Bank Marketing, 40(3), 558-577.
    Moreno, M., Sánchez-Franco, M. J., & Rey-Tienda, M. (2023). Examining transaction-specific satisfaction and trust in Airbnb and hotels. An application of BERTopic and Zero-shot text classification. Tourism & Management Studies, 19, 21-37.
    Musashi, E., Kato, S., Hosoda, T., & Ikeda, D. (2023). Analysis of Emotions from the Word-of-Mouth of the Elderly. Proceedings of the International Conference on ICT Application Research, 1, 60-65.
    Paisri, W., Ruanguttamanun, C., & Sujchaphong, N. (2022). Customer experience and commitment on eWOM and revisit intention: A case of Taladtongchom Thailand. Cogent Business & Management, 9(1).
    Patrous, Z. S. (2018). Evaluating XGBoost for User Classification by using Behavioral Features Extracted from Smartphone Sensors.
    Porter, M. F. (1980). An algorithm for suffix stripping. Program.
    Qin, Y., Pillidge, C., Harrison, B., & Adhikari, B. (2024). Pathways in formulating foods for the elderly. Food Research International, 186, 114324. https://www.sciencedirect.com/science/article/pii/S0963996924003946
    Ridzuan, F., & Wan Zainon, W. M. N. (2019). A Review on Data Cleansing Methods for Big Data. Procedia Computer Science, 161, 731-738.
    Rodriguez-Lopez, P., Rueda-Robles, A., Sánchez-Rodríguez, L., Blanca-Herrera, R. M., Quirantes-Piné, R. M., Borrás-Linares, I., Segura-Carretero, A., & Lozano-Sánchez, J. (2022). Analysis and Screening of Commercialized Protein Supplements for Sports Practice. Foods, 11(21), 3500.
    Rosario, A. B., de Valck, K., & Sotgiu, F. (2020). Conceptualizing the electronic word-of-mouth process: What we know and need to know about eWOM creation, exposure, and evaluation. JOURNAL OF THE ACADEMY OF MARKETING SCIENCE, 48(3), 422-448.
    Roy, G., Basu, R., & Ray. (2020). Antecedents of Online Purchase Intention Among Ageing Consumers. Global Business Review, 24, 097215092092201.
    Saleh Nagi Alsubari, Deshmukh, S. N., Alqarni, A. A., Alsharif, N., Theyazn H. H. Aldhyani, Alsaade, F. W., & Khalaf, O. I. (2022). Data Analytics for the Identification of Fake Reviews Using Supervised Learning. Computers, Materials & Continua, 70(2), 3189-3204.
    Sanchez-Oliver, A., Contreras, J., Puya-Braza, J. M., & Guerra-Hernández, E. (2018). Quality analysis of commercial protein powder supplements and relation to characteristics declared by manufacturer. LWT- Food Science and Technology, 97, 100–108. https://doi.org/10.1016/j.lwt.2018.06.047
    Sarica, S., & Luo, J. (2021). Stopwords in technical language processing. PLOS ONE, 16(8), e0254937.
    Saxton, R., & McDougal, O. M. (2021). Whey Protein Powder Analysis by Mid-Infrared Spectroscopy. Foods, 10(5).
    Shaqman, N., Hashim, N. H., & Yahya, W. K. (2020). Influence of Utilitarian Shopping Value and Electronic Word of Mouth on Mobile Shopping: A Conceptual Framework. International Journal of Business and Technology Management(3), 90-105%V 102.
    Sinnasamy, T. A. P., & Sjaif, N. N. A. (2022). Sentiment Analysis using Term based Method for Customers' Reviews in Amazon Product. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 13(7), 685-691.
    Sun, S., Luo, C., & Chen, J. (2017). A review of natural language processing techniques for opinion mining systems. Information Fusion, 36, 10-25.
    Swagato Chatterjee, Goyal, D., Prakash, A., & Sharma, J. (2021). Exploring healthcare/health-product ecommerce satisfaction: A text mining and machine learning application. Journal of Business Research, 131, 815-825.
    Tang, Z., Pan, X., & Gu, Z. (2024). Analyzing public demands on China’s online government inquiry platform: A BERTopic-Based topic modeling study. PLOS ONE, 19(2), e0296855.
    Verma, S., & Yadav, N. (2021). Past, Present, and Future of Electronic Word of Mouth (EWOM). Journal of Interactive Marketing, 53, 111-128.
    Wang, J., Zhao, Z., Liu, Y., & Guo, Y. (2021). Research on the Role of Influencing Factors on Hotel Customer Satisfaction Based on BP Neural Network and Text Mining. Information, 12(3).
    Weisfeld-Spolter, S., Sussan, Fiona, & Gould, Stephen. (2014). An integrative approach to eWOM and marketing communications. Corporate Communications: An International Journal, 19(3), 260-274.
    Wen-Kuo Chen, Dalianus Riantama, & Chen, L.-S. (2020). Using a Text Mining Approach to Hear Voices of Customers from Social Media toward the Fast-Food Restaurant Industry. SUSTAINABILITY, 13(1).
    Xiong, F., Xie, M., Zhao, L., Li, C., & Fan, X. (2022). Recognition and Evaluation of Data as Intangible Assets. SAGE Open, 12(2).
    Yae-Ji Kim, & Kim., H.-S. (2022). The Impact of Hotel Customer Experience on Customer Satisfaction through Online Reviews. SUSTAINABILITY, 14(2), 848.
    Yaniv Gvili, & Levy, S. (2018). Consumer engagement with eWOM on social media: the role of social capital. Online Information Review, 42(4), 482-505.
    Zhao, S. (2021). Thumb Up or Down? A Text-Mining Approach of Understanding Consumers through Reviews. Decision Sciences, 52(3), 699-719.
    Zheng Xiang, Zvi Schwartz, John H. Gerdes, & Muzaffer Uysal. (2015). What can big data and text analytics tell us about hotel guest experience and satisfaction? International Journal of Hospitality Management, 44, 120-130.
    Zhu, K. (2014). Analysis of Chinese Word Segmentation Technology. Applied Mechanics and Materials, 687-691, 1540-1543.

    無法下載圖示 校內:2029-07-31公開
    校外:2029-07-31公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE