| 研究生: |
洪權逸 Hung, Chuan-I |
|---|---|
| 論文名稱: |
應用文字探勘和XGBoost分類器分析蛋白粉市場中的電子口碑 Uses of Text Mining and XGBoost Classifier to Analyze the Electronic Word-of-Mouth of the Protein Powder Market |
| 指導教授: |
呂執中
Lyu, Jr-Jung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2024 |
| 畢業學年度: | 112 |
| 語文別: | 中文 |
| 論文頁數: | 57 |
| 中文關鍵詞: | 文字探勘 、主題模型 、XGBoost 、BERTopic 、銀髮族 、蛋白粉 |
| 外文關鍵詞: | Text Mining, Topic Modelling, XGBoost, BERTopic |
| 相關次數: | 點閱:66 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
根據蛋白粉市場產業調查報告,2022年蛋白粉市場達到約260億美元,並仍持續成長。而因全球高齡化影響與健康意識的提升,蛋白粉消費市場中的銀髮族消費者佔比越來越高。銀髮族消費者選購蛋白粉過程中容易受到口碑影響,且銀髮族因為自身健康狀況與飲食偏好,會有不同的選購需求,因此若能細部分析銀髮族消費者的偏好與意見,就能作為新產品研發的參考。
過去研究較少利用電子口碑調查銀髮族蛋白粉市場的成功因素,多使用問卷調查針對銀髮族消費者偏好的研究僅能找出表層的市場成功因素。本研究主要目的為利用電子口碑協助蛋白粉廠商找出銀髮族市場的成功因素,提出一整合分析方法,應用XGBoost分類器,針對亞馬遜購物平台上,蛋白粉產品銷售排行榜前十名蛋白粉產品,共蒐集4681則顧客留言進行分類,以識別出顧客分群,再進一步選擇銀髮族消費者的評論使用文字探勘中的主題模型方法:BERTopic模型進行分析,找出顧客偏好並進一步分析市場成功因素。
研究結果發現XGBoost演算法在評論分類任務中可達到89%的準確率,表示消費者的分群精確,蛋白粉消費者可被分為三類,第一類為以健身需求為主的年輕族群,第二類為素食者,而第三類為以保健需求為主的銀髮族群。應用BERTopic發現深層語意方面效果佳,銀髮族消費者評論中有四個隱含主題,顯示該市場成功因素分別是「營養功效」、「品牌聲譽」、「產品口味」與「包裝」等四個面向。銀髮族消費者特別重視蛋白粉的營養功效和品牌聲譽,為了健康考量,他們比起價格更在意產品是否有效,並且更傾向於選擇信譽良好且有較高評價或是醫師認證的品牌。本研究顯示文字探勘不僅能精準達到顧客分群且能夠發現更深層次的分群消費者偏好和需求,提供產品開發商有價值的參考資訊。
Based on the survey reports of the protein powder market (Global Information, 2023), the market size reached approximately USD 26 billion in 2022 and continues to grow. With the trend of high aging populations and increasing health awareness, the proportion of the protein powder market for elderly consumers is increasing. Elderly consumers are more easily influenced by word-of-mouth when purchasing protein powder, and due to their health conditions and dietary preferences, they have various purchasing needs. A detailed analysis of elderly consumers' preferences could provide valuable insights for the manufacturers of protein power. This study aims to use text mining to analyze electronic word-of-mouth to help protein powder manufacturers to identify the key success factors for products targeting the elderly market. The XGBoost classifier was applied to classify 4,681 customer reviews of the top ten best-selling protein powder products on the Amazon shopping platform to identify customer segments. Further, focusing on reviews by elderly customers, the BERTopic model was adopted to analyze customer preferences and success factors in the market. The results show that the XGBoost classifier achieves an accuracy of 89% in market classification and is appropriate for consumer segmentation. Four implicit themes were identified in the reviews indicating that the market success factors are "nutritional efficacy", "brand reputation", "product taste" and "packaging". This study demonstrates the power of combining XGBoost classifier and BERTopic model to accurately segment customers and to discover deeper preferences and needs of segmented consumers. Results provide valuable references for product developers on precise consumer market.
Albanese, F., Feuerstein, E., Lombardi, L., & Balenzuela, P. (2023). Characterizing Community Changing Users using Text Mining and Graph Machine Learning on Twitter. Alberto Mendelzon Workshop on Foundations of Data Management,
Ambulkar P, Hande P, Tambe B, Vaidya VG, Naik N, Agarwal R, & G., G. (2023). Efficacy and safety assessment of protein supplement - micronutrient fortification in promoting health and wellbeing in healthy adults - a randomized placebo-controlled trial. Transl Clin Pharmacol, 31(1), 13-27.
An, Y., Oh, H., & Lee, J. (2023). Marketing Insights from Reviews Using Topic Modeling with BERTopic and Deep Clustering Network. Applied Sciences, 13(16), 9443.
Anna Kårlund, Carlos Gómez-Gallego, Anu M. Turpeinen, Outi-Maaria Palo-oja, El-Nezami, H., & Kolehmainen1, M. (2019). Protein Supplements and Their Relation with Nutrition, Microbiota Composition and Health: Is More Protein Always Better for Sportspeople? Nutrients, 11(4), 829.
Arnaud De Bruyn, & Lilien, G. L. (2008). A multi-stage model of word-of-mouth influence through viral marketing. Intern. J. of Research in Marketing.
Artem Timoshenko, & Hauser, J. R. (2023). Identifying Customer Needs from User-Generated Content. Marketing Science, 38(1).
Aschemann-Witzel, J., Gantriis, R., Fraga, P., & Perez-Cueto, F. (2020). Plant-based food and protein trend from a business perspective: markets, consumers, and the challenges and opportunities in the future. Critical Reviews in Food Science and Nutrition, 61, 1-10. https://doi.org/10.1080/10408398.2020.1793730
Avinash Kumar, Shibashish Chakraborty, & Bala, P. K. (2023). Text mining approach to explore determinants of grocery mobile app satisfaction using online customer reviews. JOURNAL OF RETAILING AND CONSUMER SERVICES, 73.
Ayat Zaki Ahmed, & Rodríguez-Díaz, M. (2020). Significant Labels in Sentiment Analysis of Online Customer Reviews of Airlines. SUSTAINABILITY, 12(8683).
Bahety, P. K., Sarkar, S., De, T., Kumar, V., & Mittal, A. (2022). Exploring the factors influencing consumer preference toward dairy products: an empirical research. Vilakshan - XIMB Journal of Management.
Baker, A. M., Donthu, N., & Kumar, V. (2016). Investigating how Word-of-Mouth Conversations about Brands Influence Purchase and Retransmission Intentions. Journal of marketing research, 53(2), 225-239.
Ban, H.-J., & Kim, H.-S. (2019). Understanding Customer Experience and Satisfaction through Airline Passengers’ Online Review. SUSTAINABILITY, 11(15), 4066.
Birch, K., Cochrane, D., & Ward, C. (2021). Data as asset? The measurement, governance, and valuation of digital personal data by Big Tech. Big Data & Society, 8(1).
Bonifazi, G., Corradini, E., Ursino, D., & Virgili, L. (2023). Modeling, Evaluating, and Applying the eWoM Power of Reddit Posts. Big Data and Cognitive Computing, 7(1), 47.
Chai, C. P. C. (2023). Comparison of text preprocessing methods. Natural language engineering, 29(3), 509-553.
Chapman, I., Oberoi, A., Giezenaar, C., & Soenen, S. (2021). Rational Use of Protein Supplements in the Elderly-Relevance of Gastrointestinal Mechanisms. Nutrients, 13(4).
Chegini, G. R., & Taheri, M. (2013). Whey powder: Process technology and physical properties: A review. Middle East Journal of Scientific Research, 13, 1377-1387.
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA.
Chittiprolu, V. C., Vinay) [1] ; , Samala, N. S., Nagaraj) [2] ; , & Bellamkonda, R. B., Raja Shekhar) [1]. (2021). Heritage hotels and customer experience: a text mining analysis of online reviews. International Journal of Culture, Tourism and Hospitality Research, 15(2), 131-156.
Chung, J., Lee, J., & Yoon, J. (2022). Understanding music streaming services via text mining of online customer reviews. Electronic Commerce Research and Applications, 53, 101145.
Douglas, J., Lawrence, J., & Knowlden, A. (2017). The use of fortified foods to treat malnutrition among older adults: A systematic review. Quality in Ageing and Older Adults, 18.
Ertan Ermis, Ismail Hakki Tekiner, Chi Ching Lee, Sumeyye Ucak, & Yetim, H. (2023). An overview of protein powders and their use in food formulations. Food Process Engineering, 46(5).
Global Information. (2023). Global Sports Nutrition Market 2020-2030.
Gurusamy, V., & Kannan, S. (2014). Preprocessing Techniques for Text Mining.
Harry P. Cintineo, Michelle A. Arent, Jose Antonio, & Arent, S. M. (2018). Effects of Protein Supplementation on Performance and Recovery in Resistance and Endurance Training. Front Nutr., 5(83).
Jemai, J., & Zarrad, A. (2023). Feature Selection Engineering for Credit Risk Assessment in Retail Banking. Information, 14(3), 200.
Jia, S. (2019). Measuring tourists’ meal experience by mining online user generated content about restaurants. Scandinavian Journal of Hospitality and Tourism, 19(4-5), 371-389. https://doi.org/10.1080/15022250.2019.1651671
Jia, S. S. (2018). Behind the ratings: Text mining of restaurant customers’ online reviews. International Journal of Market Research, 60(6), 561-572.
Jun Yu, Xiaobin Zhang, & Hak-Seon Kim. (2023). Using Online Customer Reviews to Understand Customers’ Experience and Satisfaction with Integrated Resorts. SUSTAINABILITY, 15(17), 13049.
Katerina Berezina, Anil Bilgihan, Cihan Cobanoglu, & Okumus, F. (2016). Understanding Satisfied and Dissatisfied Hotel Customers: Text Mining of Online Hotel Reviews. Journal of Hospitality Marketing & Management, 25:1, 1-24.
Khan, F. M., Khan, S. A., Shamim, K., Gupta, Y., & Sherwani, S. I. (2022). Analysing customers' reviews and ratings for online food deliveries: A text mining approach. Consumer Studies, 47(3), 953-976.
Kherwa, P., & Bansal, P. (2018). Topic Modeling: A Comprehensive Review. ICST Transactions on Scalable Information Systems, 7, 159623.
Khyani, D., & B S, S. (2021). An Interpretation of Lemmatization and Stemming in Natural Language Processing. Shanghai Ligong Daxue Xuebao/Journal of University of Shanghai for Science and Technology, 22, 350-357.
Kim, E.-G., & Chun, S.-H. (2019). Analyzing Online Car Reviews Using Text Mining. SUSTAINABILITY, 11(1611).
Kim, J. J., Nam, M., & Kim, I. (2019). The effect of trust on value on travel websites: enhancing well-being and word-of-mouth among the elderly. Journal of Travel & Tourism Marketing, 36(1), 76-89. https://doi.org/10.1080/10548408.2018.1494086
Kousis, A., & Tjortjis, C. (2024). Investigating the Key Aspects of a Smart City through Topic Modeling and Thematic Analysis. Future Internet, 16(1), 3.
Lester, S., Cornacchia, L., Corbier, C., Hurst, K., Ayed, C., Taylor, M. A., & Fisk, I. (2021). Age group determines the acceptability of protein derived off-flavour. Food Quality and Preference, 91, 104212. https://www.sciencedirect.com/science/article/pii/S0950329321000392
Mendonça, M., & Figueira, Á. (2024). Topic Extraction: BERTopic’s Insight into the 117th Congress’s Twitterverse. Informatics, 11(1), 8.
Mishra, A., Chandel, A. K. S., Bhalani, D., V; , & Shrivastava, R. (2021). Importance of Dietary Supplements to the Health. Current Nutrition & Food Science, 17(6), 583-600.
Mittal, D., & Agrawal, S. R. (2021). Determining banking service attributes from online reviews: text mining and sentiment analysis. International Journal of Bank Marketing, 40(3), 558-577.
Moreno, M., Sánchez-Franco, M. J., & Rey-Tienda, M. (2023). Examining transaction-specific satisfaction and trust in Airbnb and hotels. An application of BERTopic and Zero-shot text classification. Tourism & Management Studies, 19, 21-37.
Musashi, E., Kato, S., Hosoda, T., & Ikeda, D. (2023). Analysis of Emotions from the Word-of-Mouth of the Elderly. Proceedings of the International Conference on ICT Application Research, 1, 60-65.
Paisri, W., Ruanguttamanun, C., & Sujchaphong, N. (2022). Customer experience and commitment on eWOM and revisit intention: A case of Taladtongchom Thailand. Cogent Business & Management, 9(1).
Patrous, Z. S. (2018). Evaluating XGBoost for User Classification by using Behavioral Features Extracted from Smartphone Sensors.
Porter, M. F. (1980). An algorithm for suffix stripping. Program.
Qin, Y., Pillidge, C., Harrison, B., & Adhikari, B. (2024). Pathways in formulating foods for the elderly. Food Research International, 186, 114324. https://www.sciencedirect.com/science/article/pii/S0963996924003946
Ridzuan, F., & Wan Zainon, W. M. N. (2019). A Review on Data Cleansing Methods for Big Data. Procedia Computer Science, 161, 731-738.
Rodriguez-Lopez, P., Rueda-Robles, A., Sánchez-Rodríguez, L., Blanca-Herrera, R. M., Quirantes-Piné, R. M., Borrás-Linares, I., Segura-Carretero, A., & Lozano-Sánchez, J. (2022). Analysis and Screening of Commercialized Protein Supplements for Sports Practice. Foods, 11(21), 3500.
Rosario, A. B., de Valck, K., & Sotgiu, F. (2020). Conceptualizing the electronic word-of-mouth process: What we know and need to know about eWOM creation, exposure, and evaluation. JOURNAL OF THE ACADEMY OF MARKETING SCIENCE, 48(3), 422-448.
Roy, G., Basu, R., & Ray. (2020). Antecedents of Online Purchase Intention Among Ageing Consumers. Global Business Review, 24, 097215092092201.
Saleh Nagi Alsubari, Deshmukh, S. N., Alqarni, A. A., Alsharif, N., Theyazn H. H. Aldhyani, Alsaade, F. W., & Khalaf, O. I. (2022). Data Analytics for the Identification of Fake Reviews Using Supervised Learning. Computers, Materials & Continua, 70(2), 3189-3204.
Sanchez-Oliver, A., Contreras, J., Puya-Braza, J. M., & Guerra-Hernández, E. (2018). Quality analysis of commercial protein powder supplements and relation to characteristics declared by manufacturer. LWT- Food Science and Technology, 97, 100–108. https://doi.org/10.1016/j.lwt.2018.06.047
Sarica, S., & Luo, J. (2021). Stopwords in technical language processing. PLOS ONE, 16(8), e0254937.
Saxton, R., & McDougal, O. M. (2021). Whey Protein Powder Analysis by Mid-Infrared Spectroscopy. Foods, 10(5).
Shaqman, N., Hashim, N. H., & Yahya, W. K. (2020). Influence of Utilitarian Shopping Value and Electronic Word of Mouth on Mobile Shopping: A Conceptual Framework. International Journal of Business and Technology Management(3), 90-105%V 102.
Sinnasamy, T. A. P., & Sjaif, N. N. A. (2022). Sentiment Analysis using Term based Method for Customers' Reviews in Amazon Product. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 13(7), 685-691.
Sun, S., Luo, C., & Chen, J. (2017). A review of natural language processing techniques for opinion mining systems. Information Fusion, 36, 10-25.
Swagato Chatterjee, Goyal, D., Prakash, A., & Sharma, J. (2021). Exploring healthcare/health-product ecommerce satisfaction: A text mining and machine learning application. Journal of Business Research, 131, 815-825.
Tang, Z., Pan, X., & Gu, Z. (2024). Analyzing public demands on China’s online government inquiry platform: A BERTopic-Based topic modeling study. PLOS ONE, 19(2), e0296855.
Verma, S., & Yadav, N. (2021). Past, Present, and Future of Electronic Word of Mouth (EWOM). Journal of Interactive Marketing, 53, 111-128.
Wang, J., Zhao, Z., Liu, Y., & Guo, Y. (2021). Research on the Role of Influencing Factors on Hotel Customer Satisfaction Based on BP Neural Network and Text Mining. Information, 12(3).
Weisfeld-Spolter, S., Sussan, Fiona, & Gould, Stephen. (2014). An integrative approach to eWOM and marketing communications. Corporate Communications: An International Journal, 19(3), 260-274.
Wen-Kuo Chen, Dalianus Riantama, & Chen, L.-S. (2020). Using a Text Mining Approach to Hear Voices of Customers from Social Media toward the Fast-Food Restaurant Industry. SUSTAINABILITY, 13(1).
Xiong, F., Xie, M., Zhao, L., Li, C., & Fan, X. (2022). Recognition and Evaluation of Data as Intangible Assets. SAGE Open, 12(2).
Yae-Ji Kim, & Kim., H.-S. (2022). The Impact of Hotel Customer Experience on Customer Satisfaction through Online Reviews. SUSTAINABILITY, 14(2), 848.
Yaniv Gvili, & Levy, S. (2018). Consumer engagement with eWOM on social media: the role of social capital. Online Information Review, 42(4), 482-505.
Zhao, S. (2021). Thumb Up or Down? A Text-Mining Approach of Understanding Consumers through Reviews. Decision Sciences, 52(3), 699-719.
Zheng Xiang, Zvi Schwartz, John H. Gerdes, & Muzaffer Uysal. (2015). What can big data and text analytics tell us about hotel guest experience and satisfaction? International Journal of Hospitality Management, 44, 120-130.
Zhu, K. (2014). Analysis of Chinese Word Segmentation Technology. Applied Mechanics and Materials, 687-691, 1540-1543.
校內:2029-07-31公開