簡易檢索 / 詳目顯示

研究生: 常定利
Elementi, Leonardo Toshi
論文名稱: 員工流失預測:機器學習模型與優化方法研究
A Study on Machine Learning Approaches for Predicting Employee Attrition
指導教授: 陳培殷
Chen, Pei-Yin
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 英文
論文頁數: 65
中文關鍵詞: 機器學習員工離職預測
外文關鍵詞: Machine Learning, Employee Attrition, Prediction
相關次數: 點閱:121下載:10
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 預測員工流失具有重要意義,因為這有助於組織主動識別有可能離職的員工,讓人力資源團隊能夠實施針對性的留任策略。這不僅可以減少人員流動成本,還能維持勞動力的穩定性,提升整體生產力。通過解決諸如工作滿意度、工作量或薪酬等潛在問題,公司可以提高員工的投入感,營造更支持的工作環境,最終改善員工士氣和組織績效。

    本論文探討了在 IBM 員工流失數據集上,哪種框架能產生最佳效果,並進行了多項實驗。首先,利用相關矩陣進行特徵選擇,並通過隨機過採樣(Random Oversampling)、SMOTE 和 ADASYN 方法解決數據不平衡問題。主要使用的機器學習模型包括 XGBoost、LightGBM 和 CatBoost,並結合網格搜索(Grid Search)、隨機搜索(Random Search)和 Hyperopt 進行超參數優化。

    使用 ADASYN 進行數據平衡、CatBoost 進行分類以及 Hyperopt 進行超參數調優的組合,比以往的方法取得了更高的準確率。研究結果顯示,該方法相較於基線模型(如神經網絡和邏輯回歸)具有顯著的改進。此方法為員工留任分析提供了一個穩健的框架,為人力資源策略提供了寶貴的洞見。未來的研究可以著重於整合可解釋性方法,以提升模型在實際人力資源應用中的可解釋性。

    Predicting employee attrition is important because it helps organizations proactively identify employees at risk of leaving, allowing HR teams to implement targeted retention strategies. This reduces turnover costs, maintains workforce stability, and boosts overall productivity. By addressing underlying issues such as job satisfaction, workload, or compensation, companies can enhance employee engagement and foster a more supportive work environment, ultimately improving both employee morale and organizational performance.
    This thesis examined what framework would yield the best result for IBM Employee Attrition dataset by conducting several experiments. Initially, feature selection was performed using correlation matrices, and data imbalance was addressed using Random Oversampling, SMOTE, and ADASYN methods. Key machine learning models—XGBoost,LightGBM, and CatBoost—were employed alongside hyperparameter optimization techniques including Grid Search, Random Search, and Hyperopt.
    The combination of ADASYN for data balancing, CatBoost for classification, and Hyperopt for hyperparameter tuning yielded higher accuracy than previous methods. Research demonstrates significant improvements over baseline models such as Neural Networks and Logistic Regression. This approach provides a robust framework for employee retention analysis, offering valuable insights for human resource strategies. Future work may focus on integrating explainability methods to enhance model interpretability forpractical HR applications.

    ABSTRACT I ACKNOWLEDGEMENT II CONTENTS III TABLE CAPTIONS VI FIGURE CAPTIONS VII CHAPTER 1. INTRODUCTION 1 CHAPTER 2. RELATED WORK 3 2.1 MACHINE LEARNING ALGORITHMS 3 2.1.1 Neural Networks 3 2.1.2 Logistic Regression 4 2.1.3 Support Vector Machines 5 2.1.4 Decision Trees 7 2.1.5 Random Forest 8 2.2 MACHINE LEARNING FOR PREDICTING EMPLOYEE ATTRITION 10 2.3 PREDICTION OF EMPLOYEE ATTRITION USING MACHINE LEARNING AND ENSEMBLE METHODS 12 2.4 EMPLOYEE ATTRITION PREDICTION USING NEURAL NETWORK CROSS VALIDATION METHOD 14 2.5 IBM EMPLOYEE ATTRITION ANALYSIS 16 CHAPTER 3. METHODOLOGICAL FRAMEWORK 18 3.1 DATA CLEAN UP AND DATA AUGMENTATION 18 3.1.1 Random Under/Over Sampler 19 3.1.2 SMOTE method 20 3.1.3 ADASYN method 21 3.2 GRADIENT BOOSTING ALGORITHM 22 3.2.1 XGBoost 23 3.2.2 LightGBM 26 3.2.3 CatBoost 27 3.3 HYPERPARAMETER OPTIMIZATION 29 3.3.1 Grid search 30 3.3.2 Random Search 31 3.3.3 Hyperopt 32 CHAPTER 4. EXPERIMENTS AND COMPARISONS 34 4.1 EVALUATION METRICS 34 4.2 ABLATION STUDY 36 4.2.1 Augmentation 36 4.2.2 Machine Learning Algorithms 40 4.2.3 Hyperparameter Optimization 41 4.2.4 Comprehensive Testing 42 4.2.5 Tested ineffective framwork 46 4.3 COMPARISON 48 4.3.1 5 times average of experimental result 48 4.3.2 Comparison with Previous Works 50 CHAPTER 5. CONCLUSION 52 REFERENCES 53

    [1] Kaggle.com “IBM HR Analytics Employee Attrition & Performance”

    [2] Shawni Dutta, Samir Kumar Bandyopadhyay, “Employee attrition prediction using neural network cross validation method”, 2020

    [3] D. R. Cox “The Regression Analysis of Binary Sequences” 1958

    [4] Sri Ranjitha Ponnuru, Gopi Krishna Merugumala, Srinivasulu Padigala, Ramya Vanga, Bhaskar Kantapalli “Employee Attrition Prediction using Logistic Regression” 2020

    [5] Lian Niu “A review of the application of logistic regression in educational research: common issues, implications, and suggestions”, 2018

    [6] Norsuhada Mansor, Nor Samsiah San, Mohd Aliff “Machine Learning for Predicting Employee Attrition” 2021

    [7] T. B. Trafalis &R. C. Gilbert, “Robust support vector machines for classification and computational issues” 2007

    [8] Aseel Qutub, Asmaa Al-Mehmadi, Munirah Al-Hssan, Ruyan Aljohani, and Hanan S. Alghamdi “Prediction of Employee Attrition Using Machine Learning and Ensemble Methods”, 2021

    [9] Breiman, L. Random Forests. Machine Learning 45, 5–32 (2001).

    [10] Shenghuan Yang, Md Tariqul Islam, “IBM Employee Attrition Analysis” 2021

    [11] Sotiris Kotsiantis, Dimitris Kanellopoulos, Panayiotis Pintelas, “Handling imbalanced datasets: A review” in GESTS International Transactions on Computer Science and Engineering, Vol.30, 2006

    [12] Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, W. Philip Kegelmeyer “SMOTE: Synthetic Minority Over-sampling Technique” 2002

    [13] Haibo He, yang Bai, Edwardo A. Garcia, Shutao Li “ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning” 2008

    [14] Tianqi Chen, Carlos Guestrin “XGBoost: A Scalable Tree Boosting System”, 2016

    [15] Price, Jenny Elizabeth and Yamazaki, Tatsuya and Fujihara, Kazuya and Sone, Hirohito, “Xgboost and Support Vector Machines: Comparing the Interpretability of Machine Learning Models.”, 2023

    [16] Subhani Shaika, P. Santhosh Kumarb, S. Vikram Reddyb, K.Sai Srinivas Reddyb and Sunil Bhutada “Machine Learning based Employee Attrition Predicting” 2023

    [17] Yandex

    [18] Laxman Singh Khati, Madan Kadariya “Prediction of Health Care Employee Turnover using Gradient Boosting Algorithm”

    [19] Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, Tie-Yan Liu “LightGBM: A Highly Efficient Gradient Boosting Decision Tree”

    [20] Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, Andrey Gulin “CatBoost: unbiased boosting with categorical features”

    [21] Petro Liashchynskyi, Pavlo Liashchynskyi “Grid Search, Random Search, Genetic Algorithm: A Big Comparison for NAS” 2019

    [22] James Bergstra, Brent Komer, Chris Eliasmith, Dan Yamins “Hyperopt: A Python library for model selection and hyperparameter optimization” 2015

    [23] Abinaya Mahendiran, Vedanth Subramaniam “Data Augmentation Techniques for Tabular Data” in Whitepaper by Abinaya Mahendiran, Manager – Data Science, NEXT Labs | Vedanth Subramaniam, Intern, NEXT Labs

    [24] Salah Al-Darraji, Dhafer G. Honi, Francesca Fallucchi, Ayad I. Abdulsada, Romeo Giuliano, Husam A. Abdulmalik “Employee Attrition Prediction Using Deep Neural Networks” 2021

    [25] Ali Raza, Kashif Munir, Mubarak Almutairi, Faizan Younas, Mian Muhammad Sadiq Fareed “Predicting Employee Attrition Using Machine Learning Approaches” 2022

    [26] Sarah S. Alduayj, Kashif Rajpoot “Predicting Employee Attrition using Machine Learning” 2018

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE