研究生: |
徐心縈 Hsu, Hsin-Ying |
---|---|
論文名稱: |
用羅吉斯迴歸建構隨機分類模型之集成方法 Ensemble Algorithms with Randomly-generated Classification Models for Logistic Regression |
指導教授: |
翁慈宗
Wong, Tzu-Tsung |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 工業與資訊管理學系 Department of Industrial and Information Management |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 43 |
中文關鍵詞: | 集成學習 、羅吉斯迴歸 、多樣性 、隨機生成模型 、標準化係數 |
外文關鍵詞: | Ensemble Learning, Logistic Regression, Diversity, Randomly-generated Model, Standardized Coefficient |
相關次數: | 點閱:116 下載:42 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
影響集成模型分類效能的關鍵因素為基本模型的準確率和多樣性,為了提升集成模型的多樣性,過去的研究在模型訓練過程中使用多個不同的子訓練集,然而,此方法對於提升多樣性的效果存在一定的上限。為了改善此問題,本研究以羅吉斯迴歸模型為基本模型,提出了一種新的集成模型為ERLR(Ensemble of Random Logistic Regression)。與傳統的基於資料集的訓練方法不同,研究跳脫以往從資料集訓練基本模型的步驟,並考慮到羅吉斯迴歸模型的特性,對迴歸係數和屬性選擇進行檢測,設計了四種不同的實驗組合,實驗結果表明,在大多數資料集上,使用全部屬性生成隨機標準化係數可以獲得較好的分類表現。
此外,本研究進一步提出了一個集成模型,稱為EMRLR(Ensemble of Mixed Random Logistic Regression),EMRLR與袋裝法訓練基本模型的組合,希望同時提升集成模型的準確率和多樣性。在20個資料集上進行實驗,結果表明,相對於其他集成方法,加入隨機模型的方法能夠顯著提升集成模型的分類效能。在執行時間的方面,雖然隨機分類模型較其他集成方法多出許多,往後可以透過平行運算,減少單一伺服器的運行程式的時間,從而使得隨機分類模型在時間成本與分類效能上都能獲得更好的表現。
The classification performance of an ensemble model is primarily determined by the accuracy and diversity of its base models. Previous studies used multiple subsets derived from the same training set to induce base models. The diversity among base models is thus limited in improving the performance of ensemble models. This study aims to propose a new ensemble algorithm, called ERLR (Ensemble of Random Logistic Regression), that uses the logistic regression to induce base models without training data. In generating a base model, attributes can be partial or full, and the coefficients in a linear regression model can be ordinary or standardized. The experimental results on 20 data sets suggest that using all attributes and standardized coefficients can have a higher accuracy in most data sets. This study also proposes an ensemble algorithm, called EMRLR (Ensemble of Mixed Random Logistic Regression), that combines the base models produced by the ERLR algorithm and the bagging approach. Experimental evaluations conducted on the same 20 data sets showed that algorithm EMRLR can significantly outperform the other three ensemble algorithms. However, the computational cost of EMRLR is the highest. This deficiency can be overcome by applying the parallel-processing technique.
Agarwal, S. & Chowdary, C. R. (2020). A-stacking and a-bagging: Adaptive versions of ensemble learning algorithms for spoof fingerprint detection. Expert Systems with Applications, 146, 113160.
Agresti, A. (2007). Building and applying logistic regression models. Categorical Data Analysis, 211-266.
Amemiya, T. (1985). Advanced Econometrics (Vol. 1). Harvard university press Cambridge, MA.
Anisha, C. & Saranya, K. (2021). Early diagnosis of stroke disorder using homogenous logistic regression ensemble classifier. International Journal of Nonlinear Analysis and Applications, 12(Special Issue), 1649-1654.
Bartlett, P., Freund, Y., Lee, W. S., & Schapire, R. E. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5), 1651-1686.
Berkson, J. (1944). Application of the logistic function to bio-assay. Journal of the American Statistical Association, 39(227), 357-365.
Bian, S. & Wang, W. (2007). On diversity and accuracy of homogeneous and heterogeneous ensembles. International Journal of Hybrid Intelligent Systems, 4(2), 103-128.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123-140.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
Dietterich, T. G. (2000). Ensemble methods in machine learning. Multiple Classifier Systems. Berlin, Heidelberg.
Efron, B. & Tibshirani, R. J. (1994). An introduction to the Bootstrap. CRC press.
Freund, Y. & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119-139.
Hansen, L. K. & Salamon, P. (1990). Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10), 993-1001.
Hilbe, J. M. (2009). Logistic Regression Models. Chapman and Hall/CRC.
Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832-844.
Ijaz, M., Gul, A., & Asghar, Z. (2022). A feature selection method for classification based on ensemble of penalized logistic models. Communications in Statistics-Simulation and Computation, 1-13.
Khodadadzadeh, M., Ghamisi, P., Contreras, C., & Gloaguen, R. (2018). Subspace multinomial logistic regression ensemble for classification of hyperspectral images. IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, 5740-5743.
Kirasich, K., Smith, T., & Sadler, B. (2018). Random forest vs logistic regression: Binary classification for heterogeneous datasets. SMU Data Science Review, 1(3), 9.
Kuncheva, L. I. (2014). Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons.
Kuswanto, H., Asfihani, A., Sarumaha, Y., & Ohwada, H. (2015). Logistic regression ensemble for predicting customer defection with very large sample size. Procedia Computer Science, 72, 86-93.
Le Cessie, S. & Van Houwelingen, J. C. (1992). Ridge estimators in logistic regression. Journal of the Royal Statistical Society: Series C (Applied Statistics), 41(1), 191-201.
Lim, N., Ahn, H., Moon, H., & Chen, J. J. (2009). Classification of high-dimensional data with ensemble of logistic regression models. Journal of Biopharmaceutical Statistics, 20(1), 160-171.
Maalouf, M. (2011). Logistic regression in data analysis: An overview. International Journal of Data Analysis Techniques and Strategies, 3(3), 281-299.
Mao, S., Chen, J.-W., Jiao, L., Gou, S., & Wang, R. (2019). Maximizing diversity by transformed ensemble learning. Applied Soft Computing, 82, 105580.
Meier, L., Van De Geer, S., & Bühlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1), 53-71.
Menard, S. (2002). Applied Logistic Regression Analysis. Sage.
Natekin, A. & Knoll, A. (2013). Gradient boosting machines, a tutorial. Frontiers in Neurorobotics, 7, 21.
Nguyen, P. T., Ha, D. H., Avand, M., Jaafari, A., Nguyen, H. D., Al-Ansari, N., Van Phong, T., Sharma, R., Kumar, R., & Le, H. V. (2020). Soft computing ensemble models based on logistic regression for groundwater potential mapping. Applied Sciences, 10(7), 2469.
Opitz, D. & Maclin, R. (1999). Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11, 169-198.
Pearl, R. & Reed, L. J. (1920). On the rate of growth of the population of the united states since 1790 and its mathematical representation. Proceedings of the National Academy of Sciences, 6(6), 275-288.
Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3), 21-45.
Rodriguez, J. J., Kuncheva, L. I., & Alonso, C. J. (2006). Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1619-1630.
Sun, J., Jia, M.-y., & Li, H. (2011). Adaboost ensemble for financial distress prediction: An empirical comparison with data from chinese listed companies. Expert Systems with Applications, 38(8), 9305-9312.
Tsymbal, A., Pechenizkiy, M., & Cunningham, P. (2005). Diversity in search strategies for ensemble feature selection. Information Fusion, 6(1), 83-98.
Widhianingsih, T. D. A., Kuswanto, H., & Prastyo, D. D. (2020). Logistic regression ensemble (lorens) applied to drug discovery. MATEMATIKA: Malaysian Journal of Industrial and Applied Mathematics, 43-49.
Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241-259.
Zhang, L. & Zhou, W.-D. (2011). Sparse ensembles using weighted combination methods based on linear programming. Pattern Recognition, 44(1), 97-106.
Zhou, Z.-H., Wu, J., & Tang, W. (2002). Ensembling neural networks: Many could be better than all. Artificial Intelligence, 137(1-2), 239-263.
Zou, H. & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301-320.