| 研究生: |
葉上戎 Yeh, Shang-Jung |
|---|---|
| 論文名稱: |
以加權法隨機森林評估金融信用風險之研究 Weighted Random Forests for Evaluating Financial Credit Risk |
| 指導教授: |
翁慈宗
Wong, Tzu-Tsung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 中文 |
| 論文頁數: | 52 |
| 中文關鍵詞: | 信用風險分析 、決策樹 、資料不平衡 、加權法隨機森林 |
| 外文關鍵詞: | Credit risk analysis, decision tree, imbalanced data, weighted random forest |
| 相關次數: | 點閱:111 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於2008年發生了金融海嘯,導致全球各地產生了嚴重的經濟危機,金融機構因此開始關注在信用風險分析上,因為若是借貸給沒有能力償還的申請人,金融機構將會損失慘重。而近年來信用風險分析的研究大多是把資料視為類別平衡的資料,但現實生活中沒有能力償還的申請人屬於少數類別,若是以一般分類方法處理可能會造成少數類別的預測正確率低落,而目前信用風險分析的研究尚無一組有效的特徵,因此本研究將使用加權法隨機森林以解決信用風險分析上資料不平衡的問題,除了可以改善少數類別低正確率的問題,也能找出信用風險資料中的有效特徵。本研究使用了四個金融相關的資料集,而研究結果也指出加權法隨機森林優於一般的隨機森林,最後會使用袋外錯誤率來找出在信用風險分析上有效且有意義的特徵。
Due to the financial crisis that occurred in 2008, which caused serious economic crises around the world, financial institutions began to pay attention to credit risk analysis. Financial institutions will suffer heavy losses if they lend to applicants who are unable to repay. In recent years, most of the research on credit risk analysis regards financial data are class-balanced. However, only few of applicants are unable to repay. If financial data are processed by general classification methods, the prediction accuracy on the minority class may be low. In addition, filtering effective features for credit risk analysis is critical. Therefore, this study will use weighted random forest instead of random forest to process imbalanced financial data such that not only the prediction accuracy on the minority class can be improved, but also the critical attributes for credit evaluation can be discovered. The experimental results obtained from four financial data sets show that weighted random forest outperforms random forest. The attributes found by applying out-of-bag error can provide meaningful insights in credit risk analysis.
陳錦村、許通安、林蔓蓁,(1996)。銀行授信客戶違約風險之預測。管理科學學報,13(1),173-95。
葉怡成,(2003)。類神經網路模式應用與實作。台北:儒林圖書有限公司。
AghaeiRad, A., Chen, N., & Ribeiro, B. (2017). Improve credit scoring using
transfer of learned knowledge from self-organizing map. Neural
Computing and Applications, 28(6), 1329-1342.
Aha, D. W., & Bankert, R. L. (1996). A comparative evaluation of sequential
feature selection algorithms. In D. Fisher& H. J Lenz (Eds.), Learning from Data (pp. 199-206). New York:Springer.
Andric, K., &Kalpic, D. (2016). The effect of class distribution on classification algorithms in credit risk assessment. Proceedings of IEEE International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) 39, pp. 1241-1247..doi: 10.1109
Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J., &Vanthienen, J. (2003). Benchmarking state-of-the-art classification algorithms for credit scoring. Journal of the Operational Research Society, 54(6), 627-635.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
Chen, C., Liaw, A., &Breiman, L. (2004). Using random forest to learn imbalanced data. Doctoral dissertation, University of California, Berkeley.
Cortes, C., &Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297.
Dahiya,S., Handa,S.S.,&Singh,N.P. (2016). A rank aggregation algorithm for ensemble of multiple feature selection techniques in credit risk evaluation. International Journal of Advanced Research in Artificial Intelligence, 5(9), 1-8.
Danenas, P., &Garsva, G. (2015). Selection of support vector machines based classifiers for credit risk domain. Expert Systems with Applications, 42(6), 3194-3204.
Danenas, P., Garsva, G., & Gudas, S. (2011). Credit risk evaluation model development using support vector based classifiers. Procedia Computer Science, 4, 1699-1707
Dash, M., & Liu, H. (1997). Feature selection for classification. Intelligent Data Analysis, 1(3), 131-156.
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861-874.
Freeman, J.A.&Skapura, D. M.(1992). Neural networks: algorithms,applications and programming techniques. Massachusetts: Addison-Wesley.
Ghatasheh, N. (2014). Business analytics using random forest trees for credit risk prediction: A comparison study. International Journal of Advanced Science and Technology, 72, 19-30.
Gupta, D. K., & Goyal, S. (2018). Credit risk prediction using artificial neural network algorithm. International Journal of Modern Education and Computer Science, 10(5), 9-16.
Guyon, I., &Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157-1182.
Huang, C. L., Chen, M. C., & Wang, C. J. (2007). Credit scoring with a data mining approach based on support vector machines. Expert Systems with Applications, 33(4), 847-856.
Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1), 273-324.
Koutanaei, F. N., Sajedi, H., &Khanbabaei, M. (2015). A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring. Journal of Retailing and Consumer Services, 27, 11–23
Lessmann, S., Baesens, B., Seow, H. V., & Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124-136.
Li H., Huang C. N., Gao J., & Fan X. (2005). The use of SVM for Chinese new word identification. InSu,K. Y., Tsujii, J., Lee, J. H., &Kwong O.Y. (Eds), Lecture Notes in Computer Science: vol. 3248.Natural Language Processing (pp.723–732). Berlin, Germany: Springer. doi: 10.1007/978-3-540-30211-7_76
Liang, D., Tsai, C. F., & Wu, H. T. (2015). The effect of feature selection on financial distress prediction. Knowledge-Based Systems, 73, 289-297.
Liu, H., &Motoda, H. (1998). Feature extraction, construction and selection: A data mining perspective. New York:Springer.
Malekipirbazari, M., &Aksakalli, V. (2015). Risk assessment in social lending via random forests. Expert Systems with Applications, 42(10), 4621-4631.
Mandrekar, J. N. (2010). Receiver operating characteristic curve in diagnostic test assessment. Journal of Thoracic Oncology, 5(9), 1315-1316.
Mohammadi, N., &Zangeneh, M. (2016). Customer credit risk assessment using artificial neural networks. International Journal of Information Technology and Computer Science, 8(3), 58-66.
Pacelli,V.&Azzollini,M. (2011). An artificial neural network approach for credit risk management. Journal of Intelligent Learning Systems and Applications,3, 103-112
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81-106.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning
representations by back-propagating errors. Nature, 323(6088), 533-536.
Vinh, N. X., Zhou, S., Chan, J., & Bailey, J. (2016). Can high-order dependencies improve mutual information based feature selection?.Pattern Recognition, 53, 46-58.
Yan, K., & Zhang, D. (2015). Feature selection and analysis on correlated gas sensor data with recursive feature elimination. Sensors and Actuators B: Chemical, 212, 353-363.
Yao, J., & Lian, C. (2016). A new ensemble model based support vector machine for credit assessing. International Journal of Grid and Distributed Computing, 9(6), 159-168.
Yu, L., & Liu, H. (2004). Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 5, 1205-1224.
Yu, L., Yang, Z., & Tang, L. (2016). A novel multistage deep belief network based extreme learning machine ensemble learning paradigm for credit risk assessment. Flexible Services and Manufacturing Journal, 28(4), 576-592.
Zhang, Q., Wang, J., Lu, A., Wang, S., & Ma, J. (2018). An improved SMO algorithm for financial credit risk assessment–Evidence from China’s banking. Neurocomputing, 272, 314-325.
校內:立即公開