研究生: |
王秉民 Wang, Bing-Min |
---|---|
論文名稱: |
探討類神經網路模型超參數在小型且非原始特徵資料上的影響: 以信用評分資料為例 Exploring neural network hyperparameters on small dataset and hand-crafted features: take credit scoring as an example |
指導教授: |
張天豪
Chang, Tien-Hao |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 中文 |
論文頁數: | 35 |
中文關鍵詞: | 信用評分 、類神經網路 、機器學習 、時序模型優化 |
外文關鍵詞: | credit scoring, neural network, machine learning, Sequential Model-Based Optimization |
相關次數: | 點閱:110 下載:16 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近幾年,金融科技帶來傳統銀行業的變革,行動支付、貸款申請、投資理財、保險證券等各項業務,都開始展現出結合新資料技術的應用方式。而人工智慧的發展更是在這波浪潮下主要的推手之一,透過機器學習提升數據資料的應用價值,進而提供更精確、優質且快速的服務。而在借貸的環節中,信用評分是一個有效的工具幫助銀行或者放款人區分一個貸款者是否值得放款。 在過去,有許多的學者應用統計、資料探勘或者是機器學習的技術來提升模型的準確度,然而,因為銀行資料有其機密性,所以很少有公開的資料庫可供學者們研究 。也因為如此,在文獻上,大部分都是對於哪一個模型在信用評分上得到較好的準確率,或者是模型參數的調整,特徵的研究也因為資料庫的限制而鮮少有人探討。 本研究主要使用類神經網路作為信用評分的模型、選擇德國信用評分資料庫 (UCI german credit dataset) 當作資料的來源。我們參考近幾年深度學習發展所提出新的技術來修改在信用評分文獻中訓練類神經網路的方法,我們對正則項 (Regularizar) 做比較,此外,我們更探討了類神經網路的參數的起始值 (initialization) 跟激活函數 (activation function) 在此類型資料庫下的影響。而在調整超參數的過程,因為傳統網格式搜尋的方法對於類神經網路來說成本太高,所以我們使用貝氏最佳化 (Bayesian optimization) 的方法來調整網路的超參數,降低了一半的計算量。最後,我們將結果套用在澳洲信用資料庫上,也得到了比文獻中更好的結果。
Deep learning has achieved remarkable success in various fields, e.g. computer vision, natural language processing, and games, etc, and developed many novel techniques. These fields have a large number of data with raw features. But there are still numerous problems in other fields with few data and hand-crafted features, such as credit scoring, stock prediction, HIV prediction, etc. We want to explore whether deep learning techniques developed from remarkable tasks work in other machine learning tasks. We compared the combinations of 9 activation functions and 12 weight initializations, found that the result from original paper is the same as from credit scoring dataset. We further explored the regularization methods affect the results while model gets deeper and used SMBO method to replace grid search and random search methods for hyperparameter tuning. Last, we compared the time of training a model between neural network and ensemble method (bstacking). We showed that neural network could get a better accuracy while using 0.27 times the time for training a model. We showed that deep learning can still outperform traditional machine learning method (bstacking) in small and hand-crafted feature dataset, and we should not be using smaller networks because of overfitting. Instead, use big network, and properly choose regularization techniques to control overfitting. In deep network, l2 and dropout are the better choices than early stopping. From the efficiency point of view, some traditional machine learning algorithms would need much time to train than neural networks.
1. Lessmann S, Baesens B, Seow H-V, Thomas LC: Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research 2015, 247(1):124-136.
2. Wang Z, Yan S, Zhang C: Active learning with adaptive regularization. Pattern Recognition 2011, 44(10-11):2375-2383.
3. Wiginton JC: A note on the comparison of logit and discriminant models of consumer credit behavior. Journal of Financial and Quantitative Analysis 1980, 15(3):757-770.
4. Altman EI: Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The journal of finance 1968, 23(4):589-609.
5. Huang C-L, Chen M-C, Wang C-J: Credit scoring with a data mining approach based on support vector machines. Expert systems with applications 2007, 33(4):847-856.
6. Min JH, Lee Y-C: Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters. Expert systems with applications 2005, 28(4):603-614.
7. Lee T-S, Chiu C-C, Chou Y-C, Lu C-J: Mining the customer credit using classification and regression tree and multivariate adaptive regression splines. Computational Statistics & Data Analysis 2006, 50(4):1113-1130.
8. Nie G, Rowe W, Zhang L, Tian Y, Shi Y: Credit card churn forecasting by logistic regression and decision tree. Expert Systems with Applications 2011, 38(12):15273-15285.
9. West D: Neural network credit scoring models. Computers & Operations Research 2000, 27(11-12):1131-1152.
10. Abellán J, Mantas CJ: Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring. Expert Systems with Applications 2014, 41(8):3825-3830.
11. Li S, Tsang IW, Chaudhari NS: Relevance vector machine based infinite decision agent ensemble learning for credit risk analysis. Expert Systems with Applications 2012, 39(5):4947-4953.
12. Xia Y, Liu C, Da B, Xie F: A novel heterogeneous ensemble credit scoring model based on bstacking approach. Expert Systems with Applications 2018, 93:182-199.
13. Krizhevsky A, Sutskever I, Hinton GE: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems: 2012. 1097-1105.
14. Zhao Z, Xu S, Kang BH, Kabir MMJ, Liu Y, Wasinger R: Investigation and improvement of multi-layer perceptron neural networks for credit scoring. Expert Systems with Applications 2015, 42(7):3508-3516.
15. Baesens B, Van Gestel T, Viaene S, Stepanova M, Suykens J, Vanthienen J: Benchmarking state-of-the-art classification algorithms for credit scoring. Journal of the operational research society 2003, 54(6):627-635.
16. Harris T: Credit scoring using the clustered support vector machine. Expert Systems with Applications 2015, 42(2):741-750.
17. Yu L, Wang S, Lai KK: Credit risk assessment with a multistage neural network ensemble learning approach. Expert systems with applications 2008, 34(2):1434-1444.
18. Rumelhart DE, Hinton GE, Williams RJ: Learning representations by back-propagating errors. nature 1986, 323(6088):533.
19. Qian N: On the momentum term in gradient descent learning algorithms. Neural networks 1999, 12(1):145-151.
20. Nesterov Y: A method for unconstrained convex minimization problem with the rate of convergence O (1/k^ 2). In: Doklady AN USSR: 1983. 543-547.
21. Duchi J, Hazan E, Singer Y: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 2011, 12(Jul):2121-2159.
22. Hinton GE, Salakhutdinov RR: Reducing the dimensionality of data with neural networks. science 2006, 313(5786):504-507.
23. Kingma DP, Ba J: Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980 2014.
24. Dozat T: Incorporating nesterov momentum into adam. 2016.
25. Reddi SJ, Kale S, Kumar S: On the convergence of adam and beyond. In: International Conference on Learning Representations: 2018.
26. Radford A, Metz L, Chintala S: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:151106434 2015.
27. Glorot X, Bengio Y: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics: 2010. 249-256.
28. He K, Zhang X, Ren S, Sun J: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision: 2015. 1026-1034.
29. LeCun Y, Bottou L, Orr GB, Müller K-R: Efficient backprop. In: Neural networks: Tricks of the trade. Springer; 1998: 9-50.
30. Saxe AM, McClelland JL, Ganguli S: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:13126120 2013.
31. Maas AL, Hannun AY, Ng AY: Rectifier nonlinearities improve neural network acoustic models. In: Proc icml: 2013. 3.
32. Memisevic R, Krueger D: Zero-bias autoencoders and the benefits of co-adapting features. stat 2014, 1050:13.
33. Clevert D-A, Unterthiner T, Hochreiter S: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:151107289 2015.
34. Klambauer G, Unterthiner T, Mayr A, Hochreiter S: Self-normalizing neural networks. In: Advances in Neural Information Processing Systems: 2017. 972-981.
35. Goodfellow IJ, Warde-Farley D, Mirza M, Courville A, Bengio Y: Maxout networks. arXiv preprint arXiv:13024389 2013.
36. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R: Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 2014, 15(1):1929-1958.
37. Bergstra JS, Bardenet R, Bengio Y, Kégl B: Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems: 2011. 2546-2554.
38. Goodfellow I, Bengio Y, Courville A, Bengio Y: Deep learning, vol. 1: MIT press Cambridge; 2016.
39. Wu L, Zhu Z: Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes. arXiv preprint arXiv:170610239 2017.
40. Neyshabur B, Bhojanapalli S, McAllester D, Srebro N: Exploring generalization in deep learning. In: Advances in Neural Information Processing Systems: 2017. 5949-5958.
41. Novak R, Bahri Y, Abolafia DA, Pennington J, Sohl-Dickstein J: Sensitivity and Generalization in Neural Networks: an Empirical Study. arXiv preprint arXiv:180208760 2018.
42. Poggio T, Kawaguchi K, Liao Q, Miranda B, Rosasco L, Boix X, Hidary J, Mhaskar H: Theory of Deep Learning III: explaining the non-overfitting puzzle. arXiv preprint arXiv:180100173 2017.
43. Neyshabur B, Tomioka R, Srebro N: In search of the real inductive bias: On the role of implicit regularization in deep learning. arXiv preprint arXiv:14126614 2014.
44. Canziani A, Paszke A, Culurciello E: . arXiv preprint arXiv:160507678 2016.
45. Ramachandran P, Zoph B, Le QV: Searching for activation functions. 2018.
46. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A: Going deeper with convolutions. In: 2015. Cvpr.
47. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y: Generative adversarial nets. In: Advances in neural information processing systems: 2014. 2672-2680.
48. Ioffe S, Szegedy C: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:150203167 2015.
49. Ba JL, Kiros JR, Hinton GE: Layer normalization. arXiv preprint arXiv:160706450 2016.
50. Srivastava RK, Greff K, Schmidhuber J: Highway networks. arXiv preprint arXiv:150500387 2015.
51. Koch G, Zemel R, Salakhutdinov R: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop: 2015.