簡易檢索 / 詳目顯示

研究生: 陳志全
Chen, Chih-Chuan
論文名稱: 結合單調性先備知識於支援向量機之研究
Incorporating Monotonic Prior Knowledge in Support Vector Machines
指導教授: 李昇暾
Li, Sheng-Tun
學位類別: 博士
Doctor
系所名稱: 管理學院 - 工業與資訊管理學系
Department of Industrial and Information Management
論文出版年: 2014
畢業學年度: 103
語文別: 英文
論文頁數: 68
中文關鍵詞: 資料探勘支援向量機先備知識單調性模糊理論
外文關鍵詞: Data mining, Support vector machine, Prior knowledge, Monotonicity constraints, Fuzzy Set Theory
相關次數: 點閱:153下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 支援向量機(support vector machine, SVM) 是以統計學習為基礎的類神經網路(ANN),由於其出色的學習能力,成為目前機器學習研究焦點,也是近年來資料探勘熱門的工具之一,在處理分類問題上已被廣泛應用。SVM的模型設計不使用訓練誤差,而是經由最小化一般性誤差上限來最小化結構性風險,免除了一般機器學習上常發生的overfitting的問題;再者,它可以轉換成二次規劃的問題,藉由適當選擇kernel 函數,可求得全域的最佳解。
    資料驅動的資料探勘方法在實務上的應用的時候常碰到一個共通的問題,就是它們雖然可以達到很高的正確率,但有可能缺乏管理上的意涵,因而降低決策品質,在這些資料探勘方法的應用上面,先備知識往往扮演一個重要的角色;例如,反應變數與預測變數之間的單調性關係是一個常見的先備知識。因此,本研究首先提出一個將單調性之先備知識納入考量的單調性限制支援向量機 (regularized monotonicity constrained SVM (RMC-SVM))模型。此模型中,先備單調知識以不等式之限制式的型態呈現,與原始之SVM模型類似,我們將其轉換成二次規劃的問題,並針對加入單調性限制式後之二次規畫問題會失去正定(positive definite)的現象,提出正則化(regularization),然後推導出適合的演算法以利問題之求解。
    另考量SVM是根據訓練實例來建構分類模型,會對於較不具重要性的資料或含噪數據過於重視及敏感,導致分類正確率下降。本研究引入模糊理論概念於單調性限制支援向量機,除了使用先備單調性知識來建構限制式,並利用各領域專家所提供的先備知識判斷資料中貢獻度,可提供不同資料之不同的重要性,建構知識導向具單調性限制式的模糊支援向量機模型 (RMC-FSVM),對於決策問題較具貢獻的資料應給予較高的貢獻值。
    經實驗證實,RMC-SVM在分類結果上,確實能有效增加分類器的成效,而且比傳統的SVM模型好。而同時考慮單調性限制式及不同貢獻度的RMC-FSVM模型亦優於SVM與FSVM。

    Support vector machine (SVM) is a state-of-the-art artificial neural network based on statistical learning. For more than a decade, SVM has drawn considerable attention from diverse research communities in data mining thanks to its outstanding performance in solving problems related to classification and function estimation. It has been successfully applied to many different fields, such as forecasting corporate distress, consumer loan evaluation, text categorization, bioinformatics, handwriting recognition, and speaker verification. The original idea of SVM is to use a linear separating hyperplane to create a classifier. For non-linearly separable cases, input vectors are mapped to a higher-dimensional feature space and the system will then easily construct the hyperplane, which ensures high generalizability for classifying new objects.
    In many data mining applications there is prior domain knowledge concerning the monotonic relations between the response and predictor variables, and taking into account monotonicity may be an important model requirement with regard to explaining and justifying decisions. Therefore, this study firstly proposes a regularized monotonicity constrained SVM (RMC-SVM) that incorporates monotonic nature of the problems being considered. In RMC-SVM, a quadratic programming problem in the dual space is derived, a Tikhonov regularization is utilized to ensure the access to the global solution, and an algorithm implemented with a quadratic programming solver is developed.
    Furthermore, considering the fact that in many applications each input point may not be exactly labeled as one particular class, this study extensively proposes a novel fuzzy SVM model to explore this issue. It applies a fuzzy membership to each input point. It also utilizes expert knowledge concerning the monotonic relations between the response and predictor variables, which is represented in the form of monotonicity constrains. The classification problem of a monotonically constrained fuzzy SVM, called a regularized monotonic FSVM (RMC-FSVM), is formulated, its dual optimization problem is derived, and its monotonic property is theoretically analyzed. The Tikhonov regularization method is also adopted to ensure that the solution is unique and bounded. A new measure, the frequency monotonicity rate, is proposed to evaluate the ability of the model to retain the monotonicity.
    When applied to some benchmark datasets, the proposed RMC-SVM shows statistically significant advantages and promising results over the original SVM. As for RMC-FSVM, the results of the experiments on real-world and synthetic datasets show that it has a number of advantages with regard to predictive ability and retaining monotonicity over the original FSVM and SVM models when applied to classification problems.

    摘要 I Abstract III 誌謝 V Content VI List of Figures VIII List of Tables IX Chapter 1 Introduction 1 Chapter 2 Related Works 5 2.1 Support Vector Machines 5 2.2 Fuzzy SVMs 7 2.3 Classification with Monotonicity Constraints 8 Chapter 3 Research Methodology 14 3.1 Definition of Monotonicity 14 3.2 Monotonically Constrained SVM Model 17 3.3 Monotonically Constrained FSVM Model 20 Chapter 4 Regularized MC-SVM and MC-FSVM Algorithms 26 4.1 Regularized MC-SVM (RMC-SVM) algorithm 26 4.2 Regularized Monotonic FSVM (RMC-FSVM) Algorithm 28 4.3 Constructing the Monotonicity Constraints 31 4.4 Defining the Fuzzy Membership Functions 33 4.5 Determining the Penalty Term for Tikhonov Regularization 34 Chapter 5 Experimental Results and Analysis 35 5.1 Experimental Results and Analysis for RMC-SVM 35 5.1.1 Data sets 35 5.1.2 Experiment design 38 5.1.3 Performance measures 38 5.1.3 Experimental analysis 41 5.2 Experimental Results and Analysis for RMC-FSVM 50 5.2.1 Comparison of membership generation strategies 51 5.2.2 Comparison of different models with SVM membership functions 53 5.2.3 Impact of monotonicity adherence on performance 55 Chapter 6 Conclusion and Future Work 58 6.1 Contributions 58 6.2 Limitations 59 6.3 Future Work 60 References 63 List of Figures Figure 1. The RMC-SVM algorithm 28 Figure 2. The RMC-FSVM algorithm 31 Figure 3. The process of constructing the monotonicity constraints 32 Figure 4. CPU time (in seconds) of RMC-SVM vs. SVM with various numbers of constraints 50 Figure 5. Performance comparison of classifiers with SVM membership function used in the FSVM and RMC-FSVM approaches 55 Figure 6. Comparison of the performance when monotonicity is violated to different degrees 56 Figure 7. The scatter plots of FMR in terms of testing data and classifiers. 57 List of Tables Table 1. Description of the data sets 37 Table 2 Accuracy (%) of RMC-SVM on the Bankruptcy data set with different numbers of constraints 42 Table 3. Accuracy (%) comparison of RMC-SVM vs. SVM on the Bankruptcy data set 42 Table 4. Performance comparison using RMC-SVM and SVM with the German dataset 44 Table 5. Performance comparison using RMC-SVM and SVM with the PD600 dataset 45 Table 6. Performance comparison using RMC-SVM and SVM with the Japanese dataset 47 Table 7. The monotonicity measures using RMC-SVM, SVM and the data set 49 Table 8. Bankruptcy results (%): comparison among FSVM, RMC-FSVM, and SVM 52 Table 9. WDBC results (%): comparison among FSVM, RMC-FSVM, and SVM 52 Table 10. German results (%): comparison among FSVM, RMC-FSVM, and SVM 53 Table 11. PD600 results (%): comparison among FSVM, RMC-FSVM, and SVM 53

    Abu-Mostafa, Y. S. (1994). Learning from Hints. Journal of Complexity, 10, 165 -178.
    Abu-Mostafa, Y. S. (1995). Hints. Neural Computation, 7, 639 - 671.
    Archer, N. P., & Wang, S. (1993). Learning Bias in Neural Networks and an Approach to Controlling Its Effect in Monotonic Classification. Paper presented at the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).
    Arun Kumar, M., & Gopal, M. (2010). A Comparison Study on Multiple Binary-Class SVM Methods for Unilabel Text Categorization. Pattern Recognition Letters, 31(11), 1437-1444.
    Bergstra, J., & Bengio, Y. (2012). Random Search for Hyper-Parameter Optimization. Journal of Machine Learning Research, 13, 281—305.
    Bhattacharyya, S., Jha, S., Tharakunnel, K., & Westland, J. C. (2011). Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3), 602-613.
    Burges, C. J. C. (1998a). A tutorial on support vector machines for pattern recognition. Data Mining Knowledge Discovery, 2(2), 955–974.
    Burges, C. J. C. (1998b). A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining Knowledge Discovery, 2, 955-974.
    Cao, L. (2010). Domain-Driven Data Mining: Challenges and Prospects. IEEE Transations on Knowledge and Data Engineering 22(6), 755 - 769.
    Cao, L., Zhao, Y., Zhang, H., Luo, D., Zhang, C., & Park, E. K. (2010). Flexible Frameworks for Actionable Knowledge Discovery. IEEE Transactions on Knowledge and Data Engineering, 22(9), 1299-1312. doi: 10.1109/TKDE.2009.143
    Caruana, G., Li, M., & Liu, Y. (2013). An ontology enhanced parallel SVM for scalable spam filter training. Neurocomputing, 108, 45-57.
    Chang, C. C., & Lin, C. J. (2011). LIBSVM : a Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 27:21--27:27.
    Chen, J., & Burer, S. (2012). Globally solving nonconvex quadratic programming problems via completely positive programming. Mathematic Programming Computation, 4, 33-52.
    Courant, R., & Hilbert, D. (1970). Methods of Mathematical Physics (Vol. I, II). New York: Wiley Interscience.
    Cristianini, N., & Shawe-Taylor, J. (2000). Support Vector Machines and other kernel-based learning methods Cambridge University Press, 2000 - Ordering Information.
    Daniels, H., & Kamp, B. (1999). Application of mlp networks to bond rating and house pricing. Neural Computation and Applications, 8, 226-234.
    Decherchi, S., Ridella, S., Zunino, R., Gastaldo, P., & Anguita, D. (2010). Using Unsupervised Analysis to Constrain Generalization Bounds for Support Vector Classifiers. IEEE Transactions on Neural Networks, 21(3), 424-438.
    Dembczynski, K., Kotlowski, W., & Slowinski, R. (2008). Ensemble of decision rules for ordinal classification with monotonicity constraints. Rough Sets and Knowledge Technology, 260-267.
    Doumpos, M., & Pasiouras, F. (2005). Developing and testing models for replicating credit ratings: A multicriteria approach. Computational Economics, 25, 327–341.
    Doumpos, M., & Zopounidis, C. (2009). Monotonic Support Vector Machines for Credit Risk Rating. New Mathematics and Natural Computation 5(3), 557-570.
    Doumpos, M., Zopounidis, C., & Golfinopoulou, V. (2007). Additive Support Vector Machines for Pattern Classification. IEEE Transactions on Systems, Man and Cybernetics - Part B: Cybernetics, 37(3), 540-550.
    Duivesteijn, W., & Feelders, A. (2008). Nearest Neighbour Classification with Monotonicity Constraints , . Paper presented at the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases, Antwerp, Belgium Springer-Verlag.
    Evgeniou, T., C., & Boussios, e. a. (2005). Generalized Robust Conjoin Estimation. Marketing Science, 24(3), 415 - 429.
    Falck, T., Suykens, J., & De Moor, B. (2009). Robustness Analysis for Least Squares Kernel Based Regression: an Optimization Approach. Paper presented at the The 48th IEEE Conference on Decision and Control (CDC 2009) Shanghai, China.
    Frank, A., & Asuncion, A. (2010). UCI Machine Learning Repository. from http://archive.ics.uci.edu/ml
    Gamarnik, D. (1998). Efficient learning of monotone concepts via quadratic optimization. In: Proceedings of the eleventh annual conference on computational learning theory, ACM Press, New York., 134–143.
    Gould, N. I. M., & Toint, P. L. (2002). Numerical Methods for Large-Scale Non-Convex Quadratic Programming. Paper presented at the Trends in Industrial and Applied Mathematics (Amritsar 2001), Kluwer, Dordrecht.
    Graf, H., Cosatto, E., Bottou, L., Dourdanovic, I., & Vapnik, V. (2005). Parallel Support Vector Machines: The Cascade SVM. In L. K. Saul, Y. Weiss & L. Bottou (Eds.), Advances in Neural Information Processing Systems (Vol. 17, pp. 521-528). Cambridge, MA: MIT Press.
    Greco, S., Matarazzo, B., & Slowinski, R. (1998). A New Rough Set Approach to Evaluation of Bankruptcy Risk. In C. Zopounidis (Ed.), Operational Tools in the Management of Financial Risks (pp. 121-136). Dordrecht, The Netherlands: Kluwer.
    Greco, S., Matarazzo, B., & Słowiński, R. (1998). A new rough set approach to evaluation of bankruptcy risk. In: Zopounidis, C. (ed.) Operational Tools in the Management of Financial Risks, Kluwer Academic Publishers, Dordrech., 121–136.
    Gruber, C., Gruber, T., Krinninger, S., & Sick, B. (2010). Online Signature Verification With Support Vector Machines Based on LCSS Kernel Functions. IEEE Transactions on Systems, Man and Cybernetics - Part B: Cybernetics, 40(4), 1088-1101.
    He, H., & Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263-1284.
    Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2010). A practical guide to support vector classification Technical Report: National Taiwan University.
    Hu, Q., Che, X., Zhang, L., Guo, M., & Yu, D. (2011). Rank Entropy Based Decision Trees for Monotonic Classification. IEEE Transactions on Knowledge and Data Engineering.
    Hua, Z. S., Wang, Y., Xu, X. Y., Zhang, B., & Liang, L. (2007). Predicting Corporate Financial Distress Based on Integration of Support Vector Machine and Logistic Regression. Expert Systems with Applications, 33(2), 434-440.
    Huang, H. P., & Liu, Y. H. (2002). Fuzzy support vector machines for pattern recognition and data mining. International Journal of Fuzzy Systems, 4, 826 - 835.
    Huang, W., Nakamoria, Y., & Wang, S. Y. (2005). Forecasting stock market movement direction with support vector machine. Computers & Operations Research, 32, 2513–2522.
    Huang, Z., Chen, H., Hsu, C. J., Chen, W. H., & Wu, S. (2004). Credit Rating Analysis with Support Vector Machines and Neural Network: A Market Comparative Study. Decision Support Systems, 37(4), 543-558.
    Joachims, T. (1999). Making Large-Scale SVM Learning Practical. In B. Schölkopf, C. J. C. Burges & A. J. Smola (Eds.), Advances in Kernel Methods: Support Vector Machines (Vol. , pp. 169–184). Cambridge, MA: MIT press.
    Khemchandani, R., Jayadeva, & Chandra, S. (2009). Knowledge Based Proximal Support Vector Machines. European Journal of Operational Research, 195(3), 914-923.
    Kim, H. S., & Sohn, S. Y. (2010). Support vector machines for default prediction of SMEs based on technology credit. European Journal of Operational Research, 201, 838 - 846.
    Kozlov, M. K., Tarasov, S. P., & Khachiyan, L. G. (1979). Polynomial Solvability of Convex Quadratic Programming. Soviet Mathematics - Doklady, 20, 1108–1111.
    Kramer, K. A., Hall, L. O., Goldgof, D. B., Remsen, A., & Luo, T. (2009). Fast Support Vector Machines for Continuous Data. IEEE Transactions on Systems, Man and Cybernetics - Part B: Cybernetics, 39(4), 989-1002.
    Kuo, Y.-T., Lonie, A., Sonenberg, L., & Paizis, K. (2007). Domain Ontology Driven Data Mining: a Medical Case Study. Paper presented at the the 2007 International Workshop on Domain Driven Data Mining (DDDM '07), New York, NY USA.
    Lauer, F., Suen, C. Y., & Bloch, G. (2007). A Trainable Feature Extractor for Handwritten Digit Recognition Pattern Recognition, 40(6), 1816-1824.
    Li, S.-T., Shiue, W., & Huang, M.-H. (2006). The Evaluation of Consumer Loans Using Support Vector Machines. Expert Systems with Applications, 30, 772-782.
    Li, S. T., Shiue, W., & Huang, M. H. (2006). The Evaluation of Consumer Loans Using Support Vector Machines. Expert Systems with Applications, 30(4), 772-782.
    Lin, C. F., & Wang, S. D. (2002). Fuzzy Support Vector Machines. IEEE Transactions on Neural Networks, 13(2), 464 - 471.
    Müller, K.-R., Smola, A., Rätsch, G., Schölkopf, B., Kohlmorgen, J., & Vapnik, V. (1997). Predicting time series with support vector machines. Paper presented at the International Conference on Artificial Neural Networks

    Ma, Y., Wang, L., & Li, L. (2014). A parallel and convergent support vector machine based on MapReduce Computer Engineering and Networking (pp. 585-592): Springer International Publishing.
    Maes, C. M. (2010). A Regularized Active-Set Method for Sparse Convex Quadratic Programming. (PhD), Stanford University.
    Mariéthoz, J., & Bengio, S. (2007). A Kernel Trick for Sequences Applied to Text-Independent Speaker Verification Systems. Pattern Recognition, 40(8), 2315-2324.
    Mercer, J. (1909). Functions of Positive and Negative Type, and Their Connection with the Theory of Integral Equations. Transactions of the London Philosophical Society (V), 9, 415-446.
    Mukherjee, S., Osuna, E., & Girosi, F. (1997). Nonlinear Prediction of Chaotic Time Series Using a Support Vector Machine. Paper presented at the IEEE Workshop on Neural networks for Signal Processing 7, Amelia Island, FL.
    Na, M. G., Park, W. S., & Lim, D. H. (2008). Detection and Diagnostics of Loss of Coolant Accidents Using Support Vector Machines. IEEE Transactions Nuclear Science, 55(1), 628-636.
    Pardalos, P. (1991). Golobal optimization algorithms for linearly constrained indefinite quadratic problems. Computational Mathematics and Applications, 21, 87-97.
    Pardalos, P. M., & Vavasis, S. A. (1991). Quadratic Programming with One Negative Eigenvalue Is NP-Hard. Journal of Global Optimization, 1(1), 15-22.
    Pazzani, M., & Kibler, D. (1992). The utility of knowledge in inductive learning. Machine Learning, 9, 57-94.
    Pazzani, M. J., Mani, S., & Shankle, W. R. (2001). Acceptance of rules generated by machine learning among medical experts. Methods of Information in Medicine, 40, 380-385.
    Pelckmans, K., Espinoza, M., De Brabanter, J., Suykens, J. A. K., & De Moor, B. (2005). Prime-Dual Monotone Kernel Regression. Neural Processing Letters, 22(2), 171-182.
    Pendharkar, P. C., & Rodger, J. A. (2003). echnical efficiency-based selection of learning cases to improve forecasting accuracy of neural networks under monotonicity assumption. Decision Support Systems, 36(1), 117-136.
    Platt, J. C. (1999). Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. J. C. Burges & A. J. Smola (Eds.), Advances in Kernel Methods: Support Vector Machines (Vol. , pp. 185–208). Cambridge, MA: MIT press.
    Popova, V. N. (2004). Knowledge Discovery and Monotonicity. (PhD), Erasmus University Rotterdam, Rotterdam, The Netherlands.
    Potharst, R., & Feelders, A. J. (2002). Classification Trees for Problems with Monotonicity Constraints. ACM SIGKDD Explorations Newsletter, 4(1), 1-10.
    Ravikumar, B., Thukaram, D., & Khincha, H. P. (2009). An Approach Using Support Vector Machines for Distance Relay Coordination in Transmission System. IEEE Transactions on Power Delivery, 24(1), 79-88.
    Ryu, Y. U., Chandrasekaran, R., & Jacob, V. (2007). Data Classification Using the Isotonic Separation Technique: Application to Breast Cancer Prediction. European Journal of Operational Research, 181, 842-854.
    Schölkopf, B., & Smola, A. J. (2002). Learning with Kernels --Support Vector Machines, Regularization, Optimization and Beyond. Cambridge, Massachusetts: The MIT Press.
    Shilton, A., Palaniswami, M., Ralph, D., & Tsoi, A. C. (2005). Incremental Training of Support Vector Machines. IEEE Transactions Neural Networks, 16(1), 114-131.
    Slowinski, R., & Zopounidis, C. (1995). Application of the Rough Set Approach to Evaluation of Bankruptcy Risk. Intelligent Systems in Accounting, Finance and Management, 4, 27-41.
    Sun, B. Y., Li, J. Y., & Wu, D. D. (2010). Kernel Discriminant Learning for Ordinal Rregression. IEEE Transactions on Knowledge and Data Engineering, 22(6), 906 - 910.
    Sun Z, F. G. I. p. (2012, 16-19 July 2012). Study on Parallel SVM Based on MapReduce. Paper presented at the International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, USA.
    Tang, H., & Qu, L. (2008). Fuzzy support vector machin with a new fuzzy membership function for pattern classification. Paper presented at the 2008 International Conference on Machine Learning and Cybernetics, New York, USA.
    Tang, J., & Shi, Y. (2009). New method of analog circuit fault diagnois using fuzzy support vector machine. Journal of Electronic Measurement and Instrument, 6, 7 - 12.
    Tikhonov, A. N., & Arsenin, V. Y. (1977). Solutions of Ill-posed Problems. (T. f. Russian, Trans.). Washinton, D. C.: V. H. Winston and Sons.
    Tsujinishi, D., & Abe, S. (2003). Fuzzy least squares support vector machines for multiclass problems. Neural Networks, 16, 785 - 792.
    Van Gestel, T., Baesens, B., Suykens, J. A. K., Van den Poel, D., Baestaens, D. E., & Willekens, M. (2006). Bayesian Kernel Based Classification for Financial Distress Detection. European Journal of Operational Research,, 172(3), 979-1003.
    Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. New York: Springer-Verlag.
    Vapnik, V. N. (1998). Statistical Learning Theory. New York: wiley.
    Wang, S. (2003). Adaptive non-parametric efficiency frontier analysis: A neural-network -based model. Computers and Operations Research 30(2), 17.
    Wang, Y., Wang, S., & Lai, K. K. (2005). A new fuzzy support vector machine to evaluate credit risk. IEEE Transactions on Fuzzy Systems, 13(6), 820 -831.
    Wismer, D., & Chattergy, R. (1978). Introduction to Nonlinear Optimization. North Holland: Amsterdam.
    Xu, Y., Wang, X.-B., Ding, J., Wu, L.-Y., & Deng, N.-Y. (2010). Lysine Acetylation Sites Prediction Using an Ensemble of Support Vector Machine Classifiers Journal of Theoretical Biology, 264(1), 130-135.
    Zipf, G. K. (1932). Studies and the Principle of Relative Frequency in Language. Cambridge: MIT Press.

    無法下載圖示 校內:2024-12-31公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE