| 研究生: |
顏佑勳 Yen, Yu-Hsun |
|---|---|
| 論文名稱: |
基於支援向量機方法的不確定資料二元分類研究 An Investigation of SVM-Based Methods for Binary Classification of Uncertain Data |
| 指導教授: |
林仁彥
Lin, Jen-Yen |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 123 |
| 中文關鍵詞: | 二元分類 、支援向量機 、穩健最佳化 、二階錐規劃 、資料不確定性 |
| 外文關鍵詞: | Binary Classification, Support Vector Machines, Robust Optimization, Second-Order Cone Programming, Data Uncertainty |
| 相關次數: | 點閱:37 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
分類問題為機器學習領域中的核心議題,其中二元分類因其應用廣泛而備受關注。支援向量機(Support Vector Machines)憑藉其凸二次規劃結構與良好的求解效率,成為經典的二元分類方法。在實務應用中,資料不確定性的問題越來越受關注,以反映實際環境中常見的測量誤差、資訊缺失或雜訊干擾,為因應此挑戰,現有的穩健支援向量機 (Robust Support Vector Machines) 研究多將支援向量機問題在考慮資料不確定性的前提下,轉換為等價的穩健二階錐支援向量機 (Second-Order Cone Support Vector Machines) 進行求解。與此不同,本研究採用極小化極大(Min-Max)策略來建構新型穩健支援向量機模型,並提出一種具效率的迭代式演算法。該演算法以限制條件的違反程度作為判斷依據,逐步篩選並納入關鍵資料點,形成具代表性的新訓練資料集,接著透過求解一連串由這些資料集所構成的凸二次規劃問題,最終得出具穩健性的最佳解。
本研究以 UCI 公開資料集進行模型效能評估。實驗結果顯示,所提方法在大多數資料集上能達成與二階錐支援向量機相當或更高的分類正確率,且在處理大規模資料時具備顯著的計算效率,顯示本演算法於實務應用上具有高度潛力。
Support Vector Machines (SVMs) are widely recognized as powerful tools for binary classification with certain data. However, real-world data often contains feature perturbations or noise, introducing uncertainty that can hinder model stability. To address this, uncertainty SVMs and robust optimization techniques have received growing attention. In particular, max-violator strategies, which iteratively select the most critical constraints during training, have been proposed to enhance robustness under adversarial or noisy conditions. A notable representative in this line of research is the Second-Order Cone SVMs (SOC-SVMs), which provide robustness through convex optimization techniques. Nevertheless, SOC-SVMs often suffer from poor computational efficiency when applied to large-scale datasets.
In this work, we propose a novel method that integrates the concept of the cutting-plane algorithm—a classical technique for efficiently solving convex optimization problems—with max-violator strategies to construct a new learning algorithm. This hybrid approach enables the model to selectively incorporate the most critical constraints, thereby reducing computational overhead while preserving theoretical convergence guarantees.
To simulate real-world uncertainty, we introduce synthetic feature perturbations during training. Empirical results across a diverse collection of benchmark datasets demonstrate that the proposed method not only achieves higher classification accuracy on test sets, but also exhibits substantial improvements in training efficiency, particularly in large-scale settings.
Alizadeh, F. and Goldfarb, D. (2003). Second-order cone programming. Mathematical programming, 95(1):3–51.
Asimit, A. V., Kyriakou, I., Santoni, S., Scognamiglio, S., and Zhu, R. (2022). Robust classification via support vector machines. Risks, 10(8):154.
Bayhan, H. A., Aslan Bayhan, S., Muhafız, E., and İzzet Can (2014). Repeatability of aberrometric measurements in normal and keratoconus eyes using a new scheimpflug–placido topographer. Journal of Cataract & Refractive Surgery, 40(2):269–275.
Ben-Tal, A., Ghaoui, L., and Nemirovski, A. (2009). Robust Optimization.
Ben-Tal, A. and Nemirovski, A. (1998). Robust convex optimization. Math. Oper. Res., 23:769–805.
Ben-Tal, A. and Nemirovski, A. (2000). Robust solutions of linear programming problems contaminated with uncertain data. Mathematical Programming, 88:411–424.
Bertsimas, D., Dunn, J., Pawlowski, C., and Zhuo, Y. D. (2019). Robust classification. INFORMS Journal on Optimization, 1(1):2–34.
Bhattacharyya, C. (2004). Robust classification of noisy data using second order cone programming approach. In International Conference on Intelligent Sensing and Information Processing, 2004. Proceedings of, pages 433–438.
Bhattacharyya, C., Grate, L., Jordan, M. I., Ghaoui, L. E., and Mian, I. S. (2004a). Robust sparse hyperplane classifiers: application to uncertain molecular profiling data. Journal of Computational Biology, 11(6):1073–1089.
Bhattacharyya, C., Shivaswamy, P., and Smola, A. (2004b). A second order cone programming formulation for classifying missing data. Advances in neural information processing systems, 17.
Bi, J. and Zhang, T. (2004). Support vector classification with input data uncertainty. In Proceedings of the 17th International Conference on Neural Information Processing Systems, NIPS’04, page 161–168, Cambridge, MA, USA. MIT Press.
Biggio, B., Nelson, B., and Laskov, P. (2011). Support vector machines under adversarial label noise. In Asian conference on machine learning, pages 97–112. PMLR.
Blackwell, L. (2020). Confidence intervals for local authority myes in england and wales, 2011 to 2019. Technical report, Office for National Statistics.
Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., and Lopez, A. (2020). A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing, 408:189–215.
Ceseracciu, E., Reggiani, M., Sawacha, Z., Sartori, M., Spolaor, F., Cobelli, C., and Pagello, E. (2010). Svm classification of locomotion modes using surface electromyography for applications in rehabilitation robotics. In 19th international symposium in robot and human interactive communication, pages 165–170. IEEE.
Chuong, T., Mak-Hau, V., Yearwood, J., Dazeley, R., Nguyen, m.-t., and Cao, T. (2022).
Robust pareto solutions for convex quadratic multiobjective optimization problems under data uncertainty. Annals of Operations Research, 319:1–32.
Cohen, J. (2013). Statistical power analysis for the behavioral sciences. routledge.
Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3):273–297.
De Cosmis, S., De Leone, R., Kropat, E., Meyer-Nieberg, S., and Pickl, S. (2013). Electric load forecasting using support vector machines for robust regression. In SpringSim (EAIA), page 9.
El Ghaoui, L., Lanckriet, G. R., and Natsoulis, G. (2003). Robust classification with interval data. Computer Science.
Franc, V. and Sonnenburg, S. (2008). Optimized cutting plane algorithm for support vector machines. In Proceedings of the 25th international conference on Machine learning, pages 320–327.
Goldfarb, D. and Iyengar, G. (2003). Robust convex quadratically constrained programs. Mathematical Programming, 97(3):495–515.
Gurobi Optimization Inc. (2015). Gurobi Optimizer Reference Manual. Beaverton, OR. Retrieved October 1, 2015, from http://www.gurobi.com.
Hariri, R. H., Fredericks, E. M., and Bowers, K. M. (2019). Uncertainty in big data analytics: survey, opportunities, and challenges. Journal of Big Data, 6(1):44.
Huang, D.-s. (1999). Radial basis probabilistic neural networks: Model and application. International Journal of Pattern Recognition and Artificial Intelligence, 13(07):1083–1101.
Huang, X., Shi, L., and Suykens, J. A. K. (2014). Support vector machine classifier with pinball loss. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5):984–997.
Jeyakumar, V. and Li, G. (2018). Exact second-order cone programming relaxations for some nonconvex minimax quadratic optimization problems. SIAM Journal on Optimization, 28(1):760–787.
Joachims, T. (2006). Training linear svms in linear time. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 217–226.
Kelley, Jr, J. E. (1960). The cutting-plane method for solving convex programs. Journal of the society for Industrial and Applied Mathematics, 8(4):703–712.
Kusnandar, N., Firdaus, H., Supono, I., Utomo, B., Kasiyanto, I., and Lailiyah, Q. (2024).
Bibliometric review of measurement uncertainty: Research classification and future tendencies. Measurement, 232:114636.
Li, J., Fong, S., Zhuang, Y., and Khoury, R. (2014). Hierarchical classification in text mining for sentiment analysis. In 2014 International Conference on Soft Computing and Machine Intelligence, pages 46–51.
Li, S. (2010). Concise formulas for the area and volume of a hyperspherical cap. Asian Journal of Mathematics & Statistics, 4(1):66–70.
Lin, C.-F. and Wang, S.-D. (2002). Fuzzy support vector machines. IEEE transactions on neural networks, 13(2):464–471.
Liu, D., Li, T., and Liang, D. (2014). Incorporating logistic regression to decision-theoretic rough sets for classifications. International Journal of Approximate Reasoning, 55(1, Part 2):197–210. Special issue on Decision-Theoretic Rough Sets.
Liu, L. and Zsu, M. T. (2009). Encyclopedia of Database Systems. Springer Publishing Company, Incorporated, 1st edition.
Lobo, M. S., Vandenberghe, L., Boyd, S., and Lebret, H. (1998). Applications of secondorder cone programming. Linear Algebra and its Applications, 284(1):193–228. International Linear Algebra Society (ILAS) Symposium on Fast Algorithms for Control, Signals and Image Processing.
Maggioni, F., Potra, F. A., Bertocchi, M., and Allevi, E. (2009). Stochastic second-order cone programming in mobile ad hoc networks. Journal of optimization theory and applications, 143(2):309–328.
Manski, C. F. (2015). Communicating uncertainty in official economic statistics: An appraisal fifty years after morgenstern. Journal of Economic Literature, 53(3):631–653.
Manski, C. F. (2018). Communicating uncertainty in policy analysis. Proceedings of the National Academy of Sciences, 115(8):1763–1770.
Natarajan, N., Dhillon, I. S., Ravikumar, P., and Tewari, A. (2013). Learning with noisy labels. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 1, NIPS’13, page 1196–1204, Red Hook, NY, USA. Curran Associates Inc.
O’Brien, E., Petrie, J., Littler, W., De Swiet, M., Padfield, P. L., Altman, D., Bland, M., Coats, A., Atkins, N., et al. (1993). The british hypertension society protocol for the evaluation of blood pressure measuring devices. J hypertens, 11(Suppl 2):543–562.
Palhalmi, J. (2020). The effect of expanded measurement uncertainty on the outcome of blood pressure measurement validation protocol based on the iso81060-2:2018 guideline- a monte carlo simulation approach. In 2020 Computing in Cardiology, pages 1–4.
Pant, R., Trafalis, T. B., and Barker, K. (2011). Support vector machine classification of uncertain and imbalanced data using robust optimization. In Proceedings of the 15th WSEAS International Conference on Computers, page 369–374, Stevens Point, Wisconsin, USA. World Scientific and Engineering Academy and Society (WSEAS).
Pellegrini., M., De Leone., R., Maponi., P., and Ferretti., M. (2013). Reducing power consumption in hydrometric level sensor networks using support vector machines. In Proceedings of the 3rd International Conference on Pervasive Embedded Computing and Communication Systems - PECCS, pages 229–232. INSTICC, SciTePress.
Pellegrini, M., De Leone, R., Maponi, P., and Rossi, C. (2012). Adaptive sampling for embedded software systems using svm: Application to water level sensors. In CTW, pages 210–214.
Refaeilzadeh, P., Tang, L., and Liu, H. (2009). Cross-validation. In LIU, L. and ÖZSU, M. T., editors, Encyclopedia of Database Systems, pages 532–538, Boston, MA. Springer US.
Roobaert, D. (2002). Directsvm: A simple support vector machine perceptron. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology, 32:147–156.
Safavian, S. and Landgrebe, D. (1991). A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics, 21(3):660–674.
Sahami, M., Dumais, S., Heckerman, D., and Horvitz, E. (1998). A bayesian approach to filtering junk e-mail. In Learning for Text Categorization: Papers from the 1998 workshop, volume 62, pages 98–105. Citeseer.
Schölkopf, B., Smola, A. J., Williamson, R. C., and Bartlett, P. L. (2000). New support vector algorithms. Neural computation, 12(5):1207–1245.
Sen, S. and Sherali, H. D. (1985). On the convergence of cutting plane algorithms for a class of nonconvex mathematical programs. Mathematical Programming, 31:42–56.
Shen, X., Niu, L., Qi, Z., and Tian, Y. (2017). Support vector machine classifier with truncated pinball loss. Pattern Recognition, 68:199–210.
Shivaswamy, P. K., Bhattacharyya, C., and Smola, A. J. (2006). Second order cone programming approaches for handling missing and uncertain data. Journal of Machine Learning Research, pages 1283–1314.
Soyster, A. L. (1973). Convex programming with set-inclusive constraints and applications to inexact linear programming. Operations research, 21(5):1154–1157.
Sözüer, S. and Thiele, A. C. (2016). The state of robust optimization. Robustness analysis in decision aiding, optimization, and analytics, pages 89–112.
Student (1908). The probable error of a mean. Biometrika, pages 1–25.
Taraji, S., Atici, S. F., Viana, G., Kusnoto, B., Allareddy, V. S., Miloro, M., and Elnagar, M. H. (2023). Novel machine learning algorithms for prediction of treatment decisions in adult patients with class iii malocclusion. Journal of Oral and Maxillofacial Surgery, 81(11):1391–1402.
Trafalis, T. and Park, J. (2006). Uncertainty and sensitivity analysis issues in support vector machines. WSEAS Transactions on Systems, 5:226–231.
Trafalis, T. B. and Gilbert, R. C. (2006). Robust classification and regression using support vector machines. European Journal of Operational Research, 173(3):893–909.
Trafalis, T. B. and Gilbert, R. C. (2007). Robust support vector machines for classification and computational issues. Optimisation Methods and Software, 22(1):187–198.
Valkenborg, D., Rousseau, A.-J., Geubbelmans, M., and Burzykowski, T. (2023). Support vector machines. American Journal of Orthodontics and Dentofacial Orthopedics, 164(5):754–757.
van der Bles, A. M., van der Linden, S., Freeman, A., Mitchell, J., Galvao, A. B., Zaval, L., and Spiegelhalter, D. (2019). Communicating uncertainty about facts, numbers and science. Royal Society Open Science, 6(1):1–42.
Vapnik, V. N. (1982). Estimation of Dependences Based on Empirical Data. Springer. Reprint ed. 2006.
Vishwanathan, S. and Murty, M. N. (2002). Ssvm: a simple svm algorithm. In Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN’02 (Cat. No. 02CH37290), volume 3, pages 2393–2398. IEEE.
Wu, Y. and Liu, Y. (2007). Robust truncated hinge loss support vector machines. Journal of the American Statistical Association, 102(479):974–983.
Xanthopoulos, P., Guarracino, M. R., and Pardalos, P. M. (2014). Robust generalized eigenvalue classifier with ellipsoidal uncertainty. Annals of Operations Research, 216:327–342.
Xu, H., Caramanis, C., and Mannor, S. (2009). Robustness and regularization of support vector machines. Journal of Machine Learning Research, 10(51):1485–1510.
Xu, L., Crammer, K., and Schuurmans, D. (2006). Robust support vector machine training via convex outlier ablation. In Proceedings of the 21st National Conference on Artificial Intelligence - Volume 1, AAAI’06, page 536–542. AAAI Press.
Yang, J. and Gunn, S. (2007). Exploiting uncertain data in support vector classification. In Apolloni, B., Howlett, R. J., and Jain, L., editors, Knowledge-Based Intelligent Information and Engineering Systems, pages 148–155, Berlin, Heidelberg. Springer Berlin Heidelberg.
Zhang, H., Sun, X., and Li, G. (2023). On second-order conic programming duals for robust convex quadratic optimization problems. Journal of Industrial and Management Optimization, 19(11):8114–8128.
Zhong, W. and Du, L. (2023). Predicting traffic casualties using support vector machines with heuristic algorithms: A study based on collision data of urban roads. Sustainability, 15(4):2944.
Zhou, L., Wang, L., Liu, L., Ogunbona, P., and Shen, D. (2014). Support vector machines for neuroimage analysis: interpretation from discrimination. Support Vector Machines Applications, pages 191–220.
校內:2030-07-24公開