| 研究生: |
陶國珍 Tao, Kuo-Chen |
|---|---|
| 論文名稱: |
建構單調性SVM模型之研究 Constructing a Monotonicity Support Vector Machines Model |
| 指導教授: |
李昇暾
Li, Sheng-Tun |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2010 |
| 畢業學年度: | 98 |
| 語文別: | 英文 |
| 論文頁數: | 45 |
| 中文關鍵詞: | 分類問題 、單調性 、支援向量機 |
| 外文關鍵詞: | classification problems, monotonicity constraints, SVM |
| 相關次數: | 點閱:66 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
資料探勘技術讓我們從龐大的資料庫中發現隱藏的知識。而分類問題是資料探勘技術應用中最常見的一種,也是近年來相當熱門的研究。支援向量機(support vector machine;SVM)是一種較新的分類方法,以統計學習理論為基礎,充分實現風險最小化原則。SVM原始被設計用於線性可分割的問題,並希望在兩個類別中尋找一個擁有最大邊界值的超平面。若要處理實務上常見的非線性問題,就要將原始的問題轉換到高維度的特徵空間。由於SVM的分類效能極佳,因此近年來在處理分類問題上被廣泛的應用。
在分類預測問題中,單調性是一種極為常見的事前知識表示形式,實務上不論是經濟、財務或醫學…等問題,經常都會假設預測函數是否滿足單調性性質。例如在經濟學上,房屋估價時,室內坪數越大,價格預估越高;屋址距離市中心越遠,估價越低。近年來,許多研究都在進行預測時考慮單調性性質,但大多數的研究是在資料前置處理時處理單調性限制,因此,建構一個具單調性限制,可正確分類單調性資料的模型是本研究的方向。
本研究結合單調性性質與SVM分類器,透過原始SVM在求解二次規劃問題的過程中增加單調性限制式的方式,建構一單調性SVM模型。使分類器在訓練過程即具有辨別單調性資料之能力,進而正確分類具單調性特質的資料。
本研究使用加州大學資料庫的乳癌資料集中的wpbc資料,以及台南市某婦產專科診所試管嬰兒( in-vitro fertilization;IVF)的臨床病歷進行驗證。實驗結果顯示,本研究所提出的單調性SVM模型在分類單調性資料的分類效能上有較佳的表現,比原始的SVM模型更能正確分類具單調性性質的資料。
Data mining techniques let us discover hidden knowledge from huge databases, and classification is one type of the data mining techniques. In the recently researches, the classification problems are one of the most popular topics.
The SVM is a principle and powerful method that has great performance in a wide variety of applications. The original idea of SVM is to use a linear separating hyperplane to create a classifier. It gets the non-linearly input vectors mapped to a higher dimension feature space and it will easily construct the hyperplane which ensures high generalization ability to classify new objects.
In some predictive problems, the predictive model should satisfy the monotonicity property. Recently, some researches use the SVM classifier to classify the data which has the monotonicity property. But most of them deal with the monotonicity property during the data preprocessing step.
In this study, we project a monotonicity SVM model by adding the monotonicity constraints into the original Support Vector Machine (SVM), and turn this prime problem into the dual space. Then we calculate the optimization solutions to classify the monotonicity data.
In the experiment, we use wpbc and IVF datasets to prove our SVM monotonicity model. As the results show, our model can classify the monotonicity data more accuracy than the original SVM.
[1] Academic, K. (2005). Data mining: Practical machine learning tools and techniques.
[2] Archer, N. P., & Wang, S. (1993). "Application of the back propagation neural network algorithm with monotonicity constraints for two-group classification problems." Decision Sciences 24(1): 16.
[3] Bennett, K. P., & Campbell, C. (2000). "Support vector machines: Hype or hallelujah?" ACM SIGKDD Explorations Newsletter: 13.
[4] Cristianini N., S.-T. J. (2000). An Introduction to Support Vector Machines and other Kernel-based Learning Methods. Cambridge University Press.
[5] Davis, S. B., J. (1994). The Monster Under The Bed. Simon & Schuster.
[6] Dembczynski, K., Kotlowski, W., & Slowinski, R.(2008). "Ensemble of decision rules for ordinal classification with monotonicity constraints." Rough Sets and Knowledge Technology 5009: 8.
[7] Duivesteijn, W., & Feelders, A. (2008). "Nearest Neighbour Classification with Monotonicity Constraints." ECML PKDD: 16.
[8] Jonathan Milgram, M. C., Robert Sabourin (2006). “One Against One” or “One Against All”: Which One is Better for Handwriting Recognition with SVMs? IWFHR: 6.
[9] Li, S. T., Shiue, W., & Huang, M. H. (2006). "The evaluation of consumer loans using support vector machines." Expert Systems with Application 30(4): 11.
[10] Maglitta, J. (1996). Smarten Up! Computerworld. 29: 23.
[11] Michael Doumpos, C. Z. (2009). "monotonic support vector machines for credit risk rating." New Mathematics and Natural Computation 5(3): 14.
[12] Michalis Vazirgiannis, M. H., Dimitrios Gunopulos (2003). Uncertainty Handling and Quality Assessment in Data Mining. London.
[13] Mohammed Javeed Zaki, C.-T. H., Rakesh Agrawal (1999). "Parallel Classification for Data Mining on Shared-Memory Multiprocessors." ICDE: 8.
[14] Pazzani, M. J., Mani, S., & Shankle, W. R. (2001). Acceptance of rules generated by machine learning among medical experts. . Methods of Information in
Medicine. 40: 380-385.
[15] Quah, J. K. H. (2000). "The monotonicity of individual and market demand." Econometrica, 48(4): 20.
[16] Quinlan, J. R. (1986). "Induction on Decision Trees." Machine Learning 1: 26.
[17] Quinlan, J. R. (1993). Morgan Kaufmann Series in Machine Learning, Kluwer Academic.
[18] Rob Potharst , A. J. F. (2002). "Classification trees for problems with monotonicity constraints." SIGKDD Explorations 4: 11.
[19] V. Vapnik. (1995). The nature of statistical learning theory., Springer.
[20] Wang, S. (2003). "Adaptive non-parametric efficiency frontier analysis: A neural-network -based model." Computers and Operations Research 30(2): 17.
[21] Wu, D., Yang, Z., & Liang, L. (2006). "Using DEA-neural network approach to evaluate branch efficiency of a large canadian bank." Expert Systems with Applications 31(1): 18.