簡易檢索 / 詳目顯示

研究生: 陳盈秀
Chen, Ying-Hsiu
論文名稱: SVM類神經網路於單調性資料探勘之研究
A Study of SVM on Data Mining with Monotonicity Constraints
指導教授: 李昇暾
Li, Sheng-Tun
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業與資訊管理學系碩士在職專班
Department of Industrial and Information Management (on the job class)
論文出版年: 2009
畢業學年度: 97
語文別: 中文
論文頁數: 53
中文關鍵詞: 資料探勘分類問題單調性支援向量機標籤修正
外文關鍵詞: Data mining, Classification, Monotonocity, SVM, Relabeling instances
相關次數: 點閱:71下載:9
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   資料探勘的技術,能讓我們從大量資料中,找出隱含或未被發現的模式,進而萃取出有用的知識。分類問題是目前資料探勘常被討論的主題之一,根據已知分類結果與其特徵屬性值之訓練點,利用探勘技術建立資料分類的模型,依此模型判定未知類別之新進資料的所屬分類。
      支援向量機 ( support vector machine;SVM ) 是一種較新的分類方法,以統計學習理論為基礎,並充分實現風險最小化原則,近年來在處理分類問題上亦被廣泛應用。由於 SVM 是根據訓練實例來建構分類模型,大量歷史數據中,若存在相互矛盾衝突的訓練資料,就會使得 SVM 無法學習到正確的模式,導致分類正確率下降,進而造成分類錯誤成本的增加。因此,如何在資料量與資料複雜度增加的狀況下,避免訓練集衝突狀況的情形發生,使得分類器顯著地提升正確率及效能,是本研究的方向。
      本研究將單調性的概念應用在改善分類器績效的目標上,對於可能存在衝突資料的訓練集,提出一種基於「避免訓練資料違反單調性」的方法。借助專家智慧經驗,擷取資料集中輸入變數與輸出結果的單調性規則,接著進行資料的前置處理,針對資料集中違反單調性的資料,利用標籤修正 ( relabeling ) 處理程序進行修正,最後 SVM 採用修正後的數據來完成學習任務。經實驗證實,以具有單調性限制的改良資料集進行分類器模型訓練時,確實能有效地增加分類器的預測成效。

      Data mining techniques enable us to discover hidden patterns and extract valuable knowledge from huge databases. A central problem in data mining is how to assign data examples to one of several predefined classes and infer some key patterns. This kind of problem in data mining is referred to as the classification problem. An algorithm designed to solve this type of problem is consequently called classification algorithm, or classifier. With the help of a classifier, analysts can predict the class of a new instance on the basis of some of its attributes.
      SVM is the state-of-the-art neural network technology based on statistical learning. By exploiting the mechanics of structural risk minimization, it can handle the overfitting problem quite well. Therefore, SVM have been widely applied to many kinds of fields in the past few years. Since SVM classifier is established by training examples, so if contradictory exists in volumes of historical data then we cannot learn the correct pattern from them. As a result, the accuracy of the classifier will decrease and the costs of misclassification increases. This research is focused on finding the way to enhance the predictive performance and avoid the conflict of training dataset while data was enlarged with volume and complexity.
      In this study, we offer a new approach which deals with resolving the possible conflict in the training dataset by using the monotonic constraints. Monotonicity of relations between a response variable and predictor variable is a form of prior knowledge that can often be elicited reliably from domain experts. To be aimed with the training instance which violates monotonic constraints, we proceed a relabeling of the data point in order to make the training dataset monotone. We have shown that the use of the new relabeled dataset within monotonic constraints can give substantial improvements in predictive performance over the original dataset via SVM classifier.

    摘要---------------------------------------------------------- I ABSTRACT----------------------------------------------------- II 致謝--------------------------------------------------------- IV 目錄---------------------------------------------------------- V 表目錄------------------------------------------------------ VII 圖目錄------------------------------------------------------VIII 第一章 緒論----------------------------------------------------1  1.1 研究背景及動機-------------------------------------------1  1.2 研究目的-------------------------------------------------3  1.3 研究範圍與限制------------------------------------------ 3  1.4 論文架構------------------------------------------------ 3 第二章 文獻探討----------------------------------------------- 6  2.1 單調性限制---------------------------------------------- 6  2.2 分類法與單調性結合之相關研究---------------------------- 8   2.2.1 k最近鄰居法----------------------------------------- 9   2.2.2決策樹-----------------------------------------------10   2.2.3類神經網路-------------------------------------------12   2.2.4同向分割法-------------------------------------------14  2.3 支援向量機----------------------------------------------15 第三章 研究方法-----------------------------------------------19  3.1 資料前置處理--------------------------------------------20   3.1.1 資料正規化------------------------------------------20   3.1.2 標籤修正--------------------------------------------21  3.2 SVM分類器-----------------------------------------------24 第四章 實驗結果與分析-----------------------------------------28  4.1 實驗環境、實驗資料--------------------------------------28  4.2 實驗步驟------------------------------------------------30   4.2.1 實驗一:具單調性限制之資料前置處理------------------30   4.2.2 實驗二:具單調性限制之模型強化訓練------------------33  4.3 實驗評估方法--------------------------------------------36  4.4 實驗結果分析--------------------------------------------37   4.4.1 實驗一:具單調性限制之資料前置處理 -- 結果比較------38   4.4.2 實驗二:具單調性限制之模型強化訓練 -- 結果比較------43 第五章 結論與未來研究方向------------------------------------ 49  5.1 結論--------------------------------------------------- 49  5.2 未來研究方向------------------------------------------- 50 參考文獻----------------------------------------------------- 51

    [1] 石文俐。中文語音合成之韻律產生器的改良與研究。國立清華大學資訊工程學系碩士論文,2005年。
     
    [2] 林昕潔。以SVM 與詮釋資料設計書籍分類系統。國立交通大學資訊科學與工程研究所碩士論文,2005年。
     
    [3] Archer, N. P., & Wang, S. Application of the back propagation neural network algorithm with monotonicity constraints for two-group classification problems. Decision Sciences, 24(1), 60-75, 1993.
     
    [4] Burbidge, R., Trotter, M., Buxton, B., & Holden, S. Drug design by machine learning: support vector machines for pharmaceutical data analysis. Computers and Chemistry, 26(1), 5-14, 2001.
     
    [5] Chandrasekaran, R., Ryu, Y. U., Jacob, V. S., & Hong, S. Isotonic separation. INFORMS Journal on Computing, 17(4), 462-474, 2005.
     
    [6] Chang, C. C., & Lin, C. J. LIBSVM: A library for support vector machines, 2001.
    Software Available At
    http://www.csie.ntu.edu.tw/~cjlin/libsvm
     
    [7] Cho, S. B., & Won, H. H. Machine learning in DNA microarray analysis for cancer classification. Proceedings of the 2003 First Asia-Pacific bioinformatics conference on Bioinformatics, 19, 189-198, 2003.
     
    [8] Cortes, C., & Vapnik, V. Support-vector networks. Machine Learning, 20, 273-297, 1995.
     
    [9] Dembczynski, K., Kotlowski, W., & Slowinski, R. Ensemble of decision rules for ordinal classification with monotonicity constraints. Rough Sets and Knowledge Technology, 5009, 260-267, 2008.
     
    [10] Duivesteijn, W., & Feelders, A. Nearest neighbour classification with monotonicity constraints. Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases, 301-316, 2008.
     
    [11] Furey, T. S., Cristianini, N., Duffy, N., Bednarski, D. W., Schummer, M., & Haussler, D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16(10), 906-914, 2000.
     
    [12] Hsu, C. W., Chang, C. C., & Lin, C. J. A practical guide to support vector classification, 2003. Available at
    http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
     
    [13] Hsu, C. W., & Lin, C. J. A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Network, 13(2), 415-425, 2002.
     
    [14] Lee, Y., & Lee, C. K. Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatic, 19(9), 1132-1139, 2003.
     
    [15] Li, S. T., Shiue, W., & Huang, M. H. The evaluation of consumer loans using support vector machines. Expert Systems with Application, 30(4), 772-782, 2006.
     
    [16] Naisbitt, J. Megatrends: Ten new directions transforming our lives. Warner Books, 1988.
     
    [17] Pazzani, M. J., Mani, S., & Shankle, W. R. Acceptance of rules generated by machine learning among medical experts. Methods of Information in Medicine, 40, 380-385, 2001.
     
    [18] Pendharkar, P. C. A data envelopment analysis-based approach for data preprocessing. IEEE Transactions on Knowledge and Data Engineering, 17(10), 1379-1388, 2005.
     
    [19] Pendharkar, P. C., & Rodger, J. A. Technical efficiency-based selection of learning cases to improve forecasting accuracy of neural networks under monotonicity assumption. Decision Support Systems, 36(1), 117-136, 2003.
     
    [20] Platt, J. C., Cristianini, N., & Taylor, J. S. Large margin dags for multiclass classification. Advances in Neural Information Processing Systems, 547–553, 2000.
     
    [21] Potharst, R., & Feelders, A. J. Classification trees for problems with monotonicity constraints. ACM SIGKDD Explorations Newsletter, 4(1), 1-10, 2002.
     
    [22] Quah, J. K. H. The monotonicity of individual and market demand. Econometrica, 68(4), 911-930, 2000.
     
    [23] Rademaker, M., Baets, D. B., & Meyer, D. H. On the role of maximal independent sets in cleaning data for supervised ranking. Proceedings of the 2006 IEEE International Conference on Fuzzy Systems, 1619–1624, 2006.
     
    [24] Ryu, Y. U., Chandrasekaran, R., & Jacob, V. S. Breast cancer prediction using the isotonic separation technique. European Journal of Operational, 181(2), 842–854, 2007.
     
    [25] Tay, F. E. H., & Cao, L. Application of support vector machines in financial time series forecasting. Omega, 29, 309–317, 2001.
     
    [26] Vapnik, V. Statistical learning theory. Wiley, 1998.
     
    [27] Vapnik, V. The nature of statistical learning theory. Springer, 1995.
     
    [28] Wang, S. Adaptive non-parametric efficiency frontier analysis: A neural-network -based model. Computers and Operations Research, 30(2), 279-295, 2003.
     
    [29] Wang, S. The unpredictability of standard back propagation neural networks in classification applications. Management Science, 41(3), 555-559, 1995.
     
    [30] Witten, I. H., & Frank, E. Data mining: Practical machine learning tools and techniques, second edition. Morgan Kaufmann, 2005.
     
    [31] Wu, D., Yang, Z., & Liang, L. Using DEA-neural network approach to evaluate branch efficiency of a large canadian bank. Expert Systems with Applications, 31(1), 108-115, 2006.

    下載圖示 校內:2014-06-18公開
    校外:2014-06-18公開
    QR CODE