| 研究生: |
湯穎奇 Tang, Ying-Chi |
|---|---|
| 論文名稱: |
應用K-means分群演算法於選取模式樹節點屬性之研究 |
| 指導教授: |
翁慈宗
Wong, Tzu-Tsung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2005 |
| 畢業學年度: | 93 |
| 語文別: | 中文 |
| 論文頁數: | 41 |
| 中文關鍵詞: | 數值預測 、模式樹 、K-means分群法 |
| 外文關鍵詞: | model tree, K-means clustering, numeric prediction |
| 相關次數: | 點閱:97 下載:10 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
模式樹的樹狀結構與決策樹類似,不同之處在於每個葉部節點是存放著一條線性歸式,以用來預測類別值。在處理數值預測的問題時,模式樹是一種不錯的方法,GUIDE為近年來在效率和精確度上都表現不錯的模式樹演算法,其主要是採用統計裡的卡方檢定來挑選節點屬性,而本研究提出的KCMT,是利用K-means分群法將資料點分群後,依據屬性的區隔能力來挑選節點屬性,並且提出三種不同找尋屬性分割點的方法,為2個標準差法、MSE法、歸類正確率法,分別記為KCMT(2sd)、KCMT(mse) 和KCMT(c)。從本研究的測試結果來看,利用分群資訊來找尋屬性分割點的方式會具有較好的學習結果,至於未修剪前的樹狀結構大小方面,KCMT大多比GUIDE來得小,而修剪後的樹狀結構大小反倒是GUIDE來得小,由預測精確度來看,利用歸類正確率法來找尋屬性分割點的方式在各資料檔中的平均表現是較佳的,而KCMT(c)與GUIDE的表現是優劣參半、不分軒輊,因此整體而言,KCMT(c)的學習成效略優於GUIDE,使得KCMT(c)的表現和GUIDE相比是具有競爭力的。此外,KCMT是採用和M5相同的處理名目型屬性的做法,由大多資料檔的測試結果顯示,當由名目屬性轉換成的二元屬性個數不是很多時,這種處理名目屬性的方式對於KCMT的學習結果並沒有太大的影響。
none
陳子立,“結合特徵選取與判定係數以建構模式樹之方法”,碩士論文,國立成功大學工業管理科學研究所,民國92年。
Alexander, W.P. and Grimshaw, S.D. (1996). Treed regression, Journal of Computational and Graphical Statistics, 5, 156-175.
Bandyopadhyay, S. and Maulik, U. (2002). An evolutionary technique based on K-Means algorithm for optimal clustering in RN, Information Sciences, 146, 221-237.
Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. (1984). Classification and Regression Trees, Belmont, CA:Wadsworth International Group.
Chaudhuri, P., Huang, M.C., Loh, W.Y., and Rubin, R. (1994). Piecewise-polynomial regression trees, Statistica Sinica, 4, 143-167.
Dempster, A., Laird, N., and Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, B 39, 1-38.
Dobra, A. and Gehrke. J.E., (2002). SECRET: A Scalable Linear Regression Tree Algorithm, Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 481-487.
Karalic, A. (1992). Employing linear regression in regression tree leaves, Proceeding of the 10th European Conference on Artificial Intelligence, 440-441.
Li, K.C., Lue, H.H., and Chen, C.H. (2000). Interactive tree-structured regression via principal Hessian directions. Journal of the American Statistical Association, 95, 547-560.
Loh, W.Y. (2002). Regression trees with unbiased variable selection and interaction detection. Statistica Sinica, 12, 361-386.
Quinlan, J.R. (1992). Learning with continuous classes, Proceedings of the Australian Joint Conference on Artificial Intelligence, 343-348.
Quinlan, J.R. (1993). Combining instance-based and model-based learning, Proceedings of the 10th International Conference on Machine Learning, 236-243.
Selim, S.Z. and Ismail, M.A. (1984). K-means-type algorithms: A generalized convergence theorem and characterization of local optimality, IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(1), 81-87.
Torgo, L. (1997). Functional models for regression tree leaves, Proceedings of the 14th International Conference on Machine Learning, 385-393.
Tou, J.T. and Gonzalez, R.C. (1974). Pattern Recognition Principles, Addison-Wesley, Reading, MA.
Wang, Y., and Witten, I.H. (1997). Inducing model trees for continuous classes, Proceedings of poster papers of the 9th European Conference on Machine Learning.