簡易檢索 / 詳目顯示

研究生: 劉梓揚
Liu, Zi-Yang
論文名稱: 多維度區別變數之區別函數的建立
The Search of Discriminant Function for Multi-dimensional Discriminant Variables
指導教授: 詹世煌
Chan, Shih-Huang
學位類別: 碩士
Master
系所名稱: 管理學院 - 統計學系
Department of Statistics
論文出版年: 2016
畢業學年度: 104
語文別: 中文
論文頁數: 52
中文關鍵詞: 分段線性曲線下面積區別分析
外文關鍵詞: AUC, discriminant analysis, piecewise linear, orthogonal distance
相關次數: 點閱:55下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在各領域中,從已發現的風險因子或感興趣因子中,若適當地應用區別函數,可判讀如某種疾病病患是否可能復發或未復發,或某項產品消費者是否可能購買或不購買。常見的區別函數方法,如線性區別函數、二次區別函數為區別參數方法;但無母數方法,如最近鄰居法、Hechenbichler & Schliep (2004)之加權最近鄰居法、Duong & Hazelton (2005)之核密度估計法等,雖有區別規則,卻未有特定的實質函數形式。本文試圖利用無母數方法了解區別函數的可能行為,進而以參數化的方式估計此區別函數,使得在實際操作上更加便利。
    本文估計線性區別函數的方法,有別於迴歸模型的設計,取損失函數為資料點到配適函數之投影距離和最小來求取直線的參數估計。二次曲線及曲面上之作法,則為利用Taubin (1991)之Taubin配適法,來估計曲線及曲面參數。當區別邊界為非傳統型之不規則型態時,係利用「連續函數行為皆可用分段直線來近似」的概念,發展分段線性模型,以資料點到配適函數之投影距離和最小,來求取此分段線性模型的參數估計。
    為了解本文所建議之建立區別函數方法的表現,我們模擬了不規則非常態資料,利用無母數區別方法了解資料特性及區別函數的可能行為,進而對其區別規則以參數化函數近似。評估區別方法則利用訓練資料集及交叉驗證檢驗法來估計錯誤分類率,並藉由曲線下面積(Area Under Curve, AUC) 來論斷各區別方法之表現。本文所提出之以參數方法近似無母數方法所得的區別函數,其在分類上較在最佳狀態為線性區別或二次區別函數時的表現(在較高的AUC值及較低的錯誤分類率上)並未有大的差異,而相對於模擬較複雜資料型態上以無母數區別法來分類時,本文所引入的近似方法亦有不錯的績效。

    Discriminant analysis (DA) has been widely applied to biometric field. Discriminating whether patients with some risk factors to recur one disease is an example. After meaningful risk factors are assured, people use them to construct a specific function or criterion so that patients with some characteristic can be classified to different outcome groups. The focus of the thesis would be mainly on the selection of bivariate and trivariate discriminant function. In conventional parametric discriminant analysis, the functional form of classification rule can be obtained. However, when it comes to non-parametric discriminant analysis, there are no specific functional forms. Although nonparametric approaches allow no population assumption and are more flexible in depicting the structure of data, the classification rules can’t be applied easily and at practically.
    The purpose of this thesis is to capture the structure of the data with non-parametric methods,and to approximate the non-parametric discriminant curve with a specific functional form so that the parametric function can be applied easily in practice. Piecewise linear model with the minimization of orthogonal distances is suggested to estimate the decision boundary of nonparametric discriminant rule. Leave-one-out cross validation and area under ROC curve (AUC) are used to evaluate the performance for different discriminant methods. When population information is available, the method proposed in the thesis, is non-inferior to parametric methods and has better performance if no such information about parametric setting is available. The method we suggest basically is useful in diagnostic discrimination and is competitive with the nonparametric discriminant rules.

    摘要.....................................i 英文延伸摘要.............................ii 誌謝..................................viii 目錄....................................ix 表目錄...................................x 圖目錄..................................xi 第一章 緒論.............................1 第二章 文獻回顧.........................2 第一節 參數區別規則...................2 第一小節 線性區別函數...............2 第二小節 二次區別函數...............2 第三小節 羅吉斯迴歸.................3 第二節 無母數區別規則.................4 第一小節 核密度估計.................4 第二小節 最近鄰居法.................5 第三小節 加權最近鄰居法.............6 第三章 以參數區別函數近似無母數區別規則....7 第一節 無母數區別規則下之區別邊界......7 第一小節 Taubin 配適法.............8 第二小節 投影距離和分段線性法........9 第二節 評估各區別規則效果之方法.......13 第四章 模擬............................15 第一節 二元常態分配..................15 第一小節 共變異數矩陣相等,Σ_1=Σ_2...15 第二小節 共變異數矩陣不等, Σ_1≠Σ_2..19 第二節 三元常態分配..................23 第一小節 共變異數矩陣相等,Σ_1=Σ_2...23 第二小節 共變異數矩陣不等, Σ_1≠Σ_2..28 第三節 非常態資料之模擬 33 第五章 結論與未來工作 46 參考文獻 47 附錄A 48

    1.Duong, T. & Hazelton, M. L. (2005). Convergence rates for unconstrained bandwidth matrix selectors in multivariate kernel density estimation. Journal of Multivariate Analysis, 93(2), 417-433.
    2.Duong, T. (2007). Kernel density estimation and kernel discriminant analysis for multivariate data in R. Journal of Statistical Software, 21(7), 1-16.
    3.Hechenbichler, K. & Schliep, K. (2004). Weighted k-nearest-neighbor techniques and ordinal classification.
    4.Markovsky, I. & Van Huffel, S. (2007). Overview of total least-squares methods. Signal processing, 87(10), 2283-2302.
    5.Taubin, G. (1991). Estimation of planar curves, surfaces, and nonplanar space curves defined by implicit equations with applications to edge and range image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(11), 1115-1138.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE