簡易檢索 / 詳目顯示

研究生: 侯閎鏵
Hou, Hong-Hua
論文名稱: 有限混合多變量斜線設限迴歸模型之最大概似估計
Maximum-Likelihood Estimation for Finite Mixtures of Multivariate-Slash Censored Regression Models
指導教授: 王婉倫
Wang, Wan-Lun
學位類別: 碩士
Master
系所名稱: 管理學院 - 統計學系
Department of Statistics
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 93
中文關鍵詞: 多變量斜線模型檢測限制有限混合模型MCECM演算法截切多變量斜線模型蒙地卡羅方法
外文關鍵詞: Censored multivariate linear regression model, Detection limit, Finite mixture model, MCECM algorithm, Truncated multivariate slash distribution, MCMC method
相關次數: 點閱:62下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 數據經常包含異常觀測值與因定量分析的檢測限制生成的設限觀察值。本文提出了多變量斜線設限迴歸(MSLCR) 模型穩健的對含有這類現象的資料進行建模。考慮到異質多變量數據的分群問題,特別是在存在異常值和設限的情況下,我們進一步延伸了有限混合多變量斜設限迴歸(FM-MSLCR) 模型,這是一種將有限混合模型與MSLCR 模型概念上簡單融合的模型。為了對模型參數進行最大概似估計,我們提供了基於蒙特卡羅基礎的期望條件最大化(MCECME) 演算法,其中Metropolis-Hastings(MH) 演算法被用來計算MSLCR 和FM-MSLCR 模型中的潛在變數的條件動差。並且使用一般化信息矩陣法來近似參數估計量的漸進標準誤。模擬結果顯示,所提出的模型和方法在MCECME 演算法的收斂行為、模型選擇、有限樣本的參數估計特性和分類性能方面具有優越性。最後,我們通過分析最後,我們通過分析「B-horizon 的地球化學元素含量」、「加州洋流系統中的溶解汞形態」來闡述所建構的模型方法之正確性與實用性。

    Data may frequently contain atypical observations and censored measurements due to experimental detection limitations of the quantification assay. The thesis presents the multivariate-slash censored regression (MSLCR) model for robust modeling of data with such phenomena. Considering the problem of clustering heterogeneous multivariate data with outliers and censoring, we further propose the finite mixture of multivariate-slash censored regression (FM-MSLCR) model, which is conceptually a straightforward fusion of the finite mixture model and the MSLCR model. And, to perform maximum likelihood estimation of the model parameters, the Monte-Carlo based expectation conditional maximization either (MCECME) algorithms are provided, where the Metropolis-Hastings (MH) sampling procedures are used to calculate the conditional moments of latent data in the MSLCR and FM-MSLCR models. Then, a general information matrix-based method is used to approximate the asymptotic standard errors of the parameter estimators. The numerical results from simulations show the advantage of the proposed models and methods in the sense of convergence behavior of MCECM algorithms, model selection, finite sample properties of parameter estimates, and classification performance. Finally, we show the utilities of the proposed models by analyzing the data collected from the dissolved mercury speciation in the 「Geochemistry of B-horizon」and 「California current system」to illustrate the correctness and practicality of the constructed model method.

    摘要i Abstract ii 致謝i 目錄ii 表目錄iv 圖目錄v 1 緒論1 1.1 研究背景與動機目的 1 1.2 文獻回顧 2 1.3 概要 3 2 多變量斜線分佈與其截切性質 4 2.1 分佈性質之符號定義 4 2.2 多變量斜線分佈 4 2.3 截切多變量斜線分佈 6 2.4 截切多變量斜線分佈之動差 7 3 具設限之多變量斜線迴歸模型10 3.1 模型架構之符號定義 10 3.2 迴歸模型結構 11 3.3 最大概似估計之MCECME 演算法 12 3.3.1 MCE 步驟 13 3.3.2 CM 步驟 16 3.4 參數估計值之標準誤 17 4 具設限之有限混合多變量斜線迴歸模型 19 4.1 模型架構之符號定義 19 4.2 有限混合多變量斜線迴歸模型 19 4.3 有限混合多元斜線迴歸模型:MCECME 演算法 21 4.3.1 MCE 步驟 22 4.3.2 CM 步驟 24 4.4 參數估計值之標準誤 25 5 模擬研究27 5.1 MSLCR 對於「MH 演算法」與「GPMH 演算法」之模擬實驗與比較28 5.1.1 MH 演算法之參數估計值 29 5.1.2 MH 演算法之參數估計值大樣本性質 32 5.2 FM-MSLCR 模型進行分群與準確估計真實參數值之模擬實驗 35 5.2.1 FM-MSLCR 模型對分離不良模擬資料的估計真實參數值之模擬實驗 35 5.2.2 FM-MSLCR 模型對分離良好模擬資料的估計真實參數值之模擬實驗 41 6 實例分析47 6.1 B-horizon 的地球化學元素含量 47 6.2 加州洋流系統中的溶解汞形態 51 7 結論 56 參考文獻 57 附錄A 60 附錄B 66 附錄C 69

    Amemiya, T. (1973). Regression analysis when the dependent variable is truncated normal. Econometrica: Journal of the Econometric Society, pages 997–1016.
    Andrews, D. F. and Mallows, C. L. (1974). Scale mixtures of normal distributions. Journal of the Royal Statistical Society: Series B (Methodological), 36(1):99–102.
    Arslan, O. (2009). Maximum likelihood parameter estimation for the multivariate skew–slash distribution. Statistics & Probability Letters, 79(20):2158–2165.
    Basso, R. M., Lachos, V. H., Cabral, C. R. B., and Ghosh, P. (2010). Robust mixture modeling based on scale mixtures of skew-normal distributions. Computational Statistics & Data Analysis, 54(12):2926–2941.
    Bozdogan, H. (1987). Model selection and akaike’s information criterion (aic): The general theory and its analytical extensions. Psychometrika, 52(3):345–370.
    Cohen, A. C. (1957). On the solution of estimating equations for truncated and censored samples from normal populations. Biometrika, 44(1/2):225–236.
    Cornish, E. A. (1954). The multivariate t-distribution associated with a set of normal sample deviates. Australian Journal of Physics, vol. 7, p. 531, 7:531.
    de Alencar, F. H., Galarza, C. E., Matos, L. A., and Lachos, V. H. (2022). Finite mixture modeling of censored and missing data using the multivariate skew-normal distribution. Advances in Data Analysis and Classification, 16(3):521–557.
    Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society: series B (methodological), 39(1):1–22.
    Genç, A. I. (2013). Moments of truncated normal/independent distributions. Statistical Papers, 54:741–764.
    Hannah M. Adams, Amina T. Schartup, C. L. and Cui, X. (2024). Dissolved mercury (hg) speciation in the california current system from samples collected on r/v roger revelle cruise rr2105 in july to august 2021. Biological and Chemical Oceanography Data Management Office (BCO-DMO). (Version 1) Version Date 2024-05-14 [if applicable, indicate subset used].
    Ho, H. J., Lin, T.-I., Chen, H.-Y., and Wang, W.-L. (2012). Some results on the truncated multivariate t distribution. Journal of Statistical Planning and Inference, 142(1):25–40.
    Kafadar, K. (1983). The efficiency of the biweight as a robust estimator of location. Journal of Research of the National Bureau of Standards, 88(2):105.
    Karlsson, M. and Laitila, T. (2014). Finite mixture modeling of censored regression models. Statistical papers, 55:627–642.
    Lachos, V. H., Moreno, E. J. L., Chen, K., and Cabral, C. R. B. (2017). Finite mixture modeling of censored data using the multivariate student-t distribution. Journal of Multivariate Analysis, 159:151–167.
    Lange, K. and Sinsheimer, J. S. (1993). Normal/independent distributions and their applications in robust regression. Journal of Computational and Graphical Statistics, 2(2):175–198.
    Levine, R. A. and Casella, G. (2001). Implementations of the monte carlo em algorithm. Journal of Computational and Graphical Statistics, 10(3):422–439.
    Liu, C. and Rubin, D. B. (1994). The ecme algorithm: a simple extension of em and ecm with faster monotone convergence. Biometrika, 81(4):633–648.
    McLachlan, G. J. and Peel, D. (2000). Finite mixture models, volume 299. John Wiley & Sons.
    Meilijson, I. (1989). A fast improvement to the em algorithm on its own terms. Journal of the Royal Statistical Society Series B: Statistical Methodology, 51(1):127–138.
    Meng, X.-L. and Rubin, D. B. (1993). Maximum likelihood estimation via the ecm algorithm: A general framework. Biometrika, 80(2):267–278.
    Powell, J. L. (1984). Least absolute deviations estimation for the censored regression model. Journal of econometrics, 25(3):303–325.
    Reimann, C. (1998). Geochemistry of B-horizon/1. PANGAEA. In supplement to: Reimann, Clemens; Äyräs, M; Chekushin, V; Bogatyrev, I; Boyd, Rognvald; Caritat, P; Dutter, R; Finne, T E; Halleraker, J H; Jæger, Ø; Kashulina, G; Lehto, O; Niskavaara, H; Pavlov, V K; Räisänen, M L; Strand, T; Volden, T (1998): Environmental geochemical atlas of the central Barents region. NGU-GTK-CKE special publication, Geological Survey of Norway (NGU), Trondheim, ISBN 978-82-7385- 176-5, 745 pp.
    Steinley, D. (2004). Properties of the hubert-arable adjusted rand index. Psychological methods, 9(3):386.
    Wang, W.-L. (2023). Multivariate contaminated normal censored regression model: Properties and maximum likelihood inference. Journal of Computational and Graphical Statistics, 32(4):1671–1684.
    Wedel, M. and DeSarbo, W. S. (1995). A mixture likelihood approach for generalized linear models. Journal of classification, 12:21–55.
    Wei, G. C. and Tanner, M. A. (1990). A monte carlo implementation of the em algorithm and the poor man’s data augmentation algorithms. Journal of the American statistical Association, 85(411):699–704.
    Zeller, C. B., Cabral, C. R. B., Lachos, V. H., and Benites, L. (2019). Finite mixture of regression models for censored data based on scale mixtures of normal distributions. Advances in Data Analysis and Classification, 13:89–116.
    林資荃(2009). 多變量偏斜分佈對於不完整資料之研究. PhD thesis, 國立交通大學.
    龔于涵(2022). 具設限資料多變量斜線分佈之最大概似估計. Master thesis, 逢甲大學.

    無法下載圖示 校內:2029-08-19公開
    校外:2029-08-19公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE