簡易檢索 / 詳目顯示

研究生: 施文千
SHIH, WEN-CHIEN
論文名稱: 有限混合多變量雙指數設限迴歸模型及其應用
Finite Mixtures of Multivariate Double Exponential Censored Regression Model with Its Application
指導教授: 王婉倫
Wang, Wan-Lun
學位類別: 碩士
Master
系所名稱: 管理學院 - 統計學系
Department of Statistics
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 69
中文關鍵詞: 設限資料厚尾分佈異質性資料尖峰分佈MCECM 演算法截切多變量雙指數分佈
外文關鍵詞: Censored data, Heavy tailed distribution, Heterogeneous data, Leptokurtic distribution, MCECM algorithm, Truncated multivariate double exponential distribution
相關次數: 點閱:6下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在資料分析中,常遇到資料因檢測限制產生設限觀察值以及因異常觀察值所導致的厚尾與尖峰型態而產生統計分析上的挑戰。目前在多變量常態分配假設下的模型過於侷限,因其對於分佈偏離常態的資料缺乏穩健性。本文提出多變量雙指數設限迴歸(MDECR) 模型,用以同時穩健地描述具厚尾、尖峰與設限特性的多變量資料。為處理異質性多變量資料中的分群問題,進一步將MDECR 模型擴展為有限混合多變量雙指數設限迴歸(FM-MDECR) 模型,允許多個反應變數同時具備設限的情況。為進行模型參數之最大概似(ML) 估計,本文發展基於蒙地卡羅方法之期望值條件最大化(MCECM)演算法,並於MDECR 與FM-MDECR 模型中結合Metropolis-Hastings(MH)演算法以計算潛在變數之條件動差,並且使用Meilijson的方法計算基於訊息矩陣之參數標準誤估計。模擬研究顯示,所提出之模型與演算法在模型選擇、參數估計及分類準確度等方面具備顯著優勢,並探討本演算法所得之參數最大概似估計具有限樣本性質。最後,透過加州洋流汞資料與美國薪資水準資料集之分析,驗證所提方法之有效性與實用性。

    In data analysis, it is common to encounter challenges in statistical analysis arising from censored measurements due to detection limits and heavy-tailed and leptokurtiz behavior owing to atypical observations. The models under the multivariate normality assumption are too restrictive as they suffer from the lack of robustness against departure from the normal distribution. The thesis proposes the Multivariate Double Exponential Censored Regression (MDECR) model to robustly describe multivariate data characterized by fat tails, leptokurtosis and censoring simultaneously. To address the issue of clustering heterogeneous multivariate data, we further extend the MDECR model to the Finite Mixture Multivariate Double Exponential Censored Regression (FM-MDECR) model, in which more than one response variables can be censored. To perform maximum likelihood (ML) estimation of the model parameters, we develop a Monte Carlo-based Expectation Conditional Maximization (MCECM) algorithm in which the Metropolis-Hastings (MH) algorithm is utilized to compute the conditional moments of the latent data in both the MDECR and FM-MDECR models. In addition, Meilijson’s method based on the information matrix is employed to compute the standard errors of parameter estimates. Simulation studies demonstrate the advantages of the proposed models in terms ofmodel selection, parameter estimation, and classification accuracy and investigate finite-sample properties of ML estimates obtained by our algorithms. Finally, we illustrate the validity and practical utility of the proposed methodology through the analysis of the California Coastal Current Mercury and U.S. Labor Market Earnings datasets.

    摘要I Abstract II 目錄 VIII 表目錄 X 圖目錄 XI 1 簡介 1 1.1 研究背景 1 1.2 動機與目的 2 1.3 組織結構 2 2 多變量雙指數分佈 3 2.1 多變量雙指數分佈 4 2.2 截切多變量雙指數分佈 5 3 具設限之多變量雙指數分佈迴歸模型 8 3.1 迴歸模型結構 8 3.2 MCECM 演算法估計MDECR 模型 9 3.3 參數估計值之標準誤 12 4 具設限之有限混合多變量雙指數分佈迴歸模型14 4.1 有限混合多變量雙指數迴歸模型 14 4.2 MCECM 演算法估計FM-MDECR 模型 15 4.3 參數估計值之標準誤 17 5 模擬研究 19 5.1 參數設定 19 5.2 模型選擇 21 5.3 參數估計 23 5.4 分群準確率 29 6 實證資料 30 6.1 加州洋流溶解汞資料 30 6.2 美國薪資水準資料 33 7 結論與未來研究方向 36 參考文獻 37 附錄A MDE 分佈相關性質之推導41 附錄B MDECR 模型最大概似估計之推導 45 附錄C FM-MDECR 模型最大概似估計之推導 49 附錄D MVNCR 與FM-MVNCR 模型估計參數比較表 53

    Adams, H. M., Cui, X., Lamborg, C. H., and Schartup, A. T. (2024). Dimethylmercury as a source of monomethylmercury in a highly productive upwelling system. Environmental Science & Technology, 58(24):10591–10600.
    Aitkin, M. and Rubin, D. B. (1985). Estimation and hypothesis testing in finite mixture models. Journal of the Royal Statistical Society Series B: Statistical Methodology,47(1):67–75.
    Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. In Selected papers of hirotugu akaike. Springer.
    Andrews, D. F. and Mallows, C. L. (1974). Scale mixtures of normal distributions. Journal of the Royal Statistical Society: Series B (Methodological), 36(1):99–102.
    Arslan, O. (2010). An alternative multivariate skew laplace distribution: properties and estimation. Statistical papers, 51:865–887.
    Bartholomew, D. J., Knott, M., and Moustaki, I. (2011). Latent variable models and factor analysis: A unified approach. John Wiley & Sons.
    Cohen, A. C. (1959). Simplified estimators for the normal distribution when samples are singly censored or truncated. Technometrics, 1(3):217–237.
    Cohen Jr, A. C. (1950). Estimating the mean and variance of normal populations from singly truncated and doubly truncated samples. The Annals of mathematical statistics, 21(4):557–569.
    de Alencar, F. H., Galarza, C. E., Matos, L. A., and Lachos, V. H. (2022). Finite mixture modeling of censored and missing data using the multivariate skew-normal distribution. Advances in Data Analysis and Classification, 16(3):521–557.
    De Veaux, R. D. (1989). Mixtures of linear regressions. Computational Statistics & Data Analysis, 8(3):227–245.
    Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society: seriesB (methodological), 39(1):1–22.
    Eltoft, T., Kim, T., and Lee, T.-W. (2006). On the multivariate laplace distribution. IEEE Signal Processing Letters, 13(5):300–303.
    Ernst, M. D. (1998). A multivariate generalized laplace distribution. Computational Statistics, 13(2):227–232.
    Farewell, V. T. (1982). The use of mixture models for the analysis of survival data with long-term survivors. Biometrics, 38(4):1041–1046.
    Frühwirth-Schnatter, S. and Pyne, S. (2010). Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions. Biostatistics, 11(2):317–336.
    Garay, A. M., Lachos, V. H., Bolfarine, H., and Cabral, C. R. (2017). Linear censored regression models with scale mixtures of normal distributions. Statistical Papers, 58:247–278.
    Garay, A. M., Lachos, V. H., and Lin, T.-I. (2016). Nonlinear censored regression models with heavy-tailed distributions. Statistics and its Interface, 9(3):281–293.
    Genç, A. I. (2013). Moments of truncated normal/independent distributions. Statistical Papers, 54:741–764.
    Gradshteyn, I. S. and Ryzhik, I. M. (2014). Table of integrals, series, and products. Academic press.
    Hastings, W. K. (1970). Monte carlo sampling methods using markov chains and their applications.
    Ho, H. J., Lin, T.-I., Chen, H.-Y., and Wang, W.-L. (2012). Some results on the truncated multivariate t distribution. Journal of Statistical Planning and Inference, 142(1):25–40.
    Hubert, L. and Arabie, P. (1985). Comparing partitions. Journal of classification, 2:193–218.
    Hughes, J. P. (1999). Mixed effects models with censored data with application to hivrna levels. Biometrics, 55(2):625–629.
    Kotz, S. and Nadarajah, S. (2004). Multivariate t-distributions and their applications. Cambridge university press.
    Lange, K. and Sinsheimer, J. S. (1993). Normal/independent distributions and their applications in robust regression. Journal of Computational and Graphical Statistics, 2(2):175–198.
    Laplace, P. S. (1774). Mémoire sur la probabilité de causes par les évenements. Mémoire de l’académie royale des sciences.
    McLachlan, G. J. and Basford, K. E. (1988). Mixture models: Inference and applications to clustering, volume 38. M. Dekker New York.
    McLachlan, G. J. and Krishnan, T. (2008). The EM algorithm and extensions. John Wiley & Sons.
    Meilijson, I. (1989). A fast improvement to the em algorithm on its own terms. Journal of the Royal Statistical Society Series B: Statistical Methodology, 51(1):127–138.
    Meng, X.-L. and Rubin, D. B. (1993). Maximum likelihood estimation via the ecm algorithm: A general framework. Biometrika, 80(2):267–278.
    Mirfarah, E., Naderi, M., Lin, T.-I., and Wang, W.-L. (2024). Robust bayesian inference for the censored mixture of experts model using heavy-tailed distributions. Advances in Data Analysis and Classification, 18:1–29.
    Mroz, T. A. (1984). The sensitivity of an empirical model of married women’s hours of work to economic and statistical assumptions. Stanford University.
    Naik, D. N. and Plungpongpun, K. (2006). A kotz-type distribution for multivariate statistical inference. In Advances in distribution theory, order statistics, and inference. Springer.
    Pearson, K. (1895). X. contributions to the mathematical theory of evolution.—ii. Skew variation in homogeneous material. Philosophical Transactions of the Royal Society of London.(A.), (186):343–414.
    Peel, D. and McLachlan, G. J. (2000). Robust mixture modelling using the t distribution. Statistics and computing, 10:339–348.
    Powell, J. L. (1984). Least absolute deviations estimation for the censored regression model. Journal of econometrics, 25(3):303–325.
    R Core Team (2025). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
    Schwarz, G. (1978). Estimating the dimension of a model. The annals of statistics, 6(2):461–464.
    Shumway, R., Azari, A., and Johnson, P. (1989). Estimating mean concentrations under transformation for environmental data with detection limits. Technometrics, 31(3):347–356.
    Titterington, D. M., Smith, A. F., and Makov, U. E. (1985). Statistical analysis of finite mixture distributions. (No Title).
    Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica: journal of the Econometric Society, 26(1):24–36.
    Wang, W.-L. (2023). Multivariate contaminated normal censored regression model: properties and maximum likelihood inference. Journal of Computational and Graphical Statistics, 32(4):1671–1684.
    Wedel, M. and DeSarbo, W. S. (1995). A mixture likelihood approach for generalized linear models. Journal of classification, 12:21–55.
    Wei, G. C. and Tanner, M. A. (1990). A monte carlo implementation of the em algorithm and the poor man’s data augmentation algorithms. Journal of the American statistical Association, 85(411):699–704.
    West, M. (1987). On scale mixtures of normal distributions. Biometrika, 74(3):646–648.
    Wolfe, J. H. (1970). Pattern clustering by multivariate mixture analysis. Multivariate behavioral research, 5(3):329–350.
    Yao, W., Wei, Y., and Yu, C. (2014). Robust mixture regression using the tdistribution. Computational Statistics & Data Analysis, 71:116–127.
    Zeller, C. B., Cabral, C. R. B., Lachos, V. H., and Benites, L. (2019). Finite mixture of regression models for censored data based on scale mixtures of normal distributions. Advances in Data Analysis and Classification, 13:89–116.

    下載圖示
    校外:立即公開
    QR CODE