| 研究生: |
蔡茜婷 Tsai, Chien-Ting |
|---|---|
| 論文名稱: |
平均誤差平方和最小化條件下的因子分析 Factor Analysis under Minimizing Mean Squared Error |
| 指導教授: |
馬瀰嘉
Ma, Mi-Chia |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 統計學系 Department of Statistics |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 58 |
| 中文關鍵詞: | 多變量分析 、因子分析 、主成份分析 、資料降維 |
| 外文關鍵詞: | multivariate analysis, factor analysis, principal component analysis, dimensionality reduction |
| 相關次數: | 點閱:193 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在現今大數據的世代中為了因應各種不同的資料型態,新的資料處理方法不斷推陳出新。而統計、電腦科學,以及資訊等各領域專家對於研發新方法的背景與對於其他領域既有方法的理解不完全,因此可能導致引用上的錯誤並使得結果不可靠。在現代研究結果高度流通的環境中,如此的錯誤結果可能在未經驗證的情況下再度被誤用。Sigg & Buhmann (2008) 提出應用EM演算法 (Expectation – Maximization Algorithm)的非負稀疏主成份分析,能夠將高維度的資料進行快速的降維處理,在網路上被廣泛引用。但作者可能不熟悉統計領域的多變量分析理論,故混淆因子分析模型與主成份分析的觀念導致文獻數學公式的推導上有誤。本研究將對於作者錯誤的數學推導進行修正,並應用多變量統計分析(Johnson & Wichern,2007; Applied Multivariate Statistical Analysis)書中的傳統因子分析方法,提出使平均重建誤差平方和最小化的方法。接著使用修改Sigg & Buhmann (2008)文獻的方法後,衍伸出四種不同的因子分析組合,並使用模擬數據做比較,評估何種組合下的誤差平方和較小。統計模擬中,假設資料來自於多變量常態分配。在固定母體期望值下,改變未觀測到的共同因子個數與維度大小,來觀察四種組合所得出的平均誤差平方和何者較小。在實例分析中,我們針對臉部影像的高維度資料進行操作,以了解我們修改過後的方法在高維度資料降維的表現。
In the big data era nowadays, new data processing methods continuously emerge to address various data types. However, experts from fields like statistics, computer science, and information may have an incomplete understanding of the background behind developing new methods and existing methods in other fields. This knowledge gap can lead to misleading usage and unreliable results. In this study, we address an error in the mathematical derivation of the widely cited Expectation-Maximization for Sparse and Non-Negative Principal Component Analysis (emPCA) method proposed by Sigg & Buhmann (2008). We correct these errors by applying traditional factor analysis methods from the book "Applied Multivariate Statistical Analysis" by Johnson & Wichern (2007) to minimize the Mean Squared Error (MSE) and Mean Reconstruction Error (MRE). We extend the modified approach to derive four different factor analysis combinations and evaluate their performance using simulated data. By examining the MSEs resulting from these combinations, we determine the most effective approach. Additionally, we apply our modified method to the high-dimensional Olivetti face dataset to assess its performance in dimensionality reduction.
[1] Williams, B., Onsman, A., and Brown, T. (2010). Exploratory factor analysis: A five-step guide for novices. Australasian Journal of Paramedicine, 8, 1-13.
[2] Bansal, M., Goyal, A., and Choudhary, A. (2022). A comparative analysis of K-Nearest Neighbour, Genetic, Support Vector Machine, Decision Tree, and Long Short Term Memory algorithms in machine learning. Decision Analytics Journal, 100071.
[3] Zhu, F., Gao, J., Yang, J., and Ye, N. (2022). Neighborhood linear discriminant analysis. Pattern Recognition, 123, 108422.
[4] Zou, H., Hastie, T., and Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2), 265-286.
[5] Zou, H., and Xue, L. (2018). A selective overview of sparse principal component analysis. Proceedings of the IEEE, 106(8), 1311-1320.
[6] Sjöstrand, K., Rostrup, E., Ryberg, C., Larsen, R., Studholme, C., Baezner, H., Ferro, J., Fazekas, F., Pantoni, L., and Inzitari, D. (2007). Sparse decomposition and modeling of anatomical shape variation. IEEE Transactions on Medical Imaging, 26(12), 1625-1635.
[7] Baden, T., Berens, P., Franke, K., Román Rosón, M., Bethge, M., and Euler, T. (2016). The functional diversity of retinal ganglion cells in the mouse. Nature, 529(7586), 345-350.
[8] Gravuer, K., Sullivan, J. J., Williams, P. A., and Duncan, R. P. (2008). Strong human association with plant invasion success for Trifolium introductions to New Zealand. Proceedings of the National Academy of Sciences, 105(17), 6344-6349.
[9] Bian, J., Zhao, D., Nie, F., Wang, R., and Li, X. (2022). Robust and sparse principal component analysis with adaptive loss minimization for feature selection. IEEE Transactions on Neural Networks and Learning Systems.
[10] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288.
[11] Fan, J., and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348-1360.
[12] Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418-1429.
[13] Yuan, M., and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49-67.
[14] Xiaojun, S., and Zongkui, Z. (2005). Exploratory factor analysis and its main problems in application. Psychological Science-Shanghai-, 28(6), 1440.
[15] Bryant, F. B., Yarnold, P. R., and Michelson, E. A. (1999). Statistical methodology: VIII. Using confirmatory factor analysis (CFA) in emergency medicine research. Academic Emergency Medicine, 6(1), 54-66.
[16] Nimon, K. F. (2012). Statistical assumptions of substantive analyses across the general linear model: a mini-review. Frontiers in Psychology, 3, 322.
[17] Thompson, B. (2004). Exploratory and confirmatory factor analysis: Understanding concepts and applications. Washington, DC, 10694(000).
[18] 陳順宇 (2000). 多變量分析, 第二版, 台北: 華泰書局.
[19] Hancock, G. R., Mueller, R. O., and Stapleton, L. M. (2010). The Reviewer's Guide to Quantitative Methods in the Social Sciences: Routledge.
[20] Knekta, E., Runyon, C., and Eddy, S. (2019). One size doesn’t fit all: Using factor analysis to gather validity evidence when using surveys in your research. CBE—Life Sciences Education, 18(1), rm1.
[21] Suhr, D. D. (2005). Principal component analysis vs. exploratory factor analysis. Sugi 30 Proceedings, 203, 230.
[22] Taherdoost, H., Sahibuddin, S., and Jalaliyoon, N. (2022). Exploratory factor analysis; concepts and theory. Advances in Applied and Pure Mathematics, 27, 375-382.
[23] Sigg, C. D., and Buhmann, J. M. (2008). Expectation-maximization for sparse and non-negative PCA. In Proceedings of the 25th international conference on Machine learning, pp. 960-967.
[24] Zass, R., and Shashua, A. (2006). Nonnegative sparse PCA. Advances in Neural Information Processing Systems, 19.
[25] Moon, T. K. (1996). The expectation-maximization algorithm. IEEE Signal Processing Magazine, 13(6), 47-60.
[26] Johnson, R. A., and Wichern, D. W. (2007). Applied multivariate statistical analysis. 6th (pp. 430-538). New Jersey, US: Pearson Prentice Hall.
[27] Smallman, L., and Artemiou, A. (2022). A literature review of (sparse) exponential family PCA. Journal of Statistical Theory and Practice, 16(1), 14.
[28] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., and Dubourg, V. (2011). Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12, 2825-2830.
[29] Dai, F., Dutta, S., and Maitra, R. (2020). A matrix-free likelihood method for exploratory factor analysis of high-dimensional gaussian data. Journal of Computational and Graphical Statistics, 29(3), 675-680.
[30] Hallman, E., and Troester, D. (2022). A multilevel approach to stochastic trace estimation. Linear Algebra and Its Applications, 638, 125-149.