| 研究生: |
林翊涵 Lin, Yi-Han |
|---|---|
| 論文名稱: |
基於傳統多變量分析方法之模型觀點抽樣設計 A Model-Based Sampling Selection Method Based on Classical Multivariate Analysis Methods |
| 指導教授: |
趙昌泰
Chao, Chang-Tai |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 統計學系 Department of Statistics |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 英文 |
| 論文頁數: | 94 |
| 中文關鍵詞: | 典型相關分析 、樣本調整 、模型觀點抽樣理論 |
| 外文關鍵詞: | Canonical Correlation Analysis, Sample Adjustment, Model-based Sampling |
| 相關次數: | 點閱:179 下載:9 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
抽樣調查時,收集每一個樣本都需要一定的成本,因此在固定的總樣本數下,該如何選取較具有代表性的樣本是抽樣理論及應用中的基本問題。在一些抽樣方法中,如分層抽樣,會先將母體分成數層,並對各層進行樣本數配置,但在樣本數配置過程中會出現問題,舉例來說,各層樣本數有時不是一個整數,或是在某些情況下分配到的樣本數為零,這時需要對層內樣本數做四捨五入取至整數位,或是進行調整,但四捨五入或調整過後的總樣本數往往不是原先所給定的,故在先前研究中會使用隨機抽樣來去除多餘或是添補缺漏的樣本。
在本研究中,基於多變量分析方法的典型相關分析,提出兩種抽樣策略來調整樣本。延續先前趙昌泰(2004)和趙昌泰與林楓敏(2007)所提出選取樣本的方法,在已知母體共變異數矩陣及給定之總樣本數下,使用群集分析將母體分群並做樣本數配置,接著根據四捨五入或調整過後的樣本數利用主成分分析進行群內樣本選取,最後當四捨五入或調整過後的總樣本數與原先所給定的不同時再調整樣本。這兩種抽樣策略不需要完整的母體模型假設,僅需要母體共變異數矩陣,即可選取適當的樣本進行調整。透過多種模擬研究和三個實際資料應用,證明所提出的抽樣策略優於簡單隨機抽樣和之前所提出方法的預測結果,在固定的樣本數下使用這兩種抽樣策略選取樣本可以最小化均方預測誤差,且不需要複雜的演算法與繁重的計算負載,在實際應用上能更加彈性和簡單。
In survey sampling, it takes a certain cost to collect each sampling unit. Therefore, given a fixed total sample size, how to select a more representative sample is the basic problem in sampling theory and application. In some sampling designs, such as stratified sampling, the population is firstly divided into several layers, and then the sample size is allocated to each layer. However, there is a problem in the process of sample size allocation. For example, sample size in each layer usually is not an integer and in some cases, sample size will be zero. At that time, the sample size in each layer needs to be rounded to the nearest integer or be adjusted. Nevertheless, the total sample size after rounding and/or adjustment is often not the same as the given one. Therefore, in previous researches, random sampling is used to remove excess sampling units or add missing ones.
In this research, based on canonical correlation analysis in multivariate analysis techniques, two model-based sampling strategies are proposed to adjust the sample. It extends the sampling methods proposed by Chao (2004) and Chao and Lin (2007). Under the known population covariance matrix and the given total sample size, one may use cluster analysis to partition the population into several clusters and do the sample size allocation. Then, one may utilize principal component analysis to select the within-cluster sample on the basis of the sample size after rounding and/or adjustment. Finally, select the adjusted sample when the total sample size after rounding and/or adjustment is different from the given one. These two sampling designs do not need an exact population distribution, but a population covariance matrix, and an appropriate sample can be selected for adjustment. Multiple simulation studies and three applications of actual data show that proposed sampling strategies perform better prediction results than that of simple random sampling without replacement and methods proposed in the past. Under a fixed total sample size, selecting sampling units by these two sampling strategies can minimize the mean-squared prediction error. No complicated algorithm and intensive computational load are required. It is more flexible and simple to implement the procedures in practice.
1. 陳順宇. 多變量分析. 臺北市: 陳順宇發行: 華泰總經銷, 2005.
2. Environmental Protection Administration. Air quality information monthly. https://erdb.epa.gov.tw/DataRepository/EnvMonitor/AirQualityMonitorMonData.aspx?topic1=
3. Environmental Protection Administration. Air quality monitoring stations. https://erdb.epa.gov.tw/DataRepository/Facilities/AirQualityMonitorStations.aspx?topic1=
4. D Basu. Role of the sufficiency and likelihood principles in sample survey theory. Sankhyā: The Indian Journal of Statistics, Series A, pages 441-454, 1969.
5. Heleno Bolfarine and Shelemyahu Zacks. Prediction theory for finite populations. Springer series in statistics. New York : Springer-Verlag, 1992.
6. Central Weather Bureau. Cwb observation data inquire system.
http://eservice.cwb.gov.tw/HistoryDataQuery/index.jsp.
7. Central Weather Bureau. Weather monitoring stations. http://eservice.cwb.gov.tw/wdps/obs/state.htm.
8. Chang-Tai Chao. Selection of sampling units under a correlated population based on the eigensystem of the population covariance matrix. Environmetrics, 15(8):757-775, 2004.
9. Chang-Tai Chao and Feng-Min Lin. Multivariate analysis techniques on model-based sampling. 2007.
10. Chang-Tai Chao and Steven K Thompson. Optimal adaptive selection of sampling sites. Environmetrics: The official journal of the International Environmetrics Society, 12(6):517-538, 2001.
11. Noel A. C. Cressie. Statistics for spatial data. Wiley series in probability and mathematical statistics: Applied probability and statistics. New York : J. Wiley, 1993.
12. Peter J Diggle and Paulo J Ribeiro Jr. Bayesian inference in gaussian model-based geostatistics. Geographical and Environmental Modelling, 6(2):129-146, 2002.
13. William R. Dillon and Matthew Goldstein. Multivariate analysis : methods and applications. Wiley series in probability and mathematical statistics: Applied probability and statistics. New York : Wiley, 1984.
14. Shelley Zacks. Bayes sequential designs of fixed size samples from finite populations. Journal of the American Statistical Association, 64(328):1342-1349, 1969