簡易檢索 / 詳目顯示

研究生: 葉俊吾
Yeh, Chun-Wu
論文名稱: 在動態多變環境下以趨勢力道分析法進行預測
Using the Trend and Potency Approach to Forecast under a Dynamic and Changeable Environment
指導教授: 利德江
Li, Der-Chiang
學位類別: 博士
Doctor
系所名稱: 管理學院 - 工業與資訊管理學系
Department of Industrial and Information Management
論文出版年: 2007
畢業學年度: 95
語文別: 英文
論文頁數: 64
中文關鍵詞: 機器學習小樣本趨勢力道函數
外文關鍵詞: Trend and potency function, Small data sets, Machine learning
相關次數: 點閱:86下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現今之環境特徵為動態及多變,資料的行為趨勢亦不停在轉變,在一序列的收集資料裡,早期之資料對未來趨勢變化之資訊提供,已不合潮流;只有離現今前幾期時間點之資料較能提供未來預測之資訊,研究者如欲採用早期之資料以建立預測模型來估計未來母體參數將產生較大之預測誤差,例如評估母體平均數,若研究者仍以樣本平均數來進行估計將不恰當,主要原因為樣本可能來自於不同母體且母體參數真值隨時間一直不停在變動,此情況在樣本數不多時尤其明顯。
    本研究提供一預測未來趨勢中心之概念,藉由資料趨勢變動之行為來產生一預測模型,以期提高預測之績效並降低預測誤差。

    Nowadays, the characteristic of the environment is dynamic and changeable, and previously collected data are not suitable for building a predictive model in that the value of population parameters such as mean or variance is moving or fluctuating. Up-to-date data is usually in small sample sets, and it is risky to assume that the distribution of the few samples collected is derived from the same distribution such as the normal distribution. Based on this, the sample statistic X may not be the proper measurement to estimate the mean of a population when confronting small data sets. This research proposes a novel concept of “trend center”; that is the center of probable (CP) determined with a variety of data properties, employed for estimating the real location of population center μ in advance.

    摘要 I Abstract II Acknowledgements III Table of Contents IV List of Tables VI List of Figures VIII 1. Introduction 1 1.1. Research Background and Motivation 1 1.2. Objective 2 2. Literature Review 5 2.1. Fundamental Issues when Confronting Small Data Sets 5 2.2. When the Size of the Data Set Is Comparatively Small 7 2.3. When the Size of the Data Set Is Simply Small 11 2.4. Learning with Small Data Sets 14 2.5. Learning with Time Series Data 17 3. Development of the TPTM Learning Algorithm 19 3.1. Probable Location Estimation from the Existing Data 19 3.2. Formulating the Trend and Potency Tracking Method (TPTM) and Finding the Center of Probable (CP) 21 3.3. The Simplified Center of Probable 26 4. Experimental Analysis 30 4.1. Sequential-data Generation with Normal Distributions 30 4.2. The Procedure of the Experiment 31 4.3. Experimental Results between the Proposed Approach and the other Forecasting Methods 32 5. Conclusions and Discussion 45 References 47

    Akaike, H. (1973). Information theory and an extension of the maximum likelihood principal. In: Petrov, B.N., Csaki, F. (Eds.), 2nd International Symposium on Information Theory. Akademia Kiado, Budapest, 267-281.

    Anthony, M. and Biggs, N. (1997). Computational Learning Theory. Cambridge University Press.

    Bishop, C.M. (1995). Neural Networks for Pattern Recognition, Oxford University Press.

    Berrar, D.P., Downes, C.S. and Dubitzky, W. (2003). Multiclass cancer classification using gene expression profiling and probabilistic neural networks. In Proceedings of the 8th Pacific Symposium on Biocomputing, Lihue, Hawaii, USA, World Scientific, NJ, pp. 5–16.

    Brown, M.P.S. et al. (2000). Knowledge-based analysis of microarray gene expression data by using support vector machines. Proceedings of the National Academy of Science, USA, 97, 263–267.

    Berrar, D., Bradbury, I., and Dubitzky, W. (2006). Avoiding model selection bias in small-sample genomic datasets. Bioinformatics, 22(10), 1245-1250.

    Cheng, C.L., Schneeweiss, H., and Thamerus, M. (2000). A small sample estimator for a polynomial regression with errors in the variables. Journal of Royal Statistical Society, 62(4), 699-709.

    Chi, H.M. and Ersoy, M.K., (2005). A statistical self-organizing learning system for remote sensing classification. IEEE Transactions on Geoscience and Remote Sensing, 43, 1890-1900.

    Dudoit, S. et al. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97, 77–87.

    Fu, W. et al. (2005). Estimating misclassification error with small samples via bootstrap cross-validation. Bioinformatics, 22(9), 1979-1986.

    Field, C.A. (1997). Robust regression and small sample confidence intervals. Journal of Statistical Planning and Inference, 57, 39-48.

    Golub, T.R. et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531-537.

    Giles, C., Lawrence, S., and Tsoi, A. (2001). Noisy time series prediction using recurrent neural networks and grammatical inference. Machine Learning, 44, 161-183.

    Giesbrecht, F. G. and Burns, J. C. (1985). Two-stage analysis based on the mixed model: large sample asymptotic theory and small-sample simulation results. Biometrics, 41, 477-486.

    Harville, D. (1977). Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American Statistical Association, 72, 320-338.

    Huang, C. and Moraga, C., (2004). A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35, 137-161.

    Jennrich, R. I. and Schluchter, M. D. (1986). Unbalanced repeated-measures models with structured covariance matrices. Biometrics, 42, 805-820.

    Kadous, M.W. and Sammut, C., (2004). Constructive induction for classifying time series. Lecture Notes in Computer Science, 3201, 192-204.

    Kenward, M. G. and Roger, J. H. (1997). Small sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53, 983-997.

    Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, 38, 963-974.

    Lee, S.Y., and Song, X.Y. (2004). Evaluation of the Bayesian and maximum likelihood approaches in analyzing structural equation models with small sample sizes. Multivariate Behavior Research, 39(4), 653-686.

    Li,T. et al. (2004). A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics, 20, 2429–2437.

    Li, D.C., Chen, L.S., and Lin, Y.S., (2003). Using functional virtual population as assistance to learn scheduling knowledge in dynamic manufacturing environments. International Journal of Production Research, 41(17), 4011-4024.

    Li, D. C., Yao-San Lin, (2007). “Learning Management Knowledge for Manufacturing Systems in the Early Stages using Time Series Data.” European Journal of Operational Research, in press.

    Li, D. C., Yao-San Lin, (2006). “Using Virtual Sample Generation to Build up Management Knowledge in the Early Manufacturing Stages.” European Journal of Operational Research, 175 413-434.

    Li, D.C., Wu, C., and Chang, F.M., (2005). Using Data-fuzzification Technology in Small Data Set Learning to Improve FMS Scheduling Accuracy. International Journal of Advanced Manufacturing Technology, 27, 321-328.

    Li, D.C., Hsu, H.C., Tsai, T.I., Lu, T.J., and Hu, S.C., (2007). A New Method to Help Diagnose Cancers for Small Sample Size. Expert Systems with Applications, 33(2), in press.

    Li, D.C., and Yeh, C.W. (2007). A non-parametric learning algorithm for small manufacturing data sets. Expert Systems with Applications, in press.

    Lindsay, D. and Cox, S., (2005). Effective probability forecasting for time series data using standard machine learning techniques. Lecture Notes in Computer Science, 3686, 35-44

    Matthews, JNS. (1995). Small clinical trials: are they all bad? Statistics in Medicine, 14:115-126.

    McLean, R. A. and Sanders, W. L. (1988). Approximating degrees of freedom for standard errors in mixed linear models. Proceedings of Statistical Computational Section of American Statistical Association, 50-89.

    Montanes, E, Quevedo, J.R., Prieto, M.M., et al., (2002). Forecasting time series combining machine learning and Box-Jenkins time series. Lecture Notes in Artificial Intelligence, 2527, 491-499.

    Niyogi, P., Girosi, F., and Tomaso, P., (1998). Incorporating prior information in machine learning by creating virtual examples. Proceeding of the IEEE, 275-298.

    Popescu, C.A. and Wong Y.S., (2005). Nested Monte Carlo EM algorithm for switching state-space models. IEEE Transactions on Knowledge and Data engineering, 17, 1653-1663.

    Prudenico, R.B.C. and Ludermir, T.B., (2004). Meta-learning approaches to selecting time series models. Neurocomputing, 61, 121-137.

    Quek, M.L., Quinn, D.I., Daneshmand, S., and Stein, J.P., (2003). Molecular prognostication in bladder cancer - a current perspective. European Journal of Cancer, 39, 1501–1510.

    Ross, D.T. et al. (2000). Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics, 24, 227–235.

    Schwarz, G., (1978). Estimating the dimensions of a model. Annals of Statistics, 6, 461-464.

    Sima, C., and Dougherty, E. (2006). What should be expected from feature selection in small-sample settings. Bioinformatics, 22(19), 2430-2436.

    Sima, C. et al. (2005). Impact of error estimation on feature-selection algorithms. Pattern Recognition, 38, 2472–2482.

    Shih, W.J. et al. (2004). Analysis of pilot and early phase studies with small sample sizes. Statistics in Medicine, 23, 1827-1842.

    Schluchter, M. D. and Elashoff, J. D. (1990). Small-sample adjustments to tests with unbalanced repeated measures assuming several covariance structures. Journal of Statistical Computation and Simulation, 37, 69-87.

    Wang, J. et al. (2003). Tumor classification and marker gene prediction by feature selection and fuzzy c-means clustering using microarray data. BMC Bioinformatics, 4, 60.

    Wall MM, Boen J, Tweedie R. (2001). An effective confidence interval for the mean with samples of size one and two. The American Statistician, 55,102-105.

    Yeoh, E.J. et al. (2002). Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell, 1, 133–143.

    Zucker, D.M., Lieberman, O., and Manor, O. (2000). Improves small sample inference in the mixed linear model: Bartlett correction and adjusted likelihood. Journal of Royal Statistical Society, 62(4), 827-838.

    下載圖示 校內:2008-07-21公開
    校外:2010-07-21公開
    QR CODE