簡易檢索 / 詳目顯示

研究生: 董名哲
Tung, Ming-Che
論文名稱: 貝氏無母數檢定應用於二元反應變數資料流變構點偵測
Nonparametric Bayesian Change-Points Detection for Logistic Regressions with Streaming Data
指導教授: 張升懋
Chang, Sheng-Mao
學位類別: 碩士
Master
系所名稱: 管理學院 - 統計學系
Department of Statistics
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 33
中文關鍵詞: 貝氏無母數迪利克雷過程Metropolis-Hastings演算法變構點偵測
外文關鍵詞: Nonparametric Bayesian, Dirichlet process, Metropolis-Hastings Algorithm, Change-points Detection
相關次數: 點閱:157下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現今由於科技與網際網路的蓬勃發展,有關資料流分析的文章也越來越多,而變構點的偵測是分析資料流的一重要問題。在此篇論文裡,提供在貝氏觀念下,如何偵測出是否有變構點的方法。整個方法主要想法是當加入新資料後,檢定模型系數的後驗分配是否改變。整個過程包含三步驟,第一步,用Metroplois-Hastings 演算法產生後驗分配的樣本。第二步,把多維度資料映射到一維空間上。最後一步則是用兩樣本貝氏無母數適合度檢定檢驗後驗分配是否改變。接著由模擬方法驗證此方法的可行性。再將方法應用在網路新聞資料上,發現此筆資料的模型每十二個禮拜模型皆會改變。

    Data Streams have become popular due to the burgeoning development of data collection devices. Detecting model changes is one of major difficulties in analyzing streaming data. In this thesis, we provide a procedure to detect model changes in the Bayesian perspective. The idea is to examine whether the posterior distributions of the past data and the current data are identical. The detection procedure consists of three steps. In the first step, we utilize the Metropolis-Hastings algorithm to draw samples from the two posterior distributions. In the second step, we map multi-dimensional posterior samples to one-dimensional samples. Last, we apply two- (one-dimensional) sample Bayesian nonparametric goodness-of-fit test to examine whether the model changes. Simulation studies showed the result of this procedure. Then, we applied the procedure to an online news popularity data and our conclusion supported the conjecture “model changes over time frequently”.

    摘要i Abstract ii 誌謝iii Table of Contents iv List of Tables v List of Figures vi Chapter 1. Introduction 1 Chapter 2. Review of Dirichlet Process and Its Properties 2 2.1 Dirichlet process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 A Bayesian Nonparametric goodness-of-fit Test of two samples . . . . . . . 4 2.3 Bayesian Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 3. Bayesian Model Change Detection 10 3.1 Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Model Change Detection Procedure . . . . . . . . . . . . . . . . . . . . . 13 3.3 Bayesian Nonparametric Goodness-of-fit Test of Multivariate-Two Samples 14 Chapter 4. Simulation Studies 16 4.1 Performance of Univariate-Two Sample Test . . . . . . . . . . . . . . . . . 16 4.2 Performance of Multivariate-Two Sample Test . . . . . . . . . . . . . . . . 18 4.3 Performance of the Model Change Detection Procedure . . . . . . . . . . . 20 Chapter 5. Analysis of the online news popularity data 22 Chapter 6. Conclusion and future work 25 References 26 Appendix A. 27 Appendix B. 29

    References
    Al Labadi, L., Masuadi, E., and Zarepour, M. (2014). Two-sample bayesian nonparametric
    goodness-of-fit test.
    Bondesson, L. (1982). On simulation from infinitely divisible distributions. Advances in
    Applied Probability, 14(04):855–869.
    Ferguson, T. S. (1973). A bayesian analysis of some nonparametric problems. The annals of
    statistics, pages 209–230.
    Ferguson, T. S. and Klass, M. J. (1972). A representation of independent increment processes
    without gaussian components. The Annals of Mathematical Statistics, 43(5):1634–1643.
    Fernandes, K., Vinagre, P., and Cortez, P. (2015). A proactive intelligent decision support
    system for predicting the popularity of online news. In Portuguese Conference on Artificial
    Intelligence, pages 535–546. Springer.
    Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., and Bouchachia, A. (2014). A survey on
    concept drift adaptation. ACM Computing Surveys (CSUR), 46(4):44.
    Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2014). Bayesian data analysis,
    volume 2. Chapman & Hall/CRC Boca Raton, FL, USA.
    Gurevich, G. and Vexler, A. (2005). Change point problems in the model of logistic regression.
    Journal of Statistical Planning and Inference, 131(2):313–331.
    Jordan, M. I. et al. (1995). Why the logistic function? a tutorial discussion on probabilities
    and neural networks.
    Kim, H.-J. and Siegmund, D. (1989). The likelihood ratio test for a change-point in simple
    linear regression. Biometrika, pages 409–423.
    Martin, A. D., Quinn, K. M., and Park, J. H. (2005). Markov chain monte carlo (mcmc)
    package. URL (consulted Oct. 2006): http://mcmcpack. wustl. edu.
    O’brien, S. M. and Dunson, D. B. (2004). Bayesian multivariate logistic regression. Biometrics,
    60(3):739–746.
    Quandt, R. E. (1958). The estimation of the parameters of a linear regression system obeying
    two separate regimes. Journal of the american statistical association, 53(284):873–880.
    Sethuraman, J. (1994). A constructive definition of dirichlet priors. Statistica sinica, pages
    639–650.
    Ulm, K. (1991). A statistical method for assessing a threshold in epidemiological studies.
    Statistics in medicine, 10(3):341–349.
    Zarepour, M. and Al Labadi, L. (2012). On a rapid simulation of the dirichlet process. Statistics
    & Probability Letters, 82(5):916–924.

    無法下載圖示 校內:2022-08-05公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE