簡易檢索 / 詳目顯示

研究生: 李思源
Lee, Szu-Yuan
論文名稱: 資料驅動空間填充抽樣在監督式與非監督式學習
Data Driven Space Filling Sampling in Supervised and Semi Supervised Learning
指導教授: 陳瑞彬
Chen, Ray-Bing
學位類別: 碩士
Master
系所名稱: 管理學院 - 統計學系
Department of Statistics
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 28
中文關鍵詞: 資料驅動空間填充抽樣差異性代表性樣本初始標記
外文關鍵詞: Data driven space-filling sampling, Discrepancy, Representative sample, Initial labeling
相關次數: 點閱:68下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 為了在大多數學習任務中取得更好的效果,我們會希望我們的訓練樣本對於母體來 說是最具有代表性的。為了實現這一目標,我們認為訓練樣本應滿足空間填充的性 質。但是,傳統的空間填充設計不能直接使用,因為我們只有一組採樣點,而沒有 連續的實驗空間。因此,這裡採用 [15] 中提出的資料驅動的空間填充抽樣,希望可 以在分類問題下搭配包含監督式與半監督式學習的方法,進而從中看到其方法之效 果。除了模擬研究以外,本文還說明了一些實際資料的應用。

    To achieve better performance in most of the learning tasks, we want that the training samples are the most representative to the population. To achieve this goal, we think that the training samples should satisfy the space-filling property. However, the traditional space-filling design cannot be used directly, because we only have a set of sample points, not a continuous experimental space. Thus the data driven space-filling sampling proposed in [15] is adopted here and we want to see its effects in (semi-) supervised learning approaches for the classification problems. In addition to the simulation studies, several real data are also illustrated in this thesis.

    摘要 ........................................................................................ i Abstract ................................................................................. ii Acknowledgements ................................................................ iii Table of Contents ................................................................... iv List of Tables .......................................................................... v List of Figures ......................................................................... vi Chapter 1. Introduction ............................................................ 1 Chapter 2. Data Driven Space-Filling Sampling ....................... 2 2.1. Discrepancy ...................................................................... 2 2.1.1. Star Discrepancy ............................................................. 3 2.1.2. F-Discrepancy ................................................................ 3 2.2. Uniform Design and Space-Filling Design ......................... 4 2.3. DataDrivenSpace-FillingSampling ..................................... 5 2.4. Illustrations ....................................................................... 6 2.4.1. Case1: single core, spread widely ................................... 6 2.4.2. Case2: two cores ............................................................ 8 2.4.3. Case3: four cores, spread widely ................................... 10 2.4.4. Case4: four cores, extreme high-density ....................... 12 Chapter 3. Simulation Study .................................................... 15 3.1. Experimental Settings ........................................................ 15 3.2. Simulated Data .................................................................. 15 Chapter 4. Experimental Results ............................................... 18 4.1. The Effect of the Number of Principal Components ........... 18 4.2. Supervised Learning .......................................................... 20 4.3. Semi-Supervised Learning ................................................. 22 Chapter 5. Conclusion and Future Work .................................... 26 References ................................................................................ 27

    [1] Abdalla G. M. Ahmed, Hélène Perrier, David Coeurjolly, Victor Ostromoukhov, Jian- wei Guo, Dong-Ming Yan, Hui Huang, and Oliver Deussen. Low-discrepancy blue noise sampling. ACM Trans. Graph., 35(6), November 2016.
    [2] James C. Bezdek, Robert Ehrlich, and William Full. Fcm: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2):191 – 203, 1984.
    [3] Simon Breneis and Aicke Hinrichs. Fibonacci lattices have minimal dispersion on the two-dimensional torus. arXiv e-prints, page arXiv:1905.03856, May 2019.
    [4] Simon Breneis and Aicke Hinrichs. Fibonacci lattices have minimal dispersion on the two-dimensional torus, 2019.
    [5] Rob Carnell. lhs: Latin Hypercube Samples, 2019. R package version 1.0.1.
    [6] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017.
    [7] Kai-Tai Fang, Min-Qian Liu, Hong Qin, and Yongdao Zhou. Theory and Application of Uniform Experimental Designs. Chapman and Hall/CRC, 10 2018.
    [8] Kai-TaiFang, Yuan Wang, and Peter M. Bentler. Some applications of number-theoretic methods in statistics. Statist. Sci., 9(3):416–428, 08 1994.
    [9] Rong Hu, Brian Mac Namee, and Sarah Jane Delany. Off to a good start: Using clustering to select the initial training set in active learning. In Hans W. Guesgen and R. Charles Murray, editors, Proceedings of the Twenty-Third International Florida Artificial Intel- ligence Research Society Conference, May 19-21, 2010, Daytona Beach, Florida, USA. AAAI Press, 2010.
    [10] Y. Wang Kai-Tai Fang. Number-Theoretic Methods in Statistics. Chapman and Hall/CRC, 1st edition, 1993.
    [11] Jaeho Kang, Kwang Ryel Ryu, and Hyuk-Chul Kwon. Using cluster-based sampling to select initial training set for active learning in text classification. In PAKDD, 2004.
    [12] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon- del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
    [13] Josh Warner, Jason Sexauer, scikit fuzzy, twmeggs, alexsavio, Aishwarya Unnikrish- nan, Guilherme Castelão, Felipe Arruda Pontes, Tobias Uelwer, pd2f, laurazh, Fernando Batista, alexbuy, Wouter Van den Broeck, William Song, The Gitter Badger, Roberto Abdelkader Martínez Pérez, James F. Power, Himanshu Mishra, Guillem Orellana Trul- lols, Axel Hörteborn, and 99991. Jdwarner/scikit-fuzzy: Scikit-fuzzy version 0.4.2, November 2019.
    [14] Weiwei Yuan, Yongkoo Han, Donghai Guan, Sungyoung Lee, and Young-Koo Lee. Initial training data selection for active learning. In Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication, ICUIMC ’11, New York, NY, USA, 2011. Association for Computing Machinery.
    [15] Aijun Zhang, M. Zhang and Y.-D. Zhou. Data-driven Space-filling Design.
    "http://www.statsoft.org/wp-content/uploads/2020Stat3622/Lecture12_ BigDataViz/20181110DSD_Nankai.pdf", 2018.
    [16] Dengyong Zhou, Olivier Bousquet, Thomas N. Lal, Jason Weston, and Bernhard Schölkopf. Learning with local and global consistency. In S. Thrun, L. K. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16, pages 321–328. MIT Press, 2004.
    [17] Jingbo Zhu, Huizhen Wang, Tianshun Yao, and Benjamin K Tsou. Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In Proceedings of the 22nd International Conference on Computational Lin- guistics (Coling 2008), pages 1137–1144, Manchester, UK, August 2008. Coling 2008 Organizing Committee.

    下載圖示 校內:2025-07-19公開
    校外:2025-07-19公開
    QR CODE