| 研究生: |
洪英祐 Hong, Ying-You |
|---|---|
| 論文名稱: |
一個增量可擴展核主成份分析方法 An Incremental Scalable Kernel Principal Component Analysis Method |
| 指導教授: |
謝錫堃
Shieh, Ce-Kuen |
| 共同指導教授: |
張志標
Chang, Jyh-Biau |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 英文 |
| 論文頁數: | 31 |
| 中文關鍵詞: | 大數據分析 、核主成份分析 、隨機傅立葉特徵 、特徵工程 、機器學習 、奇異值分解 |
| 外文關鍵詞: | Big data analysis, KPCA, Random Fourier features, Feature engineering, Machine learning, SVD |
| 相關次數: | 點閱:80 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著科技的快速發展,人們能收集與存儲的數據量呈指數性地快速增長,大數據分析的重要性也與日俱增。分析巨量數據時,資料降維是不可或缺的環節,尤其當資料特徵數量非常龐大,比如影像處理,藉由去除冗餘特徵,可大幅度降低後續分類的成本,以及避免維數災難問題,也能改善分類器效能。我們實驗室開發的可擴展核主成份分析(SKPCA)是一個基於核主成份分析(KPCA)的演算法,透過隨機傅立葉特徵演算法近似核主成份分析,將資料集映射到有限維度的再生核希柏特空間,避免了矩陣大小為資料量平方的核矩陣的直接計算與存儲問題,然而可擴展核主成份分析在處理串流資料時,並不是這麼有效。本研究提出一個「增量可擴展核主成份分析(ISKPCA)」演算法,利用串流資料的特性使用增量式奇異值分解來近似映射後的資料,除了有較低的計算時間外也大幅度的減少資料儲存空間和資料中的雜訊,進而達到增量可擴展的核主成分分析。實驗結果表明,在串流資料的情境下「增量式可擴展核主成份分析」的降維效果非常接近可擴展核主成份分析,且不論是時間成本或者空間成本都優於可擴展核主成份分析,成功將核主成份分析推廣至串流資料的情境。
With the rapid development of technology, the collected or generated data has been growing exponentially, and the importance of big data analysis has also risen. When analyzing large-scale data, dimensionality reduction is an indispensable link, especially in the case of a very large number of features, such as image processing. By removing redundant features, the cost of classification will be greatly reduced, the curse of dimensionality will be avoided, and the performance of the classifier may be improved. Scalable Kernel Principal Component Analysis (SKPCA) developed in our laboratory is an algorithm based on Kernel Principal Component Analysis (KPCA), which approximates Kernel Principal Component Analysis through random Fourier Feature algorithm and mapping the data into the higher finite-dimensional reproducing kernel Hilbert space (RKHS), thus avoiding the calculation and storage problem of the kernel matrix whose complexity is quadratic. However, SKPCA is not so effective when dealing with streaming data. In this paper, we propose an Incremental Scalable Kernel Principal Component Analysis (ISKPCA) method that exploits the characteristics of streaming data to approximate mapping data using incremental singular value decomposition. In addition to having lower running time, it also reduces data storage space and noise in the data, thereby achieving incremental SKPCA. Experimental results show that in the background of streaming data, the dimensionality reduction effect of ISKPCA is very close to that of SKPCA, and it is better than SKPCA in terms of time and space cost. Successfully extending SKPCA to the context of streaming data.
[1] Shlens, Jonathon. "A tutorial on principal component analysis." arXiv preprint arXiv:1404.1100 (2014).
[2] Schölkopf, Bernhard, Alexander Smola, and Klaus-Robert Müller. "Kernel principal component analysis." International conference on artificial neural networks. Springer, Berlin, Heidelberg, 1997.
[3] Lin Yi-Jui, Shieh Ce-Kuen, and Chang Jyh-Biau. "A Scalable Kernel Principal Component Analysis based on Random Fourier Features" https://hdl.handle.net/11296/88gb57
[4] Rahimi, Ali, and Benjamin Recht. "Random features for large-scale kernel machines." Advances in neural information processing systems 20 (2007).
[5] Liu, Fanghui, et al. "Random Features for Kernel Approximation: A Survey on Algorithms, Theory, and Beyond." IEEE Transactions on Pattern Analysis & Machine Intelligence 01 (2021): 1-1.
[6] Hofmann, Thomas, Bernhard Schölkopf, and Alexander J. Smola. "Kernel methods in machine learning." (2008): 1171-1220.
[7] Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
[8] Rudin, Walter. Fourier analysis on groups. Courier Dover Publications, 2017.
[9] Brand, Matthew. "Incremental singular value decomposition of uncertain data with missing values." Computer Vision—ECCV 2002: 7th European Conference on Computer Vision Copenhagen, Denmark, May 28–31, 2002 Proceedings, Part I 7. Springer Berlin Heidelberg, 2002.
[10] Gavish, Matan, and David L. Donoho. "The optimal hard threshold for singular values is $4/sqrt {3} $." IEEE Transactions on Information Theory 60.8 (2014): 5040-5053.
[11] G. Golub and C. Van Loan. Matrix Computations, Third Edition, Chapter 5, Section 5.4.4, pp. 252-253.
[12] Ullah, Enayat, et al. "Streaming Kernel PCA with $ ilde {O}(sqrt {n}) $ Random Features." Advances in Neural Information Processing Systems 31 (2018).
[13] Ghashami, Mina, Daniel J. Perry, and Jeff Phillips. "Streaming kernel principal component analysis." Artificial intelligence and statistics. PMLR, 2016.
[14] Sarwar, Badrul, et al. "Incremental singular value decomposition algorithms for highly scalable recommender systems." Fifth international conference on computer and information science. Vol. 1. No. 012002. 2002.
[15] Brand, Matthew. "Incremental singular value decomposition of uncertain data with missing values." Computer Vision—ECCV 2002: 7th European Conference on Computer Vision Copenhagen, Denmark, May 28–31, 2002 Proceedings, Part I 7. Springer Berlin Heidelberg, 2002.
[16] Chin, Tat-Jun, and David Suter. "Incremental kernel principal component analysis." IEEE transactions on image processing 16.6 (2007): 1662-1674.
[17] Chin, Tat-Jun, and David Suter. "Incremental Kernel PCA for Efficient Non linear Feature Extraction." BMVC. 2006.
[18] *D. Ross, J. Lim, R. Lin, M. Yang, Incremental Learning for Robust Visual Tracking, International Journal of Computer Vision, Volume 77, Issue 1-3,pp. 125-141, May 2008.*
校內:2028-08-15公開