簡易檢索 / 詳目顯示

研究生: 范紘維
Fan, Hong-Wei
論文名稱: 具有自動核選擇和多重近似核的增強型增量可擴展核主成份分析
An Enhanced Incremental Scalable Kernel Principal Component Analysis with Automatic Kernel Selection and Multiple Approximate Kernels
指導教授: 謝錫堃
Shieh, Ce-Kuen
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 51
中文關鍵詞: 降維核主成份分析近似核自動核選擇
外文關鍵詞: Dimensionality Reduction, Kernel Principal Component Analysis, Approximate Kernels, Automatic Kernel Selection
相關次數: 點閱:33下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在處理高維數據時,降維技術至關重要,因其能有效減少數據複雜性、降低計算開銷並提高模型性能。主成份分析(PCA)主要揭示數據中的線性關係,而核主成份分析(KPCA)則通過核函數處理非線性數據,但計算量巨大,對大數據集的處理能力有限。為了解決KPCA在大數據集上的計算問題,我們實驗室先前的研究開發了可擴展核主成份分析(SKPCA),該方法利用隨機傅里葉特徵(RFF)近似高斯核,有效減少了計算量。隨後,進一步提出了增量可擴展核主成份分析(ISKPCA),使其能夠處理不斷增長的數據流。然而,這些方法存在一些限制。先前的研究主要依賴單一核函數,且隨著數據流增多,誤差會逐漸累積,導致性能下降。此外,核函數需要手動調整參數的過程繁瑣且不靈活,難以適應不同數據集的需求。為了解決這些問題,我們提出了增強型增量可擴展核主成份分析(EIS-ISKPCA)。該方法通過從多重近似核中自動選擇核函數和參數,避免了單一核函數的限制,並且無需手動調整核函數的參數,從而提高了方法的靈活性和準確性。EIS-ISKPCA進一步採用了正交隨機特徵(ORF)來近似高斯核,這一改進顯著減少了誤差並提升了模型的穩定性。實驗結果表明,EIS-KPCA不僅能有效地在多重近似核上進行降維,使其可擴展的應用範圍擴大,降維結果也顯著優於ISKPCA。

    Dimensionality reduction techniques are crucial when dealing with high-dimensional data, as they can effectively reduce data complexity, lower computational costs, and improve model performance. Principal Component Analysis (PCA) primarily reveals linear relationships within the data, whereas Kernel Principal Component Analysis (KPCA) addresses nonlinear data using kernel functions. However, KPCA is computationally intensive and has limited capacity for handling large datasets. To address the computational issues of KPCA on large datasets, our laboratory's previous research developed Scalable Kernel Principal Component Analysis (SKPCA), which leverages Random Fourier Features (RFF) to approximate the Gaussian kernel, effectively reducing computational load. Subsequently, Incremental Scalable Kernel Principal Component Analysis (ISKPCA) was proposed, allowing the method to handle growing data streams. Nevertheless, these methods have certain limitations. Prior research mainly relied on a single kernel function, and as the data stream grows, errors gradually accumulate, leading to performance degradation. Moreover, the process of manually tuning kernel function parameters is cumbersome and inflexible, making it challenging to adapt to the needs of different datasets. To address these issues, we propose the Enhanced Incremental Scalable Kernel Principal Component Analysis method (EIS-ISKPCA). This method automatically selects kernel functions and parameters from multiple approximate kernels, avoiding the limitations of a single kernel function and eliminating the need for manual parameter tuning, thus enhancing flexibility and accuracy. EIS-ISKPCA further employs Orthogonal Random Features (ORF) to approximate the Gaussian kernel, significantly reducing errors and improving model stability. Experimental results demonstrate that EIS-KPCA not only effectively performs dimensionality reduction across multiple approximate kernels, expanding its scalable application range, but also significantly outperforms ISKPCA in dimensionality reduction results.

    Chapter 1 : Introduction 1 Chapter 2 : Background and Related Works 4 2.1 Kernel Principal Component Analysis 4 2.2 Scalable Kernel Principal Component Analysis 5 2.3 Incremental Scalable Kernel Principal Component Analysis 6 2.4 Approximation Kernel Algorithm 8 2.5 Related Works 11 Chapter 3 : Methodology 13 3.1 Overview 13 3.2 Automatic Kernel Selection 13 3.3 Feature Mapping 15 3.4 Truncated Matrix 16 3.5 Dimension Reduction 18 3.5.1 Dimension Reduction of the Training Dataset 18 3.5.2 Dimension Reduction of the Testing Dataset 19 Chapter 4 : Experiments 21 4.1 Environment 21 4.2 Performance Comparison of KPCA and EIS-KPCA 21 4.3 Runtime Measurement 23 4.4 Visualization 23 4.5 Automatic Kernel Selection 27 4.6 Classification Accuracy Comparison 28 4.7 Approximation Error Measurement 29 4.8 Reconstruction Quality Evaluation 31 Chapter 5 : Conclusion and Future Work 38 Chapter 6 : References 39

    [1] Wold, S., Esbensen, K. & Geladi, P. "Principal component analysis. " Chemometr. Intell. Lab. Syst. 2, 37–52 (1987).
    [2] Schölkopf, Bernhard, Alexander Smola, and Klaus-Robert Müller. "Kernel principal component analysis." International conference on artificial neural networks. Springer, Berlin, Heidelberg, 1997.
    [3] Lin Yi-Jui, Shieh Ce-Kuen, and Chang Jyh-Biau. "A Scalable Kernel Principal Component Analysis based on Random Fourier Features (2022)". https://hdl.handle.net/11296/88gb57
    [4] Rahimi, Ali, and Benjamin Recht. "Random features for large-scale kernel machines." Advances in neural information processing systems (2007).
    [5] Hong Ying-You, Shieh Ce-Kuen, and Chang Jyh-Biau. "An Incremental Scalable Kernel Principal Component Analysis Method (2023)". https://hdl.handle.net/11296/h7jash
    [6] Brand, Matthew. "Incremental singular value decomposition of uncertain data with missing values." Computer Vision—ECCV 2002: 7th European Conference on Computer Vision Copenhagen, Denmark, May 28–31, 2002 Proceedings, Part I 7. Springer Berlin Heidelberg, 2002.
    [7] F. Yu, A.T. Suresh, K. Choromanski, D. Holtmannrice and S. Kumar, "Orthogonal random features", Proc. Int. Conf. Neural Inf. Process. Syst., pp. 1975-1983, 2016.
    [8] Y. Goldberg and M. Elhadad, "SVM: Fast Space-Efficient non-Heuristic Polynomial Kernel Computation for NLP Applications", Proc. ACL-08: HLT, 2008.
    [9] Vedaldi, A. and Zisserman, "Efficient additive kernels via explicit feature maps", Computer Vision and Pattern Recognition 2010.
    [10] Shiokawa, Y. et al. "Application of kernel principal component analysis and computational machine learning to exploration of metabolites strongly associated with diet" Sci. Rep. 8, 3426 (2018).
    [11] Pham, N. and Pagh, R. (2013), Fast and scalable polynomial kernels via explicit feature maps. In 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’13), ACM, pp. 239–247.
    [12] Atarashi, K., Maji, S., and Oyama, S. Random feature maps for the itemset kernel. In The Thirty-Third AAAI Conference on Artificial Intelligence, pp. 3199–3206, 2019.
    [13] Gavish, Matan, and David L. Donoho. "The optimal hard threshold for singular values is $4/sqrt {3} $." IEEE Transactions on Information Theory 60.8 (2014): 5040-5053.
    [14] Alam, M.A. and Fukumizu, K., Hyperparameter Selection in Kernel Principal Component Analysis, J. Comput. Sci, 10(7):1139-1150, 2014.
    [15] N. Cristianini, A. Elisseeff, J. Shawe-Taylor and J. Kandola, "On kernel-target alignment", Proc. NIPS, 2001.
    [16] Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: A next-generation hyperparameter optimization framework. In A. Teredesai, V. Kumar, Y. Li, R. Rosales, E. Terzi, & G. Karypis (Eds.), KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 2623–2631). ACM.
    [17] D. Donoho, M. Gavish, and E. Romanov. ScreeNOT: Exact MSE-optimal singular value thresholding in correlated noise. The Annals of Statistics, 51(1): 122 – 148, 2023.
    [18] Anguita, D.; Ghio, A.; Oneto, L.; Parra, X.; Reyes-Ortiz, J.L. A public domain dataset for human activity recognition using smartphones. In Proceedings of the 21th International European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, 24–26 April 2013.
    [19] S. Kumar, M. Mohri and A. Talwalkar, "Sampling methods for the Nyström method", J. Mach. Learn. Res., vol. 13, pp. 981-1006, Apr. 2012.

    無法下載圖示 校內:2029-08-27公開
    校外:2029-08-27公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE