簡易檢索 / 詳目顯示

研究生: 曾川銘
Tseng, Chuan-Ming
論文名稱: 資料流中的機率分佈模式搜尋
Probability Distribution Pattern Searching in Data Streaming
指導教授: 蕭宏章
Hsiao, Hung-Chang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 人工智慧科技碩士學位學程
Graduate Program of Artificial Intelligence
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 70
中文關鍵詞: 資料流 (Data Streams)模式搜尋 (Pattern Searching)動差生成函數 (Moment-Generating Function, MGF)動差 (Moment)高階動差 (High-order moments)資料特徵化 (Data Characteristic)相關性係數 (Correlation coefficient)數據探勘 (Data mining)動態時間扭曲 (Dynamic Time Warping)
外文關鍵詞: Data Streams, Pattern Searching, Moment-Generating Function, Moment, High-order moments, Data Characteristic, Correlation coefficient, Data mining, Dynamic Time Warping
相關次數: 點閱:123下載:10
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究旨在探討資料流中的機率分佈模式搜尋,運用動差生成函數 (Moment-Generating Function, MGF) 能夠全面性描述數據機率分佈的特性,將數據特徵化為多階動差 (Moment),並在資料流中,進行模式搜尋 (Pattern Searching),也稱為模式匹配 (Pattern Matching) 或相似性搜尋 (Similarity Search)。我們比較不同的相似度搜尋方法,在精準度、時間複雜度、彈性的差異,包含: 動差相似度、K-S 檢定 (Kolmogorov-Smirnov Test, K-S Test)、共變異數 (Covariance, COV) 和交叉熵 (Cross-Entropy),共 4 種相似度搜尋方法。同時,我們使用 Dynamic Time Warping (DTW) 演算法評估精準度,用來量測時間序列在搜尋結果與指定模式之間的差異,並在資料流的搜尋過程中,採用 Knuth-Morris-Pratt (KMP) 的演算法概念,將動差轉換為字元的方式,進行搜尋。
    論文一開始通過幾組實驗進行觀察,觀察由動差生成函數產生的各階動差,在不同機率分佈的數值變化,尤其是高階動差 (High-Order Moments) 在細微變化上較為敏感,可以偵測到瞬間變化量,適合應用在觀察異常或極值的情況。此外,搭配常用的視覺化圖表,讓使用者能直覺地理解其中差異,其中包含:直方圖 (Histogram)、箱形圖 (Boxplot)、分位數圖 (Quantile–Quantile Plot, Q-Q plot)。接著,應用這些基礎,將數據特徵化為動差後,配合相似度和門檻,我們可以用於數據的篩選與過濾,以及檢測數據中的異常模式,有效地幫助使用者快速識別並處理資料中的異常情況,進而提高數據品質和分析結果的可靠性。最後我們回到主題的搜尋應用上,如何實現動差相似度,將動差應用在資料流的模式搜尋,用來找出具有相似統計特徵的數據區段,進一步擴展了動差生成函數在資料探勘 (Data Mining) 中的應用範疇。

    We explore the probability distribution pattern searching in data streaming. By using the Moment-Generating Function (MGF), we can comprehensively describe the characteristics of data probability distributions, and then transform the data into multi-order moments for pattern searching in data streams, also known as pattern matching or similarity search.
    We compared different similarity search methods about accuracy, time complexity, and flexibility, including: moment similarity, Kolmogorov-Smirnov Test (K-S Test), Covariance (COV), and Cross-Entropy, totaling four similarity search methods. Additionally, we evaluated accuracy using the Dynamic Time Warping (DTW) algorithm, which measures the differences between the search results and the specified pattern in time series. During the data stream search process, we adopted the concept of the Knuth-Morris-Pratt (KMP) algorithm by transforming moments into characters for the search.

    摘要 i Abstract ii 英文延伸摘要 iii 誌謝 vii 目錄 viii 表格 x 圖片 xi 名詞解釋 xii 第一章 緒論 1 1.1. 研究背景 1 1.2. 研究動機 1 1.3. 目標與貢獻 2 1.4. 論文架構 4 第二章 相關工作 6 2.1. 現況 6 2.2. 文獻探討 7 2.3. 理論基礎 7 . 2.3.1. 時域特徵 8 . 2.3.2. 頻域特徵 8 . 2.3.3. 時頻特徵 (小波轉換) 10 . 2.3.3.Knuth-Morris-Pratt (KMP) 12 . 2.3.3.Dynamic Time Warping (DTW) 12 第三章 資料流中的機率分佈模式搜尋 13 3.1. 動差生成函數 15 . 3.1.1. 動差生成函數在其他領域的應用 19 . 3.1.2. 理論基礎 19 3.2. 動差在不同機率分佈的表現 21 . 3.2.1. 不同週期正弦波的的模擬數值 21 . 3.2.2. 不同機率分佈的模擬數值 23 3.3. 相似度比較 24 . 3.3.1. 動差相似度 24 . 3.3.2.Kolmogorov-Smirnov Test 26 . 3.3.3.Covariance 26 . 3.3.4.Cross-Entropy 27 第四章 實驗與討論 28 4.1. 儀器設備 28 . 4.1.1. 硬體環境 28 . 4.1.2. 軟體環境 29 4.2. 實驗準備與說明 29 . 4.2.1. 環境架設 29 . 4.2.2. 數據與資訊流程 30 4.2.3. 實作內容 32 4.3. 結果與討論 36 . 4.3.1. 資料流中的機率分佈模式搜尋 36 . 4.3.2. 特徵化風扇的振動數據 38 4.4. 動差生成函數的應用 45 . 4.4.1. 篩選數據 45 . 4.4.2. 搜尋數據 47 4.5. 動差生成函數與常見特徵化的比較 49 第五章 結論 52 5.1. 結論 52 5.2. 未來展望 52 References 54

    [1] Jose Vicente Abellan-Nebot and Fernando Romero Subirón. A review of machining monitoring systems based on artificial intelligence process models. The International Journal of Advanced Manufacturing Technology, 47:237–257, 2010.
    [2] Alberto Bellini, Fiorenzo Filippetti, Carla Tassoni, and Gérard-André Capolino. Advances in diagnostic techniques for induction machines. IEEE Transactions on industrial electronics, 55(12):4109–4126, 2008.
    [3] Tarak Benkedjouh, Kamal Medjaher, Noureddine Zerhouni, and Said Rechak. Health assessment and life prediction of cutting tools based on support vector regression. Journal of intelligent manufacturing, 26:213–223, 2015.
    [4] Binqiang Chen, Zhousuo Zhang, Chuang Sun, Bing Li, Yanyang Zi, and Zhengjia He. Fault feature extraction of gearbox by using overcomplete rational dilation discrete wavelet transform on signals measured from vibration sensors. Mechanical Systems and Signal Processing, 33:275–298, 2012.
    [5] Howard C Choe, Yulun Wan, and Andrew K Chan. Neural pattern identification of railroad wheel-bearing faults from audible acoustic signals: comparison of fft, cwt, and dwt features. In Wavelet applications IV, volume 3078, pages 480–496. SPIE, 1997.
    [6] Charles K Chui. An introduction to wavelets, volume 1. Academic press, 1992.
    [7] James W Cooley and John W Tukey. An algorithm for the machine calculation of complex fourier series. Mathematics of computation, 19(90):297–301, 1965.
    [8] Jean Baptiste Joseph Fourier. Théorie analytique de la chaleur, volume 1. GauthierVillars, 1888.
    [9] Dennis Gabor. Theory of communication. part 1: The analysis of information. Journal of the Institution of Electrical Engineers-part III: radio and communication engineering, 93(26):429–441, 1946.
    [10] Alexander Grossmann and Jean Morlet. Decomposition of hardy functions into square integrable wavelets of constant shape. SIAM journal on mathematical analysis, 15(4):723–736, 1984.
    [11] RBW Heng and Mohd Jailani Mohd Nor. Statistical analysis of sound and vibration signals for monitoring rolling element bearing condition. Applied Acoustics, 53(1-3):211–226, 1998.
    [12] Norden E Huang, Zheng Shen, Steven R Long, Manli C Wu, Hsing H Shih, Quanan Zheng, Nai-Chyuan Yen, Chi Chao Tung, and Henry H Liu. The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of London. Series A: mathematical, physical and engineering sciences, 454(1971):903–995, 1998
    [13] PL Novi Inverardi and Aldo Tagliani. Discrete distributions from moment generating function. Applied mathematics and computation, 182(1):200–209, 2006.
    [14] Živana B Jakovljević. Comparative analysis of hilbert huang and discrete wavelet transform in processing of signals obtained from the cutting process: An intermittent turning example. FME transactions, 41(4):342–348, 2013.
    [15] Andrew KS Jardine, Daming Lin, and Dragan Banjevic. A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mechanical systems and signal processing, 20(7):1483–1510, 2006.
    [16] Kamran Javed, Rafael Gouriveau, Noureddine Zerhouni, and Patrick Nectoux. Enabling health monitoring approach based on vibration data for accurate prognostics. IEEE Transactions on industrial electronics, 62(1):647–656, 2014.
    [17] Ian T Jolliffe. Principal component analysis for special types of data. Springer, 2002.
    [18] kaggle. NASA Bearing Dataset. https://www.kaggle.com/datasets/vinayak123tyagi/bearingdataset.
    [19] Daniel Kifer, Shai Ben-David, and Johannes Gehrke. Detecting change in data streams. In VLDB, volume 4, pages 180–191. Toronto, Canada, 2004.
    [20] HJ Landau. Sampling, data transmission, and the nyquist rate. Proceedings of the IEEE, 55(10):1701–1706, 1967.
    [21] Yaguo Lei, Jing Lin, Zhengjia He, and Ming J Zuo. A review on empirical mode decomposition in fault diagnosis of rotating machinery. Mechanical systems and signal processing, 35(1-2):108–126, 2013.
    [22] 蔡有藤 (Yuo-Tern Tsai); 陳宗傑 (Chung-Chieh Chen); 廖哲賢 (Zhe-Xian Liao). 機械系統性能衰退預測與故障診斷之研究. Journal of Technology, 27(3):121–129, September 2012.
    [23] Stephane G Mallat. A theory for multiresolution signal decomposition: the wavelet representation. IEEE transactions on pattern analysis and machine intelligence, 11(7):674–693, 1989.
    [24] Nicholas Metropolis and Stanislaw Ulam. The monte carlo method. Journal of the American statistical association, 44(247):335–341, 1949.
    [25] Eli Upfal Michael Mitzenmacher. Probability and Computing Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 1st edition, 2005.
    [26] Asoke K Nandi, Chao Liu, and ML Dennis Wong. Intelligent vibration signal processing for condition monitoring. In Proceedings of the International Conference Surveillance, volume 7, pages 1–15, 2013.
    [27] Patrick Nectoux, Rafael Gouriveau, Kamal Medjaher, Emmanuel Ramasso, Brigitte Chebel-Morello, Noureddine Zerhouni, and Christophe Varnier. Pronostia: An experimental platform for bearings accelerated degradation tests. In IEEE International Conference on Prognostics and Health Management, PHM’12., pages 1–8. IEEE Catalog Number: CPF12PHM-CDR, 2012.
    [28] Von Neumann. Various techniques used in connection with random digits. Notes by GE Forsythe, pages 36–38, 1951.
    [29] Allen Craig Robert Hogg, Joseph McKean. Introduction to Mathematical Statistics. Pearson, 8th edition, 2005.
    [30] Sebastian Schelter, Dustin Lange, Philipp Schmidt, Meltem Celikel, Felix Biessmann, and Andreas Grafberger. Automating large-scale data quality verification. Proceedings of the VLDB Endowment, 11(12):1781–1794, 2018.
    [31] Sergey Sukhanov, Renzhi Wu, Christian Debes, and Abdelhak M Zoubir. Dynamic pattern matching with multiple queries on large scale data streams. Signal Processing, 171:107402, 2020.
    [32] Weizhong Yan, Hai Qiu, Naresh Iyer, AIR FORCE RESEARCH LAB WRIGHTPATTERSON AFB OH MATERIALS, and MANUFACTURING DIRECTORATE. Feature extraction for bearing prognostics and health management (phm)-a survey (preprint). Geofluids, 11(4):343–348, 2008.
    [33] Jianbo Yu. A hybrid feature selection scheme and self-organizing map model for machine health assessment. Applied Soft Computing, 11(5):4041–4054, 2011.
    [34] 王凱立 and 林嘉慧. 條件高階動差於財務金融市場之應用. 財務金融學刊, 11(2):1–42, 2003.
    [35] 藍青玉. 消費尤拉方程式: 高階動差的理論與實證重要性. PhD thesis, 2005.
    [36] 許長文. 基於高階次方與動差之廣義特徵正規化法之強健性語音辨識技術. PhD thesis, 2008.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE