簡易檢索 / 詳目顯示

研究生: 鍾孟勳
Zhong, Meng-Xun
論文名稱: 在串流環境中基於時間感知與密度的漸進式區域異常檢測
TADILOF: Time Aware Density-based Incremental Local Outlier Detection in Data Streams
指導教授: 黃仁暐
Huang, Jen-Wei
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 42
中文關鍵詞: 時間感知與密度離群值檢測串流資料空氣品質
外文關鍵詞: Outlier detection, data streams, air quality
相關次數: 點閱:121下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 串流資料中的離群值偵測演算法一直被視為很重要的問題,尤其是在IoT技術蓬勃發展的現在,大量的串流資料不斷的被產生,例如:智慧工廠製造、智能家電還有智慧城市當中的各種感測器,也讓離群值偵測演算法近年來越來越受重視,舉例來說,MILOF和DILOF都是串流資料中的離群值偵測演算法,兩者皆基於有名的LOF演算法所開發,LOF是一種基於密度的離群值偵測演算法,然而上述兩者皆沒有考慮到資料可能會隨著時間轉變的因素,因此,在這篇論文中我們提出了在數據流中基於時間感知與密度的漸增式區域異常檢測演算法(TADILOF),還有基於過往資訊的LOF近似值演算法,在TADILOF演算法當中,我們結合了移除過時資料的概念,讓模型的能夠自適應資料的趨勢,在LOF近似值演算法當中,我們運用的過往的資訊,求取可能的LOF值,大幅增加了在記憶體限制下的效能,在實驗章節中,我們的結果贏過當今最好的演算法,另一方面,我們也運用此演算法建構一套汙染事件即時偵測系統,我們可以在汙染事件發生時通知使用者,讓使用者做好準備。

    Outlier detection in data streams is crucial to data mining; however, this task is made increasingly difficult by enormous growth in the quantity of data generated by an expanding Internet of Things. Recent advances in outlier detection based on the density-based local outlier factor (LOF) algorithm (i.e., MILOF and DILOF) do not consider variations in data over time. This paper presents a novel algorithm for streaming data, referred to as time-aware density-based incremental local outlier detection (TADILOF) to overcome this issue. We also developed a means of estimating the LOF score based on historical information following the removal of outdated data. Experiment results demonstrated that TADILOF outperforms current state-ofthe-art methods. We also present an application of the proposed scheme to the development of an air-quality monitoring system.

    中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . iv List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . vii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . 5 2.1 LOF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 iLOF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 MILOF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 8 2.4 DILOF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 9 3 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1 Time Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 Approximate LOF score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3 Time and Space Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Experiment Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2.1 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3 Skipping scheme for sequence of outliers . . . . . . . . . . . . . . . . . . . . . . 30 5 PM2.5 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.1 Monitor PM2.5 Pollution Event . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.2 PM2.5 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.3 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    [1] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “Lof: identifying density-basedlocal outliers,” in ACM sigmod record, vol. 29, no. 2. ACM, 2000, pp. 93–104.
    [2] H.-P. Hsieh, S.-D. Lin, and Y. Zheng, “Inferring air quality for station locationrecommendation based on urban big data.” in KDD, L. Cao, C. Zhang, T. Joachims, G. I.Webb, D. D. Margineantu, and G. Williams, Eds. ACM, 2015, pp. 437–446. [Online].Available: http://dblp.uni-trier.de/db/conf/kdd/kdd2015.html#HsiehLZ15
    [3] Y. Zheng, F. Liu, and H.-P. Hsieh, “U-air: When urban air quality inference meets bigdata,” in International Conference on Knowledge Discovery and Data Mining (SIGKDD),ser. KDD ’13. New York, NY, USA: ACM, 2013, pp. 1436–1444. [Online]. Available:http://doi.acm.org/10.1145/2487575.2488188
    [4] L.-J. Chen, Y.-H. Ho, H.-H. Hsieh, S.-T. Huang, H.-C. Lee, and S. Mahajan,“Adf: An anomaly detection framework for large-scale pm2.5 sensing systems.” IEEEInternet of Things Journal, vol. 5, no. 2, pp. 559–570, 2018. [Online]. Available:http://dblp.uni-trier.de/db/journals/iotj/iotj5.html#ChenHHHLM18
    [5] D. Pokrajac, A. Lazarevic, and L. J. Latecki, “Incremental local outlier detectionfor data streams.” in CIDM. IEEE, 2007, pp. 504–515. [Online]. Available:http://dblp.uni-trier.de/db/conf/cidm/cidm2007.html#PokrajacLL07
    [6] M. Salehi, C. Leckie, J. C. Bezdek, T. Vaithianathan, and X. Zhang, “Fastmemory efficient local outlier detection in data streams.” IEEE Trans. Knowl.Data Eng., vol. 28, no. 12, pp. 3246–3260, 2016. [Online]. Available: http://dblp.uni-trier.de/db/journals/tkde/tkde28.html#SalehiLBVZ16
    [7] G. S. Na, D. H. Kim, and H. Yu, “Dilof: Effective and memory efficient local outlierdetection in data streams.” in KDD, Y. Guo and F. Farooq, Eds. ACM, 2018, pp. 1993–2002. [Online]. Available: http://dblp.uni-trier.de/db/conf/kdd/kdd2018.html#NaKY18
    [8] Y. Zheng, X. Yi, M. Li, R. Li, Z. Shan, E. Chang, and T. Li, “Forecasting fine-grained airquality based on big data.” in KDD, L. Cao, C. Zhang, T. Joachims, G. I. Webb, D. D. Margineantu, and G. Williams, Eds. ACM, 2015, pp. 2267–2276. [Online]. Available:http://dblp.uni-trier.de/db/conf/kdd/kdd2015.html#ZhengYLLSCL15
    [9] P.-W. Soh, J.-W. Chang, and J.-W. Huang, “Adaptive deep learning-based air qualityprediction model using the most relevant spatial-temporal relations.” IEEE Access, vol. 6,pp. 38 186–38 199, 2018. [Online]. Available: http://dblp.uni-trier.de/db/journals/access/access6.html#SohCH18
    [10] G. Hulten, L. Spencer, and P. Domingos, “Mining time-changing data streams,” in KDD’01: Proceedings of the seventh ACM SIGKDD international conference on Knowledgediscovery and data mining. New York, NY, USA: ACM, 2001, pp. 97–106. [Online].Available: http://portal.acm.org/citation.cfm?id=502512.502529
    [11] A. Tsymbal, “The problem of concept drift: definitions and related work,” The Universityof Dublin, Trinity College, Department of Computer Science, Dublin, Ireland, Tech. Rep.TCD-CS-2004-15, 2004.
    [12] W. Fan, “Systematic data selection to mine concept-drifting data streams.” in KDD,W. Kim, R. Kohavi, J. Gehrke, and W. DuMouchel, Eds. ACM, 2004, pp. 128–137.[Online]. Available: http://dblp.uni-trier.de/db/conf/kdd/kdd2004.html#Fan04
    [13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification withdeep convolutional neural networks,” in Advances in neural information processingsystems, 2012, pp. 1097–1105. [Online]. Available: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
    [14] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9,no. 8, pp. 1735–1780, 1997.
    [15] C. H. S. Ronald C. Henry, Yu-Shuo Chang, “Locating nearby sources ofair pollution by nonparametric regression of atmospheric concentrations on winddirection,” Atmospheric Environment, vol. 36, 2002. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1352231002001644
    [16] C. Chatfield, “Exploratory data analysis,” European Journal of Operational Research,vol. 23, no. 1, pp. 5 – 13, 1986. [Online]. Available: http://www.sciencedirect.com/science/article/pii/0377221786902092
    [17] F. P. Preparata and M. I. Shamos, Computational Geometry - An Introduction., ser. Textsand Monographs in Computer Science. Springer, 1985.
    [18] E. M. Knorr and R. T. Ng, “Algorithms for miningdistance-based outliers in large datasets,” 1998, pp. 392–403. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=9A0ED8BD4E41A9DC15F6E02185FD3B8D?doi=10.1.1.55.8026&rep=rep1&type=pdf
    [19] ——, “Finding intensional knowledge of distance-based outliers,” in VLDB ’99: Proceedingsof the 25th International Conference on Very Large Data Bases. San Francisco, CA,USA: Morgan Kaufmann Publishers Inc., 1999, pp. 211–222.
    [20] S. Ramaswamy, R. Rastogi, and K. Shim, “Efficient algorithms for mining outliers fromlarge data sets.” in SIGMOD Conference, W. Chen, J. F. Naughton, and P. A. Bernstein,Eds. ACM, 2000, pp. 427–438, sIGMOD Record 29(2), June 2000. [Online]. Available:http://dblp.uni-trier.de/db/conf/sigmod/sigmod2000.html#RamaswamyRS00
    [21] C. C. Aggarwal and S. Sathe, “Theoretical foundations and algorithms for outlierensembles.” SIGKDD Explorations, vol. 17, no. 1, pp. 24–47, 2015. [Online]. Available:http://dblp.uni-trier.de/db/journals/sigkdd/sigkdd17.html#AggarwalS15

    無法下載圖示 校內:2025-02-10公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE