簡易檢索 / 詳目顯示

研究生: 張財實
Chang, Chai-Shi
論文名稱: 用於資料串流事件偵測任務之目標選取技術
Target Selection on Large-Scale Data Stream for Event Detection Task
指導教授: 莊坤達
Chuang, Kun-Ta
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 36
中文關鍵詞: 目標選取事件偵測
外文關鍵詞: target selection, event detection
相關次數: 點閱:39下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 大數據分析已是現今社會所被重要關注的議題之一,我們每一天接收到的資訊極為龐大 且複雜,若能更好的利用這些資訊,必能為人類帶來不少的好處以及便利。隨著大數據時代 的推進,物聯網裝置的開發也在不斷地進步增長,而最常見的應用服務之一便是事件偵測。 在大量數據的情況下執行事件偵測,系統往往需要提供足夠的資源來為每個用戶分配事件偵 測的工作量。因此,這會花費大量 CPU 資源和時間。
    在大量用戶的情況下,我們得到的龐大數據所包含的事件發生數量是非常稀疏的,而在 所得資料類別不均的情況下,我們提出了 Spatial-Temporal Candidate Sampling for Target Selection (STCS),一個專門為大量用戶所設計的事件偵測目標選取技術,通過運用空間局 部性以及時間局部性兩種方法來偵測數據中有可能發生事件的目標使用者,達到降低事件偵 測次數,進而降低運算資源。
    本論文分別在大型漏水事件偵測以及冷器開關偵測兩項實際場域上進行實驗。我們所提 出的方法在 80% 的事件達成率下,在漏水資料上僅使用了 17% 的運算資源,而在冷器開關 事件上僅使用了 45% 的運算資源。因此,我們所提出的技術可以很好的應用在大量使用者 的事件偵測任務上,透過非常少的運算資源,達到非常好的事件偵測達成率。

    Recently, big data analysis has become one of the most important topics in current society. We constantly acquire a vast amount of complicated information every day. These complicated data will be very beneficial and convenient for people if we can utilize this big data more effectively. With the advancement of the era of big data, the development of IoT devices is also growing, and one of the most common application services is event detection. To perform event detection with a large amount of data, the system must supply sufficient resources to allocate the effort of event detection among each user to accomplish event detection. Therefore, a lot of CPU time and resources are used.
    The event data we acquired from large-scale users are extremely scarce. Therefore, the un- equal data categories situation makes this task more challenging, so in this study, we propose an event detection target selection technique for a large number of users, named Spatial-Temporal Candidate Sampling for Target Selection (STCS), by utilizing two methods: spatial locality and temporal locality, to identify target users who may be aberrant in the large data and so lower the number of event detection and the computational resources required.
    In this paper, we test our proposed method in two real-life event detection problems, such as water leakage events and air conditioner on-off events as our experimental scenarios. Our proposed method uses only 17% of computational costs on water leakage events and 45% of computational costs on air conditioner on-off events at an 80% total recall rate. Therefore, the technique we propose can be well applied to the event detection task of large-scale users, achieving a very good event detection rate with very few computational costs.

    中文摘要 i Abstract ii Acknowledgment iii Contents iv List of Tables vi List of Figures vii 1 Introduction 1 2 Related Works 7 2.1 Imbalance Learning on Big Data 7 2.2 Event Detection 7 2.3 Similarity Search 8 2.4 Temporal Locality 9 3 Observations 10 4 Preliminary 12 4.1 Problem Statement 12 4.2 System Architecture 13 5 Methodologies 14 5.1 Hit Target and Candidate Pool 15 5.2 Similarity Search 16 5.3 Progressive Potential Score Matrix 18 5.4 Top ? and Random Selection 19 6 Experimental Results 21 6.1 Dataset and Experimental Settings 21 6.2 Baseline Methods and Evaluation Metrics 23 6.2.1 Evaluation Metrics 23 6.2.2 Baseline Methods 24 6.3 Experimental Results 24 6.4 Ablation Study 28 7 Future Work 30 8 Conclusion 31 Bibliography 32

    [1] N. Alghanmi, R. Alotaibi, and S. M. Buhari, “Machine learning approaches for anomaly detection in iot: An overview and future research directions,” Wireless Personal Commu- nications, pp. 1–16, 2021.
    [2] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, “Internet of things (iot): A vision, architectural elements, and future directions,” Future generation computer systems, vol. 29, no. 7, pp. 1645–1660, 2013.
    [3] B. Farahani, F. Firouzi, and K. Chakrabarty, “Healthcare iot,” in Intelligent internet of things. Springer, 2020, pp. 515–545.
    [4] D. Jorde and H.-A. Jacobsen, “Event detection for energy consumption monitoring,” IEEE Transactions on Sustainable Computing, vol. 6, no. 4, pp. 703–709, 2020.
    [5] J. Alves Coelho, A. Gl ́oria, and P. Sebastia ̃o, “Precise water leak detection using machine learning and real-time sensor data,” IoT, vol. 1, no. 2, pp. 474–493, 2020.
    [6] J. L. Leevy, T. M. Khoshgoftaar, R. A. Bauder, and N. Seliya, “A survey on addressing high-class imbalance in big data,” Journal of Big Data, 2018.
    [7] H. He and Y. Ma, Imbalanced learning: foundations, algorithms, and applications. Wiley- IEEE Press, 2013.
    [8] G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from class-imbalanced data: Review of methods and applications,” Expert systems with applications, vol. 73, pp. 220–239, 2017.
    [9] M. Mohammadi, A. Al-Fuqaha, S. Sorour, and M. Guizani, “Deep learning for iot big data and streaming analytics: A survey,” IEEE Communications Surveys & Tutorials, vol. 20, no. 4, pp. 2923–2960, 2018.
    [10] B. Krawczyk, “Learning from imbalanced data: open challenges and future directions,” Progress in Artificial Intelligence, vol. 5, no. 4, pp. 221–232, 2016.
    [11] H. Ren, B. Xu, Y. Wang, C. Yi, C. Huang, X. Kou, T. Xing, M. Yang, J. Tong, and Q. Zhang, “Time-series anomaly detection service at microsoft,” in Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 3009–3017.
    [12] N. Laptev, S. Amizadeh, and I. Flint, “Generic and scalable framework for automated time-series anomaly detection,” in Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 2015, pp. 1939–1947.
    [13] M. Thill, W. Konen, and T. Ba ̈ck, “Online anomaly detection on the webscope s5 dataset: A comparative study,” in 2017 Evolving and Adaptive Intelligent Systems (EAIS), 2017, pp. 1–8.
    [14] S. Grossberg, “Adaptive resonance theory: How a brain learns to consciously attend, learn, and recognize a changing world,” Neural networks, vol. 37, pp. 1–47, 2013.
    [15] S. Ahmad, A. Lavin, S. Purdy, and Z. Agha, “Unsupervised real-time anomaly detection for streaming data,” Neurocomputing, vol. 262, pp. 134–147, 2017.
    [16] R.-J. Hsieh, J. Chou, and C.-H. Ho, “Unsupervised online anomaly detection on multi- variate sensing time series data for smart manufacturing,” in 2019 IEEE 12th Conference on Service-Oriented Computing and Applications (SOCA). IEEE, 2019, pp. 90–97.
    [17] D. Wang, P. Wu, P. Zhao, and S. C. Hoi, “A framework of sparse online learning and its applications,” arXiv preprint arXiv:1507.07146, 2015.
    [18] I. Popivanov and R. J. Miller, “Similarity search over time-series data using wavelets,” in Proceedings 18th international conference on data engineering. IEEE, 2002, pp. 212–221.
    [19] J. Jo, J. Seo, and J.-D. Fekete, “Panene: A progressive algorithm for indexing and query- ing approximate k-nearest neighbors,” IEEE transactions on visualization and computer graphics, vol. 26, no. 2, pp. 1347–1360, 2018.
    [20] B. Peng, P. Fatourou, and T. Palpanas, “Messi: In-memory data series indexing,” in 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 2020, pp. 337–348.
    [21] N. Sundaram, A. Turmukhametova, N. Satish, T. Mostak, P. Indyk, S. Madden, and P. Dubey, “Streaming similarity search over one billion tweets using parallel locality- sensitive hashing,” Proceedings of the VLDB Endowment, vol. 6, no. 14, pp. 1930–1941, 2013.
    [22] M.-Y. Yeh, K.-L. Wu, P. S. Yu, and M.-S. Chen, “Proud: a probabilistic approach to processing similarity queries over uncertain data streams,” in Proceedings of the 12th in- ternational conference on extending database technology: advances in database technology, 2009, pp. 684–695.
    [23] A. Gogolou, T. Tsandilas, K. Echihabi, A. Bezerianos, and T. Palpanas, “Data series progressive similarity search with probabilistic quality guarantees,” in Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 2020, pp. 1857– 1873.
    [24] H. Jegou, M. Douze, and C. Schmid, “Product quantization for nearest neighbor search,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, no. 1, pp. 117–128, 2010.
    [25] D. Xu, I. W. Tsang, and Y. Zhang, “Online product quantization,” IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 11, pp. 2185–2198, 2018.
    [26] S. Traverso, M. Ahmed, M. Garetto, P. Giaccone, E. Leonardi, and S. Niccolini, “Temporal locality in today’s content caching: Why it matters and how to model it,” ACM SIGCOMM Computer Communication Review, vol. 43, no. 5, pp. 5–12, 2013.
    [27] Z.-G. Zhou and P. Tang, “Improving time series anomaly detection based on exponen- tially weighted moving average (ewma) of season-trend model residuals,” in 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). IEEE, 2016, pp. 3414–3417.
    [28] M. Zhang, J. Guo, X. Li, and R. Jin, “Data-driven anomaly detection approach for time- series streaming data,” Sensors, vol. 20, no. 19, p. 5646, 2020.
    [29] C.-H. Lee, H.-C. Yang, T.-F. Chien, and W.-S. Wen, “A novel approach for event detection by mining spatio-temporal information on microblogs,” in 2011 International Conference on Advances in Social Networks Analysis and Mining, 2011, pp. 254–259.
    [30] P. Jain, S. Jain, O. R. Za ̈ıane, and A. Srivastava, “Anomaly detection in resource con- strained environments with streaming data,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 6, no. 3, pp. 649–659, 2022.
    [31] M. Kontaki, A. N. Papadopoulos, and Y. Manolopoulos, “Adaptive similarity search in streaming time series with sliding windows,” Data & Knowledge Engineering, vol. 63, no. 2, pp. 478–502, 2007.
    [32] M. Zhang, X. Li, and L. Wang, “An adaptive outlier detection and processing approach towards time series sensor data,” IEEE Access, vol. 7, pp. 175 192–175 212, 2019.
    [33] J. Capdevila, G. Pericacho, J. Torres, and J. Cerquides, “Scaling dbscan-like algorithms for event detection systems in twitter,” in International Conference on Algorithms and Architectures for Parallel Processing. Springer, 2016, pp. 356–373.
    [34] S. R. Wibisono, M. T. Anwar, A. Supriyanto, and I. H. A. Amin, “Multivariate weather anomaly detection using dbscan clustering algorithm,” Journal of Physics: Conference Series, vol. 1869, 2021.
    [35] A. Putina and D. Rossi, “Online anomaly detection leveraging stream-based clustering and real-time telemetry,” IEEE Transactions on Network and Service Management, vol. 18, pp. 839–854, 2021.
    [36] O. Parson, G. Fisher, A. Hersey, N. Batra, J. Kelly, A. Singh, W. Knottenbelt, and A. Rogers, “Dataport and nilmtk: A building data set designed for non-intrusive load monitoring,” in 2015 IEEE Global Conference on Signal and Information Processing (Glob- alSIP), 2015, pp. 210–214.
    [37] N. Batra, J. Kelly, O. Parson, H. Dutta, W. Knottenbelt, A. Rogers, A. Singh, and M. Sri- vastava, “Nilmtk: An open source toolkit for non-intrusive load monitoring,” in Proceedings of the 5th international conference on Future energy systems, 2014, pp. 265–276.
    [38] N. Batra, R. Kukunuri, A. Pandey, R. Malakar, R. Kumar, O. Krystalakos, M. Zhong, P. Meira, and O. Parson, “Towards reproducible state-of-the-art energy disaggregation,” in Proceedings of the 6th ACM international conference on systems for energy-efficient buildings, cities, and transportation, 2019, pp. 193–202.
    [39] R. Gopinath, M. Kumar, C. P. C. Joshua, and K. Srinivas, “Energy management using non-intrusive load monitoring techniques–state-of-the-art and future research directions,” Sustainable Cities and Society, vol. 62, p. 102411, 2020.

    下載圖示 校內:2024-09-15公開
    校外:2024-09-15公開
    QR CODE