簡易檢索 / 詳目顯示

研究生: 曾慶忠
Tseng, Ching-Chung
論文名稱: 具延展性之複雜事件情節探勘系統之研究
A Study on Scalable Analytical Frameworks for Complex Event Episode Mining
指導教授: 謝孫源
Hsieh, Sun-Yuan
共同指導教授: 曾新穆
Vincent S. Tseng
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 80
中文關鍵詞: 頻繁情節樣式探勘複雜事件序列漸增式探勘Lambda 架構
外文關鍵詞: Frequent Episode Pattern Mining, Complex Event Sequence, Incremental Mining, Lambda Architecture
ORCID: 0000-0003-0637-7792
相關次數: 點閱:85下載:9
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 情節樣式探勘是資料探勘技術中一種用以獲取高價值資訊的重要技術,可以為人們解決現實生活中各種不同領域的問題,例如製造分析、股票市場、天氣預測、醫療健康、網路資安等各領域。儘管情節模式探勘技術已存在多年,然而隨著大數據時代來臨,物聯網設備所收集的連續性序列資料的分析需求大幅成長,如何正確且高效地應用情節模式探勘技術來分析複雜事件序列資料,且必須要具備延展性以因應資料的快速成長,這對於解決各領域的應用問題變得越來越重要。但根據我們的探討,發現目前仍很少有研究專注發展一個具有延展性的架構,可將情節模式探勘技術應用於從連續性複雜事件序列資料,以協助各領域專家從中取得有用資訊。上述情況促使我們想研究如何發展出一個雛形架構,可以為各領域專家提供一種正確、高效且具有延展性的方式,可以應用情節樣式探勘技術來完成資料探勘工作。
    在本論文中,我們提出了一個基於複雜事件情節探勘技術的新架構,主要貢獻在於解決以下過去方法所面臨的幾個問題:(1) 傳統的情節探勘技術多是應用在靜態資料庫去探勘高頻率情節樣式,並無法適用於連續性資料,特別是針對物聯網設備所收集的串流資料做分析,而本架構則是針對連續性串流資料的探勘所設計;(2)當有新資料產生時,若要重新取得正確的分析結果,通常必須重新執行完整的探勘過程,這是一個相當費時的過程,本架構採用Lambda架構設計,拆分為批量情節挖掘、增量情節挖掘和模式合併,以兼顧效率和準確性;(3)當資料持續增長需要更大的分析運算能力時,我們所提架構採用Apache Spark 跟 Apache Spark Streaming作為程式開發框架,具有可因應需要快速擴展運算能量之延展性。
    在本論文中,我們研究跟分析這個過往少有人觸及但卻有廣大實務應用價值的主題,從理論跟實務面去設計跟開發具備上述優點的新架構,並利用不同領域的公開資料及我們收集的真實資料來評估我們架構的實用性。實驗結果顯示,我們所提的架構兼顧效率與準確性,且具有延展性,適合用於各種不同領域應用的複雜事件情節探勘分析架構。

    How to get important and valued information for people to solve problems around us, such as the analysis of traffic data, healthcare improvement, weather forecast, cyber security is a booming requirement, and episode pattern mining is a very useful technique for the above purpose. Along with the rapid development of IoT (Internet of Things), there comes the ’Big Data’ era with the fast growth of digital data and the requirements rise for gaining useful knowledge by analyzing the rich data collected by those devices. How to effectively and efficiently analyze complex event data using episode pattern mining technique is becoming more and more important for solving problems in many domains. According to our study, there are very few studies focus on developing a framework based on episode pattern mining technique of complex event sequences with scalability and can be fit for applications with most domains. The situations mentioned above motivate us to focus on developing a novel framework that people use for conducting such data mining jobs effectively and efficiently.
    In this dissertation, we propose a novel and scalable architecture for complex event episode mining. The main contributions are described as the following: (1) Most of the existing methods focused on mining episode patterns in static data, and they cannot fulfill the requirement of analyzing the streaming data, especially those data collected via IoT devices. The architecture we proposed is particularly designed for complex event sequence of data streams; (2) When data increases, the process of mining will be re-activated once and once to get most updated patterns, this is a high-cost process. We adopt the lambda architecture, including delta episode mining, batch episode mining, and pattern merging, and this architecture takes both efficiency and accuracy into account; (3) Moreover, to enhance scalability, we chose Apache Spark and Apache Spark Streaming as the system development framework. When more computing power is needed, it will be easy to for users scale up the system by adding more computing units.
    In this dissertation, we made comprehensive survey and analysis on this topic that was less explored in the past but has great practical value in real-world applications. We design and develop an architecture with the above advantages from the theoretical and practical aspects and use datasets of different application domains, including wide-used public datasets and some real datasets we collected, to evaluate the performance of the architecture we proposed. Experimental results show the architecture outperforms other existing methods in accuracy and efficiency. As to the huge complex event episode mining jobs, the experimental result proves that our framework has a very good scalability as well. It is a scalable analytical architecture for complex event episode mining with various domains applications.

    中文摘要 I English Abstract III 誌 謝 V Content VI List of Tables VIII List of Figures IX Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Overview of the Dissertation 5 1.2.1 Episode mining for complex-event sequences with scalability 5 1.2.2 Episode mining with high efficiency and accuracy 7 1.2.3 Experimental Evaluation 8 1.3 Organization of the Dissertation 8 Chapter 2 Background and Related Work 9 2.1 Frequent Itemset Mining 9 2.2 Frequent Episode Mining 10 2.3 Incremental Mining 15 2.5 Big Data Platform 18 2.5.1 Apache Spark 18 2.5.2 Apache Spark Streaming 19 2.5.3 Apache Kafka 20 Chapter 3 Episode Mining for Complex-event Sequences with Scalability 22 3.1 Introduction 22 3.2 Preliminary and definitions 23 3.3 Pre-Processing Layer 25 3.3.1 Data cleaning 26 3.3.3 Integration 27 3.3.4 Transformation 27 3.4 Scalable Mining 29 3.4.1 MapReduce Programming 29 3.4.2 Modules 31 3.4.3 Algorithms 31 3.5 Rules management 34 3.6 Summary 38 Chapter 4 Incremental Episode Mining with High Efficiency and Accuracy 39 4.1 Introduction 39 4.2 Batch Layer 41 4.2.1 Modules 42 4.2.2 Algorithms 42 4.3 Speed layer 46 4.3.1 Modules 46 4.3.2 Algorithms 46 4.4 Merge layer 48 4.4.1 Modules 48 4.4.2 Algorithms 49 4.5 User interface 50 4.6 Summary 51 Chapter 5 Empirical Evaluation 53 5.1 Experimental environment 53 5.2 Datasets and episode mining scenarios 55 5.3 Experimental result 60 5.3.1 Accuracy evaluation 60 5.3.2 Efficiency evaluation 65 5.3.3 Scalability evaluation 67 Chapter 6 Conclusions and Future Works 69 6.1 Conclusions 69 6.2 Future Works 71 Reference 72

    [1] Y. F. Lin, P. W. Jiang, and V. S. Tseng, “Efficient mining of frequent target episodes from complex event sequences,” in Frontiers in Artificial Intelligence and Applications, 2015, vol. 274. doi: 10.3233/978-1-61499-484-8-501.
    [2] H. Mannila, H. Toivonen, and a. I. Verkamo, “Discovering Frequent Episodes in Sequences,” Proc. Int. Conf. Knowl. Discov. Data Min., pp. 210–215, 1995, doi: 10.1023/A:1009748302351.
    [3] H. Mannila and H. Toivonen, “Discovering Generalized Episodes Using Minimal Occurrences.,” Kdd, pp. 146–151, 1996, [Online]. Available: http://www.aaai.org/Papers/KDD/1996/KDD96-024.pdf
    [4] H. Mannila, H. Toivonen, and A. I. Verkamo, “Discovery of Frequent Episodes in Event Sequences,” Data Min. Knowl. Discov., pp. 259–289, 1997.
    [5] S. Laxman, P. S. Sastry, and K. P. Unnikrishnan, “Discovering frequent generalized episodes when events persist for different durations,” IEEE Trans. Knowl. Data Eng., vol. 19, no. 9, pp. 1188–1201, 2007, doi: 10.1109/TKDE.2007.1055.
    [6] T. You, Y. Li, B. Sun, and C. Du, “Multi-Source Data Stream Online Frequent Episode Mining,” IEEE Access, 2020, doi: 10.1109/ACCESS.2020.2997337.
    [7] M. Y. Su, “Discovery and prevention of attack episodes by frequent episodes mining and finite state machines,” J. Netw. Comput. Appl., vol. 33, no. 2, 2010, doi: 10.1016/j.jnca.2009.10.003.
    [8] G. Casas-garriga, “Discovering Unbounded Episodes in Sequential Data,” in Knowledge Discovery in Databases: PKDD 2003, 2003.
    [9] C.-W. Wu, Y.-F. Lin, P. S. Yu, and V. S. Tseng, “Mining high utility episodes in complex event sequences,” Proc. 19th ACM SIGKDD Int. Conf. Knowl. Discov. data Min. - KDD ’13, p. 536, 2013, doi: 10.1145/2487575.2487654.
    [10] X. Ao, P. Luo, C. Li, F. Zhuang, and Q. He, “Discovering and learning sensational episodes of news events,” Inf. Syst., 2018, doi: 10.1016/j.is.2018.05.003.
    [11] A. Dattasharma, P. K. Tripathi, and G. Sridhar, “Identifying stock similarity based on multi-event episodes,” Conf. Res. Pract. Inf. Technol. Ser., 2008.
    [12] Y. F. Lin, C. F. Huang, and V. S. Tseng, “A novel episode mining methodology for stock investment,” J. Inf. Sci. Eng., vol. 30, no. 3, pp. 571–585, 2014, doi: 10.1109/TAAI.2012.26.
    [13] A. Ng and A. W. C. Fu, “Mining frequent episodes for relating financial events and stock trends,” in Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science), vol. 2637, 2003, pp. 27–39. doi: 10.1007/3-540-36175-8_4.
    [14] D. W. Cheung, J. H. J. Han, V. T. Ng, and C. Y. Wong, “Maintenance of discovered association rules in large databases: an incremental updating technique,” Proc. Twelfth Int. Conf. Data Eng., 1996, doi: 10.1109/ICDE.1996.492094.
    [15] D. W. Cheung, S. Lee, and B. Kao, “A general incremental technique for maintaining discovered association rules,” in Procceedings of The 5th International Conference on Database Systems for Advanced Applications, 1997, pp. 185–194. [Online]. Available: http://hdl.handle.net/10722/57231
    [16] S. Shan, X. Wang, and M. Sui, “Mining association rules: A continuous incremental updating technique,” in Proceedings - 2010 International Conference on Web Information Systems and Mining, WISM 2010, 2010, vol. 1. doi: 10.1109/WISM.2010.39.
    [17] M. Y. Lin and S. Y. Lee, “Incremental update on sequential patterns in large databases by implicit merging and efficient counting,” Inf. Syst., vol. 29, no. 5, pp. 385–404, 2004, doi: 10.1016/S0306-4379(03)00036-X.
    [18] B. Mallick, D. Garg, and P. S. Grover, “Incremental mining of sequential patterns: Progress and challenges,” Intelligent Data Analysis, vol. 17, no. 3. 2013. doi: 10.3233/IDA-130591.
    [19] V. S. Tseng and C. H. Lee, “Effective temporal data classification by integrating sequential pattern mining and probabilistic induction,” Expert Syst. Appl., vol. 36, no. 5, 2009, doi: 10.1016/j.eswa.2008.10.077.
    [20] M. Hausenblas and N. Bijnens, “Lambda Architecture,” 2013. http://lambda-architecture.net/
    [21] Apache Spark, “Apache Spark: a fast and general engine for large-scale data processing,” Spark.Apache.Org, 2015.
    [22] N. Deshai, B. V. D. S. Sekhar, and S. Venkataramana, “Mllib: machine learning in apache spark,” Int. J. Recent Technol. Eng., vol. 8, no. 1, 2019.
    [23] S. Salloum, R. Dautov, X. Chen, P. X. Peng, and J. Z. Huang, “Big data analytics on Apache Spark,” International Journal of Data Science and Analytics, vol. 1, no. 3–4. 2016. doi: 10.1007/s41060-016-0027-9.
    [24] J. C. Lin, M. C. Lee, I. C. Yu, and E. B. Johnsen, “A configurable and executable model of Spark Streaming on Apache YARN,” Int. J. Grid Util. Comput., vol. 11, no. 2, 2020, doi: 10.1504/IJGUC.2020.105531.
    [25] G. P. Gupta and J. Khedwal, “Framework for Error Detection & its Localization in Sensor Data Stream for reliable big sensor data analytics using Apache Spark Streaming,” in Procedia Computer Science, 2020, vol. 167. doi: 10.1016/j.procs.2020.03.286.
    [26] Apache Spark, “Spark Streaming - Spark 2.4.4 Documentation,” The Apache Software Foundation, 2019.
    [27] J. C. C. Tseng, J. Y. Gu, P. F. Wang, C. Y. Chen, and V. S. Tseng, “A novel complex-events analytical system using episode pattern mining techniques,” 2015. doi: 10.1007/978-3-319-23862-3_48.
    [28] J. C. C. Tseng, J. Y. Gu, P. F. Wang, C. Y. Chen, C. F. Li, and V. S. Tseng, “A scalable complex event analytical system with incremental episode mining over data streams,” 2016. doi: 10.1109/CEC.2016.7743854.
    [29] J. C. C. Tseng, S.-Y. Hsieh, and V. S. Tseng, “A Scalable Analytical Framework for Complex Event Episode Mining With Various Domains Applications,” IEEE Access, vol. 10, pp. 130672–130685, 2022, doi: 10.1109/ACCESS.2022.3228962.
    [30] R. Agrawal, T. Imieliński, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” ACM SIGMOD Rec., vol. 22, no. 2, 1993, doi: 10.1145/170036.170072.
    [31] J. Ayres, J. Flannick, J. Gehrke, and T. Yiu, “Sequential pattern mining using A bitmap representation,” 2002. doi: 10.1145/775107.775109.
    [32] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” 1994.
    [33] C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, and Y. K. Lee, “Efficient tree structures for high utility pattern mining in incremental databases,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 12, 2009, doi: 10.1109/TKDE.2009.46.
    [34] Q. F. Ahmed, S. K. Tanbeer, and B. S. Jeong, “Mining high utility web access sequences in dynamic web log data,” 2010. doi: 10.1109/SNPD.2010.21.
    [35] J. F. Boulicaut, A. Bykowski, and C. Rigotti, “Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries,” Data Min. Knowl. Discov., vol. 7, no. 1, 2003, doi: 10.1023/A:1021571501451.
    [36] P. Fournier-Viger and V. S. Tseng, “TNS: Mining top-K non-redundant sequential rules,” 2013. doi: 10.1145/2480362.2480395.
    [37] R. Gwadera, M. J. Atallah, and W. Szpankowski, “Reliable detection of episodes in event sequences,” Knowl. Inf. Syst., 2005, doi: 10.1007/s10115-004-0174-5.
    [38] H. F. Li, H. Y. Huang, Y. C. Chen, Y. J. Liu, and S. Y. Lee, “Fast and memory efficient mining of high utility itemsets in data streams,” 2008. doi: 10.1109/ICDM.2008.107.
    [39] C. P. Lai, P. C. Chung, and V. S. Tseng, “A novel algorithm for mining fuzzy high utility itemsets,” Int. J. Innov. Comput. Inf. Control, vol. 6, no. 10, 2010.
    [40] Y. Liu, W. K. Liao, and A. Choudhary, “A fast high utility itemsets mining algorithm,” 2005. doi: 10.1145/1089827.1089839.
    [41] S. Brin, R. Motwani, and C. Silverstein, “Beyond Market Baskets: Generalizing Association Rules to Correlations,” SIGMOD Rec. (ACM Spec. Interes. Gr. Manag. Data), vol. 26, no. 2, 1997, doi: 10.1145/253262.253327.
    [42] E. R. Omiecinski, “Alternative interest measures for mining associations in databases,” IEEE Trans. Knowl. Data Eng., vol. 15, no. 1, 2003, doi: 10.1109/TKDE.2003.1161582.
    [43] H. Wang, W. Wang, J. Yang, and P. S. Yu, “Clustering by pattern similarity in large data sets,” 2002. doi: 10.1145/564736.564737.
    [44] C. H. Mooney and J. F. Roddick, “Sequential pattern mining - Approaches and algorithms,” ACM Computing Surveys, vol. 45, no. 2. 2013. doi: 10.1145/2431211.2431218.
    [45] B. Liu, W. Hsu, Y. Ma, and B. Ma, “Integrating Classification and Association Rule Mining,” Knowl. Discov. Data Min., 1998.
    [46] “https://en.wikipedia.org/wiki/Affinity_analysis.”
    [47] B. Goethals and M. Zaki, “Workshop on frequent itemset mining implementations,” Data Min. Work. Freq. Itemset Min., 2003.
    [48] G. S. Manku, “Approximate Frequency Counts over Data Streams,” VLDB ’02 - Proc. 28th VLDB Conf., 2002, doi: 10.1145/237814.237823.
    [49] H. F. Li, S. Y. Lee, and M. K. Shan, “An efficient algorithm for mining frequent itemsets over the entire history of data streams,” Proc First Int. …, 2004.
    [50] J. X. Yu, Z. Chong, H. Lu, Z. Zhang, and A. Zhou, “A false negative approach to mining frequent itemsets from high speed transactional data streams,” Inf. Sci. (Ny)., vol. 176, no. 14, 2006, doi: 10.1016/j.ins.2005.11.003.
    [51] H. Chernoff, “A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations,” Ann. Math. Stat., vol. 23, no. 4, 1952, doi: 10.1214/aoms/1177729330.
    [52] J. H. Chang and W. S. Lee, “Finding recent frequent itemsets adaptively over online data streams,” 2003. doi: 10.1145/956750.956807.
    [53] D. Lee and W. Lee, “Finding maximal frequent itemsets over online data streams adaptively,” 2005. doi: 10.1109/ICDM.2005.68.
    [54] J. H. Chang and W. S. Lee, “estWin: Adaptively monitoring the recent change of frequent itemsets over online data streams,” 2003.
    [55] Y. Chi, H. Wang, P. S. Yu, and R. R. Muntz, “Moment: Maintaining closed frequent itemsets over a stream sliding window,” 2004. doi: 10.1109/icdm.2004.10084.
    [56] Y. Chi, H. Wang, P. S. Yu, and R. R. Muntz, “Catch the moment: Maintaining closed frequent itemsets over a data stream sliding window,” Knowl. Inf. Syst., vol. 10, no. 3, 2006, doi: 10.1007/s10115-006-0003-0.
    [57] K. Y. Huang and C. H. Chang, “Efficient mining of frequent episodes from complex sequences,” Inf. Syst., vol. 33, no. 1, pp. 96–114, 2008, doi: 10.1016/j.is.2007.07.003.
    [58] S. Laxman, P. S. Sastry, and K. P. Unnikrishnan, “A fast algorithm for finding frequent episodes in event streams,” Proc. 13th ACM SIGKDD Int. Conf. Knowl. Discov. data Min. - KDD ’07, p. 410, 2007, doi: 10.1145/1281192.1281238.
    [59] X. Ma, H. Pang, and K.-L. Tan, “Finding constrained frequent episodes using minimal occurrences,” pp. 471–474, 2004, doi: 10.1142/9789812702289_0048.
    [60] G. Xiao, A. Garg, D. Chen, D. Jiang, W. Shu, and X. Xu, “AHE Detection with a hybrid intelligence model in smart healthcare,” IEEE Access, vol. 7, pp. 37360–37370, 2019, doi: 10.1109/ACCESS.2019.2905303.
    [61] V. S. Tseng, C. H. Chou, K. Q. Yang, and J. C. C. Tseng, “A Big Data Analytical Framework for Sports Behavior Mining and Personalized Health Services,” 2018. doi: 10.1109/TAAI.2017.47.
    [62] X. Ao, P. Luo, C. Li, F. Zhuang, and Q. He, “Online Frequent Episode Mining,” Proc. - Int. Conf. Data Eng., vol. 2015-May, pp. 891–902, 2015, doi: 10.1109/ICDE.2015.7113342.
    [63] M. H. Wong, V. S. Tseng, J. C. C. Tseng, S. W. Liu, and C. H. Tsai, “Long-term user location prediction using deep learning and periodic pattern mining,” 2017. doi: 10.1007/978-3-319-69179-4_41.
    [64] R. Mallik and H. Kargupta, “A sustainable approach for demand prediction in smart grids using a distributed local asynchronous algorithm,” 2011.
    [65] S. Moens, O. Jeunen, and B. Goethals, “Interactive evaluation of recommender systems with SNIPER: an episodemining approach,” in Proceedings of the 13th ACM Conference on Recommender Systems, RecSys2019, Copenhagen, Denmark, September 16-20, 2019, 2019, pp. 538–539. doi: 10.1145/3298689.3346965.
    [66] M. Amiri, L. M. Khanli, and R. Mirandola, “An online learning model based on episode mining for workload predictionin cloud,” Futur. Gener. Comput. Syst., vol. 87, pp. 83–101, 2018, doi: 10.1016/j.future.2018.04.044.
    [67] M. Amiri, L. M. Khanli, and R. Mirandola, “A new efficient approach for extracting the closed episodes for workloadprediction in cloud,” Computing, vol. 102, no. 1, pp. 141–200, 2020, doi: 10.1007/s00607-019-00734-3.
    [68] Y. F. Lin, C. W. Wu, C. F. Huang, and V. S. Tseng, “Discovering utility-based episode rules in complex event sequences,” Expert Syst. Appl., vol. 42, no. 12, pp. 5303–5314, 2015, doi: 10.1016/j.eswa.2015.02.022.
    [69] P. Fournier-Viger, P. Yang, J. C.-W. Lin, and U. Yun, “HUE-Span: Fast High Utility Episode Mining,” in Advanced Data Mining and Applications - 15th International Conference, {ADMA} 2019, Dalian, China, November 21-23, 2019, Proceedings, 2019, vol. 11888, pp. 169–184. doi: 10.1007/978-3-030-35231-8_12.
    [70] X. Ao, H. Shi, J. Wang, L. Zuo, H. Li, and Q. He, “Large-Scale Frequent Episode Mining from Complex Event Sequences withHierarchies,” ACM Trans. Intell. Syst. Technol., vol. 10, no. 4, pp. 36:1–36:26, 2019, doi: 10.1145/3326163.
    [71] O. Ouarem, F. Nouioua, and P. Fournier-Viger, “Mining Episode Rules from Event Sequences Under Non-overlapping Frequency,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2021, vol. 12798 LNAI. doi: 10.1007/978-3-030-79457-6_7.
    [72] P. Fournier-Viger, Y. Chen, F. Nouioua, and J. C.-W. Lin, “Mining Partially-Ordered Episode Rules in an Event Sequence,” in Intelligent Information and Database Systems - 13th Asian Conference,ACIIDS 2021, Phuket, Thailand, April 7-10, 2021, Proceedings, 2021, vol. 12672, pp. 3–15. doi: 10.1007/978-3-030-73280-6_1.
    [73] Y. Chen, P. Fournier-Viger, F. Nouioua, and Y. Wu, “Mining Partially-Ordered Episode Rules with the Head Support,” in Big Data Analytics and Knowledge Discovery - 23rd International Conference,DaWaK 2021, Virtual Event, September 27-30, 2021, Proceedings, 2021, vol. 12925, pp. 266–271. doi: 10.1007/978-3-030-86534-4_26.
    [74] T. Guyet, W. Zhang, and A. Bifet, “Differentially private frequent episode mining over event streams,” Eng. Appl. Artif. Intell., vol. 110, p. 104681, 2022, doi: 10.1016/j.engappai.2022.104681.
    [75] T. Guyet, W. Zhang, and A. Bifet, “Incremental Mining of Frequent Serial Episodes Considering MultipleOccurrence,” CoRR, vol. abs/2201.11650, 2022, [Online]. Available: https://arxiv.org/abs/2201.11650
    [76] J. Han and Kamber Micheline, Data Mining: Concepts and Techniques (Second Edition), vol. 7, no. 11. 2015.
    [77] H. Toivonen, “Sampling Large Databases for Association Rules,” 1996.
    [78] T. Shiby, B. Sreenath, A. Khaled, and R. Sanjay, “An Efficient Algorithm for the Incremental Updation Rules in Large Databases of Association,” Kdd, 1997.
    [79] N. F. Ayan, A. U. Tansel, and E. Arkun, “An efficient algorithm to update large itemsets with early pruning,” in Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, 1999, pp. 287–291.
    [80] Z. Zhou and C. I. Ezeife, “A low-scan incremental association rule maintenance method based on the apriori property,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2001, vol. 2056. doi: 10.1007/3-540-45153-6_3.
    [81] C. I. Ezeife and Y. Su, “Mining incremental association rules with generalized FP-tree,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2002, vol. 2338. doi: 10.1007/3-540-47922-8_13.
    [82] F. Gurcan and M. Berigel, “Real-Time Processing of Big Data Streams: Lifecycle, Tools, Tasks, and Challenges,” 2018. doi: 10.1109/ISMSIT.2018.8567061.
    [83] S. Ramírez-Gallego, B. Krawczyk, S. García, M. Woźniak, and F. Herrera, “A survey on data preprocessing for data stream mining: Current status and future directions,” Neurocomputing, vol. 239, 2017, doi: 10.1016/j.neucom.2017.01.078.
    [84] G. Hesse and M. Lorenz, “Conceptual survey on data stream processing systems,” in Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS, 2016, vol. 2016-January. doi: 10.1109/ICPADS.2015.106.
    [85] A. A. Safaei, “Real-time processing of streaming big data,” Real-Time Syst., vol. 53, no. 1, 2017, doi: 10.1007/s11241-016-9257-0.
    [86] S. Shahrivari, “Beyond batch processing: Towards real-time and streaming big data,” Computers, vol. 3, no. 4. 2014. doi: 10.3390/computers3040117.
    [87] “https://learn.microsoft.com/en-us/azure/architecture/guide/architecture-styles/big-data.”
    [88] M. Zaharia et al., “Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing,” 2012.
    [89] D. Vohra, “Apache Kafka,” in Practical Hadoop Ecosystem, 2016. doi: 10.1007/978-1-4842-2199-0_9.
    [90] J. Kreps, N. Narkhede, and J. Rao, “Kafka: a Distributed Messaging System for Log Processing,” ACM SIGMOD Work. Netw. Meets Databases, 2011.
    [91] B. R. Hiraman, M. C. Viresh, and C. K. Abhijeet, “A Study of Apache Kafka in Big Data Stream Processing,” 2018. doi: 10.1109/ICICET.2018.8533771.
    [92] H. Isah, T. Abughofa, S. Mahfuz, D. Ajerla, F. Zulkernine, and S. Khan, “A survey of distributed data stream processing frameworks,” IEEE Access, vol. 7, 2019, doi: 10.1109/ACCESS.2019.2946884.
    [93] J. Lin, E. Keogh, S. Lonardi, and B. Chiu, “A symbolic representation of time series, with implications for streaming algorithms,” 2003. doi: 10.1145/882085.882086.
    [94] J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” Commun. ACM, vol. 51, no. 1, 2008, doi: 10.1145/1327452.1327492.
    [95] J. Pei et al., “Mining sequential patterns by pattern-growth: The prefixspan approach,” IEEE Trans. Knowl. Data Eng., 2004, doi: 10.1109/TKDE.2004.77.
    [96] I. A. T. Hashem, N. B. Anuar, A. Gani, I. Yaqoob, F. Xia, and S. U. Khan, “MapReduce: Review and open challenges,” Scientometrics, vol. 109, no. 1, 2016, doi: 10.1007/s11192-016-1945-y.
    [97] R. Shree, T. Choudhury, S. C. Gupta, and P. Kumar, “KAFKA: The modern platform for data management and analysis in big data domain,” in 2nd International Conference on Telecommunication and Networks, TEL-NET 2017, 2018, vol. 2018-January. doi: 10.1109/TEL-NET.2017.8343593.
    [98] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop distributed file system,” 2010. doi: 10.1109/MSST.2010.5496972.
    [99] T. Matyashovskyy, “Lambda Architecture with Apache Spark - DZone Big Data,” dzone.com, 2016.
    [100] M. Atallah, R. Gwadera, and W. Szpankowski, “Detection of significant sets of episodes in event sequences,” 2004. doi: 10.1109/ICDM.2004.10090.
    [101] N. Tatti and J. Vreeken, “The Long and the Short of It: Summarising Event Sequences with Serial Episodes,” in Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), 2012, pp. 462–470. doi: 10.1145/2339530.2339606.
    [102] A. Hidri, A. Selmi, and M. S. Hidri, “Discovery of Frequent Patterns of Episodes Within a Time Window for Alarm Management Systems,” IEEE Access, vol. 8, pp. 11061–11073, 2020, doi: 10.1109/ACCESS.2020.2965647.
    [103] J. Pei et al., “PrefixSpan Mining Sequential Patterns Efficiently by Prefix Projected Pattern Growth,” 2001.
    [104] P. Kijsanayothin, G. Chalumporn, and R. Hewett, “On using MapReduce to scale algorithms for Big Data analytics: a case study,” J. Big Data, vol. 6, no. 1, 2019, doi: 10.1186/s40537-019-0269-1.
    [105] “https://en.wikipedia.org/wiki/Soot_blower.”
    [106] D. Dua and C. Graff, “UCI Machine Learning Repository: Data Sets,” Irvine, CA: University of California, School of Information and Computer Science., 2019.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE