簡易檢索 / 詳目顯示

研究生: 黃智榆
Huang, Jhih-Yu
論文名稱: 用於PCRE比對之有效率的cycle延遲NFA及其於TCAM上的實作
An Efficient Delayed NFA for PCRE Matching and its Implementation on TCAM
指導教授: 張燕光
Chang, Yeim-Kuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 95
中文關鍵詞: 正則表達式NFADFAFailure LinkActive StateTCAM
外文關鍵詞: Regular expression, NFA, DFA, Failure Link, Active State, TCAM
相關次數: 點閱:67下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,網路流量和Network Intrusion Detection System(NIDS)快速發展。使得pattern set也是快速增長。傳統使用string-matching的方法逐漸被正則表達式所取代。著名的正則表達式state machine建構方法有兩種。分別為Non-deterministic Finite Automata (NFA)和Deterministic Finite Automata (DFA)。它們兩者互有優缺點。NFA的優點有,建構速度快、memory較小。但是因為它會有state self-loop的特性,則會造成多個active states存在。多個active states是為了確保input string所有match的可能性,但是這會大幅降低match速度。DFA沒有這個困擾,因為它始終只有一個active state。因此它是state machine方法中比對速度最快的一種方式。它的代價是memory大幅增加,因為它需要把在NFA利用多個active state確保match的所有可能性都算出來並且建構在state machine上。如果pattern太大,DFA可能會因為建構不出來而失敗。因此我們偏好使用NFA。
    為了減少NFA的active states,我們提出了一個結合Aho-Corasick(AC) failure link和NFA的REM架構,稱為Delayed Non-Deterministic Finite Automata (DNFA)。
    現今大部分的研究都只有簡單的Regular Expression。例如,simple pattern、 ".*" 、 "[^
    ]* " 。 以往用於compiler是足夠的,但是根據NIDS發展,勢必要提供更多更複雜的Regular Expression。所以我們拓展pattern到支援著名的PCRE Library。以及為了解決在DNFA比對輸入字串時候可能發生的延遲輸入,所以我們提出一個新的TCAM儲存transition table的方式。對DNFA state進行編碼。將原本會延遲一個cycle輸入,改善成不用延遲就可以解決。

    In recent years, network traffic and Network Intrusion Detection System (NIDS) have developed rapidly. The pattern set is also growing rapidly. The traditional method of using string-matching is gradually replaced by regular expressions. There are two well-known regular expression state machine construction methods. They are Non-deterministic Finite Automata (NFA) and Deterministic Finite Automata (DFA). The advantages of NFA include fast construction speed and smaller memory. But because it has the characteristics of state self-loop, it will cause multiple active states to exist. Multiple active states are to ensure the possibility of all matches will not miss. But this will greatly reduce the match speed. DFA does not have this trouble because it always has only one active state. Therefore, it is the fastest method among the state machine methods. Its cost is a substantial increase in memory, because it needs to calculate all the possibilities of using multiple active states to ensure match in NFA and build it on the state machine. If the pattern is too large, DFA may fail because it cannot be constructed. Therefore, we prefer to use NFA.
    In order to decrease the number of active states in NFA and keep the construction time short, we propose a novel implementation of REM, called delayed Non-Deterministic Finite Automata (DNFA), which combines the concept of failure link proposed by Aho-Corasick (AC) with NFA.
    Most of the researches nowadays only have simple regular expressions, such as simple pattern, ".*", and "[^\n ]*", which were sufficient for the compiler in the past. But according to the development of NIDS, it is necessary to provide more complex regular expressions. So, we expanded it to support the format given by the famous PCRE library. Furthermore, in order to solve the delay inputs that may occur when matching the input string in DNFA, we propose a new way to store transition table in TCAM that encodes DNFA states to improve the DNFA without delay inputs.

    摘要 i Abstract ii TABLE OF CONTENTS iii List of Tables iv List of Figures v Chapter 1 Introduction 1 Chapter 2 Related Work 3 2.1 Regular Expression 3 2.2 State machine 4 2.2.1 Parsing Tree 4 2.2.2 Non-Deterministic Finite Automata (NFA) 5 2.2.3 Deterministic Finite Automata (DFA) 7 2.3 Aho-Corasick algorithm 10 2.4 Compression approach 12 2.4.1 Software-based approach 12 2.4.2 Hardware-based approach 17 Chapter 3 Problem statement 23 3.1 PCRE problem 23 3.2 NFA in TCAM 24 Chapter 4 Proposed Scheme 25 4.1 Motivation 26 4.2 Overview 28 4.3 DNFA Construction 29 4.3.1 Construction Flow 30 4.3.2 Convert the self-loop of Regular Expression 44 4.3.3 The complex Regular Expressions base on PCRE library 50 4.3.4 Dynamic state construction 61 4.3.5 New thread transition 62 4.3.6 A trade-off for the complex pattern 65 4.4 Implement DNFA in TCAM 67 4.4.1 New thread transition in TCAM 73 4.5 Optimal storage method in TCAM 75 4.6 Occurrence Limitation Regular Expression in TCAM 77 4.6.1 TCAM OLRE infrastructure 77 4.6.2 TCAM range encoding 80 Chapter 5 Experimental results 82 5.1 Environment 82 5.2 Data set 82 5.3 Evaluation of DNFA result 83 Chapter 6 Conclusion 89 Reference 91 Appendix 95

    [1] ClamAV 0.100.1 (2018) [Online]. Available: https://www.clamav.net/
    [2] S. Wu and U. Manber, “A fast algorithm for multi-pattern searching,” Dept. Comput. Sci., Univ. Arizona, Tucson, AZ, USA, Tech. Rep. TR-94-17, 1994
    [3] R. S. Boyer and J. S. Moore “A fast string searching algorithm” Communications of the ACM, vol. 20, no 10, pp.762-772, Oct. 1977
    [4] Snort v3.0. (2018) [Online]. Available: https://www.snort.org/
    [5] Bro Intrusion Detection System. (2014). [Online]. Available: https://www.bro.org/
    [6] Application Layer Packet Classifier for LINUX. (2018). [Online]. Available: http://l7-filter.sourceforge.net/
    [7] K. Thompson, "Programming techniques: Regular expression search algorithm, " Commun. ACM, vol. 11, no. 6, pp. 419–422, 1968.
    [8] M. O. Rabin, D. Scott. “ Finite automata and their decision problems,” IBM Journal of Research and Development, 3(2): 114-125, 1959.
    [9] F. Yu, Z. Chen, Y. Diao, T. V. Lakshman, and R. H. Katz, “Fast and memory-efficient regular expression matching for deep packet inspection,” in Proc. ACM/IEEE Symp. Archit. Netw. Commun. Syst., San Jose, CA, USA, 2006, pp. 93–102.
    [10] M. Becchi and P. Crowley, “A hybrid finite automaton for practical deep packet inspection,” in Proc. ACM CoNEXT Conf., New York, NY, USA, 2007, Art. no. 1.
    [11] V. Aho and M. J. Corasick, “Efficient string matching: An aid to bibliographic search,” Commun. ACM, vol. 18, no. 6, pp. 330-340, 1975
    [12] V. M. Glushkov, “The Abstract Theory of Automata,” Russian Mathematical Surveys, Vol. 16, No. 5, 1961, pp. 1-53
    [13] V. C. Valgenti, M. S. Kim, S. I. Oh and I. Lee, "REduce: Removing Redundancy from Regular Expression Matching in Network Security," 2015 24th International Conference on Computer Communication and Networks (ICCCN), Las Vegas, NV, 2015, pp. 1-10.
    [14] M. Becchi and S. Cadambi, “Memory-efficient regular expression search using state merging,” in Proc. 26th IEEE Int. Conf. Comput. Commun. (INFOCOM), Anchorage, AK, USA, 2007, pp. 1064–1072.
    [15] S. Kong, R. Smith, and C. Estan, “Efficient signature matching with multiple alphabet compression tables,” in Proc. 4th Int. Conf. Security Privacy Commun. Netow., Istanbul, Turkey, 2008, Art. no. 1.
    [16] M. Becchi and P. Crowley, “Efficient regular expression evaluation: Theory to practice,” in Proc. 4th ACM/IEEE Symp. Archit. Netw. Commun. Syst., San Jose, CA, USA, 2008, pp. 50–59.
    [17] Alex X. Liu, Eric Norige, Sailesh Kumar "A few bits are enough - ASIC friendly Regular Expression matching for high speed network security systems" (ICNP) 7-10 Oct. 2013
    [18] SMITH, R., ESTAN, C., AND JHA, S. “Xfa: Faster signature matching with extended automata”. In Security and Privacy, 2008. SP 2008. IEEE Symposium on (2008), IEEE, pp. 187–201.
    [19] S. Kumar, S. Dharmapurikar, F. Yu, P. Crowley, and J. Turner, “Algorithms to accelerate multiple regular expressions matching for deep packet inspection,” ACM SIGCOMM Comput. Commun. Rev., vol. 36, no. 4, pp. 339–350, 2006.
    [20] M. Becchi and P. Crowley, “An improved algorithm to accelerate regular expression evaluation,” in Proc. 3rd ACM/IEEE Symp. Architect. Netw. Commun. Syst., Orlando, FL, USA, 2007, pp. 145–154.
    [21] F. Yu, R. H. Katz, and T. V. Lakshman, “Gigabit rate packet patternmatching using TCAM,” in Proc. 12th IEEE Int. Conf. Netw. Protocols (ICNP), Berlin, Germany, 2004, pp. 174–183.
    [22] Jung-Sik Sung, Seok-Min Kang, Youngseok Lee, Taeck-Geun Kwon and Bong-Tae Kim, "A multi-gigabit rate deep packet inspection algorithm using TCAM," GLOBECOM '05. IEEE Global Telecommunications Conference, 2005., St. Louis, MO, 2005, pp. 5 pp.-.
    [23] M. Alicherry, M. Muthuprasanna, and V. Kumar, “High speed pattern matching for network IDS/IPS,” in Proc. 14th IEEE Int. Conf. Netw. Protocols (ICNP), Santa Barbara, CA, USA, 2006, pp. 187–196.
    [24] S. Yun, “An efficient TCAM-based implementation of multipattern matching using covered state encoding,” IEEE Trans. Comput., vol. 61, no. 2, pp. 213–221, Feb. 2012.
    [25] Kun Huang, Xuelin Chen " A Power-Efficient Approach to TCAM-Based Regular Expression Matching" (ICCCN) 30 July-2 Aug. 2018
    [26] C. R. Meiners, J. Patel, E. Norige, E. Torng, and A. X. Liu, “Fast regular expression matching using small TCAMs for network intrusion detection and prevention systems,” in Proc. 19th USENIX Conf. Security, Berkeley, CA, USA, 2010, pp. 1–16.
    [27] Alex X. Liu, Eric Norige "A De-Compositional Approach to Regular Expression Matching for Network Security", IEEE/ACM Transactions on Networking (Volume: 27, Issue: 6, Dec. 2019)
    [28] PCRE - Perl Compatible Regular Expressions/ https://www.pcre.org/
    [29] V Wüstholz, " Static Detection of DoS Vulnerabilities in Programs that use Regular Expressions" March 2017
    [30] Nicolaas WeidemanEmail, Brink van der Merwe, Martin Berglund, Bruce Watson "Analyzing Matching Time Behavior of Backtracking Regular Expression Matchers by Using Ambiguity of NFA" 06 July 2016
    [31] Yuju Shen, Yanyan Jiang, Chang Xu, Ping Yu, Xiaoxing Ma and Jian Lu, “ReScue: Crafting Regular Expression DoS Attacks” 2018 33rd ACM/IEEE
    [32] James C. Davis, “On the Impact and Defeat of Regular Expression Denial of Service” 2020
    [33] Asiri Rathnayake, Hayo Thielecke, "Static Analysis for Regular Expression Exponential Runtime via Substructural Logics" May 2014
    [34] regxstring [github] https://github.com/daidodo/regxstring
    [35] regexr [Online] https://regexr.com/
    [36] WebGraphviz[Online] http://www.webgraphviz.com/

    下載圖示 校內:2024-09-27公開
    校外:2024-09-27公開
    QR CODE