| 研究生: |
鄭驍敏 Zheng, Xiao-Min |
|---|---|
| 論文名稱: |
Delayed NFA:一個更快速的正則表達式比對方法實作於TCAM Delayed NFA: A Faster Regular Expression Matching Scheme in TCAM |
| 指導教授: |
張燕光
Chang, Yeim-Kuan |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 英文 |
| 論文頁數: | 57 |
| 中文關鍵詞: | 正則表達式 、NFA 、DFA 、Failure Link 、Active State 、TCAM |
| 外文關鍵詞: | Regular expression, NFA, DFA, Failure Link, Active State, TCAM |
| 相關次數: | 點閱:72 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著網路流量和主流Network Intrusion Detection System(NIDS) pattern set size的逐年快速增長,正則表達式逐漸取代了傳統的string matching成為主流Deep Packet Inspection(DPI)的核心部件,並且廣泛用於字串比對,基因序列比對以及詞意分析。相較於傳統的字串比對,基於正則表達式對複雜的signature 有更強大的和靈活的描述性,正則表達式可以用更加簡短的表達方式來描述多個的simple pattern。
為我們所熟知的正則表達式的兩種建構方式,分別是Non-deterministic Finite Automata(NFA)和Deterministic Finite Automata(DFA)。正則表達式可以轉換成NFA的形式進行正則表達式比對,雖然NFA可以有比較快的建構速度,但是由於初始state self-loop的特性,NFA在做正則表達式比對的時候會同時有多個active state。反之,DFA作為正則表達式比對最快的一種比對方式,每個state對應的character只會有一個active state。然而由於rule set中“*”的特性,DFA建構起來所消耗的時間會比NFA建構的時間來得多得多,甚至可能好幾十倍。
為了減少NFA的active state,我們提出了一個結合Aho-Corasick(AC) failure link和transformed NFA的REM架構,Delayed Non-Deterministic Finite Automata(DNFA),并將transition table儲存於Ternary Content Addressable Memory(TCAM) 中。DNFA將NFA中self-loop state轉換成non self-loop state,再根據修改之後的failure link construction function建立每個state的failure link。DNFA state進行編碼之後,我們提出一個新的TCAM儲存transition table的方式,將原本NFA在TCAM中需要multi-match的搜尋過程,簡化為只需要一次memory access。最後的實驗運用Snort的rule set,在相同trace file的情況下,比原始NFA平均減少了28%~30%的active state,建構的時間也只比NFA多1%-2%。雖然在TCAM entry數上會多於NFA,但是DNFA的TCAM entry數只佔DFA的1%,只佔D2FA的7%。
With the rapid growth of network traffic and the size of pattern set in Network Intrusion Detection System (NIDS) year by year, regular expressions matching (REM) has gradually replaced traditional string matching and became core component of Deep Packet Inspection (DPI), which is widely used in pattern matching, DNA sequence matching and literal analysis. Compared to traditional string matching, regular expression has more powerful and flexible expressive ability to complex signatures. What’s more, many simple pattern can be formatted in a shorter way by regular expression.
There are two typical implementation methods for regular expression, Non-Deterministic Finite Automata (NFA), Deterministic Finite Automata (DFA). Regular expression can be converted to NFA. NFA has the fastest construction time compared to DFA, but due to the characteristic of self-loop in initial state, NFA may have many active states during look up stage in REM which slow down the search speed. Conversely, DFA is the fastest method for REM where each state has only one active state corresponding to one input character. However, there is a lot Kleene Star "*" or repetitions in regular expression which causes construction time of DFA could be dozen times of NFA.
In order to decrease the number of active state in NFA and keep implementation in a short construction time, we propose a novel implementation of REM, called delayed Non-Deterministic Finite Automata (DNFA), which combines the concept of failure link proposed by Aho-Corasick (AC) with NFA. Furthermore, we store transition table of DNFA into Ternary Content Addressable Memory (TCAM). DNFA transits self-loop state in NFA into non self-loop state, then according to modified failure link construction function to build failure link of each state. After stage encoding by relationship of failure link, we proposed a new transition table storage method in TCAM, which simplifies original NFA look up process from multiple match to only one memory access. We use Snort rule sets in experiment. With same trace file, DNFA decreases 28%~30% active state compared to original NFA, and construction time is only 1%~2% longer than original NFA. Although DNFA needs more entries in TCAM than original NFA, it needs only 1% of TCAM entries in DFA and 7% of TCAM entries in D2FA.
[1] V. Aho and M. J. Corasick, “Efficient string matching: An aid to bibliographic search,” Commun. ACM, vol. 18, no. 6, pp. 330-340, 1975
[2] S. Wu and U. Manber, “A fast algorithm for multi-pattern searching,” Dept. Comput. Sci., Univ. Arizona, Tucson, AZ, USA, Tech. Rep. TR-94-17, 1994
[3] R. S. Boyer and J. S. Moore “A fast string searching algorithm” Communications of the ACM, vol. 20, no 10, pp.762-772, Oct. 1977
[4] Snort v3.0. (2018) [Online]. Available: https://www.snort.org/
[5] Bro Intrusion Detection System. (2014). [Online]. Available: https://www.bro.org/
[6] Application Layer Packet Classifier for LINUX. (2018). [Online]. Available: http://l7-filter.sourceforge.net/
[7] ClamAV 0.100.1 (2018) [Online]. Available: https://www.clamav.net/
[8] M. O. Rabin, D. Scott. “ Finite automata and their decision problems,” IBM Journal of Research and Development, 3(2): 114-125, 1959.
[9] F. Yu, Z. Chen, Y. Diao, T. V. Lakshman, and R. H. Katz, “Fast and memory-efficient regular expression matching for deep packet inspection,” in Proc. ACM/IEEE Symp. Archit. Netw. Commun. Syst., San Jose, CA, USA, 2006, pp. 93–102.
[10] M. Becchi and P. Crowley, “A hybrid finite automaton for practical deep packet inspection,” in Proc. ACM CoNEXT Conf., New York, NY, USA, 2007, Art. no. 1.
[11] Ken Thompson, “Programming Techniques: Regular expression search algorithm.” Communications of the ACM Volume 11 Issue 6 (1968) pp. 419-422
[12] V. M. Glushkov, “The Abstract Theory of Automata,” Russian Mathematical Surveys, Vol. 16, No. 5, 1961, pp. 1-53
[13] V. C. Valgenti, M. S. Kim, S. I. Oh and I. Lee, "REduce: Removing Redundancy from Regular Expression Matching in Network Security," 2015 24th International Conference on Computer Communication and Networks (ICCCN), Las Vegas, NV, 2015, pp. 1-10.
[14] M. Becchi and S. Cadambi, “Memory-efficient regular expression search using state merging,” in Proc. 26th IEEE Int. Conf. Comput. Commun. (INFOCOM), Anchorage, AK, USA, 2007, pp. 1064–1072.
[15] S. Kong, R. Smith, and C. Estan, “Efficient signature matching with multiple alphabet compression tables,” in Proc. 4th Int. Conf. Security Privacy Commun. Netow., Istanbul, Turkey, 2008, Art. no. 1.
[16] M. Becchi and P. Crowley, “Efficient regular expression evaluation: Theory to practice,” in Proc. 4th ACM/IEEE Symp. Archit. Netw. Commun. Syst., San Jose, CA, USA, 2008, pp. 50–59.
[17] S. Kumar, S. Dharmapurikar, F. Yu, P. Crowley, and J. Turner, “Algorithms to accelerate multiple regular expressions matching for deep packet inspection,” ACM SIGCOMM Comput. Commun. Rev., vol. 36, no. 4, pp. 339–350, 2006.
[18] M. Becchi and P. Crowley, “An improved algorithm to accelerate regular expression evaluation,” in Proc. 3rd ACM/IEEE Symp. Architect. Netw. Commun. Syst., Orlando, FL, USA, 2007, pp. 145–154.
[19] SMITH, R., ESTAN, C., AND JHA, S. “Xfa: Faster signature matching with extended automata”. In Security and Privacy, 2008. SP 2008. IEEE Symposium on (2008), IEEE, pp. 187–201.
[20] F. Yu, R. H. Katz, and T. V. Lakshman, “Gigabit rate packet patternmatching using TCAM,” in Proc. 12th IEEE Int. Conf. Netw. Protocols (ICNP), Berlin, Germany, 2004, pp. 174–183.
[21] Jung-Sik Sung, Seok-Min Kang, Youngseok Lee, Taeck-Geun Kwon and Bong-Tae Kim, "A multi-gigabit rate deep packet inspection algorithm using TCAM," GLOBECOM '05. IEEE Global Telecommunications Conference, 2005., St. Louis, MO, 2005, pp. 5 pp.-.
[22] M. Alicherry, M. Muthuprasanna, and V. Kumar, “High speed pattern matching for network IDS/IPS,” in Proc. 14th IEEE Int. Conf. Netw. Protocols (ICNP), Santa Barbara, CA, USA, 2006, pp. 187–196.
[23] C. R. Meiners, J. Patel, E. Norige, E. Torng, and A. X. Liu, “Fast regular expression matching using small TCAMs for network intrusion detection and prevention systems,” in Proc. 19th USENIX Conf. Security, Berkeley, CA, USA, 2010, pp. 1–16.
[24] Regular Expression Processor. Accessed on Apr. 5, 2012. [Online]. Available: http://regex.wustl.edu/index.php/Regular_Expression_Processor
[25] S. Yun, “An efficient TCAM-based implementation of multipattern matching using covered state encoding,” IEEE Trans. Comput., vol. 61, no. 2, pp. 213–221, Feb. 2012.
[26] onlineRandomTools [Online] https://onlinerandomtools.com/generate-random-data-from-regexp
校內:2023-08-28公開