| 研究生: |
石靖煊 Shih, Ching-Hsuan |
|---|---|
| 論文名稱: |
有效節省記憶體的正規表示比對方法及其管線設計 A Memory Efficient Pattern Matching Scheme for Regular Expression and Its Pipelined Design |
| 指導教授: |
張燕光
Chang, Yeim-Kuan |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2015 |
| 畢業學年度: | 103 |
| 語文別: | 英文 |
| 論文頁數: | 59 |
| 中文關鍵詞: | 深度封包檢測 、正規表示式 、DFA 、state explosion 、FPGA 、管線化設計 |
| 外文關鍵詞: | Deep Packet Intrusion, regular expression, DFA, state explosion, FPGA, pipeline |
| 相關次數: | 點閱:93 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近幾年來,由於網路流量的規模快速成長,大量的惡意包封與攻擊普遍地散布在網際網路之間。用於監控網路活動並且使我們的電腦遠離危險的網路入侵偵測系統變的愈來愈重要。如同我們所知的深度封包檢測,網路入侵偵測系統藉由預先定義的規則來檢查網路封包的內容且找出是否含有病毒碼。因此,使用於深度封包檢測中的正規表示式比對必須非常快速且節省記憶體。DFA是一個用於實現正規表示式比對的經典方法,DFA的特色是擁有出色的搜尋速度,但是由於存在於複雜規則中的Kleene closure,例如:“.*”、“[^a]*”,DFA受限於眾所皆知的state explosion問題且消耗非常多的的記憶體使用量。
在這篇論文中,我們提出了一個有效使用記憶體的正規表示式比對演算法且維持可接受的搜尋速度,我們稱作FSFA。我們使用額外的資料結構以消除複雜規則中的Kleene closure,使state數量大幅降低,並且藉由default state compression技術大量減少transition數量。除此之外,我們將提出的FSFA實作到硬體上並且以管線化的方式運行使之改善搜尋效能。根據我們實作在PC環境下的實驗結果,我們提出的方法需要的state數量在DFA中只佔1%,而在JFA中只佔2% 到22%。在硬體方面,我們實作在FPGA的Xilinx Virtex-7 XC7V2000T,根據實驗結果顯示,搜尋效能可達到2.85 Gbps。
In recent years, since the scale of Internet traffic grows at a pretty rapid speed, many malicious packets and attacks are common and spread over the Internet. Thus, Network Intrusion Detection System (NIDS) which supervises network activities for keeping our computers away from danger becomes more and more important. With predefined rules, NIDS suspects the payloads of network packets which is known as Deep Packet Inspection (DPI) to find out all the virus patterns. So, regular expression matching which is a searching algorithm used in DPI needs to be very fast and consume small memory space. Deterministic Finite Automata (DFA) is a classical method that performs regular expression matching. The feature of DFA is that it has the great performance of searching but suffers from well-known state explosion problem at the expense of large memory usage because of Kleene closures such as “.*” and “[^a]*” appearing in complex rules.
In this thesis, we propose a memory efficient regular expression matching algorithm called Failureless Segmented FA (FSFA) with an acceptable searching speed. In FSFA, We eliminate Kleene closures by using additional data structures in order to reduce a large amount of states. And, we further reduce the transitions by using default state compression technique. Moreover, we map our proposed FSFA onto the hardware architecture in a pipelined manner to improve the throughput. Our performance results implemented on a PC software environment show that our scheme only needs 1% of states needed in the DFA and 2% to 22% of states needed in the JFA. And, in the hardware architecture, our performance results implemented on Xilinx Virtex-7 XC7V2000T FPGA show that the throughput can reach up to 2.85 Gbps.
[1] Snort. [Online]. Available: http://www.snort.org/
[2] Bro. [Online]. Available: http://www.bro.org/
[3] V. M. Glushkov, “The Abstract Theory of Automata,” Russian Mathematical Surveys, Vol. 16, No. 5, 1961, pp. 1-53
[4] M. O. Rabin, D. Scott. “Finite automata and their decision problems,” IBM Journal of Research and Development, 3(2): 114-125, 1959
[5] S. Kumar et al., “Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection,” in ACM SIGCOMM, Sept 2006
[6] M. Becchi and P. Crowley. “An Improved Algorithm to Accelerate Regular Expression Evaluation,” in Proc. of ANCS '07, pages 145-154, 2007
[7] D Ficara, S Giordano, G Procissi, F Vitucci, G Antichi, A Di Pietro, “An Improved DFA for Fast Regular Expression Matching,” ACM SIGCOMM Computer Communication Review 38 (5), 29-40, 2008
[8] R. Smith, C. Estan, and S. Jha, “XFA: Faster Signature Matching with Extended Automata,” in IEEE Symposium on Security and Privacy, 2008, pp. 187-201
[9] R. Smith et al., “Deflating the Big Bang: Fast and Scalable Deep Packet Inspection with Extended Finite Automata,” in Proceedings of the ACM SIGCOMM 2008 conference on Data communication, Seattle, WA, USA, 2008, pp. 207-218
[10] Xiaodong Yu, Bill Lin, Michela Becchi, “Revisiting State Blow-Up: Automatically Building Augmented-FA While Preserving Functional Equivalence,” IEEE Journal on Selected Areas in Communications 32(10): 1822-1833, 2014
[11] Weirong Jiang, Yi-Hua E. Yang, and Viktor K. Prasanna, “Scalable Multi-Pipeline Architecture for High Performance Multipattern String Matching,” Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on, vol., no., pp.1-12, 19-23 April 2010
[12] Y.-H.E. Yang and V.K. Prasanna, ‘‘High-Performance and Compact Architecture for Regular Expression Matching on FPGA,’’ IEEE Trans. Comput., vol. 61, no. 7, pp. 1013-1025, July 2012
[13] N. Yamagaki, R. Sidhu, S. Kamiya, “High-Speed Regular Expression Matching Engine using Multi-Character NFA,” in Proc. Intl. Conf. FPL, Aug. 2008, pp. 697–701
[14] R. Sidhu, and V. K. Prasanna, “Fast Regular Expression Matching using FPGAs,” in Field-Programmable Custom Computing Machines, 2001. FCCM '01. The 9th Annual IEEE Symposium on, 2001, pp. 227-238
[15] A. Mitra, W. Najjar, and L. Bhuyan, “Compiling PCRE to FPGA for Accelerating SNORT IDS,” in Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems, Orlando, Florida, USA, 2007, pp. 127-136
[16] J. Bispo, I. Sourdis, J.M.P. Cardoso, and S. Vassiliadis, “Regular Expression Matching for Reconfigurable Packet Inspection,” Proc. IEEE Int’l Conf. Field Programmable Technology (FPT), pp. 119-126, Dec. 2006
[17] F. Yu, R. H. Katz, and T. V. Lakshman, “Gigabit Rate Packet Pattern-Matching Using TCAM,” in Proceedings of the 12th IEEE International Conference on Network Protocols, 2004, pp. 174-183
[18] C. R. Meiners et al., “Fast Regular Expression Matching using Small TCAMs for Network Intrusion Detection and Prevention Systems,” in Proceedings of the 19th USENIX conference on Security, Washington, DC, 2010, pp. 8-8
[19] K. Peng et al., “Chain-Based DFA Deflation for Fast and Scalable Regular Expression Matching Using TCAM,” in Proceedings of the 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems, 2011, pp. 24-35
[20] ClamAV. [Online]. Available: http://www.clamav.net/
[21] Michela Becchi, Regex Tool. [Online]. Available: http://regex.wustl.edu/
[22] 7 Series FPGAs Overview. [Online]. Available: http://www.xilinx.com/support/