| 研究生: |
李袁碩 Li, Yuen-Shuo |
|---|---|
| 論文名稱: |
藉由壓縮技術和字串切割實現節省記憶體的 DFA A Memory Efficient DFA using Compression and Pattern Segmentation |
| 指導教授: |
張燕光
Chang, Yeim-Kuan |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2014 |
| 畢業學年度: | 102 |
| 語文別: | 英文 |
| 論文頁數: | 59 |
| 中文關鍵詞: | 網路入侵偵測 、樣式比對 、正規表達式 、平行環境 、OpenMP |
| 外文關鍵詞: | Network intrusion detection, pattern matching, regular expression, OpenMP |
| 相關次數: | 點閱:79 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著網路流量成長越來越快,大量的病毒和惡意攻擊不斷地在網路上傳播,網路安全相關的設備像是入侵偵測系統、防火牆、防毒軟體等的角色也漸漸被人們重視。其中,正規表達式的搜尋是這類系統最繁重的任務之一,直接影響到其本身的效能。因此如何利用低記憶體執行高效能演算法一直是非常熱門的議題。
由於正規表達式本身比一般的字串複雜許多,當多個正規表達式同時比對,所對應的 DFA 可能會相當複雜,必須耗用的大量的記憶體。每新增一個正規表達式,對應 DFA 的都會以可怕的速度增長。而隨著病毒與惡意封包種類的增加,需要比對的字串也日益增長,記憶體用量也越來越驚人。因此,我們希望能找到一個方法能夠盡可能在不拖慢搜尋速度的情況下,減少記憶體的使用量。
在這篇論文中,我們使用壓縮技術和字串切割的方式,設計了一個可以大量減少記憶體使用量的DFA,而且可以藉由平行化來增加其效能,減少因為壓縮和字串切割而拖慢搜尋效能所帶來的影響。我們根據 Parallel Failureless-AC (PFAC) 的做法,提出了 CS-DFA 的方法。其使用的資料結構是由δFA改進而來,δFA是一種儲存DFA 的方式,可以有效的減少紀錄transitions的數量,其主要概念是將每一個 State 的 Transition Table 與其 Parent State 的 Transition Table 共用,這麼一來,每一個 State 只需要紀錄與其 Parent State 的 Transition Table 不同的地方即可,因此記憶體的使用量可以減少許多。我們發現若與Parent State 的 Transition Table 相比,State 自身的Transition Table 內有更多相似之處可以共用。透過類似的想法,我們成功減少更多記憶體使用量,而且還能有更快的搜尋速度。另外,我們還使用字串切割的技術更進一步壓縮,取得更好的結果,並透過 OpenMP 的技術,保持一定的搜尋效能。
As the traffic of Internet is grow quickly, there are more and more malicious attacks and viruses spread over the Internet. The role of network security systems such as network intrusion detection system (NIDS), firewalls, and antivirus software have become more important. These systems usually use regexes to inspect the payload of packet, which is one of the most intensive tasks in the systems. We have to develop a high-throughput algorithm that requires a small amount of memory to find out the hidden virus in packet payload.
Regular expressions is more complicated than simple patterns. When multiple regular expressions are processed together, the corresponding DFA may be so complicated and need a large amount of memory. The numbers of states and transitions grow so fast when a new regular expression is added. As the virus and malicious packets grow quickly, more patterns is needed to be compared. Therefore, we hope to find a scheme to decrease the memory consumption of DFA and keep the search performance acceptable.
In this thesis, we propose a memory efficient parallel compatible DFA that uses the techniques of compression and pattern segmentation. With the idea of PFAC algorithm, we propose a new DFA called compressed segmented DFA (CS-DFA) that needs less number of states and transitions than δFA. Without considering the symbols of “.*” in the beginning of the regular expression, we notice that a DFA state transitions for most symbols to the same next state. As a result, the transition table can be compressed by run-length encoding. The number of transitions in the proposed CS-DFA is about a half of the transitions need in δFA, and has 74% of the memory consumed by δFA.
In addition, we focus on two conditions “Counting Constraints” and “Kleene Star” which cause state blowup frequently in pattern sets. Then the technique of pattern segmentation is applied. The memory consumption is smaller than pure CS-DFA without pattern segmentation by about 19%. The search performance is still acceptable when using OpenMP.
[1] Snort. [Online]. Available : http://www.snort.org
[2] Bro. [Online]. Available : http://www.bro.org/
[3] E. Knuth, J. H. M. Jr., V. R. Pratt, “Fast pattern matching in strings,” SIAM Journal on Computing, vol. 6, no. 2, pp.323–350, 1977.
[4] C.-H. Lin, C.-H. Liu, L.-S. Chien, S.C. Chang, “Accelerating Pattern Matching Using a Novel Parallel Algorithm on GPUs,” IEEE Transactions on Computers, vol. PP, p.1, 2012.
[5] D Ficara, S Giordano, G Procissi, F Vitucci, G Antichi, A Di Pietro, “An improved DFA for fast regular expression matching, “ACM SIGCOMM Computer Communication Review 38 (5), 29-40, 2008.
[6] OpenMP Application Program Interface Version 3.1 May 2008, openmp.org.
[7] V. Aho, M. J. Corasick, “Efficient string matching: An aid to bibliography search” Communications of the ACM, vol. 18, no.6, pp.333-340, 1975.
[8] R. S. Boyer, J. S. Moore, “A fast string searching algorithm,” Communications of the ACM, vol. 20, no 10, pp.762-772, Oct. 1977.
[9] V.M. Glushkov, “The abstract theory of automata,” Russian Math. Surveys, 16 (1961), pp. 1–53.
[10] Ken Thompson, “Programming Techniques: Regular expression search algorithm,“ Communications of the ACM, v.11 n.6, p.419-422, June 1968.
[11] M. O. Rabin, D. Scott. “Finite automata and their decision problems,” IBM Journal of Research and Development, 3(2):114–125, 1959
[12] R. Smith , C. Estan , S. Jha , S. Kong, “Deflating the big bang: fast and scalable deep packet inspection with extended finite automata,“ SIGCOMM Comput. Commu. Rev., vol. 38, no. 4, pp. 207-218, 2008.
[13] S. Kumar , B. Chandrasekaran , J. Turner , G. Varghese, “Curing regular expressions matching algorithms from insomnia, amnesia, and acalculia,” in Proc. ACM ANCS, 2007, pp. 155-164.
[14] S. Kumar , S. Dharmapurikar , F. Yu , P. Crowley , J. Turner, “Algorithms to accelerate multiple regular expressions matching for deep packet inspection,” Proc. ACM SIGCOMM, 2006, pp. 339-350.
[15] N. Yamagaki, R. Sidhu, S. Kamiya, “High-Speed Regular Expression Matching Engine Using Multi-Character NFA,” in Proc. Intl. Conf. FPL, Aug. 2008, pp. 697–701.
[16] J. E. Hopcroft, R. Motwani, J. D. Ullman, "Introduction to Automata Theory, Languages and Computability, 2nd Edition," Addison-Wesley, Nov. 2000.
[17] Alfred V. Aho, Ravi Sethi, Jeffrey Ullman, “Compilers: Principles, Techniques and Tools”, Addison Wesley, 1986
[18] R. McNaughton, H. Yamada, "Regular Expressions and State Graphs for Automata". IEEE Trans. on Electronic Computers 9 (1): 39–47, Mar 1960.
[19] Regular Expression Processor. [Online]. Available: http://regex.wustl.edu/index.php/Main_Page