| 研究生: |
陳耶至 Chen, Ye-Zhi |
|---|---|
| 論文名稱: |
藉由GPU之大量執行緒以加速AC類型的字串比對演算法 Accelerating AC-based Pattern Matching Algorithm using Numerous Threads of GPU |
| 指導教授: |
張燕光
Chang, Yeim-Kuan |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 英文 |
| 論文頁數: | 58 |
| 中文關鍵詞: | 深度封包檢測 、Aho-Corasick 、決定性有限狀態自動機 、GPU 、字串比對 、病毒掃描 、CUDA |
| 外文關鍵詞: | Deep Packet Inspection, Aho-Corasick, Deterministic Finite Automata, GPU, pattern matching, CUDA |
| 相關次數: | 點閱:126 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
字串比對在網路入侵偵測系統中扮演著很重要的角色,網路入侵偵測系統被廣泛運用在保護電腦免於被病毒(virus)、惡意郵件(malware)、木馬(troy)…等惡意攻擊。隨著網路的迅速膨脹,傳統的CPU單核心單一執行緒演算法已經不能負荷需求,因此需要藉由硬體協助來加速字串比對的效能。GPU繪圖處理器是一個常見的低成本, 高彈性選項,隨著NVIDIA研發的Compute Unified Device Architecture (CUDA) 技術,它讓GPU的程式可編輯性大幅提升,讓GPU不再只是用來執行圖形運算,它可以提供Single Instruction Multiple Thread (SIMT) 的大資料平行運算,有效地加速了字串比對的速度。
基於著名的Aho-Corasick (AC) 演算法,以及將AC演算法改良並實作在GPU上的Parallel Failureless-AC (PFAC) 演算法,我們在此論文內分析它們並提出有效利用GPU平行處理的加速方法Multi-Byte Multi-Thread AC (MBMTAC)。(1)基於“每個輸入字元使用一條執行緒去執行字串比對的概念”,我們將它延伸到一次比對多個字元,由於將每個輸入字元都當作一個起始點去執行,因此我們在建立資料結構時不需要考慮字串的位移,解決了在CPU上實作多字元比對的AC所需額外大量的states和transitions的問題。(2)基於將每個輸入字元作為起始點的理念,我們不需要考慮backward transition self-loop以及failure function,因此transition table變的十分的稀疏。針對這項特性,我們使用一個完美雜湊的方法對transition table進行壓縮,我們提出的方法所需的記憶體使用量只需要PFAC演算法的0.19%。(3) 藉由分析CUDA的特性進行程式的優化。我們的效能可以達到33.4 Gb/s。
Pattern matching plays an important role in Network Intrusion Detection System (NIDS) which is widely used in protecting computers from malicious attacks. With the rapid expansion of the network, the traditional single-core single-threaded CPU based algorithms have been unable to load demand, so it is necessary to accelerate pattern matching algorithm through hardware. GPU is a low cost and high scalability choice. With the development of Compute Unified Device Architecture (CUDA) technique by Nvdia, CUDA can promote the programmability of GPU program and let the role of GPU not only in executing graphical computing, but support the Single Instruction Multiple Thread (SIMT) big data computation. It can accelerate the speed of pattern matching effectively.
In this thesis, we propose Multi-Byte Multi-Thread AC (MBMTAC) algorithm which is based on the famous Aho-Corasick (AC) algorithm and the Parallel Failureless-AC (PFAC) algorithm that improves the AC algorithm on GPU. (1) Base on the concept that let each byte of input stream as a start position and assign one thread for a start position to execute pattern matching, we extend it to matching multiple characters at once. The offset of each pattern does not need to be considered when establishing the data structure. Therefore, the problem of large amounts of additional states and transitions on CPU is solved. (2) Based on the concept, we do not need to consider the backward transitions, self-loops and the failure functions. Hence, the transition table becomes very sparse. We use a perfect hash scheme to compress the transition table. The memory usage of MBMTAC is only 0.19% of the PFAC needs. (3) By utilizing the characteristic of CUDA, we optimize the MBMTAC data structure so that the throughput can reach 33.4 Gb/s.
[1] Cell Broadband Engine. [Online]. Available : https://www-01.ibm.com/chips/techlib/techlib.nsf/products/Cell_Broadband_Engine
[2] CUDA. [Online]. Available : http://www.nvidia.com/object/cuda_home_new.html
[3] ClamAV. [Online]. Available : http://www.clamav.net/
[4] Defcon. [Online]. Available : http://www.defcon.org/
[5] Fermi. [Online]. Available : http://www.nvidia.com/object/fermi-architecture.html
[6] Snort. [Online]. Available : http://www.snort.org
[7] Tesla 2050. [Online]. Available :
http://www.nvidia.com.tw/object/product_tesla_C2050_C2070_tw.html
[8] V. Aho and M. J. Corasick, “Efficient string matching: An aid to bibliography search,” Communications of the ACM, vol. 18, no.6, pp.333-340, 1975.
[9] Z. K. Baker and V. K. Prasanna, “Time and Area Efficient Pattern Matching on FPGAs,” Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays, pp.223-232, 2004.
[10] R. S. Boyer and J. S. Moore, “A fast string searching algorithm,” Communications of the ACM, vol. 20, no 10, pp.762-772, 1977.
[11] E. Knuth, J. H. M. Jr., and V. R. Pratt, “Fast pattern matching in strings,” SIAM Journal on Computing, vol. 6, no. 2, pp.323–350, 1977.
[12] C. E. Leiserson and J. B. Saxe, “Retiming Synchronous Circuitry,” Technical Report 13, Digital Systems Research Center, 1986.
[13] C.-H. Lin, C.-H. Liu, L.-S. Chien and S.C. Chang, “Accelerating Pattern Matching Using a Novel Parallel Algorithm on GPUs,” IEEE Transactions on Computers, vol.PP, p.1, 2012.
[14] D. Luchaup, R. Smith, C. Estan, S. Jha, “Multi-byte regular expression matching with speculation,” Proceedings of the12th International Symposium on Recent Advances in Intrusion Detection, pp.284-303, 2009
[15] C. Radu, C. Leordeanu, V. Cristea and D. Luchaup, “Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation,” International Conference on Complex, Intelligent, and Software Intensive Systems, pp.203-210, 2011.
[16] R. E. Tarjan and A. C.-C. Yao, "Storing a sparse table," Communications of the ACM, voL 22, pp.606-611, 1979.
[17] G. Vasiliadis, S. Antonatos, M. Polychronakis, E. P. Markatos, and S. Ioannidis, “Gnort: High Performance Network Intrusion Detection Using Graphics Processors,” Proceedings of the 11th International Symposium on Recent Advances in Intrusion Detection, pp.116-134, 2008.
[18] Sun Wu, Udi Manber, “A fast algorithm For Multi-Pattern Searching,” Technical Report TR 94-17, University of Arizona at Tuscon, 1994
[19] X. Zha and S. Sahni, “GPU-to-GPU and Host-to-Host Multipattern String Matching On A GPU,” IEEE Transactions on Computers, vol.62, pp.1156-1169, 2013.
校內:2018-08-29公開