| 研究生: |
謝宗霖 Hsieh, Tsung-Lin |
|---|---|
| 論文名稱: |
基於Boyer-Moore演算法的多重字串比對演算法 A Boyer-Moore Based Multiple Pattern Matching Algorithm |
| 指導教授: |
張燕光
Chang, Yeim-Kuan |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2012 |
| 畢業學年度: | 100 |
| 語文別: | 英文 |
| 論文頁數: | 63 |
| 中文關鍵詞: | 網路入侵偵測系統 、樣式比對 、位移-基礎樣式比對演算法 |
| 外文關鍵詞: | Network Intrusion Detection Systems, Pattern Matching, Shift-based Pattern Matching algorithm |
| 相關次數: | 點閱:119 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,隨著網路速度越來越快,大量的病毒及惡意攻擊不斷的在網際網路間傳播。網路入侵偵測系統是一套設計來專門監測網路上是否含有惡意攻擊封包及病毒。一個有效率的樣式比對演算法在網路入侵偵測系統中扮演著重要的角色。藉由事先定義好病毒碼及惡意攻擊的內容,網路入侵偵測系統比對封包的內容看其是否包含已定義過的病毒碼。位移-基礎(shift-based)樣式比對演算法,像是KMP及Boyer-Moore演算法,為其中一大類應用在網路入侵偵測系統的方法。其優點在於能節省很多不需要的字元比對來提高樣式比對的效率。但是當事先定義好的病毒碼數量過多時,將造成病毒碼間關聯性較高而降低可以位移的距離,進而掩蓋位移-基礎(shift-based)樣式比對演算法的優點。
在這篇論文中,我們將以Boyer-Moore單一樣式比對演算法為基礎,實作出Boyer-Moore演算法的多重樣式版本。藉由增大最大可位移的距離,避免更多不需要的字元比對。在大部分的情況下,我們提出的方法在同一輸入字元只會有一次字元比對。而在原始Boyer-Moore演算法,同一輸入字元則可能被比對多次。使用的病毒規則表是取自於ClamAV此開放式源碼的軟體。根據我們模擬實作的結果,與Aho-Corasick演算法比較之下,在較小的病毒規則表,我們提出的方法能節省了超過75%不必要的字元比對。而在較大的病毒規則表,搭配分組的方法,在不同的輸入檔案中依然節省了33%到63%不必要的字元比對。藉由壓縮資料結構,我們提出的方法僅使用Aho-Corasick演算法0.91倍的記憶體使用量。
In recent years, with the grown of network speeds, a lot of viruses and malicious attacks have spread continuously over the Internet. Network intrusion detection system (NIDS) is designed especially for monitoring the viruses or malicious attacks by inspecting the contents of packets travelling over the Internet. An efficient pattern matching algorithm plays an important role in the NIDS. Based on the pre-defined virus signatures and malicious attack contents (called patterns), NIDS performs the pattern matching operations to find whether any pre-defined pattern exists in the payload of any packet. Shift-based schemes such as KMP and Boyer-Moore are the main pattern matching algorithms adopted in NID systems. The advantage of shift-based schemes is that they avoid many character comparisons to gain the performance of pattern matching. But when the set of the pre-defined patterns become larger, the relation between patterns would become closer so that the shift distance may be too small to fully take the advantage of the shift-based schemes.
In this thesis, we will use Boyer-Moore single pattern matching algorithm as our foundation and implement a multiple pattern matching version from it. By increasing the biggest shift distance, we can avoid more unnecessary character comparisons. In most cases, the same input character is only matched once in the proposed scheme while some input characters may have to be compared many times in the original Boyer-Moore scheme. The pattern set that we use is from ClamAV, the open source software. According to our simulation results, comparing with Aho-Corasick algorithm, our method can save more than 75% unnecessary character comparisons in small pattern set. In the set with more patterns, our scheme enhanced by a grouping method still saves 33% to 63% unnecessary character comparisons in different input files. By compressing the data structure, it only costs 0.91 times of memory usage comparing to Aho-Corasick algorithm.
[1] V. Aho and M. J. Corasick, “Efficient string matching: An aid to bibliography search,” Communications of the ACM, vol. 18, no.6, pp.333-340, 1975.
[2] B. H. Bloom, “Space/Time Trade-offs in Hash Coding with Allowable Errors.” Communications of the ACM, vol.13, no.7, pp.422-426, July 1970.
[3] R. S. Boyer and J. S. Moore, “A fast string searching algorithm,” Communications of the ACM, vol. 20, no 10, pp.762-772, Oct. 1977.
[4] Bro: The Bro Network Security Monitor, www.bro-ids.org
[5] Clam Anti-Virus signature database, www.clamav.net.
[6] B. Commentz-Walter “A string matching algorithm fast on the average,” Proc. vol. 6 International Colloquium on Automata, Languages, and Programming, pp.118-132, 1979.
[7] L. Fan, P. Cao, J. Almeida, and A. Z. Broder, “Summary Cache: A scalable Wide-Area Web Cache Sharing Protocol,” IEEE/ACM Transactions on Networking, vol.8, no.3, pp.281-293, June 2000.
[8] R. Nigel Horspool, “Practical fast searching in strings.” Software–Practice & Experience 10–6 (1980) pp.501-506.
[9] Andrew Hume & Daniel Sunday, “Fast string searching.” Software–Practice &Experience 21-11 (1991) pp.1221-1248.
[10] E. Knuth, J. H. M. Jr., and V. R. Pratt, “Fast pattern matching in strings,” SIAM Journal on Computing, vol. 6, no. 2, pp. 323–350, June 1977.
[11] P. C. Lin, Y. D. Lin, and Y. C. Lai, “A Hybrid Algorithm of Backward Hashing and Automaton Tracking for Virus Scanning,” IEEE Transactions on Computers, vol. 60, no. 4, April 2011.
[12] C. R. Meiners, J. Patel, E. Norige, E. Torng and A. X. Liu, “Fast Regular Expression Matching Using Small TCAMs For Network Intrusion Detection and Prevention Systems,” in Proceedings of the 19th USENIX conference on Security, 2010.
[13] Snort: The Open Source Network Intrusion Detection System, www.snort.org
[14] The Shmoo Group: the Capture the Flag Data. http://cctf.shmoo.com/
[15] Sun Wu, Udi Manber, “A fast algorithm For Multi-Pattern Searching,” Technical Report TR 94-17, University of Arizona at Tuscon, May 1994
[16] D. H. Yang, X. Ke and C. Yong, “An Improved Wu-Manber Multiple Patterns Matching Algorithm” IEEE International Performance Computing and Communications Conference, pp. 675-680, 2006.
[17] F. Yu, R. H. Katz, and T. V. Lakshman, “Gigabit Rate Packet Pattern-Matching Using TCAM,” in Proceeding of 12th IEEE International Conference on Network Protocols (ICNP’04), pp. 174-183, 2004.
[18] F. Zane, G. Narlikar and A. Basu, “CoolCAMs: Power-Efficient TCAMs for Forwarding Engines.” IEEE International Conference on Computer Communications, vol.1, pp.42-45, March 2003.