簡易檢索 / 詳目顯示

研究生: 李尚宸
Li, Shang-Chen
論文名稱: 用於巨大流量檢測之快速準確的兩階段 Sketch 框架
A Fast and Accurate Two-Stage Sketch-Based Framework for Heavy Hitter Detection
指導教授: 張燕光
Chang, Yeim-Kuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 英文
論文頁數: 110
中文關鍵詞: Sketch巨大流量檢測網路測量
外文關鍵詞: Sketch, Heavy Hitter Detection, Network Measurement
相關次數: 點閱:9下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 巨大流量檢測 (heavy hitter detection) 在網路管理中至關重要,用於辨識高頻寬流量,例如來自那些由分散式阻斷服務攻擊 (DDoS attack)、突發流量 (flash crowd) 或網路壅塞 (network congestion) 引起的流量,這些流量需在記憶體受限的高速環境中高效處理。
    我們提出一個 Two-Stage框架以及一種新穎的 Momentum-Sketch,以提升巨大流量檢測的效率與可擴展性。Two-Stage 架構在第一階段 (Stage 1),輸入資料封包被插入至一個 two-row CU Sketch,作為高吞吐量的預過濾器,快速剔除大多數低流量,以最小的記憶體與每封包開銷實現過濾。超過准入閾值 (admission threshold) 的流量被轉發至第二階段 (Stage 2)。在第二階段,我們使用 Momentum-Sketch計數資料結構,透過動量指標 (momentum metric) 結合到達強度 (arrival strength) 與估計頻率,並採用機率計數器衰減策略 (probabilistic counter-decay strategy),優先保留真正的巨大流量,有效降低偏態分佈下的估計誤差。
    實驗結果顯示,Momentum-Sketch 在低記憶體預算下表現出卓越的性能。在 16 KB 記憶體下,其平均 F1 score 比其他方法高出約 10% 至 520%,平均召回率 (recall) 高出約 13% 至 600%。特別的是,由於其機率計數器衰減策略可消除過度估計 (overestimation),Momentum-Sketch 在採用閾值式報告時,無論記憶體限制為何,皆可達到精確率 (precision) 1.0。在 Momentum-Sketch 中,我們進一步利用 SIMD 指令加速插入與查詢操作,在所有資料集上,平均插入吞吐量提升 47.3%,平均查詢吞吐量提升 62.7%。
    另一方面,Two-Stage 框架透過過濾低流量提升性能,相較 Momentum-Sketch 大幅提高插入和查詢吞吐量,同時保持高 F1 score。具體來說,在所有資料集上平均插入吞吐量提升 59.8%,查詢吞吐量提升 25.5%。
    綜上所述,Momentum-Sketch 在資源有限的情況下表現出色,特別是在 16 KB 到 128 KB 的記憶體下提供了卓越的 F1 score。而 Two-Stage 框架在 128 KB 及以上的記憶體下顯著提升吞吐量並維持高 F1 score,使其成為記憶體容量較大系統的選擇。

    Heavy hitter detection is critical in network management for identifying high-bandwidth flows, such as those arising from DDoS attacks, flash crowds, or network congestion, which must be processed efficiently in memory-constrained, high-speed environments.
    We present a Two-Stage framework together with a novel Momentum-Sketch to boost throughput and memory efficiency for heavy hitter detection. In Stage 1, incoming packets are inserted into a two-row CU Sketch, serving as a high-throughput pre-filter that quickly discards most non-heavy flows with minimal memory and per-packet overhead. Flows exceeding an admission threshold are forwarded to Stage 2. In Stage 2, we introduce Momentum-Sketch, a novel counting data structure that combines arrival strength with estimated frequency via a momentum metric and employs a probabilistic counter-decay strategy to preferentially keep true heavy hitters, effectively reducing estimation errors under skewed distributions.
    The experimental results show that Momentum-Sketch excels in accuracy under low memory budgets. At 16 KB, its average F1 score across all datasets surpasses that of other methods by approximately 10% to 520%, and its average recall exceeds these methods by approximately 13% to 600%. Notably, because its probabilistic counter-decay strategy eliminates overestimation, Momentum-Sketch achieves precision 1.0 under any memory constraint when threshold-based reporting is used. In Momentum-Sketch, we further leverage SIMD instructions to enhance insertion and query performance, improving average insert throughput by 47.3% and average query throughput by 62.7% across all datasets.
    On the other hand, the Two-Stage framework enhances performance by filtering non-heavy flows, significantly improving insert and query throughput compared to Momentum-Sketch while maintaining a high F1 score. Specifically, it improves average insert throughput by 59.8% and average query throughput by 25.5% across all datasets.
    In conclusion, Momentum-Sketch is highly effective in resource-constrained scenarios, delivering exceptional F1 scores at memory sizes ranging from 16 KB to 128 KB. The Two-Stage framework significantly improves throughput while maintaining a high F1 score at memory sizes of 128 KB and beyond, making it an excellent choice for systems with greater memory capacity.

    摘要 1 Abstract 3 誌謝 5 TABLE OF CONTENTS 6 LIST OF TABLES 9 LIST OF FIGURES 11 Chapter 1 Introduction 14 1.1 Introduction 14 1.2 Organization of the Thesis 17 Chapter 2 Related Work 18 2.1 Heavy Hitter Detection Definition 18 2.2 Sketch Overview 19 2.3 Basic Sketch Data Structure 20 2.3.1 Count-Min Sketch 20 2.3.2 Count-Min Sketch with Conservative Update 21 2.4 Heavy Hitter Detection Approach 23 2.4.1 Space-Saving 23 2.4.2 Count-Min Heap 23 2.4.3 Elastic Sketch 24 2.4.4 MV-Sketch 25 2.4.5 CocoSketch 27 2.4.6 Stable-Sketch 28 2.4.7 Tight-Sketch 30 2.4.8 Cold Filter Framework 32 2.5 SIMD (Single Instruction, Multiple Data) 35 2.5.1 SSE2 37 2.5.2 AVX2 38 2.5.3 AVX-512 41 Chapter 3 Proposed Scheme 44 3.1 Overview 44 3.2 Comparative Analysis of Flow‐Frequency Distributions Before and After Filtering 45 3.3 Filter Sketch Selection 48 3.4 Distinguishing Heavy Hitters with Momentum Metric 49 3.5 Two-Stage Heavy Hitter Detection Framework 52 3.5.1 Insert 53 3.5.2 Query 53 3.6 Stage 1 Filter Sketch: CU Sketch 54 3.6.1 Insert 54 3.6.2 Query 55 3.7 Stage 2 Main Sketch: Momentum-Sketch 55 3.7.1 Insert 56 3.7.2 Query 57 3.8 Two-Stage Running Examples 58 3.8.1 Insert 58 3.8.2 Query 60 3.9 Optimized Momentum-Sketch with SIMD Instructions 61 3.9.1 Insert 62 3.9.2 Query 63 Chapter 4 Experimental Results 65 4.1 Experimental Environment 65 4.2 Dataset Descriptions 65 4.3 Methodology 66 4.4 Evaluation Metrics 66 4.5 Momentum Decay Factor Setting 67 4.6 Two-Stage Parameters Setting 68 4.7 Performance on Heavy Hitter Detection 71 4.7.1 Results of CAIDA 2016 Dataset 73 4.7.2 Results of CAIDA 2018 Dataset 77 4.7.3 Results of CAIDA 2019 Dataset 81 4.7.4 Results of MAWI 2021 Dataset 85 4.7.5 Results of MAWI 2022 Dataset 89 4.7.6 Results of Campus Dataset 93 4.8 Accelerating Momentum-Sketch with SIMD Instructions 97 4.9 Different Counter-Decay Probability of Momentum-Sketch 99 Chapter 5 Conclusion 102 Chapter 6 Future Work 103 6.1 Dynamic Parameter Tuning for Two-Stage Framework 103 6.2 Practical Deployment Across More Platforms 103 References 104 Appendix 108

    [1] L. Tang, Q. Huang, and P. P. C. Lee, “MV‑Sketch: A Fast and Compact Invertible Sketch for Heavy Flow Detection in Network Data Streams,” in Proceedings of IEEE INFOCOM, Apr. 2019.
    [2] A. Metwally, D. Agrawal, and A. El Abbadi, “Efficient computation of frequent and top‑k elements in data streams,” in Proc. 10th Int. Conf. Database Theory (ICDT), Edinburgh, UK, May 2005, pp. 398–412.
    [3] G. S. Manku and R. Motwani, “Approximate frequency counts over data streams,” in Proc. 28th Int. Conf. Very Large Data Bases (VLDB), Hong Kong, China, Aug. 2002, pp. 346–357.
    [4] G. Cormode and S. Muthukrishnan, “An improved data stream summary: the count‑min sketch and its applications,” J. Algorithms, vol. 55, no. 1, pp. 58–75, Apr. 2005.
    [5] C. Estan and G. Varghese, “New directions in traffic measurement and accounting,” in Proc. SIGCOMM, San Diego, CA, USA, Aug. 2002.
    [6] T. Yang, J. Jiang, P. Liu, Q. Huang, J. Gong, Y. Zhou, R. Miao, X. Li, and S. Uhlig, “Elastic Sketch: Adaptive and Fast Network‑wide Measurements,” in Proc. ACM SIGCOMM, Budapest, Hungary, Aug. 2018, pp. 561–575.
    [7] Y. Zhang, Z. Liu, R. Wang, T. Yang, J. Li, R. Miao, P. Liu, R. Zhang, and J. Jiang, “CocoSketch: High‑Performance Sketch‑based Measurement over Arbitrary Partial Key Query,” in Proc. ACM SIGCOMM, Virtual Event (Budapest), Aug. 2021, pp. 207–222.
    [8] W. Li and P. Patras, “Stable‑Sketch: A Versatile Sketch for Accurate, Fast, Web‑Scale Data Stream Processing,” in Proc. ACM Web Conf. (WWW), Singapore, May 2024, pp. 4227–4238.
    [9] W. Li and P. Patras, “Tight-Sketch: A High-Performance Sketch for Heavy Item-Oriented Data Stream Mining with Limited Memory Size,” in Proc. 32nd ACM Int. Conf. Inf. Knowl. Manag. (CIKM), Birmingham, UK, Oct. 2023, pp. 1328–1337.
    [10] T. Benson, A. Akella, and D. Maltz, “Network traffic characteristics of data centers in the wild,” in Proc. 10th ACM SIGCOMM Conf. Internet Meas., Melbourne, Australia, Nov. 2010, pp. 267–280.
    [11] “Intel® C++ Compiler 19.1 Developer Guide and Reference.” https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top.html (accessed Jun. 10, 2025).
    [12] S. T. Zargar, J. B. D. Joshi, and D. Tipper, “A survey of defense mechanisms against distributed denial of service (DDoS) flooding attacks,” IEEE Communications Surveys & Tutorials, vol. 15, no. 4, pp. 2046–2069, 2013.
    [13] J. Jung, B. Krishnamurthy, and M. Rabinovich, “Flash crowds and denial of service attacks: Characterization and implications for CDNs and web sites,” in Proc. 11th Int. Conf. World Wide Web, Honolulu, HI, USA, May 2002, pp. 252–262.
    [14] Y. Li, R. Miao, H. Liu, Y. Zhuang, F. Feng, L. Tang, Z. Cao, M. Zhang, F. Kelly, M. Alizadeh, and M. Yu, “HPCC: High Precision Congestion Control,” in Proc. ACM SIGCOMM Conf. on Data Communication (SIGCOMM), Beijing, China, Aug. 2019, pp. 1–13.
    [15] T. A. Benson, A. Anand, A. Akella, and M. Zhang, “Understanding data center traffic characteristics,” in Proc. 1st ACM Workshop on Research on Enterprise Networking (WREN), Barcelona, Spain, Aug. 2009, pp. 1–12.
    [16] “MV-Sketch Repository,” https://github.com/Grace-TL/MV-Sketch.
    [17] “The CAIDA Anonymized Internet Traces,” http://www.caida.org/data/overview/.
    [18] “MAWI Working Group Traffic Archive,” http://mawi.wide.ad.jp/mawi/.
    [19] M. Singh, M. Singh, and S. Kaur, “10 Days DNS Network Traffic from April-May, 2016,” Mendeley Data, v2, 2019.
    [20] “CocoSketch Repository,” https://github.com/yindazhang/CocoSketch.
    [21] Y. Zhou, T. Yang, J. Jiang, B. Cui, M. Yu, X. Li, and S. Uhlig, “Cold Filter: A meta‑framework for faster and more accurate stream processing,” in Proc. ACM SIGMOD/PODS, Houston, TX, USA, Jun. 10–15, 2018, pp. 1–16, doi: 10.1145/3183713.3183726.d
    [22] C.‑Y. Hsu, P. Indyk, D. Katabi and A. Vakilian, “Learning‑based frequency estimation algorithms,” in Proc. International Conference on Learning Representations (ICLR), New Orleans, LA, USA, May 6–9, 2019.
    [23] J. Huang, W. Zhang, Y. Li, L. Li, Z. Li, J. Ye, and J. Wang, “ChainSketch: An Efficient and Accurate Sketch for Heavy Flow Detection,” IEEE/ACM Transactions on Networking, vol. 31, no. 2, pp. 738–753, Aug. 2022, doi: 10.1109/TNET.2022.3199506.
    [24] R. Yao, Y. Cao, Y. Cui, Y. Chen, C. Guo, and G. Shen, “CPSketch: A ‘Couple’ Sketch-based Heavy Flow Detection Method,” Journal of Network and Computer Applications, vol. 242, p. 104245, Jun. 2025, doi: 10.1016/j.jnca.2025.104245.
    [25] J. Ye, L. Li, W. Zhang, G. Chen, Y. Shan, Y. Li, W. Li, and J. Huang, "UA-Sketch: An Accurate Approach to Detect Heavy Flow based on Uninterrupted Arrival," in Proc. 51st Int. Conf. Parallel Process. (ICPP), Bordeaux, France, Aug. 2022, pp. 57:1–57:11.
    [26] P. Roy, A. Khan, and G. Alonso, “Augmented Sketch: Faster and More Accurate Stream Processing,” in Proc. 2016 ACM SIGMOD Int. Conf. Management of Data, San Francisco, CA, USA, Jun. 26–Jul. 1, 2016, pp. 1449–1463.
    [27] J. Gong, T. Yang, H. Zhang, H. Li, S. Uhlig, S. Chen, L. Uden, and X. Li, “HeavyKeeper: An Accurate Algorithm for Finding Top-k Elephant Flows,” in Proc. 2018 USENIX Annual Technical Conference (USENIX ATC ’18), Boston, MA, USA, Jul. 2018, pp. 909–921.
    [28] X. Liu, X. Zhang, B. Liu, T. Li, T. Yang, and G. Xie, “2FA Sketch: Two-Factor Armor Sketch for Accurate and Efficient Heavy Hitter Detection in Data Streams,” in Network and Parallel Computing: 20th IFIP WG 10.3 International Conference, NPC 2024, Haikou, China, December 7–8, 2024, Proceedings, Part II (Lecture Notes in Computer Science, vol. 15528). Singapore: Springer, 2025, pp. 337–350, doi: 10.1007/978-981-96-2864-3_27.
    [29] J. Yu, Y.-E. Sun, H. Huang, Y. Du, G. Gao, H. Xu, and S. Chen, “HeavyTracker: An Efficient Algorithm for Heavy-Hitter Detection in High-Speed Networks,” in Proceedings of the 2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS), 2023, pp. 362–370, doi: 10.1109/ICPADS56603.2022.00054.
    [30] M. Wu, H. Huang, Y.-E. Sun, Y. Du, S. Chen, and G. Gao, “ActiveKeeper: An Accurate and Efficient Algorithm for Finding Top-k Elephant Flows,” IEEE Communications Letters, vol. 25, no. 8, pp. 2545–2549, Aug. 2021, doi: 10.1109/LCOMM.2021.3077902.
    [31] K. S. Fayaz, Y. Tobioka, V. Sekar, and M. Bailey, “Bohatei: Flexible and elastic DDoS defense,” in Proc. USENIX Secur., 2015, pp. 817–832.
    [32] Liu, Z., Namkung, H., Nikolaidis, G., Lee, J., Kim, C., Jin, X., Braverman, V., Yu, M., Sekar, V.: Jaqen: A High-Performance Switch-Native approach for detecting and mitigating volumetric DDoS attacks with programmable switches. In: 30th USENIX Security Symposium (USENIX Security 21). pp. 3829–3846 (2021).

    無法下載圖示 校內:2029-07-31公開
    校外:2029-07-31公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE