| 研究生: | 李尚宸 Li, Shang-Chen | 
|---|---|
| 論文名稱: | 用於巨大流量檢測之快速準確的兩階段 Sketch 框架 A Fast and Accurate Two-Stage Sketch-Based Framework for Heavy Hitter Detection | 
| 指導教授: | 張燕光 Chang, Yeim-Kuan | 
| 學位類別: | 碩士 Master | 
| 系所名稱: | 電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering | 
| 論文出版年: | 2025 | 
| 畢業學年度: | 113 | 
| 語文別: | 英文 | 
| 論文頁數: | 110 | 
| 中文關鍵詞: | Sketch 、巨大流量檢測 、網路測量 | 
| 外文關鍵詞: | Sketch, Heavy Hitter Detection, Network Measurement | 
| 相關次數: | 點閱:9 下載:0 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
巨大流量檢測 (heavy hitter detection) 在網路管理中至關重要,用於辨識高頻寬流量,例如來自那些由分散式阻斷服務攻擊 (DDoS attack)、突發流量 (flash crowd) 或網路壅塞 (network congestion) 引起的流量,這些流量需在記憶體受限的高速環境中高效處理。
我們提出一個 Two-Stage框架以及一種新穎的 Momentum-Sketch,以提升巨大流量檢測的效率與可擴展性。Two-Stage 架構在第一階段 (Stage 1),輸入資料封包被插入至一個 two-row CU Sketch,作為高吞吐量的預過濾器,快速剔除大多數低流量,以最小的記憶體與每封包開銷實現過濾。超過准入閾值 (admission threshold) 的流量被轉發至第二階段 (Stage 2)。在第二階段,我們使用 Momentum-Sketch計數資料結構,透過動量指標 (momentum metric) 結合到達強度 (arrival strength) 與估計頻率,並採用機率計數器衰減策略 (probabilistic counter-decay strategy),優先保留真正的巨大流量,有效降低偏態分佈下的估計誤差。
實驗結果顯示,Momentum-Sketch 在低記憶體預算下表現出卓越的性能。在 16 KB 記憶體下,其平均 F1 score 比其他方法高出約 10% 至 520%,平均召回率 (recall) 高出約 13% 至 600%。特別的是,由於其機率計數器衰減策略可消除過度估計 (overestimation),Momentum-Sketch 在採用閾值式報告時,無論記憶體限制為何,皆可達到精確率 (precision) 1.0。在 Momentum-Sketch 中,我們進一步利用 SIMD 指令加速插入與查詢操作,在所有資料集上,平均插入吞吐量提升 47.3%,平均查詢吞吐量提升 62.7%。
另一方面,Two-Stage 框架透過過濾低流量提升性能,相較 Momentum-Sketch 大幅提高插入和查詢吞吐量,同時保持高 F1 score。具體來說,在所有資料集上平均插入吞吐量提升 59.8%,查詢吞吐量提升 25.5%。
綜上所述,Momentum-Sketch 在資源有限的情況下表現出色,特別是在 16 KB 到 128 KB 的記憶體下提供了卓越的 F1 score。而 Two-Stage 框架在 128 KB 及以上的記憶體下顯著提升吞吐量並維持高 F1 score,使其成為記憶體容量較大系統的選擇。
Heavy hitter detection is critical in network management for identifying high-bandwidth flows, such as those arising from DDoS attacks, flash crowds, or network congestion, which must be processed efficiently in memory-constrained, high-speed environments. 
We present a Two-Stage framework together with a novel Momentum-Sketch to boost throughput and memory efficiency for heavy hitter detection. In Stage 1, incoming packets are inserted into a two-row CU Sketch, serving as a high-throughput pre-filter that quickly discards most non-heavy flows with minimal memory and per-packet overhead. Flows exceeding an admission threshold are forwarded to Stage 2. In Stage 2, we introduce Momentum-Sketch, a novel counting data structure that combines arrival strength with estimated frequency via a momentum metric and employs a probabilistic counter-decay strategy to preferentially keep true heavy hitters, effectively reducing estimation errors under skewed distributions. 
The experimental results show that Momentum-Sketch excels in accuracy under low memory budgets. At 16 KB, its average F1 score across all datasets surpasses that of other methods by approximately 10% to 520%, and its average recall exceeds these methods by approximately 13% to 600%. Notably, because its probabilistic counter-decay strategy eliminates overestimation, Momentum-Sketch achieves precision 1.0 under any memory constraint when threshold-based reporting is used. In Momentum-Sketch, we further leverage SIMD instructions to enhance insertion and query performance, improving average insert throughput by 47.3% and average query throughput by 62.7% across all datasets.
On the other hand, the Two-Stage framework enhances performance by filtering non-heavy flows, significantly improving insert and query throughput compared to Momentum-Sketch while maintaining a high F1 score. Specifically, it improves average insert throughput by 59.8% and average query throughput by 25.5% across all datasets.
In conclusion, Momentum-Sketch is highly effective in resource-constrained scenarios, delivering exceptional F1 scores at memory sizes ranging from 16 KB to 128 KB. The Two-Stage framework significantly improves throughput while maintaining a high F1 score at memory sizes of 128 KB and beyond, making it an excellent choice for systems with greater memory capacity.
[1] L. Tang, Q. Huang, and P. P. C. Lee, “MV‑Sketch: A Fast and Compact Invertible Sketch for Heavy Flow Detection in Network Data Streams,” in Proceedings of IEEE INFOCOM, Apr. 2019.
[2] A. Metwally, D. Agrawal, and A. El Abbadi, “Efficient computation of frequent and top‑k elements in data streams,” in Proc. 10th Int. Conf. Database Theory (ICDT), Edinburgh, UK, May 2005, pp. 398–412.
[3] G. S. Manku and R. Motwani, “Approximate frequency counts over data streams,” in Proc. 28th Int. Conf. Very Large Data Bases (VLDB), Hong Kong, China, Aug. 2002, pp. 346–357.
[4] G. Cormode and S. Muthukrishnan, “An improved data stream summary: the count‑min sketch and its applications,” J. Algorithms, vol. 55, no. 1, pp. 58–75, Apr. 2005.
[5] C. Estan and G. Varghese, “New directions in traffic measurement and accounting,” in Proc. SIGCOMM, San Diego, CA, USA, Aug. 2002. 
[6] T. Yang, J. Jiang, P. Liu, Q. Huang, J. Gong, Y. Zhou, R. Miao, X. Li, and S. Uhlig, “Elastic Sketch: Adaptive and Fast Network‑wide Measurements,” in Proc. ACM SIGCOMM, Budapest, Hungary, Aug. 2018, pp. 561–575.
[7] Y. Zhang, Z. Liu, R. Wang, T. Yang, J. Li, R. Miao, P. Liu, R. Zhang, and J. Jiang, “CocoSketch: High‑Performance Sketch‑based Measurement over Arbitrary Partial Key Query,” in Proc. ACM SIGCOMM, Virtual Event (Budapest), Aug. 2021, pp. 207–222.
[8] W. Li and P. Patras, “Stable‑Sketch: A Versatile Sketch for Accurate, Fast, Web‑Scale Data Stream Processing,” in Proc. ACM Web Conf. (WWW), Singapore, May 2024, pp. 4227–4238.
[9] W. Li and P. Patras, “Tight-Sketch: A High-Performance Sketch for Heavy Item-Oriented Data Stream Mining with Limited Memory Size,” in Proc. 32nd ACM Int. Conf. Inf. Knowl. Manag. (CIKM), Birmingham, UK, Oct. 2023, pp. 1328–1337.
[10] T. Benson, A. Akella, and D. Maltz, “Network traffic characteristics of data centers in the wild,” in Proc. 10th ACM SIGCOMM Conf. Internet Meas., Melbourne, Australia, Nov. 2010, pp. 267–280.
[11] “Intel® C++ Compiler 19.1 Developer Guide and Reference.” https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top.html (accessed Jun. 10, 2025).
[12] S. T. Zargar, J. B. D. Joshi, and D. Tipper, “A survey of defense mechanisms against distributed denial of service (DDoS) flooding attacks,” IEEE Communications Surveys & Tutorials, vol. 15, no. 4, pp. 2046–2069, 2013.
[13] J. Jung, B. Krishnamurthy, and M. Rabinovich, “Flash crowds and denial of service attacks: Characterization and implications for CDNs and web sites,” in Proc. 11th Int. Conf. World Wide Web, Honolulu, HI, USA, May 2002, pp. 252–262.
[14] Y. Li, R. Miao, H. Liu, Y. Zhuang, F. Feng, L. Tang, Z. Cao, M. Zhang, F. Kelly, M. Alizadeh, and M. Yu, “HPCC: High Precision Congestion Control,” in Proc. ACM SIGCOMM Conf. on Data Communication (SIGCOMM), Beijing, China, Aug. 2019, pp. 1–13.
[15] T. A. Benson, A. Anand, A. Akella, and M. Zhang, “Understanding data center traffic characteristics,” in Proc. 1st ACM Workshop on Research on Enterprise Networking (WREN), Barcelona, Spain, Aug. 2009, pp. 1–12.
[16] “MV-Sketch Repository,” https://github.com/Grace-TL/MV-Sketch.
[17] “The CAIDA Anonymized Internet Traces,” http://www.caida.org/data/overview/.
[18] “MAWI Working Group Traffic Archive,” http://mawi.wide.ad.jp/mawi/.
[19] M. Singh, M. Singh, and S. Kaur, “10 Days DNS Network Traffic from April-May, 2016,” Mendeley Data, v2, 2019.
[20] “CocoSketch Repository,” https://github.com/yindazhang/CocoSketch.
[21] Y. Zhou, T. Yang, J. Jiang, B. Cui, M. Yu, X. Li, and S. Uhlig, “Cold Filter: A meta‑framework for faster and more accurate stream processing,” in Proc. ACM SIGMOD/PODS, Houston, TX, USA, Jun. 10–15, 2018, pp. 1–16, doi: 10.1145/3183713.3183726.d
[22] C.‑Y. Hsu, P. Indyk, D. Katabi and A. Vakilian, “Learning‑based frequency estimation algorithms,” in Proc. International Conference on Learning Representations (ICLR), New Orleans, LA, USA, May 6–9, 2019.
[23] J. Huang, W. Zhang, Y. Li, L. Li, Z. Li, J. Ye, and J. Wang, “ChainSketch: An Efficient and Accurate Sketch for Heavy Flow Detection,” IEEE/ACM Transactions on Networking, vol. 31, no. 2, pp. 738–753, Aug. 2022, doi: 10.1109/TNET.2022.3199506.
[24] R. Yao, Y. Cao, Y. Cui, Y. Chen, C. Guo, and G. Shen, “CPSketch: A ‘Couple’ Sketch-based Heavy Flow Detection Method,” Journal of Network and Computer Applications, vol. 242, p. 104245, Jun. 2025, doi: 10.1016/j.jnca.2025.104245.
[25] J. Ye, L. Li, W. Zhang, G. Chen, Y. Shan, Y. Li, W. Li, and J. Huang, "UA-Sketch: An Accurate Approach to Detect Heavy Flow based on Uninterrupted Arrival," in Proc. 51st Int. Conf. Parallel Process. (ICPP), Bordeaux, France, Aug. 2022, pp. 57:1–57:11.
[26] P. Roy, A. Khan, and G. Alonso, “Augmented Sketch: Faster and More Accurate Stream Processing,” in Proc. 2016 ACM SIGMOD Int. Conf. Management of Data, San Francisco, CA, USA, Jun. 26–Jul. 1, 2016, pp. 1449–1463.
[27] J. Gong, T. Yang, H. Zhang, H. Li, S. Uhlig, S. Chen, L. Uden, and X. Li, “HeavyKeeper: An Accurate Algorithm for Finding Top-k Elephant Flows,” in Proc. 2018 USENIX Annual Technical Conference (USENIX ATC ’18), Boston, MA, USA, Jul. 2018, pp. 909–921.
[28] X. Liu, X. Zhang, B. Liu, T. Li, T. Yang, and G. Xie, “2FA Sketch: Two-Factor Armor Sketch for Accurate and Efficient Heavy Hitter Detection in Data Streams,” in Network and Parallel Computing: 20th IFIP WG 10.3 International Conference, NPC 2024, Haikou, China, December 7–8, 2024, Proceedings, Part II (Lecture Notes in Computer Science, vol. 15528). Singapore: Springer, 2025, pp. 337–350, doi: 10.1007/978-981-96-2864-3_27.
[29] J. Yu, Y.-E. Sun, H. Huang, Y. Du, G. Gao, H. Xu, and S. Chen, “HeavyTracker: An Efficient Algorithm for Heavy-Hitter Detection in High-Speed Networks,” in Proceedings of the 2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS), 2023, pp. 362–370, doi: 10.1109/ICPADS56603.2022.00054.
[30] M. Wu, H. Huang, Y.-E. Sun, Y. Du, S. Chen, and G. Gao, “ActiveKeeper: An Accurate and Efficient Algorithm for Finding Top-k Elephant Flows,” IEEE Communications Letters, vol. 25, no. 8, pp. 2545–2549, Aug. 2021, doi: 10.1109/LCOMM.2021.3077902.
[31] K. S. Fayaz, Y. Tobioka, V. Sekar, and M. Bailey, “Bohatei: Flexible and elastic DDoS defense,” in Proc. USENIX Secur., 2015, pp. 817–832.
[32] Liu, Z., Namkung, H., Nikolaidis, G., Lee, J., Kim, C., Jin, X., Braverman, V., Yu, M., Sekar, V.: Jaqen: A High-Performance Switch-Native approach for detecting and mitigating volumetric DDoS attacks with programmable switches. In: 30th USENIX Security Symposium (USENIX Security 21). pp. 3829–3846 (2021).
 校內:2029-07-31公開
                                        校內:2029-07-31公開