| 研究生: |
劉哲宏 Liu, Che-Hung |
|---|---|
| 論文名稱: |
低延遲四通道排序電路設計與實現 Design and implementation of a low latency quadra path sorter |
| 指導教授: |
陳培殷
Chen, Pei-Yin |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 英文 |
| 論文頁數: | 32 |
| 中文關鍵詞: | 90-nm TSMC 、comparison free 、sorting algorithms 、speed complexity O(N) |
| 外文關鍵詞: | 90-nm TSMC, comparison free, sorting algorithms, speed complexity O(N) |
| 相關次數: | 點閱:52 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
排序是人們在日常生活中經常處理的問題,因此如何設計高效能、低面積的排序硬體是一項重要的研究議題。先前的作品可擴展性不佳,也就是當排序的輸入數目大幅上升時,必須要大幅修改硬體架構,其成本與排序時間也將呈現非線性成長。
因此本文提出一種成本與速度為O(N)的非比較型排序演算法,以計數排序法(counting sort)做延伸最佳化,在不增加面積的前提下加速排序,使其具備更高的工作頻率,並且通過四向輸出(quadra path)以降低整體排序所需要的cycle數。
根據SYNOPSYS之Design Compiler及Artisan TSMC 90 nm標準元件庫的電路合成結果,電路排序1024個10-bit數字時工作頻率可以到達1.0GHz,排序時間在1.28μ~2.302μ之間。相較於其他排序硬體,實驗結果表明面積減少了11.3%,單位時間內的產出至少提高了101.6%,最高可以提高到141.9%。
Sorting is a well-known problem which is commonly discussed not only in algorithms, but also in daily life. Therefore, design a faster and smaller sorting hardware has always been an important research topic, but the previous works are not scalable enough. When the number of inputs rises sharply, it will increase the cost of area and the latency will not increase in linear, hence the architecture needs to be modified significantly.
In this paper, we propose a non-compare sorting algorithm with space complexity and time complexity both in O(N). The sorting algorithm (counting sort) is used for extended optimization, and its speed is faster without increasing the area. The proposed architecture substantially increases its sorting frequency and also divides the output into four output ports on the sorting cycle (latency) in order to reduce the overall cycle.
The architecture is tested considering delay components of 90-nm standard cell library of the Taiwan Semiconductor Manufacturing Company (TSMC) and Design Compiler of the SYNOPSYS company, which sorting of N = 1024 elements of size K = 10-bit. The results verify that our proposed hardware requires approximately 1.28μ~2.302μ to sort 1024 elements with a clock cycle time of 1.0 GHz. The experimental results show that the area is reduced by 11.3%, throughput is increased by at least 101.6%, and at most can be increased to 141.9%.
[1] D. E. Knuth, The Art of Computer Programming. Reading, MA, USA: Addison-Wesley, Mar. 2011.
[2] Y. Bang and S. Q. Zheng, “A simple and efficient VLSI sorting architecture,” in Proc. 37th Midwest Symp. Circuits Syst., vol. 1. 1994, pp. 70–73.
[3] T. Leighton, Y. Ma, and C. G. Plaxton, “Breaking the(n log2n) barrier for sorting with faults,” J. Comput. Syst. Sci., vol. 54, no. 2, pp. 265–304, 1997.
[4] Y. Han, “Deterministic sorting in O(n log log n) time and linear space,” J. Algorithms, vol. 50, no. 1, pp. 96–105, 2004.
[5] C. Canaan, M. S. Garai, and M. Daya, “Popular sorting algorithms,” World Appl. Programm., vol. 1, no. 1, pp. 62–71, Apr. 2011.
[6] L. M. Busse, M. H. Chehreghani, and J. M. Buhmann, “The information content in sorting algorithms,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Jul. 2012, pp. 2746–2750.
[7] R. Zhang, X. Wei, and T. Watanabe, “A sorting-based IO connection assignment for flip-chip designs,” in Proc. IEEE 10th Int. Conf. ASIC (ASICON), Oct. 2013, pp. 1–4.
[8] D. Fuguo, “Several incomplete sort algorithms for getting the median value,” Int. J. Digital Content Technol. Appl., vol. 4, no. 8, pp. 193–198, Nov. 2010.
[9] W. Jianping, Y. Yutang, L. Lin, H. Bingquan, and G. Tao, “Highspeed FPGA-based SOPC application for currency sorting system,” in Proc. 10th Int. Conf. Electron. Meas. Instrum. (ICEMI), Aug. 2011, pp. 85–89.
[10] R. Meolic, “Demonstration of sorting algorithms on mobile platforms,” in Proc. CSEDU, 2013, pp. 136–141.
[11] F.-C. Leu, Y.-T. Tsai, and C. Y. Tang, “An efficient external sorting algorithm,” Inf. Process. Lett., vol. 75, pp. 159–163, Sep. 2000.
[12] J. L. Bentley and R. Sedgewick, “Fast algorithms for sorting and searching strings,” in Proc. 8th Annu. ACM-SIAM Symp. Discrete Algorithms (SODA), Jan. 1997, pp. 360–369.
[13] L. Xiao, X. Zhang, and S. A. Kubricht, “Improving memory performance of sorting algorithms,” J. Experim. Algorithmic, vol. 5, no. 3, pp. 1–20, 2000.
[14] P. Sareen, “Comparison of sorting algorithms (on the basis of average case),” Int. J. Adv. Res. Comput. Sci. Softw. Eng., vol. 3, no. 3, pp. 522–532, Mar. 2013.
[15] H. Inoue, T. Moriyama, H. Komatsu, and T. Nakatani, “AA-SORT: A new parallel sorting algorithm for multi-core SIMD processors,” in Proc. 16th Int. Conf. Parallel Archit. Compil. Techn. (PACT), 2007, pp. 189–198.
[16] V. Kundeti and S. Rajasekaran, “Efficient out-of-core sorting algorithms for the parallel disks model,” J. Parallel Distrib. Comput., vol. 71, no. 11, pp. 1427–1433, 2011.
[17] G. Capannini, F. Silvestri, and R. Baraglia, “Sorting on GPUs for large scale datasets: A thorough comparison,” Int. Process. Manage., vol. 48, no. 5, pp. 903–917, 2012.
[18] D. Cederman and P. Tsigas, “GPU-Quicksort: A practical quicksort algorithm for graphics processors,” ACM J. Experim. Algorithmics (JEA), vol. 14, Dec. 2009, Art. no. 4.
[19] B. Jan, B. Montrucchio, C. Ragusa, F. G. Ghan, and O. Khan, “Fast parallel sorting algorithms on GPUs,” Int. J. Distrib. Parallel Syst., vol. 3, no. 6, pp. 107–118, Nov. 2012.
[20] N. Satish, M. Harris, and M. Garland, “Designing efficient sorting algorithms for manycore GPUs,” in Proc. 23rd IEEE Int. Symp. Parallel Distrib. Process., May 2009, pp. 1–10.
[21] C. Bunse, H. Höpfner, S. Roychoudhury, and E. Mansour, “Choosing the ‘best’ sorting algorithm from optimal energy consumption,” in Proc. ICSOFT, vol. 2. 2009, pp. 199–206.
[22] A. D. Mishra and D. Garg, “Selection of best sorting algorithm,” Int. J. Intell. Inf. Process., vol. 2, no. 2, pp. 363–368, Jul./Dec. 2008.
[23] T.-C. Lin, C.-C. Kuo, Y.-H. Hsieh, and B.-F. Wang, “Efficient algorithms for the inverse sorting problem with bound constraints under the l∞-norm and the Hamming distance,” J. Comput. Syst. Sci., vol. 75, no. 8, pp. 451–464, 2009.
[24] F. Henglein, “What is a sorting function?” J. Logic Algebraic Programm., vol. 78, no. 7, pp. 552–572, Aug./Sep. 2009.
[25] E. Mumolo, G. Capello, and M. Nolich, “VHDL design of a scalable VLSI sorting device based on pipelined computation,” J. Comput. Inf. Technol., vol. 12, no. 1, pp. 1–14, 2004.
[26] E. Herruzo, G. Ruiz, J. I. Benavides, and O. Plata, “A new parallel sorting algorithm based on odd-even mergesort,” in Proc. 15th EUROMICRO Int. Conf. Parallel, Distrib. Netw.-Based Process. (PDP), Feb. 2007, pp. 18–22.
[27] M. Thorup, “Randomized sorting in O(n log log n) time and linear space using addition, shift, and bit-wise Boolean operations,” J. Algorithms, vol. 42, no. 2, pp. 205–230, Feb. 2002.
[28] M. Afghahi, “A 512 16-b bit-serial sorter chip,” IEEE J. Solid-State Circuits, vol. 26, no. 10, pp. 1452–1457, Oct. 1991.
[29] J.-T. Yan, “An improved optimal algorithm for bubble-sorting-based nonManhattan channel routing,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 18, no. 2, pp. 163–171, Feb. 1999.
[30] L. Skliarova, D. Mihhailov, V. Sklyarov, and A. Sudnitson, “Implementation of sorting algorithms in reconfigurable hardware,” in Proc. 16th IEEE Medit. Electrotech. Conf. (MELECON), Mar. 2012, pp. 107–110.
[31] N. Tabrizi and N. Bagherzadeh, “An ASIC design of a novel pipelined and parallel sorting accelerator for a multiprocessor-on-a-chip,” in Proc. IEEE 6th Int. Conf. ASIC (ASICON), Oct. 2005, pp. 46–49.
[32] H. Schröder, “VLSI-sorting evaluated under the linear model,” J. Complex., vol. 4, no. 4, pp. 330–355, Dec. 1988.
[33] H.-S. Yu, J.-Y. Lee, and J.-D. Cho, “A fast VLSI implementation of sorting algorithm for standard median filters,” in Proc. 12th Annu. IEEE Int. ASIC/SOC Conf., Sep. 1999, pp. 387–390.
[34] G. Campobello and M. Russo, “A scalable VLSI speed/area tunable sorting network,” J. Syst. Archit., vol. 52, no. 10, pp. 589–602, Oct. 2006
[35] W. Zhou, Z. Cai, R. Ding, C. Gong, and D. Liu, “Efficient sorting design on a novel embedded parallel computing architecture with unique memory access,” Comput. Elect. Eng., vol. 39, no. 7, pp. 2100–2111, Oct. 2013.
[36] V. Sklyarov, “FPGA-based implementation of recursive algorithms,” Microprocess. Microsyst., vol. 28, nos. 5–6, pp. 197–211, Aug. 2004.
[37] R. Lin and S. Olariu, “Efficient VLSI architectures for Columnsort,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 7, no. 1, pp. 135–138, Mar. 1999.
[38] S. W. Moore and B. T. Graham, “Tagged up/down sorter—A hardware priority queue,” Comput. J., vol. 38, no. 9, pp. 695–703, Sep. 1995.
[39] G. V. Russo and M. Russo, “A novel class of sorting networks,” IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 43, no. 7, pp. 544–552, Jul. 1996.
[40] S. Dong, X. Wang, and X. Wang, “A novel high-speed parallel scheme for data sorting algorithm based on FPGA,” in Proc. IEEE 2nd Int. Congr. Image Signal Process. (CISP), Oct. 2009, pp. 1–4.
[41] A. Széll and B. Fehér, “Efficient sorting architectures in FPGA,” in Proc. Int. Carpathian Control Conf. (ICCC), May 2006, pp. 1–4.
[42] A. A. Colavita, A. Cicuttin, F. Fratnik, and G. Capello, “SORTCHIP: A VLSI implementation of a hardware algorithm for continuous data sorting,” IEEE J. Solid-State Circuits, vol. 38, no. 6, pp. 1076–1079, Jun. 2003.
[43] T. Demirci, I. Hatirnaz, and Y. Leblebici, “Full-custom CMOS realization of a high-performance binary sorting engine with linear area-time complexity,” in Proc. Int. Symp. Circuits Syst. (ISCAS), vol. 5. May 2003, pp. V453–V456.
[44] K. Ratnayake and A. Amer, “An FPGA architecture of stable-sorting on a large data volume: Application to video signals,” in Proc. 41st Annu. Conf. Inf. Sci. Syst., Mar. 2007, pp. 431–436.
[45] S. Alaparthi, K. Gulati, and S. P. Khatri, “Sorting binary numbers in hardware—A novel algorithm and its implementation,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2009, pp. 2225–2228.
[46] J. F. Hughes et al., Computer Graphics: Principles and Practice, 3rd ed. Reading, MA, USA: Addison-Wesley, 2014.
[47] R. Chen, S. Siriyal, and V. Prasanna, “Energy and memory efficient mapping of bitonic sorting on FPGA,” in Proc. ACM/SIGDA Int. Symp. Field Program. Gate (FPGA), Monterey, CA, USA, Feb. 2015, pp. 240–249
[48] Saleh Abdel-Hafeez,Ann Gordon-Ross.An Efficient O(N) Comparison-Free Sorting Algorithm.IEEE Transactions on Very Large Scale Integration (VLSI) Systems.Pages: 1 - 13 Volume: PP Issue: 99. 2017