成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	林文景 Lin, Wen-Ching
論文名稱：	高效能公開金鑰密碼系統處理器之積體電路架構設計 High-performance VLSI Architecture Design of Public-Key Cryptography Processor
指導教授：	謝明得 Shieh, Ming-Der
學位類別：	博士 Doctor
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2012
畢業學年度：	100
語文別：	英文
論文頁數：	136
中文關鍵詞：	除法、有限場、模數乘法、公開金鑰密碼、超大型積體電路
外文關鍵詞：	division, finite field, modular multiplication, public-key cryptography, VLSI
相關次數：	點閱：191 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

公開金鑰密碼(public-key cryptography, PKC)能提供安全功能像是機密性、認證、資料完整性和不可否認性，因此它在無線通訊系統和網路服務中扮演越來越重要的角色。本論文提出高效能的GF(2m)除法、模數乘法和公開金鑰密碼處理器之超大型積體電路設計。我們提出減輕傳統迭代GF(2m)除法和基於字元(word-based)之蒙哥馬利(Montgomery)模數乘法演算法的資料相依(data dependency)的技術，以達到高效率硬體實現。我們藉由改變預先定義的變數，然後相對應地更新其初始值，來改寫傳統迭代(iterative)的GF(2m)除法演算法。在改寫的演算法裡，用來更新新定義變數的兩個運算，判斷加法是否要執行和減少多項式次數，可以同時執行；因此開發出的除法器在不增加延遲(latency)和面積的成本之下，進一步提高運行速度。
在減輕傳統基於字元之蒙哥馬利模數乘法演算法的資料相依的情況下，使相鄰的處理單元(process element, PE)的延遲時間為剛好一個週期；且在字元須大於一個位元的前提下，無論選擇字元的大小。加上使用所提出的縮減運算元數目方案，我們的可擴充性架構可以運作在非常高的速度；而且不同的應用可以使用各自適合的資料路徑(datapath)。除此之外，這項研究也提出最大程度的減輕傳統基於字元之蒙哥馬利演算法的資料相依，以達到最大限度重複使用變數現在的這個字元。在最大程度的減輕資料相依之下，我們提出一個新的排程方案來減少所開發的可擴充性架構的記憶體存取次數。假設w代表一個字元的大小，分析結果顯示，所提出的可擴充性架構的記憶體存取次數約為傳統的可擴充性架構的(w-1)分之一倍。
最後，本論文提出一個配備可重組化處理單元陣列(reconfigurable process element array, RPEA)的公開金鑰密碼處理器；可重組化處理單元陣列可根據運算元大小，重組成一個、兩個或四個基於所提出的基於字元展開(unrolling)平行化蒙哥馬利演算法的低延遲模數乘法器；使用盡可能多的模數乘法器之排程來執行模指數和橢圓曲線向量乘法的運算。因此，可重組化處理單元陣列可有效率地執行各種精確度的任意質數、不可分解多項式和曲線參數之模指數和雙場(dual-field)橢圓曲線向量乘法。比較的結果顯示所提出的處理器不但能支援多種公開金鑰密碼演算法與廣泛大小的運算元，且與其它文獻內的公開金鑰密碼處理器比較起來是更有效率的。

Public-key cryptography (PKC) has become increasingly important in wireless communication systems and internet services for providing security such as confidentiality, authentication, data integrity, and non-repudiation. This dissertation presents high-performance VLSI designs of GF(2m) division, modular multiplication, and a PKC processor. Techniques for relaxing the data dependency in conventional iterative GF(2m) division and word-based Montgomery modular algorithms are presented to achieve highly efficient hardware implementations. The conventional GF(2m) division algorithm is reformulated by changing the pre-defined variable and then updating its initial value accordingly. In the reformulated division algorithm, determining whether the addition is needed to be performed and the reduction for updating the new-defined variable can be carried out concurrently; thus the developed dividers improve its operating speed without increasing latency or area cost.
With the data dependency relaxed in conventional word-based Montgomery modular multiplication algorithms, a latency of exactly one cycle between neighboring processing elements can be obtained regardless of the chosen word size w (w > 1). With the proposed operand reduction scheme, scalable architectures can operate at high speeds and proper datapaths can be chosen for specific applications. Besides, the data dependency in conventional word-based Montgomery’s algorithms is greatly relaxed to maximize the possibility of reusing the current words of variables. With the greatly relaxed data dependency, a scheduling scheme is proposed to further reduce the number of memory accesses in the developed scalable architecture. Analytical results show that the memory bandwidth requirement of the proposed scalable architecture is almost 1/(w-1) times that of conventional scalable architectures.
Finally, a reconfigurable process element array (RPEA) is developed in the proposed PKC processor. Given an operand size, the RPEA can be configured as one, two, or four low-latency modular multipliers based on the proposed word-based unrolling parallelized Montgomery algorithm. An associated scheduler can then use as many modular multipliers as possible to carry out modular exponentiation and elliptic curve scalar multiplication. Consequently, the RPEA can efficiently perform modular exponentiation and dual-field elliptic curve scalar multiplication for arbitrary prime numbers, irreducible polynomials, and curve parameters with multiple precisions. Comparison results show that the proposed PKC processor is more efficient than existing PKC processors and can support various public-key algorithms with a wide range of operand sizes.

INTRODUCTION	1
1	MOTIVATION	1
2	PREVIOUS WORKS	4
3	PROPOSED DESIGNS OF MODULAR ARITHMETIC AND PUBLIC-KEY CRYPTOGRAPHY PROCESSOR ARCHITECTURES	5
3.1	GF(2m) Dividers	5
3.2	Montgomery Modular Multipliers	6
3.3	PKC Processor	10
4	ORGANIZATION OF THE DISSERTATION	11
PRELIMINARY OF MODULAR ARITHMETIC AND PUBLIC-KEY CRYPTOGRAPHY	12
1	MODULAR ARITHMETIC	12
1.1	Montgomery Modular Multiplication	13
1.2	Word-based Montgomery Modular Multiplication	15
1.3	Modular Inversion/Division	18
2	PUBLIC-KEY CRYPTOGRAPHY	20
2.1	RSA	21
2.2	ECC	23
DESIGN OF HIGH-SPEED ITERATIVE DIVIDERS IN BINARY EXTENSION FIELDS	28
1	PROPOSED DIVISION ALGORITHM	28
2	DEVELOPED ITERATIVE DIVIDERS	33
2.1	Base Cell Design	33
2.2	Semi-Systolic Array Design	36
2.3	Bit-Serial Systolic Array Design	36
3	PERFORMANCE ANALYSIS AND IMPLEMENTATION RESULTS	40
4	FLEXIBLE GF(2M) DIVIDER DESIGN FOR CRYPTOGRAPHIC APPLICATIONS	46
4.1	Proposed SIMD/SISD Divider Architecture	47
4.2	Complexity Analysis and Implementation Results	52
5	SUMMARY	54
WORD-BASED MONTGOMERY MODULAR MULTIPLICATION ALGORITHM FOR LOW-LATENCY SCALABLE ARCHITECTURES	55
1	PROPOSED WORD-BASED MONTGOMERY MODULAR MULTIPLICATION ALGORITHM	56
1.1	Dependency Relaxation	56
1.2	Operand Reduction	57
1.3	New Word-Based Montgomery Modular Multiplication Algorithm	60
1.4	Scheduling and Performance Estimation	61
2	COMPLEXITY ANALYSIS AND EXPERIMENTAL RESULTS	65
2.1	Hardware Design	65
2.2	Complexity Analysis and Comparison	67
2.3	Experimental Results	71
3	FAST SCALABLE RADIX-4 MONTGOMERY MODULAR MULTIPLIER	74
3.1	Proposed Word-based Radix-4 Montgomery Modular Multiplication Algorithm	75
3.2	Scheduling, Hardware Architecture and Implementation Results	78
4	SUMMARY	84
SCALABLE MONTGOMERY MODULAR MULTIPLICATION ARCHITECTURE WITH LOW LATENCY AND LOW MEMORY BANDWIDTH REQUIREMENT	85
1	PROPOSED WORD-BASED MONTGOMERY MODULAR MULTIPLICATION ALGORITHM	86
1.1	Data Dependency Relaxation for Low Memory Bandwidth Requirement	86
1.2	Proposed Word-Based Montgomery Modular Multiplication Algorithm	89
1.3	Scheduling and Performance Estimation	91
2	HARDWARE ARCHITECTURE AND EXPERIMENTAL RESULTS	95
2.1	Hardware Design	95
2.2	Experimental Results and Comparisons	97
3	SUMMARY	102
HIGH-PERFORMANCE PUBLIC-KEY CRYPTOGRAPHY PROCESSOR WITH RECONFIGURABLE PROCESS ELEMENT ARRAY	103
1	PROPOSED WORD-BASED ALGORITHM FOR MONTGOMERY MODULAR MULTIPLICATION	104
1.1	Proposed Word-based Radix-2 Parallelized Montgomery Modular Multiplication Algorithm	104
1.2	Latency of Montgomery Modular Multiplication	108
2	PKC PROCESSOR ARCHITECTURE	110
2.1	Functional Units	111
2.2	Scheduling and Latency for Cryptographic Functions	115
2.3	Local Memory	119
3	IMPLEMENTATION RESULT AND COMPARISON	120
4	SUMMARY	125
CONCLUSION AND FUTURE WORK	126
1	CONCLUSION	126
2	FUTURE WORK	129
BIBLIOGRAPHY	131
                                    

[1] R. Rivest, A. Shamir, and L. Adleman, “A method for obtaining digital signatures and public-key cryptosystems,” Commun. ACM, vol. 21, pp. 120-126, Feb. 1978.
[2] N. Koblitz, “Elliptic curve cryptosystems,” Proc. Math. Computation, vol. 48, pp. 203–209, 1987.
[3] V. S. Miller, “Use of elliptic curve in cryptography,” Proc. Adv. Cryptology (Crypto), ser. LNCS, vol. 218, pp. 417–426, 1986.
[4] W. Diffie and M. E. Hellman, “New directions in cryptography,” IEEE Trans. Inf. Theory, vol. IT-22, no. 6, pp. 644–654, Nov. 1976.
[5] IEEE Standard Specifications for Public-Key Cryptography, IEEE Std 1363–2000, 2000.
[6] Certicom Corporation, The Basics of ECC [online]. Available: http://www.certicom.com/index.php/the-basics-of-ecc/4-content-protection/104-the-basics-of-ecc.
[7] H. Eberle, N. Gura, S. C. Shantz, V. Gupta, L. Rarick, and S. Sundaram, "A public-key cryptographic processor for RSA and ECC," Proc. IEEE Int. Conf. Appl.-Specific Syst., Arch. Processors, Sep. 2004, pp. 98-110.
[8] J. Goodman and A. P. Chandrakasan, “An energy-efficient reconfigurable public-key cryptography processor,” IEEE J. Solid-State Circuits, vol. 36, no. 11, pp. 1808–1820, Nov. 2001.
[9] M.C. Sun, C.P. Su, C.T. Huang, and C.W. Wu, “Design of a scalable RSA and ECC crypto-processor,” in Proc. IEEE Asia South Pacific Design Autom. Conf., Jan. 2003, pp. 495–498.
[10] X. Zeng, C. Chen, and Q. Zhang, “A reconfigurable public-key cryptography coprocessor,” in Proc. IEEE Asia Pacific Conf. Adv. Syst. Integr. Circuits, Aug. 2004, pp. 172–175.
[11] N. Smyth, M. McLoone, and J. V. McCanny, “An adaptable and scalable asymmetric cryptographic processor,” in Proc. IEEE Int. Conf. Appl.-Specific Syst. Arch. Processors, Sep. 2006, pp. 341–346.
[12] J.H. Chen, M.D. Shieh, and W.C. Lin, “A high-performance unified-field reconfigurable cryptographic processor,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no. 8, pp. 1145-1158, Aug. 2010.
[13] R. Cheung, N. Telle, W. Luk, and P. Cheung, “Customizable elliptic curve cryptosystems,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vo. 13, no. 9, pp. 1048-1059, Sep. 2005.
[14] K. Sakiyama, L. Batina, B. Preneel, and I. Verbauwhede, “Multicore curve-based cryptoprocessor with reconfigurable modular arithmetic logic units over GF(2n),” IEEE Trans. Computers, vol. 56, no. 9, pp. 1269-1282, Sep. 2007.
[15] B. Ansari and M. Hasan, “High-performance architecture for elliptic curve scalar multiplication,” IEEE Trans. Computers, vol. 57, no. 11, pp. 1443-1453, Nov. 2008.
[16] G. Orlando and C. Paar, “A scalable GF(p) elliptic curve processor architecture for programmable hardware,” Proc. Cryptographic Hardware Embedded Syst.. ser. LNCS, vol. 2162, pp.348-363, 2001.
[17] C. J. McIvor, M. McLoone, and J. V. McCanny, “Hardware elliptic curve cryptography processor over GF(p),” IEEE Trans. Circuits Syst. I: Fundam. Theory Appl., vol. 53, no. 9, pp. 1946-1987, Sep. 2006.
[18] G. Chen, G. Bai, and H. Chen, “A high-performance elliptic curve cryptographic processor for general curves over GF(p) based on a systolic arithmetic unit,” IEEE Trans. Circuits Syst. II: Express Briefs, vol. 54, no. 5, pp.412-416, May 2007.
[19] A. Satoh and K. Takano, “A scalable dual-field elliptic curve cryptographic processor,” IEEE Trans. Comput., vol. 52, no. 4, pp. 449-460, Apr. 2003.
[20] K. Sakiyama, E. De Mulder, B. Preneel, and I. Verbauwhede, "A parallel processing hardware architecture for elliptic curve cryptosystems," Proc. IEEE Int. Conf. Acoust., Speech Signal Process., May 2006, pp. 904–907.
[21] J.Y. Lai and C.Y. Huang, “Elixir: high-throughput cost-effective dual-field processors and the design framework for elliptic curve cryptography,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 16, no. 11, pp. 1567-1580, Nov. 2008.
[22] J.Y. Lai and C.T. Hung, “A highly efficient cipher processor for dual-field elliptic curve cryptography,” IEEE Trans. Circuits Syst. II, Expr. Briefs, vol. 56, no. 5, pp. 394-398, May 2009.
[23] C. McIvor, M. McLoone, and J.V. McCanny, “Modified Montgomery modular multiplication and RSA exponentiation techniques,” IEE Proc.-Comp. Digit. Tech., vol. 151, no. 6, pp. 402-408. Nov. 2004.
[24] M.D. Shieh, J.H. Chen, H.S. Wu, and W.C. Lin, “A new modular exponentiation architecture for efficient design of RSA Cryptosystem,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vo. 16, no. 9, pp. 1151-1161, Sep. 2008.
[25] A. Miyamoto, N. Homma, T. Aoki, and A. Satoh, “Systematic design of RSA processors based on high-radix Montgomery multipliers,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vo. 19, no. 7, pp. 1136-1146, Jul. 2011.
[26] D. E. Knuth, Seminumerical Algorithms, the Art of Computer Programming. Reading, MA: Addison-Wesley, 1981, vol. 2.
[27] J. Stein, “Computational problems associated with racah algebra,” J. Comput. Phys., vol. 1, pp. 397–405, 1967.
[28] H. Brunner, A. Curiger, and M. Hofstetter, “On computing multiplicative inverses in GF(2m),” IEEE Trans. Computers, vol. 42, pp. 1010-1015, Aug. 1993.
[29] J.H. Guo and C.L. Wang, “Systolic array implementation of Euclid’s algorithm for inversion and division in GF(2m),” IEEE Trans. Computers, vol. 47, pp. 1161-1167, Oct. 1998.
[30] J.H. Guo and C.L. Wang, “Bit-serial systolic array implementation of Euclid’s algorithm for inversion and division in GF(2m),” Proc. Technical Papers, Int. Symp. VLSI Tech., Syst., Appl., Jun. 1997, pp. 113-117.
[31] A.K. Daneshbeh, M.A. Hasan, “A class of unidirectional bit serial systolic architectures for multiplicative inversion and division over GF(2m),” IEEE Trans. Computers, vol. 54, pp. 370-380, Mar. 2005.
[32] Z. Yan and D. V. Sarwate, “Systolic architectures for finite field inversion and division,” Proc. IEEE Int. Symp. Circuits Syst., pp. 789-792, 2002.
[33] Z. Yan and D. V. Sarwate, “New systolic architectures for inversion and division in GF(2m),” IEEE Trans. Computers, vol. 52, pp. 1514-1519, Nov. 2003.
[34] Z. Yan, D. V. Sarwate, and Z. Liu, “Hardware-efficient systolic architectures for inversion in GF(2m),” Proc. IEE Inf. Security, vol. 152, pp. 31-45, Oct. 2005.
[35] Y. Watanabe, N. Takagi, and K. Takagi, “A VLSI Algorithm for division in GF(2m) based on extended binary GCD algorithm,” IEICE Trans. Fund., vol. E85-A, No. 5, May 2002.
[36] C.H. Kim and C.Y. Hong, “High-speed division architecture for GF(2m), ” IEE Electronics Letters, vol. 38, pp. 835-836, Jul. 2002.
[37] C.H. Wu, C.M. Wu, M.D. Shieh, and Y.T. Hwang, “High-speed, low-complexity systolic designs of novel iterative division Algorithms in GF(2m),” IEEE Trans. Computers, vol. 53, pp. 375-380, Mar. 2004.
[38] C. H. Kim, S. Kwon, C. P. Hong and I. G. Nam, “Efficient bit-serial systolic array for dvision over GF(2m),” Proc. IEEE Int. Symp. Circuits Syst., May 2003, pp. 252-255.
[39] P. L. Montgomery, “Modular multiplication without trial division,” Math. Comput., vol. 44, no. 170, pp. 519–521, Apr. 1985.
[40] S.K. Jain, L. Song, and K.K. Parhi, “Efficient semisystolic architectures for finite-field arithmetic,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,vol. 6, pp.101-113, Mar. 1998.
[41] C.L. Wang and J.L. Lin, “Systolic array implementation of multipliers for finite fields GF(2m),” IEEE Trans. Circuits Syst., vol. 38, pp. 796-800, Jul. 1991.
[42] M. Huang, K. Gaj, S. Kwon, and T. El-Ghazawi, “An optimized hardware architecture for Montgomery multiplication algorithm,” Proc. Public Key Cryptography, ser. LNCS, vol. 4939, pp. 214-228, 2008.
[43] A. F. Tenca and C. K. Koc, “A scalable architecture for modular multiplication based on Montgomery’s algorithm,” IEEE Trans. Computers, vol. 52, no. 9, pp. 1215–1221, Sep. 2003.
[44] D. Harris, R. Krishnamurthy, S. Mathew, and S. Hsu, “An improved unified scalable radix-2 Montgomery multiplier,” Proc. IEEE Symp. Comput. Arith., Jun. 2005, pp. 1196–1200.
[45] C. D. Walter, “Montgomery exponentiation needs no final subtractions,” Electronics Letters, vol. 32, no. 21, pp. 1831–1832, Oct. 1999.
[46] E. Savas, A. F. Tenca, and G. Koc, “A scalable and unified multiplier architecture for finite fields GF(p) and GF(2m),” Proc. Cryptographic Hardware Embedded Syst, ser. LNCS, vol. 1965, pp. 277-292, 2000.
[47] H. Orup, “Simplifying quotient determination in high-radix modular multiplication,” Proc. 12th IEEE Symp. Comput. Arith., Jul. 1995, pp. 193-199.
[48] T. Itoh and S. Tsujii, “A fast algorithm for computing multiplicative inverses in GF(2m) using normal basis,” Inf. Comput., vol. 78, pp. 171-177, 1988.
[49] B. Schneier, Applied Cryptography: Protocols, Algorithms, and Source Code in C, 2nd ed. New York: Wiley, 1996.
[50] A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone, Handbook of Applied Cryptography. Boca Raton, FL: CRC Press, 1997.
[51] W. Stallings, Cryptography and Network Security: Principles and Practice, 5th ed. Prentice Hall, 2010
[52] U. S. Department of Commerce, Washington, DC, “National Encryption Standard,” 1988.
[53] Federal Information Processing Standards Publication 197, “Advanced Encryption Standard (AES),” 2001.
[54] M. Joye and S.M. Yen, “The Montgomery powering ladder,” Proc. Cryptographic Hardware Embedded Syst., ser. LNCS, vol. 2523, pp.291-302, 2003.
[55] P. Montgomery, "Speeding the Pollard and elliptic curve methods of factorization", Math. Comput., vol 48, pp. 243-264, 1987.
[56] J. Lόpez and R. Dahab, “Improved algorithms for elliptic curve arithmetic in GF(2n),” Proc. Selected Areas in Cryptography, ser. LNCS, vol. 2551, pp. 201-212, 1999.
[57] J. Lόpez and R. Dahab, “Fast multiplication on elliptic curves over GF(2m) without precomputation,” Proc. Cryptographic Hardware and Embedded Syst, ser. LNCS, vol. 1717, pp. 316-327, 1999.
[58] K.K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation, John Wiley & Sons Inc., 1999.
[59] S. Kwon, C. H. Kim, and C. P. Hong, “A systolic multiplier with LSB first algorithm over GF(2m) which is as efficient as MSB first algorithm,” Proc. IEEE Int. Sym. Circuits Syst., May 2003, pp. 633-636.
[60] M.D. Shieh, J.H. Chen, W.C. Lin, and C.M. Wu, "An efficient multiplier/divider design for elliptic curve cryptosystem over GF(2m)," J. Inf. Sci. Eng, vol. 25, no. 5., pp. 1555-1573, Sep. 2009.
[61] J. Coron, “Resistance against differential power analysis for elliptic curve cryptosystems,” Proc. Cryptographic Hardware Embedded Syst, ser. LNCS, vol. 1717, pp. 292-302, 1999.
[62] R. Dennard, F. Gaensslen, H.N. Yu, V. Rideout, E. Bassous, and A, Leblanc, “Design of ion-implanted MOSFET’s with very small physical dimensions,” IEEE J. Solid-State Circuits, vol. sc-9, no. 5, pp. 256–268, Oct. 1974.
[63] W. M. Lim and M. Benaissa, "Design space exploration of a hardware-software co-designed GF(2m) Galois field processor for forward error correction and cryptography," Proc. 1st IEEE/ACM/IFIP Int. conf. Hardware/Software Codesign Syst. Synthesis, Oct. 2003, pp. 53-58.
[64] J.H. Guo and C.L. Wang, “Novel digit-serial systolic array implementation of Euclid’s algorithm for division in GF(2m),” Proc. IEEE Int. Symp. Circuits Syst., May 1998, pp. 478-481.
[65] A. D. Booth, “A signed binary multiplication technique,” Q. J. Mech. Appl. Math., vol. 4, no.2, pp.236-240, 1951.
[66] A. Tenca and L. Tawalbeh, “An efficient and scalable radix-4 modular multiplier design using recoding techniques,” Proc. IEEE Asilomar Conf. Signals, Syst., and Computers, Nov. 2003, pp.1445-1450.
[67] N. Pinckney, P. Amberg, and D. Harris, “Parallelized booth-encoded radix-4 Montgomery multipliers,” Proc. 16th IFIP/IEEE Int. Conf. Very Large Scale Integr. (VLSI), Oct. 2008, pp. 1-6.
[68] M. Huang, K. Gaj, S. Kwon, and T. El-Ghazawi, “New hardware architectures for Montgomery modular multiplication algorithm,” IEEE Trans. Computers, vol. 60, no. 7, pp.923-935, Jul. 2011.
[69] N. Pinckney and D. Harris, “Parallelized radix-4 scalable Montgomery multipliers,” J. Integrated Circuits Syst., vol. 3, no. 1, pp. 39-45, Mar. 2008.

2017-08-31公開

簡易檢索 / 詳目顯示

相關論文