| 研究生: |
陳俊宏 Chen, Jun-Hong |
|---|---|
| 論文名稱: |
高效能公開金鑰之積體電路架構研究 Exploration of High-Performance VLSI Architectures for Public-Key Cryptosystems |
| 指導教授: |
謝明得
Shieh, Ming-Der |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2008 |
| 畢業學年度: | 96 |
| 語文別: | 英文 |
| 論文頁數: | 140 |
| 中文關鍵詞: | 公開金鑰 、可重組化密碼處理器 、橢圓曲線密碼系統 、積體電路架構 、有限場 、模數乘法 、模數指數 |
| 外文關鍵詞: | reconfigurable cryptographic processor, VLSI architecture, public-key cryptography, modular multiplication, elliptic curve cryptosystem, finite field, modular exponentiation |
| 相關次數: | 點閱:141 下載:4 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於近年來網際網路的盛行,使得如何確保傳送資料之安全性已成為重要的課題,尤其在行動數位化的今天,個人資料即時保密更為重要。本論文主要針對此一問題,提出四項技術以提高密碼演算法或密碼電路的運算效能,並且基於這些技術,我們提出高效能且具有演算法彈性的密碼處理器。
首先,觀察指數餘數運算的特性並佐以數學上的推導,我們提出了一個統合乘法與平方的模組,這個方法能夠減少進位儲存加法器(carry-save adder: CSA)樹中運算元的個數。我們所提出的架構具有下列幾項優點:(i)在每一次的模數乘法之間不需要將進位儲存型式的運算元轉換為二進位型式,因此,除了最後在模數指數運算中需要做一次轉換以得到結果,其餘的情況下皆可省去因轉換所需耗費的時間;(ii)以一種非常有效率的方式減少CSA樹中運算元個數;(iii)可用一種節省硬體且不會對最長延遲路徑造成太大影響的方式來設計乘法與平方的元件; (iv)藉由所提出的模數簡化方法,可以在不增加太多硬體成本的前提下,有效地提升運算速度。實驗結果證明本論文所提出的模數指數設計不只擁有最低的硬體複雜度,同時,從面積-時間複雜度的角度來看,也擁有最傑出的表現。
其次,我們研究如何減化傳統模數乘法中的乘法、商數預估、模數化簡三者間存在的資料相依問題,因此我們提出一個快速的模數乘法演算法與電路設計,此設計之所以有速度上的提升,主要是因為電路能同時的做乘法與模數化簡的運算,而這兩運算之間並不存在立即的相依性,因此最長延遲時間從4-to-2 CSA降成了3-to-2 CSA,所以電路在計算模數乘法的時間複雜度分析,能夠具有顯著的進步。實驗結果證實,在速度上的比較,我們所提出的高速模數乘法電路能夠較目前的研究具有速度上的優勢,應用到模數指數運算上,以時間複雜度與面積-時間複雜度的角度來看,也擁有最傑出的表現。
第三種技術是提出高速與低複雜度的有限場Montgomery倒數演算法與電路,我們改良演算法中不易實現硬體的問題加以改進,改良後不僅易於硬體實現,且硬體結構也相當簡單,因此具有低複雜度的特性。實驗結果證明,我們所提出的設計,以時間面積複雜度與時間複雜度的角度來看,皆有較好的表現。
第四個技術是改良演算法的方式,將有有限場乘法器演算法成功的實現在 Stein的倒數演算法中,如此一來只要在初始時將運算元做適當的選擇,此單一電路即可做乘法或除法的運算,如此加入少許的選擇運算元電路,乘法器能夠有效率且巧妙的嵌入除法電路之中。將乘法除法電路應用在仿射座標系統的橢圓曲線密碼系統上,能在無速度上的損失的前提下,與一個乘法一個除法的電路做比較,面積能夠減少約13.8%。實驗結果證實,與目前低複雜度的橢圓曲線密碼系統研究做比較,面積減少了約12.7%,因此能提供面積-時間複雜度的優勢。
最後,我們基於之前的研究,並使用微指令架構來設計,發展出能操作於GF(p)與GF(2m)的可重組化密碼處理器,此一可重組化運算電路是使用前幾章節提出的技術來設計出高效能與低複雜度的架構,並使用微指令架構設計,將最佳化過的密碼演算法以撰寫程式的方法撰寫入密碼處理器當中,如此一來即能夠不需要修改硬體設計之下,增加或修改使用的密碼演算法。實驗結果證明我們所實現的密碼處理器不僅針對密碼演算法有著彈性的設計、高硬體使用率、也有著優異的效能。
With the explosion of electronic data communication and computer network, how to ensure the security of transmitted data has become an important topic in current research. This dissertation proposes four techniques to enhance the performance of cryptographic algorithms/architectures as well as their underlying arithmetic circuits for public-key cryptosystems. With the proposed techniques, a high-performance flexible cryptographic processor is also developed to demonstrate the effectiveness of our work.
First, we present a new modular exponentiation architecture with a unified modular multiplication/square (UMS) module and show how to reduce the number of input operands for the CSA tree by mathematical manipulation. The developed architecture has the following advantages. (i) There is no need to convert the carry-save form of an operand into its binary representation at the end of each modular multiplication. In this way, except the final step to get the result of modular exponentiation, the time-consuming carry propagation can then be eliminated. (ii) The number of input operands for the CSA tree is reduced in a very efficient way. (iii) The hardware saving is achieved with very limited impact on the original critical path delay when designed with two distinct modular multiplication and square components. Experimental results show that our modular exponentiation design obtains the least hardware complexity compared with the existing work and outperforms them in terms of area-time (AT) complexity as well.
Second, we first explore how to relax the data dependency existing among the multiplication, quotient determination, and modular reduction in the conventional Montgomery modular multiplication algorithm. Then we propose a new modular multiplication algorithm for high-speed hardware design. The speed improvement is achieved by reducing the critical path delay from the 4-to-2 to 3-to-2 carry-save addition, and the resulting time complexity of our development is also decreased by simultaneously performing the multiplication and modular reduction processes. Experimental results show that the developed modular multiplication can operate in higher speed than the related work. Moreover, when applying the proposed modular multiplication to the modular exponentiation, we also obtain both time and area-time (AT) advantages compared with the existing work.
Third, we consider a fundamental change at the algorithmic level and eliminate the potential problems in hardware implementation which makes the resulting modified Montgomery inverse algorithm over GF(2m) very suitable for hardware realization. Due to its structural simplicity, the modified algorithm can be easily mapped onto a high-speed and possibly low-complexity circuit. Experimental results show that our development can achieve both the area and speed advantages over the previous work
Fourth, we show that a field multiplication over GF(2m) can be implemented by extended Stein’s algorithm, one of the algorithms used to accomplish division. In this way, a field multiplier can be efficiently embedded into a divider with very little hardware overhead for operand selection based on a fundamental change at the algorithmic level. When applying the developed combined multiplication and division (CMD) algorithm to Elliptic Curve Cryptography (ECC) using affine coordinates, we achieve about 13.8% reduction on the area requirement with almost no performance degradation compared to the one implemented with two distinct components. Experimental results also demonstrate that not only our CMD circuit has the area advantage (up to 12.7%) in comparison with other low-cost design but also the resulting area-efficient design of ECC system exhibits considerable improvement on the area-time (AT) complexity of previous work.
Flexible field arithmetic processors are emerging as viable design alternatives for implementing a wide range of cryptography applications. Finally, we propose a microcode-based architecture and a novel reconfigurable datapath which can perform either prime field GF(p) operations or binary extension field GF(2m) operations for arbitrary prime numbers, irreducible polynomials, and precision. Using these field arithmetic, users are capable of programming cryptographic algorithms in microcode sequences for full compliance with majority of public-key cryptographic algorithms such as RSA and elliptic curve cryptosystems. An algorithm optimization or refinement can be made at a higher level based on the reconfigurable datapath. Experimental results show that the developed processor has full cryptography algorithm flexibility, high hardware utilization, and excellent performance.
[1] W. Diffie and M.E. Hellman, “New directions in cryptography,” IEEE Trans. Inf. Theory, vol. IT-22, no. 6, pp. 644-654, Nov. 1976.
[2] IEEE Standard Specifications for Public-Key Cryptography, IEEE Std 1363-2000, Jan. 2000.
[3] R. Rivest, A. Shamir, and L. Adleman, “A method for obtaining digital signatures and public-key cryptosystems,” Commun. ACM, vol. 21, no. 2, pp. 120-126, Feb. 1978.
[4] T. ElGamal, "A public key cryptosystem and a signature scheme based on discrete logarithms," IEEE Trans. Inf. Theory, vol. 31, no. 4, pp. 469-472, 1985.
[5] Proposed Federal Information Processing Standard for Digital Signature Standard (DSS), Federal Register, vol. 56, pp. 42980-42982, Aug. 1991.
[6] Recommended Elliptic Curves for Government Use, NIST, Available: http://csrc.nist.gov/CryptoToolkit/dss/ecdsa/NISTReCur.pdf.
[7] SEC 2: Recommended Elliptic Curve Domain Parameters, SECG, Available: www.secg.org/collateral/sec2_final.pdf
[8] Information Technology – Security Techniques – Digital Signatures with Appendix – Part 3: Certificate Based-Mechanisms, ISO/IEC 14888-3, 1998.
[9] Public Key Cryptography for the Financial Services Industry: The Elliptic Curve Digital Signature Algorithm (ECDSA), ANSI X9.38, 1999.
[10] Information Technology – Security Techniques – Cryptographic Techniques Based on Elliptic Curves – Part 4: Digital signatures giving message recovery, ISO/IEC 15946-4, 2004.
[11] Public Key Cryptography for the Financial Services Industry: Key Agreement and Key Transport Using Elliptic Curve Cryptography. Working Draft, ANSIX9.63-2001, May, 2001.
[12] Certicom Corporation, The Basics of ECC 2006 [Online], Available: http://www.certicom.com/index.php?action=res,ecc_faq.
[13] Digital Signature Standard, NIST, FIPS Publication 186-2, Feb. 2000.
[14] P.L. Montgomery, “Modular multiplication without trial division,” Math. Comput., vol. 44, no. 170, pp. 519-521, Apr. 1985.
[15] R.C.C. Cheung, N.J. Telle, W. Luk, and P.Y.K. Cheung, “Customizable elliptic curve cryptosystems,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 9, pp.1048-1059, Sep. 2005.
[16] Y. Eslami, A. Sheikholeslami, P.G. Gulak, S. Masui, and K. Mukaida, “An area-efficient universal cryptography processor for smart cards,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 1, pp. 43-56, Jan. 2006.
[17] P.H.W. Leong and I.K.H Leung, “A microcoded elliptic curve processor using FPGA technology,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 10, no. 5, pp. 550-559, Oct. 2002.
[18] M. Bednara, M. Daldrup, J. Gathen, J. Shokrollahi, and J. Teich, "Reconfigurable implementation of elliptic curve crypto algorithms," Proc. IEEE Int. Parallel Distributed Processing Symp., Apr. 2002, pp. 157-164.
[19] N.A. Saqib, F. Rodriguez-Henriquez, and A. Diaz-Perez, "A parallel architecture for fast computation of elliptic curve scalar multiplication over GF (2m)," Proc. IEEE Int. Parallel Distributed Processing Symp., Apr. 2004, pp. 144-151.
[20] M. Morales-Sandoval and C. Feregrino-Uribe, "On the hardware design of an elliptic curve cryptosystem," Proc. 5th IEEE Mexican Int. Conf., 2004, pp. 64-70.
[21] S.B. Wicker, Error Control Systems for Digital Communication and Storage, Englewood Cliffs, NJ: Prentice-Hall, 1995.
[22] E. Savas and C.K. Koc, “The Montgomery modular inverse-revisited”, IEEE Trans. Comput., vol. 49, no. 7, pp. 763-766, July 2000.
[23] E. Savas and C.K. Koc, “Architectures for unified field inversion with applications in elliptic curve cryptography”, Proc. IEEE Conf. Electron., Circuits Syst., Sep. 2002, vol. 3, pp. 1155-1158.
[24] C.D. Walter, “Systolic modular multiplication,” IEEE Trans. Comput., vol.42, no. 3, pp. 376-378, Mar. 1993.
[25] T.W. Kwon, C.S. You, W.S. Heo, Y.K. Kang, and J.R. Choi, “Two implementation methods of a 1024-bit RSA cryptoprocessor based on modified Montgomery algorithm,” Proc. IEEE Int. Symp. Circuits Syst., May 2001, vol. 4, pp. 650-653.
[26] A. Cilardo, A. Mazzeo, L. Romano, and G.P. Saggese, “Carry-save Montgomery modular exponentiation on reconfigurable hardware,” Proc. Design, Autom. Test Eur, Feb. 2004, vol. 3, pp. 206-211.
[27] C. McIvor, M. McLoone, and J.V. McCanny, “Fast Montgomery modular multiplication and RSA cryptographic processor architectures,” Proc. 37th Asilomar Conf. Signals, Syst. Comput., Nov. 2003, vol. 1, pp. 379-384.
[28] K. Manochehri and S. Pourmozafari, “Fast Montgomery modular multiplication by pipelined CSA architecture,” Proc. IEEE Int. Conf. Microw., Dec. 2004, pp. 144-147.
[29] C. McIvor, M. McLoone, and J.V. McCanny, “Modified Montgomery modular multiplication and RSA exponentiation techniques,” IEE Proc.-Comp. Digit. Tech., vol. 151, no. 6, pp. 402-408. Nov. 2004.
[30] C.C. Yang, T.S. Chang, and C.W. Jen, “A new RSA cryptosystem hardware design based on Montgomery’s algorithm,” IEEE Trans. Circuits Syst. II , Exp. Briefs, vol. 45, no. 7, pp. 908-913, July 1998.
[31] A. Daly and W. Marnane, “Efficient architectures for implementing Montgomery modular multiplication and RSA modular exponentiation on reconfigurable logic,” Proc. ACM/SIGDA 10th Int. Symp. FPGAs, Feb. 2002, pp. 40-49.
[32] Q. Liu, F. Ma, D. Tong, and X. Cheng, “A regular parallel RSA processor,” Proc. IEEE Int. Midwest Symp. Circuits Syst., July 2004, vol. 3, pp. iii-467-470.
[33] A.P. Fournaris and O. Koufopavlou, “A new RSA encryption architecture and hardware implementation based on optimized Montgomery multiplication,” Proc. IEEE Int. Symp. Circuits Syst., May 2005, vol. 5, pp. 4645-4648.
[34] J.H. Hong, and C.W. Wu, “Cellular-array modular Multiplier for fast RSA public-key cryptosystem based on modified booth’s algorithm,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 11, no. 3, pp. 474-484, June 2003.
[35] D.E. Knuth, Seminumerical Algorithms, the Art of Computer Programming, vol. 2, Reading, MA: Addison-Wesley, 1981.
[36] N. Takagi, J. Yoshiki, and K. Takagi, “A fast algorithm for multiplicative inversion in GF(2m) using normal basis”, IEEE Trans. Comput., vol. 42, no. 9, pp. 1141-1146, Sep. 1993.
[37] R.P. Brent and H.T. Kung, “Systolic VLSI arrays for polynomial GCD computation”, IEEE Trans. Comput., vol. 33, no. 8, pp. 731-736, Aug. 1984.
[38] J. Stein, “Computational problems associated with Racah algebra,” J. Comput. Phys., vol. 1, pp. 397-405, 1967.
[39] K. Araki, I. Fujita, and M. Morisue, “Fast inverter over finite field based on Euclid’s algorithm”, IEICE Trans Fundam., vol. E72, pp. 1230-1234, Nov. 1989.
[40] H. Brunner, A. Curiger, and M. Hofstetter, “On computing multiplicative inverses in GF(2m),” IEEE Trans. Comput., vol. 42, no. 8, pp. 1010-1015, Aug. 1993.
[41] J.H. Guo and C.L. Wang, “Hardware-efficient systolic architecture for inversion and division in GF(2m)”, IEE Proc.-Comp. Digit. Tech., vol. 145, no. 4, pp. 272-278, July 1998.
[42] Y. Watanabe, N. Takagi, and K. Takagi, “A VLSI algorithm for division in GF(2m) based on extended binary GCD algorithm”, IEICE Trans Fundam, vol. E85-A, pp. 994-999, May 2002.
[43] C.H. Wu, C.M. Wu, M.D. Shieh, and Y.T. Hwang, “High-speed, low-complexity systolic designs of novel iterative division algorithms in GF(2m),” IEEE Trans. Comput., vol. 53, no. 3, pp. 375-380, Mar. 2004.
[44] Artisan Components, “TSMC 0.13 μm (CL013G) Process 1.2-Volt SAGE-XTM Standard Cell Library Databook,” Jan. 2004.
[45] N. Nedjah and L.M. Mourelle, “Three hardware architectures for the binary modular exponentiation: sequential, parallel, and systolic,” IEEE Trans. Circuits Syst. I , Reg. Papers, vol. 53, no. 3, pp. 627–633, Mar. 2006.
[46] P.S. Chen, S.A. Hwang, and C.W. Wu, “A systolic RSA public key cryptosystem,” Proc. IEEE Int. Symp. Circuits Syst., May 1996, pp. 408-411.
[47] A.A.A. Gutub and A.F. Tenca, “Efficient scalable hardware architecture for Montgomery inverse computation in GF(p)”, IEEE Workshop Signal Process. Syst., Aug. 2003, pp. 93-98.
[48] D.I. Moldovan and J.A.B. Fortes, “Partitioning and mapping algorithms into fixed size systolic arrays,” IEEE Trans. Comput., vol. 35, no. 1, pp. 1-12, Jan. 1986.
[49] S.E. Eldridge and C.D. Walter, “Hardware implementation of Montgomery’s modular multiplication algorithm,” IEEE Trans. Comput., vol. 42, no. 6, pp. 693-699, June 1993.
[50] T. Itoh and S. Tsujii, "A fast algorithm for computing multiplicative inverses in GF(2m) using normal basis," Inform. Comput. vol. 78, pp.171-177, 1988.
[51] C.L. Wang and J.L. Lin, “Systolic array implementation of multipliers for finite fields GF(2m),” IEEE Trans. Circuits Syst. II , Exp. Briefs, vol. 38, no. 7, pp. 796-800, July 1991.
[52] Artisan Components, “TSMC 0.18 μm Process 1.8-Volt SAGE-XTM Standard Cell Library Databook,” Sep. 2003.
[53] B. Parhami, Computer Arithmetic: Algorithms and Hardware Designs, New York: Oxford Univ. Press, 2000.
[54] D.R. Stinson, Cryptography: Theory and Practice, CRC Press, Inc., 1995.
[55] E. Al-Daoud, R. Mahmod, M. Rushdan, and A. Kilicman, “A new addition formula for elliptic curves over GF(2n)”, IEEE Trans. Comput., vol. 51, no. 8, Aug. 2002.
[56] C.K. Koc and T. Acar, “Montgomery multiplication in GF(2k)”, Designs, Codes and Cryptography, vol. 14, no. 1, pp. 57-69, 1998.
[57] W.W. Peterson and E.J. Weldon, Error-Correcting Codes, Cambridge, MA: MIT Press, 1972.
[58] S.K. Jain, L. Song, and K.K. Parhi, “Efficient semisystolic architectures for finite-field arithmetic,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 6, no. 1, pp.101-113, Mar. 1998.
[59] H. Wu, “Bit-parallel finite field multiplier and squarer using polynomial basis,” IEEE Trans. Comput., vol. 51, no. 7, pp. 750-758, July 2002.
[60] J.H. Kim and D.H. Lee, “A compact finite field processor over GF(2m) for elliptic curve cryptography,” Proc. IEEE Int. Symp. Circuits Syst., May 2002, pp. II-340-II-343.
[61] J. Park, J.T. Hwang, and Y.C. Kim, “FPGA and ASIC implementation of ECC processor for security on medical embedded system,” Proc. IEEE 3th Int. Conf. Inf. Technol. Appl., July 2005, pp. 547-551.
[62] J. Goodman and A.P. Chandrakasan, “An energy-efficient reconfigurable public-key cryptography processor,” IEEE J. Solid-State Circuits, vol. 36, no. 11, pp. 1808-1820, Nov. 2001.
[63] A. Satoh and K. Takano, "A scalable dual-field elliptic curve cryptographic processor," IEEE Trans. Comput., vol. 52, no. 4, pp. 449-460, Apr. 2003.
[64] C. O’Rourke and B. Sunar, “Achieving NTRU with Montgomery multiplication,” IEEE Trans. Comput., vol. 52, no. 4, Apr. 2003.
[65] M.C. Sun, C.P. Su, C.T. Huang, and C.W. Wu, “Design of a scalable RSA and ECC crypto-processor,” Proc. IEEE Asia South Pacific Design Autom. Conf., Jan. 2003, pp. 495-498.
[66] K. Sakiyama, L. Batina, B. Preneel, and I. Verbauwhede, “Multicore curve-based cryptoprocessor with reconfigurable modular arithmetic logic units over GF(2n),” IEEE Trans. Comput., vol. 56, no. 9, Sep. 2007.
[67] Y. Wang, D.L. Maskell, J. Leiwo, and T. Srikanthan, “Unified signed-digit number adder for RSA and ECC public-key cryptosystems,” Proc. IEEE Asia Pacific Conf. Circuits Syst., Dec. 2006, pp. 1655-1658.
[68] X. Zeng, C. Chen, and Q. Zhang, “A reconfigurable public-key cryptography coprocessor,” Proc. IEEE Asia Pacific Conf. Advanced Syst. Integr. Circuits, Aug. 2004, pp. 172-175.
[69] Y. Wang, J. Leiwo, and T. Srikanthan, “A unified architecture for crypto-processing in embedded systems,” Proc. IEEE Conf. Embedded Softw. and Syst., Dec. 2005, pp. 1-4.
[70] C.H. Kim and C.Y. Hong, “High-speed division architecture for GF(2m),”IEE Electron. Lett., vol. 38, pp. 835-836, July 2002