| 研究生: |
葉正濠 Ye, Jheng-Hao |
|---|---|
| 論文名稱: |
應用於安全雲端運算系統之硬體加速器積體電路架構與演算法設計 Development of Efficient VLSI Hardware Accelerators and Algorithm for Secure Cloud Computing |
| 指導教授: |
謝明得
Shieh, Ming-Der |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 英文 |
| 論文頁數: | 148 |
| 中文關鍵詞: | 雲端運算 、全同態加密 、大整數乘法 、數論轉換 、公開金鑰密碼 、模數乘法 、互斥和表示式 |
| 外文關鍵詞: | Cloud computing, fully homomorphic encryption (FHE), large integer multiplication, number theoretic transform (NTT), public-key cryptography, modular multiplication, Exclusive-or Sum-of-Products expression |
| 相關次數: | 點閱:132 下載:5 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
資訊安全隨著雲端運算(cloud computing)的快速成長變得越來越重要,全同態加密(fully homomorphic encryption)可在第三方伺服器中直接對密文進行運算以確保原始資料的隱密性,因而受到越來越多的關注。如何有效克服利用同態函數進行密文運算時所包含的各式高複雜度運算需求,如大整數乘法和降低雜訊等運算,是目前相當熱門的研究主題之一,本論文針對此問題,從三個面向提出不同的解決方案。首先,為了降低在密文運算所需的重加密(recryption)次數,本論文藉由明文邏輯運算與密文同態函數的對應關係,提出一個基於新的優化條件之有效率積之互斥和(exclusive-or sum-of-products)邏輯最佳化演算法來減少明文進行積運算的最高輸入數,進而降低密文同態乘法運算的階層數。其次,為了加速全同態加密的運算和減少所需的運算資源,本論文提出三種低複雜度且快速的大整數乘法器硬體架構設計,包含:(1)於記憶體式(memory-based)數論轉換架構中,透過減少等效運算元數量來降低高基底(high radix)蝴蝶運算單元(butterfly unit)的面積;(2)採用管線化-記憶體式(pipelined shared-memory)數論轉換架構;(3)針對區塊式(block-based)整數乘法運算提出有效率減少區塊乘積(block product)個數之做法,其可支援於不同全同態加密應用中所需之不同運算位元長度。以上架構皆採用單埠且區塊合併(single-port merged-bank)記憶體結構來設計數論轉換(number theoretic transform)和反數論轉換以進一步減少所需的記憶體面積,並配合所提出之一致性記憶體定址方法,可同時用於數論轉換、反數論轉換與進位處理運算中,達到降低位址產生器總面積之效益。實驗結果顯示,相較於其它文獻的大整數乘法器實現方式,採用本論文所提出之機制在於三種不同架構上皆可大幅降低所需的面積。最後,為了確保雲端伺服器和使用者之間的資料傳輸安全性,公開金鑰密碼(public-key cryptography)在無線通訊系統和網路服務中亦扮演越來越重要的角色。為克服傳統基於字元運算之蒙哥馬利(Montgomery)模數乘法器之高功率消耗缺點,本論文提出可舒緩資料相依性之技術,並將資料相依圖(dependency graph)切割成多個區塊來處理不同運算元的計算,之後配合有效率的映射至硬體架構方式,於一個區塊迭代(iteration)中讓每個處理單元可重複使用當下的字元變數,進而減少核心(kernel)管線化暫存器的切換邏輯閘次數與外部記憶體存取次數。實驗結果顯示,相較於其它文獻結果,本論文所提出的可擴充性模數乘法器具有較低的能源耗損優勢。
With the rapid increase in cloud computing applications, the security issue has become an important research topic nowadays. Fully homomorphic encryption (FHE) allows computations to be carried out directly on encrypted data for ensuring data privacy on untrusted servers. However, FHE demands extremely high computational complexity since large integer multiplication and noise reduction operation are needed for homomorphic evaluation on ciphertexts. In this dissertation, an efficient exclusive-or sum-of-products (ESOP) minimization algorithm based on a novel cost function is presented to simplify the evaluation function on plaintexts, which can relax the need of performing the recryption operation on ciphertexts accordingly. To speedup FHE operations and reduce resource requirements, three low-complexity VLSI architectures of large integer multiplication are presented. First, a novel and efficient operand reduction scheme is proposed to reduce the area requirement of high-radix butterfly units. We also extend the single-port, merged-bank memory structure to the design of the number theoretic transform (NTT) and the inverse NTT (INTT) for further minimizing area costs in both the memory-based and pipelined shared-memory NTT architectures. In addition, a unified memory addressing scheme is developed to support both NTT/INTT and resolving carries computations. Furthermore, an efficient block product reduction scheme for a block-based integer multiplication algorithm is presented. The developed block-based integer multiplier can support various precision of input operands for different FHE applications. Experimental results reveal that significant area reductions can be achieved for all the three architectures of large integer multiplication designed using the proposed schemes compared to related works. To ensure the security of data transmition between cloud servers and users, public-key cryptography (PKC) has gained increasing attention in wireless communication systems and Internet services for providing security features, such as confidentiality, authentication, data integrity, and non-repudiation. In this dissertation, a data dependency relaxation scheme is also proposed to reduce the power consumption of the conventional word-based Montgoemry modular multiplier. The dependency graph is first divided into a set of blocks for handling any precision of operands. Together with the developed architecture mapping scheme, the current words of variables for each processing element in a block interation can be efficiently reused so that the switching activity of pipelined registers and the number of memory access of the kernel in the mapped architecture are greatly reduced. Experimental results show that the proposed scalable architecture achieves significant energy reduction in comparison with related works.
[1] C. Gentry, “Fully homomorphic encryption using ideal lattices,” in Proc. 41st Annu. ACM Symp. Theory Comput., 2009, pp. 169-178.
[2] Federal Information Processing Standards Publication 186-3, “Digital Signature Standard,” 2009.
[3] M.V. Dijk, C. Gentry, S. Halevl, and V. Vaikuntanathan, “Fully homomorphic encryption over the integers,” in Proc. Annu. Int. Conf. Theory Appl. Cryptograph. Techn., 2010, pp. 24-43.
[4] Z. Brakerski, C. Gentry, and V. Vaikuntanathan, “(Leveled) fully homomorphic encryption without bootstrapping,” in Proc. 3rd Innovations in Theoretical Computer Science, 2012, pp. 309-325.
[5] C. Gentry and S. Halevl, “Implementating Gentry’s fully-homomorphic encryption scheme,” in Proc. Annu. Int. Conf. Theory Appl. Cryptograph. Techn., 2011, pp. 129-148.
[6] J.S Coron, D. Naccache, and M. Tibouchi, “Public key compression and modulus switching for fully homomorphic encryption over the integers,” in Proc. Annu. Int. Conf. Theory Appl. Cryptograph. Techn., 2012, pp. 446-464.
[7] A. Schönhage and V. Strassen, “Schnelle multiplication großer zahlen,” Computing, vol. 7, nos. 3-4, pp. 281-292, 1971.
[8] A. Karatsuba and Y. Ofman, “Multiplication of multidigit numbers on automata,” Soviet Physics Doklady, vol. 7, no. 7, pp. 595-596, Jan. 1963.
[9] A.L. Toom, “The complexity of a scheme of functional elements realizing the multiplication of integers,” Soviet Mathematics Doklady, vol. 3, no. 4, pp. 714-716, 1963.
[10] S.A. Cook, “On the minimum computation time of functions,” Ph.D. dissertation, Dept. Math., Harvard Univ., Cambridge, MA, USA, 1966.
[11] J. M. Pollard, “The fast Fourier transform in a finite field,” Mathematics of computation, vol. 25, no. 144, pp. 365-374, 1971.
[12] J. Solinas, “Generalized mersenne numbers,” Blekinge College Technol., Karlskrona, Sweden, Tech, Rep. 06/MI/006, 1999.
[13] W. Wang, Y. Hu, L. Chen, X. Huang, and B. Sunar, “Exploring the feasibility of fully homomorphic encryption,” IEEE Trans. Comput., vol. 64, no. 3, pp. 698-706, Mar. 2015.
[14] X. Cao, C. Moore, M. O’Neil, E. O’Sullivan, and N. Hanley, “Optimised multiplication architectures for accelerating fully homomorphic encryption,” IEEE Trans. Comput., vol. 65, no. 9, pp. 2794-2806, Sep. 2016.
[15] Y. Doroz, E. Ozturk, and B. Sunar, “Acceleration fully homomorphic encryption in hardware,” IEEE Trans. Comput., vol. 64, no. 6, pp. 1509-1521, Jun. 2015.
[16] Y. Doroz, E. Erdinc, and B. Sunar, “A million-bit multiplier architecture for fully homomorphic encryption,” ELSEVIER Journal of Microprocessors and Microsystems, vol. 38, no. 8, pp. 766-775, Nov. 2014.
[17] W. Wang, X. Huang, N. Emmart, and C. Weems, “VLSI design for a large-number multiplier for fully homomorphic encryption,” IEEE Trans. VLSI Syst., vol. 22, no. 9, pp. 1879-1887, Sep. 2014.
[18] X. Feng and S. Li, “Design for an area-efficient million-bit integer multiplier using double modulus NTT,” IEEE Trans. VLSI Syst., vol. 25, no. 9, pp. 2658-2662, Sep. 2017.
[19] H.F Luo, Y.J. Liu, and M.D. Shieh, “Efficient memory addressing algorithms for FFT processor design,” IEEE Trans. VLSI Syst., vol. 23, no. 10, pp. 2162-2172, Oct. 2015.
[20] P.Y. Tsai and C.Y. Lin, “A generalized conflict-free memory addressing scheme for continuous-flow parallel-processing FFT processors with rescheduling,” IEEE Trans. VLSI Syst., vol. 19, no. 12, pp. 2290-2302, Dec. 2011.
[21] C. Moore, N. Hanley, J. McAllister, M. O’Neill, E. O’Sullivan, and X. Cao, “Targeting FPGA DSP slices for a large integer multiplier for integer based FHE,” in Proc. Int. Conf. Financial Cryptography Data Secur., 2013, pp. 226-237.
[22] P. G. Comba, “Exponentiation cryptosystems on the IBM PC,” IBM Syst. J., vol. 29, no. 4, pp. 526-538, 1990.
[23] H.F Luo and M.D. Shieh, “Efficient memory management scheme for pipelined shared-memory FFT processors,”in Proc. IEEE International Conference on Comsumer Electronics-Taiwan, pp. 178-179, 2015.
[24] X. Cao and C. Moore, “New integer-FFT multiplication architectures and implementations for accelerating fully homomorphic encryption,” IACR Cryptology ePrint Archive, 2013.
[25] P. L. Montgomery, “Modular multiplication without trial division,” Math. Computation, vol. 44, pp. 519–521, Apr. 1985.
[26] R. Rivest, A. Shamir, and L. Adleman, “A method for obtaining digital signatures and public-key cryptosystems,” Commun. ACM, vol. 21, pp. 120-126, Feb. 1978.
[27] N. Koblitz, “Elliptic curve cryptosystems,” Proc. Math. Computation, pp. 203–209, 1987.
[28] V. S. Miller, “Use of elliptic curve in cryptography,” Proc. Adv. Cryptology (Crypto), pp. 417–426, 1986.
[29] C.C. Yang, T.S. Chang, and C.W. Jen, “A new RSA cryptosystem hardware design based on Montgomery’s algorithm,” IEEE Trans. Circuits Sys. Part II: Analog and Digital Signal Processing, vol. 45, no. 7, pp. 908-913, Jul. 1998.
[30] C. McIvor, M. McLoone, and J.V. McCanny, “Modified Montgomery modular multiplication and RSA exponentiation techniques,” IEE Proc.-Comp. Digit. Tech., vol. 151, no. 6, pp. 402-408. Nov. 2004.
[31] M.D. Shieh, J.H. Chen, H.S. Wu, and W.C. Lin, “A new modular exponentiation architecture for efficient design of RSA Cryptosystem,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vo. 16, no. 9, pp. 1151-1161, Sept. 2008.
[32] F. Tenca and K. Koc, “A scalable architecture for modular multiplication based on Montgomery’s algorithm,” IEEE Trans. Comput., vol. 52, no. 9, pp. 1215–1221, Sep. 2003.
[33] D. Harris, R. Krishnamurthy, S. Mathew, and S. Hsu, “An improved unified scalable radix-2 Montgomery multiplier,” in Proc. IEEE Symp. Comput. Arith., 2005, pp. 1196–1200.
[34] M.D. Shieh and W.C. Lin, “Word-based Montgomery modular multiplication algorithm for low-latency scalable architectures,” IEEE Trans. Comput., vol. 59, no. 8, pp. 1145-1151, Aug. 2010.
[35] M. Huang, K. Gaj, and T. El-Ghazawi, “New hardware architectures for Montgomery modular multiplication algorithm,” IEEE Trans. Comput., vol. 60, no. 7, pp. 923-935, Jul. 2011.
[36] W.C. Lin, J.H. Ye, and M.D. Shieh, “Scalable Montgomery modular multiplication architecture with low-latency and low-memory bandwidth requirement,” IEEE Trans. Comput., vol. 63, no. 2, pp. 475-483, Feb. 2014.
[37] T. Kozlowski, E.L. Dagless, and J.M. Saul, “An enhanced algorithm for the minimization of exclusive-OR sum-of-products for incompletely specified functions,” in Proc. IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD), 1995, pp. 244-249.
[38] T. Sasao, “EXMIN2: A simplified algorithm for exclusive-OR-sum-of-products expressions for multiple-valued-input two-valued-output functions,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 12, no. 5, pp. 621-632, May. 1993.
[39] N. Song and M. Perkowski, “Minimization of exclusive sum-of-products expressions for multiple-valued input, incompletely specified functions,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 15, no. 4, pp. 385-395, Apr. 1996.
[40] A. Mishchenko and M. Perkowski, “Fast heuristic minimization of exclusive-sums-of-products,” in Proc. Reed-Muller Workshop, 2001, pp. 242-250.
[41] D.V. Popel and A. Dani, “Sierpinski gaskets for logic functions representation,” in Proc. IEEE Int. Symp. on Multiple-Valued Logic (ISMVL), 2002, pp. 39-45.
[42] R. Rivest, L. Adleman, and M.L. Dertouzos, “On the banks and privacy homomorphisms,” in Foundations of secure computation, 1978, pp. 169-180.
[43] P. Paillier, “Public-key cryptosystems based on composite degree residuisity classes,” in Proc. Annu. Int. Conf. Theory Appl. Cryptograph. Techn., 1999, pp. 223-238.
[44] D. Boneh, E.J. Goh, and K. Nissim, “Evaluating 2-DNF formulas on ciphertexts,” in Proc. Annu. Theory of Cryptograph, 2005, pp. 325-341.
[45] P. Barrett, “Implementing the Rivest, Shamir and Adleman public key encryption algorithm on a standard digital signal processor,” in Proc. Annu. Int. Conf. Theory Appl. Cryptograph. Techn., 1986, pp. 311-323.
[46] B. Schneier, Applied Cryptograpgy: Protocols, Algorithms, and Source Code in C, 2nd ed. New York: Wiley, 1996.
[47] A.J. Menezes, P.C. van Oorschot, and S.A. Vanstone, Handbook of Applied Cryptography. Boca Raton, FL: CRC Press, 1997.
[48] W. Stallings, Cryptography and Network Secutity: Principles and Practice, 5th ed. Prentice Hall, 2010.
[49] U. S. Department of Commerce, Washington, DC, “National Encryption Standard”, 1988.
[50] Federal Information Processing Standards Publication 197, “Advanced Encryption Standard (AES),” 2001.
[51] W. Diffie and M.E. Hellman, “New directions in cryptography,” IEEE Trans. Inf. Theory, vol. IT-22, no. 6, pp. 644-654, Nov. 1976.
[52] IEEE Standard Specifications for Public-Key Cryptography, IEEE Std 1363-2000, 2000.
[53] D.E. Knuth, Seminumerical Algorithms, the Art of Computer Programming. Reading, MA: Addison-Wesley, 1981, vol. 2.
[54] M. Joye and S.M. Yen, “The Montgomery powering ladder,” in Proc. Annu. Int. Conf. Cryptographic Hardware and Embedded Systems, 2002, pp. 291-302.
[55] Federal Information Processing Standards Publication 180-2, “Secure Hash Standard,” 2002.
[56] C.D. Walter, “Montgomery exponentiation needs no final subtractions,” Electronics Letters, vol. 32, no. 21, pp. 1831–1832, Oct. 1999.
[57] S.Y. Lee, C.C. Chen, C.C. Lee, and C.J. Chen, “A low-power VLSI architecture for shared-memory FFT processor with a mixed-radix algorithm and a simple memory control scheme,” in Proc. IEEE Int. Symp. Circuits Syst., 2006, pp. 157-160.