| 研究生: | 鍾勝民 Chung, Shen-Ming | 
|---|---|
| 論文名稱: | 可確保雲端資料庫使用者隱私之雲原生且一致化可搜索加密演算法 Cloud-Native and Unified Searchable Encryption for Privacy-Preserving within Cloud Repository | 
| 指導教授: | 謝明得 Shieh, Ming-Der | 
| 學位類別: | 博士 Doctor | 
| 系所名稱: | 電機資訊學院 - 電機工程學系 Department of Electrical Engineering | 
| 論文出版年: | 2020 | 
| 畢業學年度: | 108 | 
| 語文別: | 英文 | 
| 論文頁數: | 103 | 
| 中文關鍵詞: | 可搜索加密演算法 、資料庫系統 、隱私確保 、雲原生 | 
| 外文關鍵詞: | searchable encryption, DBMS, privacy preserving, cloud-native | 
| 相關次數: | 點閱:104 下載:13 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
由於能對存放於雲端這類半信任(semi-trusted)環境之資料賦予機密性(confidentiality)及可搜索性(searchability),可搜索加密演算法(Searchable Encryption;SE)被視為雲端時代的一項重要技術。然而,此技術往往須修改伺服器端的軟體架構,無法「原生地(natively)」與雲端服務整合運行,因此導致可搜索加密演算法即使在二十年的發展後仍罕見於雲端。
為了要讓可搜索加密演算法能廣泛適用於雲端以確保使用者的隱私,本論文提出一個名為FETCH(Frequency-Eliminated Trapdoor-Character-Hopping)的「雲原生(cloud-native)」可搜索加密演算法。雲原生的目的,是要讓FETCH能在不修改既有資料庫軟體(database management system; DBMS)的原則下,支援萬用字元(wildcard)之加密下文字樣型(pattern)搜索。基於創新的common-conditioned-subsequence-preserving (CCSP)技巧,FETCH將加密下搜索的問題轉化為資料庫系統常能支援之子序列比對(subsequence matching)問題,因此可通用於雲端服務。雖然我們在安全分析中以學理證明CCSP將使SE索引(SE indexes)無法達到理論上的不可分辨性(indistinguishability),但在廣泛被運用的三層需端架構(3-tier cloud structure)下,FETCH已可提供足夠的機密性,並比現有的萬用字元可搜索加密演算法(wildcard SE schemes)有更優異的檢索效能、更小的索引存儲需求及更簡單的佈署方式。FETCH尤其能處理現有萬用字元可搜索加密演算法所無法負荷之巨量而持續成長的(雲端)資料。
相當有趣地,我們發現FETCH可被進一步擴充作加密下數值檢索,我們稱此擴充後之演算法為「一致化(unified)」可搜索加密演算法,並命名為uFETCH。其一致化的特性讓uFETCH除了能在加密下搜索文字資料,也能為數值資料提供加密下之快速搜索。由於既有的加密下搜索(encrypted-search)相關技術皆只能適用文字或數值之同質性(homogeneous)資料型態,uFETCH打破了一種認為「必須整合多種加密下搜索技術才能使資料庫系統提供隱私確保」的信念;事實上,此信念已造就許多如CryptDB等受歡迎但高複雜度之設計。有別於這樣的設計,uFETCH可為文字及數值之資料型態建立一致性的SE索引;即使將文字及數值所建立之SE索引被混雜在一起,它仍能提供快速之加密下搜索。由於uFETCH一樣將加密搜索問題轉化為子序列比對問題,它讓雲端資料庫能在次線性(sub-linear)的時間下完成加密文字與數值的搜索,並在受歡迎的三層需端架構下提供足夠的機密性保全。
除了演算法推導及相關學理分析外,在此論文中我們也分別對兩大雲端使用例(use cases)完成實做及量化效能評估-即雲端儲存系統(cloud storage)及雲端資料庫系統(cloud DBMS)。因雲端儲存系統常用於儲存非結構化(unstructured)檔案資料,我們運用FETCH來提供全文資料檢索(full-content search),並以真實世界之Enron郵件資料集進行效能分析。至於結構化資料,我們以uFETCH展示一低複雜度整合下之SQL資安代理器(security agent),相對於傳統做法須套用多種異質可搜索加密技術,此SQL資安代理器只使用uFETCH便可使MySQL資料庫系統在不被修改的原則下提供文字及數值資料之隱私確保。
Searchable Encryption (SE) is considered important in the era of cloud as it provides both confidentiality and searchability for the data stored in semi-trusted environments such as cloud. Unfortunately, even after two-decade long development, it is still rarely deployed in cloud because most SE schemes cannot natively work with cloud services, as modifications to the underlying software infrastructure is inevitable.
To advocate SE deployment in cloud for protecting user privacy, a cloud-native SE scheme called FETCH (Frequency-Eliminated Trapdoor-Character-Hopping) is presented in this dissertation. Based on novel common-conditioned-subsequence-preserving (CCSP) techniques, FETCH is able to natively work with off-the-shelf databases to support wildcard-based pattern search on encrypted data. With the CCSP techniques, the problem of wildcard SE searching can be transformed into a problem of subsequence matching, that is well-supported in most databases and thus fits well with cloud services in general. Though in our security analysis CCSP removes the possibility of obtaining theoretical indistinguishability between indexed items, we show that FETCH nevertheless provide adequate confidentiality protection in the commonly-adopted 3-tier cloud structure and fares much better than other existing wildcard SE schemes in terms of query performance, storage overhead and deployment complexity. In particular, FETCH is able to efficiently handle datasets whose size is multiple orders of magnitude larger than those that existing schemes can comfortably support.
Interestingly, we found FETCH able to be extended to cover even encrypted numerical data in addition to textual data. We call the extended one ``unified' FETCH, or uFETCH in short. As encrypted-search techniques such as SE schemes were devised for homogeneous data type, i.e. textual or numerical, uFETCH breaks a nature presumption that multiple techniques have to be intertwined to make database management system (DBMS) privacy-preserving. Such a presumption actually has led to popular but more complex designs such as CryptDB, putting efforts on heterogeneous integration. Different from such designs, uFETCH is able to build unified SE indexes for both the types while enabling fast encrypted search even if the SE indexes built for texts and numbers are mingled. Since uFETCH also transforms the problem of encrypted search into a simple problem of subsequence matching for cloud-native, it requires only sub-linear search time w.r.t. the volume of indexed items and is secure in the widely-adopted 3-tier cloud structure to help cloud service providers ease regulation compliance with out-sourced repository.
Besides algorithmic derivation and theoretical analysis, two major cloud use cases, i.e. cloud storage and cloud DBMS, are also studied and evaluated, with cloud storage studied for pure textual data and cloud DBMS for data mixed with texts and numbers. More specifically, as cloud storage is often used to store unstructured data in files, FETCH is utilized to address the challenge of full-content textual search. As for structured data mixed with textual and numerical data, uFETCH is set up to bring forth simpler design of a security agent that demonstrates how cloud DBMS can be easily augmented to preserve privacy.
[1] “General data protection regulation.” https://gdpr-info.eu. Accessed: 20200325.
[2] J. Mei, K. Li, A. Ouyang, and K. Li, “A profit maximization scheme with guaranteed quality of service in cloud computing,” IEEE Transactions on Computers, vol. 64, pp. 1–1, 11 2015.
[3] O. Goldreich, The Foundations of Cryptography Volume 2, Basic Applications, vol. 2. Cambridge University Press, 2004.
[4] R. BaezaYates and B. RibeiroNeto, Modern information retrieval. ACM press New York, 1999.
[5] C. Gentry, “Fully homomorphic encryption using ideal lattices,” ACM Symposium on Theory of Computing, pp. 169–178, 2009.
[6] D. X. Song, D. Wagner, and A. Perrig, “Practical techniques for searches on encrypted data,” IEEE Symposium on Security and Privacy, pp. 44–55, 2000.
[7] E.J. Goh, “Secure indexes,” IACR Cryptology ePrint Archive, 2003.
[8] R. Curtmola, J. Garay, S. Kamara, and R. Ostrovsky, “Searchable symmetric encryption: Improved definitions and efficient constructions,” CCS ACM, pp. 79–88, 2006.
[9] J. Li, Q. Wang, C. Wang, N. Cao, K. Ren, and W. Lou, “Fuzzy keyword search over encrypted data in cloud computing,” IEEE INFOCOM, 2010.
[10] A. Awad, A. Matthews, Y. Qiao, and B. Lee, “Chaotic searchable encryption for mobile cloud storage,” IEEE Transactions on Cloud Computing, 2015.
[11] C. Bösch, R. Brinkman, P. Hartel, and W. Jonker, “Conjunctive wildcard search over encrypted data,” SDM, pp. 114–117, 2011.
[12] T. Suga, T. Nishide, and K. Sakurai, “Secure keyword search using bloom filter with specified character positions,” International Conference on Provable Security, pp. 235–252, 2012.
[13] S. Sedghi, P. V. Liesdonk, S. Nikova, P. H. Hartel, and W. Jonker, “Searching keywords with wildcards on encrypted data,” SCN (LNCS) 6280, pp. 138–153, 2010.
[14] D. Wang, X. Jia, C. Wang, K. Yang, S. Fu, and M. Xu, “Generalized pattern matching string search on encrypted data in cloud systems,” IEEE INFOCOM, pp. 2101–2109, 2015.
[15] C. Hu and L. Han, “Efficient wildcard search over encrypted data. international journal of information security,” International Journal of Information Security, pp. 539–547, 2016.
[16] Y. Yang, X. Liu, R. H. Deng, and J. Weng, “Flexible wildcard searchable encryption system,” IEEE Transactions on Services Computing, 2017.
[17] Y. Yang, X. Liu, R. H. Deng, and J. Weng, “Flexible wildcard searchable encryption system,” IEEE Transactions on Services Computing, vol. 13, pp. 464–477, jul 2020.
[18] P. Golle, J. Staddon, and B. Waters, “Secure conjunctive keyword search over encrypted data,” Applied Cryptography and Network Security LNCS 3089, pp. 31–45, Springer, 2004.
[19] L. Ballard, S. Kamara, and F. Monrose, “Achieving efficient conjunctive keyword searches over encrypted data,” ICICS (LNCS) 3783, pp. 414–426, Springer, 2005.
[20] Y. Yang and M. Ma, “Conjunctive keyword search with designated tester and timing enabled proxy re-encryption function for e-health clouds,” IEEE Transactions on Information Forensics and Security, 2016.
[21] Y. Tang, D. Gu, N. Ding, and H. Lu, “Phrase search over encrypted data with symmetric encryption scheme,” in 2012 32nd International Conference on Distributed Computing Systems Workshops, pp. 471–480, 2012.
[22] Z. A. Kissel and J. Wang, “Verifiable phrase search over encrypted data secure against a semi-honest-but-curious adversary,” in 2013 IEEE 33rd International Conference on Distributed Computing Systems Workshops, pp. 126–131, 2013.
[23] S. Zittrower and C. C. Zou, “Encrypted phrase searching in the cloud,” in 2012 IEEE Global Communications Conference (GLOBECOM), pp. 764–770, 2012.
[24] J. Li, Z. Liu, X. Chen, F. Xhafa, X. Tan, and D. S. Wong, “L-encdb: A lightweight framework for privacy-preserving data queries in cloud computing,” Knowledge-Based Systems, vol. 79, pp. 18 – 26, 2015.
[25] C. Guo, X. Chen, Y. Jie, F. Zhangjie, M. Li, and B. Feng, “Dynamic multi-phrase ranked search over encrypted data with symmetric searchable encryption,” IEEE Transactions on Services Computing, pp. 1–1, 2017.
[26] H. Hacigümüş, B. Iyer, C. Li, and S. Mehrotra, “Executing sql over encrypted data in the database-service-provider model,” ACM SIGMOD international conference on Management of data, 2002.
[27] P. Chen, W. Zeng, Y. Zhu, and Y. Gu, “Secure range query based on privacy-preserving function in two-tiered sensor,” International Conference on Security of Smart Cities, Industrial Control System and Communications (SSIC), 2016.
[28] R. Li, A. X. Liu, A. L. Wang, and B. Bruhadeshwar, “Fast range query processing with strong privacy protection for cloud computing,” The VLDB Endowment, pp. 1953–1964, 2014.
[29] S. Bajaj and R. Sion, “Trusteddb: A trusted hardware-based
database with privacy and data confidentiality,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, 2014.
[30] J. Bater, E. G, C. Eggen, S. Goel, A. Kho, and J. Rogers, “Smcql: secure querying for federated databases,” The VLDB Endowment, pp. 673–684, 2017.
[31] J. Li and E. Omiecinski, “Efficiency and security tradeoff in supporting range queries on encrypted databases,” DBSec 2005. Working Conference on Data and Applications Security, 2005.
[32] R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu, “Order-preserving encryption for numeric data,” SIGMOD, pp. 563–574, 2004.
[33] R. A. Popa, C. M. S. Redfield, N. Zeldovich, and H. Balakrishnan, “Cryptdb: protecting confidentiality with encrypted query processing,” ACM Symposium on Operating Systems Principles, 2011.
[34] P. Paillier, “Publickey cryptosystems based on composite degree residuosity classes,” EUROCRYPT, 1999.
[35] S. Tu, M. F. Kaashoek, S. Madden, and N. Zeldovich, “Processing analytical queries over encrypted data,” in Proceedings of the VLDB Endowment, vol. 6, pp. 289–300, 2013.
[36] A. Papadimitriou, R. Bhagwan, N. Chandran, R. Ramjee, A. Haberlen, H. Singh, A. Modi, and S. Badrinarayanan, “Big data analytics over encrypted datasets with seabed,” 10 2016.
[37] “Microsoft azure blob.” https://azure.microsoft.com/zh-tw/services/storage/blobs/. Accessed: 2020-06-17.
[38] “Amazon aws s3.” https://aws.amazon.com/tw/s3/. Accessed: 2020-06-17.
[39] “Google cloud storage.” https://cloud.google.com/storage. Accessed: 2020-06-17.
[40] O. Goldreich and R. Ostrovsky, “Software protection and simulation on oblivious rams,” Journal of the ACM, vol. 43(3), pp. 431–473, 1996.
[41] L. Vaquero, L. Rodero-Merino, J. Caceres, and M. Lindner, “A break in the clouds: towards a cloud definitions,” ACM SIGCOMM computer communications review, 2009.
[42] C. Bösch, P. Hartel, W. Jonker, and A. Peter, “A survey of provably secure searchable encryption,” ACM computing surveys, vol. 47(2), pp. 1–18, 2014.
[43] B. Bloom, “Space/time tradeoffs in hash coding with allowable errors,” Communications of the ACM, 1970.
[44] W. K. Wong, D. W. Cheung, B. Kao, and N. Mamoulis, “Secure knn computation on encrypted databases,” SIGMOD, pp. 139–152, 2009.
[45] A. Arasu, S. Blanas, K. Eguro, R. Kaushik, D. Kossmann, R. Ramamurthy, and R. Venkatesan, “Orthogonal security with cipherbase.,” in CIDR, 2013.
[46] D. Boneh, C. Gentry, S. Halevi, F. Wang, and D. J. Wu, “Private database queries using somewhat homomorphic encryption,” in Applied Cryptography and Network Security (M. Jacobson, M. Locasto, P. Mohassel, and R. SafaviNaini, eds.), (Berlin, Heidelberg), pp. 102–118, Springer Berlin Heidelberg, 2013.
[47] B. K. Samanthula, W. Jiang, and E. Bertino, “Privacy-preserving complex query evaluation over semantically secure encrypted data,” in Computer Security ESORICS 2014 (M. Kutyłowski and J. Vaidya, eds.), (Cham), pp. 400–418, Springer International Publishing, 2014.
[48] M. Kantarcioǧlu and C. Clifton, “Security issues in querying encrypted data,” Data and Applications Security XIX, pp. 924–924, 2005.
[49] P. Lin and K. Candan, “Hiding tree structured data and queries from untrusted data stores,” Information Systems Security, vol. 14, no. 4, pp. 10–26, 2005.
[50] V. Pappas, F. Krell, B. Vo, V. Kolesnikov, T. Malkin, S. G. Choi, W. George, A. Keromytis, and S. Bellovin, “Blind seer: A scalable private dbms,” in 2014 IEEE Symposium on Security and Privacy, pp. 359–374, 2014.
[51] D. Boneh and V. Shoup, “A graduate course in applied cryptography.” https://crypto.stanford.edu/~dabo/cryptobook/. Accessed: 2020-04-17.
[52] D. S. Hirschberg, “Algorithms for the longest common subsequence problem,” J. ACM, vol. 24(4), pp. 664–675, 1977.
[53] B. Hore, S. Mehrotra, and G. Tsudik, “A privacy-preserving index for range queries,” in Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, pp. 720–731, 2004.
[54] Chuanyi Liu, Guofeng Wang, Peiyi Han, Hezhong Pan, and Binxing Fang, “A cloud access security broker based approach for encrypted data search and sharing,” in 2017 International Conference on Computing, Networking and Communications (ICNC), pp. 422–426, 2017.
[55] M. Vrable, S. Savage, and G. Voelker, “Bluesky: A cloud-backed file system for the enterprise,” pp. 19–19, 02 2012.
[56] R. Zhao, C. Yue, B. Tak, and C. Tang, “Safesky: A secure cloud storage middleware for end-user applications,” 09 2015.
[57] “Searchstax.” https://www.searchstax.com/. Accessed: 2020-04-17.
[58] P. S. Keila and D. B. Skillicorn, “Structure in the enron email dataset,” Computational and Mathematical Organization Theory, vol. 11(3), pp. 183–199, 2005.
[59] “Popular baby names, national data.” https://www.ssa.gov/OACT/babynames/names.zip. Accessed: 2020-03-25.