簡易檢索 / 詳目顯示

研究生: 黃振哲
Huang, Chen-Che
論文名稱: 以反正切函數進行同態加密之隱私保護資料探勘
Homomorphic Encryption with Arctangent Functions for Privacy Preserving Data Mining
指導教授: 翁慈宗
Wong, Tzu-Tsung
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 43
中文關鍵詞: 隱私保護資料探勘反正切函數同態加密羅吉斯回歸
外文關鍵詞: privacy preserving data mining, arctangent function, homomorphic encryption, logistic regression
相關次數: 點閱:75下載:15
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著資訊科技的日益發展之下,可以產生資料的應用程式不斷地推出,但是資料中通常會包含個人之敏感資訊,這也使得隱私保護資料探勘 (Privacy Preserving Data Mining, PPDM) 的重要性逐漸提升。過往學者們提出了許多PPDM方法以保護隱私,從中得知至今尚未有一種PPDM方法能適用在所有情境。經由文獻探討得知,PPDM有一種作法為使用同態加密的方式達到隱私保護的要求,且通常都應用在羅吉斯回歸上,然而大多數文獻皆採用Sigmoid函數,發現了在效率、效能方面表現不夠好,因此本研究也以此為出發點,使用反正切函數進行同態加密的羅吉斯回歸方法。本研究提出了一套PPDM流程,為資料擁有者(甲方)透過同態加密方式保護原始資料,接著將受保護之資料傳給資料接收者(乙方)使用羅吉斯回歸進行分析處理後,最後提供此模型給乙方進行後續預測新資料之用。在生成模型之後,藉由與過往文獻所提出的方法進行比較,以分類正確率差異、預測不一致比例以及訓練時間等評估方式,評估本研究之模型,並在實驗之後發現,本研究在大部分的情形下,此三個評估方式表現皆較佳。

    As information technology advances, the data collected by applications often contain sensitive personal information, which makes Privacy Preserving Data Mining (PPDM) increasingly important. Scholars have proposed many PPDM methods to protect privacy, but no single PPDM method that can be applied in all domains. Using homomorphic encryption to achieve the privacy protection requirement is an approach in PPDM. Logistic regression with Sigmoid function is usually adopted in applying homomorphic encryption, while its computational cost and is high and the frequency of prediction inconsistency is not small. To overcome such deficiencies, this study proposes a way to introduce the arctangent function for logistic regression so that the data owner can protect the original data from the receiver by applying homomorphic encryption. The experimental results on 20 data sets show that the method proposed by this study not only is more efficient, but also can reduce prediction inconsistency on average.

    摘要 I 目錄 V 表目錄 VII 圖目錄 VIII 第一章 緒論 1 1.1 研究背景及動機 1 1.2 研究目的 2 1.3 研究流程 3 第二章 文獻探討 4 2.1 隱私保護資料探勘 4 2.1.1 隱私保護資料探勘之技術 4 2.1.2 隱私保護資料探勘之應用 8 2.1.3 隱私保護資料探勘之限制與挑戰 9 2.2 羅吉斯回歸 11 2.3 同態加密 11 2.4 激勵函數之近似轉換 14 2.5 PPDM評估指標 17 2.5.1 隱私指標 17 2.5.2 可用性指標 18 2.6 小結 19 第三章 研究方法 20 3.1 PPDM運作流程 20 3.2 資料加密 21 3.3 模型訓練 23 3.4 新資料預測 25 3.5 評估方式 26 第四章 實證研究 29 4.1 資料集介紹 29 4.2 參數設定 29 4.3 模型評估 32 4.3.1 模型分類正確率之差異 32 4.3.2 模型預測不一致比例 34 4.3.3 模型訓練時間比較 34 4.4 小結 36 第五章 結論與未來展望 38 5.1 結論 38 5.2 未來展望 39 參考文獻 40

    Acquisti, A., Brandimarte, L., & Loewenstein, G. (2015). Privacy and human behavior in the age of information. Science, 347(6221), 509-514.
    Adhau, T. P. & Pund, M. A. (2017). Information security and data mining in big data. International Journal of Scientific Research in Science, Engineering and Technology, 3(2), 661-673.
    Aggarwal, C. C. (2015). Data Mining: The Textbook: Springer.
    Aggarwal, C. C. & Philip, S. Y. (2004). A condensation approach to privacy preserving data mining. Proceedings of the International Conference on Extending Database Technology, 183-199, Heidelberg, Berlin, Germany.
    Agrawal, R. & Srikant, R. (2000). Privacy-preserving data mining. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 439-450, Dallas, Texas, USA.
    Beresford, A. R. & Stajano, F. (2003). Location privacy in pervasive computing. IEEE Pervasive Computing, 2(1), 46-55.
    Bertino, E., Lin, D., & Jiang, W. (2008). A survey of quantification of privacy preserving data mining algorithms. Privacy-Preserving Data Mining: Models and Algorithms, 183-205, Boston, MA, USA
    Chen, H., Laine, K., Player, R. (2017). Simple encrypted arithmetic library - SEAL v2.1. International Conference on Financial Cryptography and Data Security, 3-18, Cham, Switzerland.
    Dabhade, M. & Hilda, J. J. Privacy preserving in data mining using data perturbation and classification method. Institute of Integrative Omics and Applied Biotechnology, 8, 346-352.
    De Montjoye, Y.-A., Hidalgo, C. A., Verleysen, M., & Blondel, V. D. (2013). Unique in the crowd: The privacy bounds of human mobility. Scientific Reports, 3(1), 1-5.
    ElGamal, T. (1985). A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Transactions on Information Theory, 31(4), 469-472.
    Fan, Y., Bai, J., Lei, X., Zhang, Y., Zhang, B., Li, K.-C., & Tan, G. (2020). Privacy preserving based logistic regression on big data. Journal of Network and Computer Applications, 171, 102769.
    Felt, A. P., Ha, E., Egelman, S., Haney, A., Chin, E., & Wagner, D. (2012). Android permissions: User attention, comprehension, and behavior. Proceedings of the Eighth Symposium on Usable Privacy and Security, 1-14, Washington, D.C., USA.
    Fung, B. C., Wang, K., Chen, R., & Yu, P. S. (2010). Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys, 42(4), 1-53.
    Gentry, C. (2009). Fully homomorphic encryption using ideal lattices. Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, 169-178, Bethesda, MD, USA.
    Groat, M. M., Hey, W., & Forrest, S. (2011). KIPDA: k-indistinguishable privacy-preserving data aggregation in wireless sensor networks. 2011 Proceedings IEEE INFOCOM, 2024-2032, Shanghai, China.
    Harale, S. A. & Bongale, A. (2014). Privacy preservation and restoration of data using unrealized data sets. International Journal of Engineering Research and Applications, 4(7), 107-111.
    Jiang, B. & Yao, X. (2006). Location-based services and GIS in perspective. Computers, Environment and Urban Systems, 30(6), 712-725.
    Kim, J. & Winkler, W. (2003). Multiplicative noise for masking continuous data. Statistics, 1(9).
    Krumm, J. (2009). A survey of computational location privacy. Personal and Ubiquitous Computing, 13(6), 391-399.
    Laur, S., Lipmaa, H., & Mielikäinen, T. (2006). Cryptographically private support vector machines. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 618-624, Philadelphia, PA, USA.
    Li, N., Li, T., & Venkatasubramanian, S. (2007). t-closeness: Privacy beyond k-anonymity and l-diversity. 2007 IEEE 23rd International Conference on Data Engineering, 106-115, Istanbul, Turkey.
    Lindell, Y. (2005). Secure multiparty computation for privacy preserving data mining. Encyclopedia of Data Warehousing and Mining, 1005-1009, Hershey, PA, USA.
    Machanavajjhala, A., Kifer, D., Gehrke, J., & Venkitasubramaniam, M. (2007). l-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data, 1(1), 3-54.
    Matwin, S. (2013). Privacy-preserving data mining techniques: Survey and challenges. Discrimination and Privacy in the Information Society, 209-221, Heidelberg, Berlin, Germany.
    McLaren, P. J., Raisaro, J. L., Aouri, M., Rotger, M., Ayday, E., Bartha, I., Delgado, M. B., Vallet, Y., Günthard, H. F., Cavassini, M., Furrer, H., Doco-Lecompte, T., Marzolini, C., Schmid, P., Benedetto, C. D., Decosterd L. A., Fellay, J., Hubaux, J. -P., Telenti, A. (2016). Privacy-preserving genomic testing in the clinic: a model using HIV treatment. Genetics in Medicine, 18(8), 814-822.
    Mendes, R. & Vilela, J. P. (2017). Privacy-preserving data mining: methods, metrics, and applications. IEEE Access, 5, 10562-10582.
    Naveed, M., Ayday, E., Clayton, E. W., Fellay, J., Gunter, C. A., Hubaux, J. -P., Malin, B. A., Wang, X. (2015). Privacy in the genomic era. ACM Computing Surveys, 48(1), 1-44.
    Paillier, P. (1999). Public-key cryptosystems based on composite degree residuosity classes. International Conference on the Theory and Applications of Cryptographic Techniques, 223-238, Heidelberg, Berlin, Germany.
    Piotr, D. (2008). Logistic function and arcus tangent. Mathematical Economics, 5(12), 67-74.
    Rady, H. (2011). Reyni’s entropy and mean square error for improving the convergence of multilayer backprobagation neural networks: A comparative study. International Journal of Electrical & Computer Sciences, 11(5), 68-79.
    Rajan, S., Wang, S., Inkol, R., & Joyal, A. (2006). Efficient approximations for the arctangent function. IEEE Signal Processing Magazine, 23(3), 108-111.
    Rivest, R. L., Adleman, L., & Dertouzos, M. L. (1978). On data banks and privacy homomorphisms. Foundations of Secure Computation, 4(11), 169-180.
    Rong, H., Wang, H.-M., Liu, J., & Xian, M. (2016). Privacy-preserving k-nearest neighbor computation in multiple cloud environments. IEEE Access, 4, 9589-9603.
    Shanthi, A. & Karthikeyan, M. (2012). A review on privacy preserving data mining. 2012 IEEE International Conference on Computational Intelligence and Computing Research, 1-4, Coimbatore, India.
    Siang, D. K. K., Othman, S. H., & Radzi, R. Z. R. M. (2018). Comparative study on perturbation techniques in privacy preserving data mining on two numeric datasets. International Journal of Innovative Computing, 8(1), 27-32.
    Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 557-570.
    Taban, G. & Gligor, V. D. (2009). Privacy-preserving integrity-assured data aggregation in sensor networks. 2009 International Conference on Computational Science and Engineering, 168-175, Vancouver, BC, Canada.
    Tan, P.-N., Steinbach, M., Karpatne, A., & Kumar, V. (2020). Introduction to data mining.: Pearson.
    Tran, H.-Y. & Hu, J. (2019). Privacy-preserving big data analytics a comprehensive survey. Journal of Parallel and Distributed Computing, 134, 207-218.
    Vaghashia, H. & Ganatra, A. (2015). A survey: Privacy preservation techniques in data mining. International Journal of Computer Applications, 119(4), 20-26.
    Wagner, I. & Eckhoff, D. (2018). Technical privacy metrics: A systematic survey. ACM Computing Surveys, 51(3), 1-38.
    Xu, L., Jiang, C., Wang, J., Yuan, J., & Ren, Y. (2014). Information security in big data: Privacy and data mining. IEEE Access, 2, 1149-1176.
    Yao, A. C. (1982). Protocols for secure computations. 23rd Annual Symposium on Foundations of Computer Science, 160-164, Chicago, IL, USA.
    Yi, X., Rao, F.-Y., Bertino, E., & Bouguettaya, A. (2015). Privacy-preserving association rule mining in cloud computing. Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security, 439-450, Singapore, Republic of Singapore.
    Zhao, C., Zhao, S., Zhao, M., Chen, Z., Gao, C.-Z., Li, H., & Tan, Y.-a. (2019). Secure multi-party computation: Theory, practice and applications. Information Sciences, 476, 357-372.
    Zhou, J., Cao, Z., Dong, X., & Lin, X. (2015). PPDM: A privacy-preserving protocol for cloud-assisted e-healthcare systems. IEEE Journal of Selected Topics in Signal Processing, 9(7), 1332-1344.
    蘇佳琳 (2021)。單調性資料轉換在隱私保護資料探勘中對可用性及安全性的影響。國立成功大學資訊管理研究所碩士論文。

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE