簡易檢索 / 詳目顯示

研究生: 蘇佳琳
Su, Jia-Lin
論文名稱: 單調性資料轉換在隱私保護資料探勘中對可用性及安全性的影響
The Impact on the Utility and Security of Privacy Preserving Data Mining by Using Monotonic Transformation
指導教授: 翁慈宗
Wong, Tzu-Tsung
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 67
中文關鍵詞: 分類單調函數多段式轉換隱私保護資料探勘
外文關鍵詞: classification, monotonic function, multi-interval transformation, privacy preserving data mining
相關次數: 點閱:133下載:23
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在這個大數據的時代,許多企業或資料科學家會藉由分析數據來解決問題,然而,資料擁有者經常為了保護隱私或內部機密選擇隱藏部分數據,甚至拒絕提供資料,導致資料探勘的結果不準確。有鑑於上述問題,隱私保護資料探勘 (Privacy Preserving Data Mining, PPDM) 於2000年被提出,學者們針對資料前處理提出各種轉換方式,或是在傳送過程使用加密技術以保護資料,然而卻無法還原資料與學習模型,導致資料信息流失過多,無法在安全性 (security) 與可用性 (utility) 之間取得平衡。
    因此,本研究提出新的PPDM模型,確保除了資料擁有者(甲方)以外,皆無法得到原始資料,若資料科學家(乙方)欲分析,只能從甲方獲取經過轉換的資料集,利用資料探勘演算法得到學習模型後,再將模型交給甲方進行還原,最後回傳給乙方,也就是說,乙方不需拿到原始資料,也可以得到正確的模型,並可應用於未來的新資料,如此一來就能兼顧資料的安全性與可用性。
    為了在轉換前後保持資料的單調性,本研究選用線性函數及Box-Cox函數,針對數值型屬性作多段式單調轉換,接著使用決策樹與規則分類兩種演算法生成學習模型,最後再利用單調函數的反函數將模型還原。本研究使用正確率 (Accuracy) 與自創的Secure測度評估研究結果後,發現符合單調性的多段式轉換可完整保留資料可用性,安全性的部分則會隨著分段數的增加明顯上升,證實本研究提出的多段式單調轉換能夠同時顧及資料的可用性及安全性。

    In the era of Big Data, enterprises and data analysts will solve problems by analyzing data. However, data owners may be eager to protect the privacy or the confidentiality of data by refusing to provide it, resulting in inaccurate data mining results. Privacy Preserving Data Mining (PPDM) was proposed in 2000 to resolve this problem. After two decades of development, scholars were still unable to restore the models induced from the data transformed by PPDM techniques for predicting new data. In other words, all of those PPDM techniques cannot strike a balance between the utility and the security of data so far. This research aims to propose a new PPDM method so that original data can be accessed only by data owners. Data analysts can process the transformed data provided by the owners to develop a transformed model. After that, the analysts must send the transformed model back to the owners for restoration. The owners, finally, will send the restored model to the analysts for classifying new data. Since data analysts can apply the restored model on new instances, this method enhances both the security and the utility of data simultaneously. In order to maintain the monotonicity of continuous attributes, linear and Box-Cox functions are chosen to perform multi-interval transformation on those attributes. The inverse of those functions are used to obtain the restored models for decision tree and rule-based classification algorithms. The experimental results on ten data sets demonstrate that the proposed transformation method can enhance both the security and the utility of the data for mining. The level of security increases as the number of intervals for transformation increases.

    摘要 I 目錄 VII 表目錄 IX 圖目錄 X 第一章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 2 1.3 研究範圍與限制 3 1.4 研究流程 3 第二章 文獻探討 5 2.1 隱私保護資料探勘 5 2.1.1 隱私保護資料探勘的技術與分類 5 2.1.2 隱私保護資料探勘的應用與限制 8 2.1.3 可逆式隱私保護資料探勘 9 2.2 資料轉換 10 2.2.1 資料轉換的目的 11 2.2.2 資料轉換的方法 11 2.2.3 單調性資料轉換與前置處理 15 2.3 評估方法 16 2.3.1 資料質量 17 2.3.2 隱私等級 19 2.4 小結 21 第三章 研究方法 22 3.1 問題定義及運作流程 22 3.2 資料轉換 23 3.2.1 線性轉換與Box-Cox轉換 24 3.2.2 一段式單調轉換與多段式單調轉換 26 3.3 分類模型的生成 30 3.3.1 決策樹 30 3.3.2 規則分類 32 3.4 還原分類模型 33 3.4.1 線性轉換之模型還原 33 3.4.2 Box-Cox轉換之模型還原 35 3.4.3 多段式單調轉換之模型還原 36 3.5 評估指標 37 3.5.1 可用性的評估 37 3.5.2 安全性的評估 38 3.6 小結 39 第四章 實證研究 40 4.1 資料集介紹 40 4.2 資料轉換 41 4.3 模型生成與還原 43 4.3.1 模型生成 43 4.3.2 模型還原 44 4.4 可用性及安全性評估 47 4.4.1 可用性評估 47 4.4.2 安全性評估 51 4.5 完整實作範例 55 4.6 小結 59 第五章 結論與未來展望 60 5.1 結論 60 5.2 未來展望 62 參考文獻 63

    Aggarwal, C. C. (2015). Data Mining: The Textbook. New York, NY, USA: Springer.
    Agrawal, R., & Srikant, R. (2000). Privacy-preserving data mining. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 439-450.
    Alattar, A. M. (2003). Reversible watermark using difference expansion of triplets. Proceedings 2003 International Conference on Image Processing, 1, I-501.
    Bartle, R. G. (1964). The Elements of Real Analysis (Vol. 2). New York: Wiley.
    Bertino, E., Lin, D., & Jiang, W. (2008). A survey of quantification of privacy preserving data mining algorithms. Privacy-preserving Data Mining (pp. 183-205). Boston, MA: Springer.
    Box, G., & Cox, D. (1964). An analysis of transformations. Journal of the Royal Statistical Society. Series B (Methodological), 26(2), 211-252.
    Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. Mobile Networks and Applications, 19(2), 171-209.
    Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., & Zhu, M. Y. (2002). Tools for privacy preserving distributed data mining. ACM SIGKDD Explorations Newslett, 4(2), 2834.
    Coltuc, D., & Chassery, J. M. (2007). Very fast watermarking by reversible contrast mapping. IEEE Signal Processing Letters, 14(4), 255-258.
    Denham, B., Pears, R., & Naeem, M. A. (2020). Enhancing random projection with independent and cumulative additive noise for privacy-preserving data stream mining. Expert Systems with Applications, 152, 113380.
    Domadiya, N., & Rao, U. P. (2020). Improving healthcare services using source anonymous scheme with privacy preserving distributed healthcare data collection and mining. Computing, 1-23.
    Du, J., Jiang, C., Gelenbe, E., Xu, L., Li, J., & Ren, Y. (2018). Distributed data privacy preservation in IoT applications. IEEE Wireless Communications, 25(6), 68-76.
    Dua, S., & Du, X. (2016). Data Mining and Machine Learning in Cybersecurity. Boca Raton, FL, USA: CRC Press.
    Dwark, C. (2006). Differential privacy. Automata, Languages and Programming, 4052, 112.
    Gal, T. S., Chen, Z., & Gangopadhyay, A. (2008). A privacy protection model for patient data with multiple sensitive attributes. International Journal of Information Security and Privacy (IJISP), 2(3), 28-44.
    Gehrke, J., Kifer, D., Machanavajjhala, A., & Venkitasubramaniam, M. (2007). L-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data, 1(1), 3.
    Gentry, C. (2009). A Fully Homomorphic Encryption Scheme (Ph.D. dissertation, Stanford University, Stanford, CA, USA).
    Hassan, M. U., Rehmani, M. H., & Chen, J. (2019). Privacy preservation in blockchain based IoT systems: Integration issues, prospects, challenges, and future research directions. Future Generation Computer Systems, 97, 512-529.
    Iyengar, V. S. (2002). Transforming data to satisfy privacy constraints. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 279-288.
    Jolai, F., & Ghanbari, A. (2010). Integrating data transformation techniques with Hopfield neural networks for solving travelling salesman problem. Expert Systems with Applications, 37(7), 5331-5335.
    Kim, J., & Winkler, W. (2003). Multiplicative noise for masking continuous data. Statistics, 1, 9.
    Kiraz, M. S., Genç, Z. A., & Kardas, S. (2015). Security and efficiency analysis of the Hamming distance computation protocol based on oblivious transfer. Security and Communication Networks, 8(18), 4123-4135.
    Kolesnikov, V., & Schneider, T. (2008). Improved garbled circuit: Free XOR gates and applications. International Colloquium on Automata, Languages, and Programming (pp. 486-498). Heidelberg, Berlin: Springer.
    Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2005). Applied Linear Statistical Models (Vol. 5). New York: McGraw-Hill Irwin.
    Lee, J. S., & Jun, S. P. (2020). Privacy-preserving data mining for open government data from heterogeneous sources. Government Information Quarterly, 101544.
    LeFevre, K., DeWitt, D. J., & Ramakrishnan, R. (2006). Mondrian multidimensional k-anonymity. 22nd IEEE International Conference on Data Engineering, 25.
    Li, N., Li, T., & Venkatasubramanian, S. (2007). T-closeness: Privacy beyond k-anonymity and l-diversity. 23rd IEEE International Conference on Data Engineering, 106-115.
    Lin, C. Y. (2016). A reversible data transform algorithm using integer transform for privacy-preserving data mining. Journal of Systems and Software, 117, 104-112.
    Lindell, Y. (2005). Secure multiparty computation for privacy preserving data mining. In Encyclopedia of Data Warehousing and Mining (pp. 1005-1009). IGI Global.
    Manikandan, S. (2010). Data transformation. Journal of Pharmacology and Pharmacotherapeutics, 1(2), 126-127.
    Mendes, R., & Vilela, J. P. (2017). Privacy-preserving data mining: methods, metrics, and applications. IEEE Access, 5, 10562-10582.
    Ni, Z., Shi, Y. Q., Ansari, N., & Su, W. (2006). Reversible data hiding. IEEE Transactions on Circuits and Systems for Video Technology, 16(3), 354-362.
    Oliveira, S. R., & Zaiane, O. R. (2002). Privacy preserving frequent itemset mining. Proceedings of the IEEE International Conference on Privacy, Security, and Data Mining, 14, 43-54.
    Oliveira, S. R., & Zaiane, O. R. (2010). Privacy preserving clustering by data transformation. Journal of Information and Data Management, 1(1), 37-37.
    Osborne, J. (2010). Improving your data transformations: Applying the Box-Cox transformation. Practical Assessment, Research, and Evaluation, 15(1), 12.
    Patro, S., & Sahu, K. K. (2015). Normalization: A preprocessing stage. arXiv preprint arXiv:1503.06462.
    Powers, D. M. (2011). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies, 2, 37-63.
    Rabin, M.O. (1981). How to Exchange Secrets by Oblivious Transfer. Technical Report TR-81, Boston: Aiken Computation Laboratory, Harvard University.
    Samarati, P. (2001). Protecting respondents identities in microdata release. IEEE transactions on Knowledge and Data Engineering, 13(6), 1010-1027.
    Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379-423.
    Tian, J. (2003). Reversible data embedding using a difference expansion. IEEE Transactions on Circuits and Systems for Video Technology, 13(8), 890-896.
    Tran, Hong-Yen., & Hu, Jiankun. (2019). Privacy-preserving big data analytics a comprehensive survey. Journal of Parallel and Distributed Computing, 134, 207-218.
    Tukey, J. W. (1977). Exploratory Data Analysis (Vol. 2), Reading, Mass: Addison-Wesley Publishing Company, 131-160.
    Xue, M., Kalnis, P., & Pung, H. K. (2009). Location diversity: Enhanced privacy protection in location-based services. In International Symposium on Location-and Context-Awareness (pp. 70-87). Heidelberg, Berlin: Springer.
    Yao, A. C. (1982). Protocols for secure computations. 23rd IEEE Annual Symposium on Foundations of Computer Science (SFCS 1982), 160-164.
    方荷雅(2020)。探討線性轉換方法對隱私保護資料探勘流程效用及安全性影響之研究。國立成功大學,台南市。

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE