簡易檢索 / 詳目顯示

研究生: 曾子芸
Zeng, Zi-Yun
論文名稱: 考量隱私之稀疏屬性社群網路特徵表示學習
Privacy-aware Feature Representation Learning in Attributed Social Networks
指導教授: 李政德
Li, Cheng-Te
學位類別: 碩士
Master
系所名稱: 管理學院 - 統計學系
Department of Statistics
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 71
中文關鍵詞: 隱私保護稀疏屬性社群網路特徵表示學習
外文關鍵詞: Privacy Protection, Sparse Attribute Community Network, Feature Representation Learning
相關次數: 點閱:78下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,保護隱私的話題風起雲湧,隱私保護成為現代人越來越注重的議題;然而社群軟體需要大量的資訊,資訊越多越能準確的分析使用者,但越來越多使用者基於隱私考量,不願意透露更多自己的屬性,或是交友圈等,造成資料的不完整。而目前現有的作法,都是假設資料是完整的情況去進行分析,並沒有考慮隱私保護下資料不完整,甚至稀疏的情況;且在社群網路嵌入特徵向量學習方法 (network embedding)中,多只考慮結構資訊,或是考慮屬性資訊時還需額外資訊,綜合以上問題,本研究提出一將結構與屬性融合之方式(Structure-Attributed Enhanced Matrix, SAEM),克服資料稀疏性問題,並提出四種改良的嵌入學習,其一為在嵌入學習時同時加入屬性與結構資訊做學習(cc-n2v),只加入結構資訊(com-n2v)與只加入屬性資訊(clu-n2v),將com-n2v與clu-n2v結合成cat-n2v。
    對於移除50%的隨機移除隱私設定下,本研究提出的SAEM結合cat-n2v平均而言相比node2vec基於網路結構,在社群偵測、節點分類與連結預測分別獲得80%、15%與27%的顯著效果提升,在不同隱私設定下結果也大同小異,尤其在使用者愈注重隱私、網路結構愈稀疏的狀況下,能做出更準確的預測,足以顯示本研究的方法得以在隱私顧慮下提供有效的社群網路應用服務。使用者能夠被允許提供不完整資訊,呼應歐盟於2018年5月頒布的一般資料保護規範 (General Data Protection Regulation, GDPR),只需少量的使用者個資,在有效的特徵表示學習下,依然能提供穩定準確效能的網路推薦系統服務。
    其中針對社群偵測,提出Egocentric Partitioning (EgoPart)演算法,適用於偵測重疊社群(overlapped communities),亦可用在偵測非重疊社群(non-overlapped communities),先從各節點角度去偵測鄰近結構分布(local clusters),局部觀察後再結合成整體的社群(global clusters),在偵測重疊社區與非重疊社區時,與現今最最廣為被採用的社群偵測演算法Label Propagation相比,分別獲得32%、42%的顯著效果提升,尤其對於偵測非重疊社區,EgoPart表現優於其他常見方法,顯示本研究提供有效之方法在社群偵測上。

    In recent years, privacy protection has become an issue that more people pay attention to. However, community software requires lots of information. More and more users are not willing to disclosure their own attributes, circle of friends, etc., resulting in incomplete information. The current practice assume that the data is complete to analyze, not consider the privacy of the data is incomplete or even sparse happening.
    In network embedding feature vector learning method, more only consider structural information, or need additional information. To summarize the above problems, this study proposes a way to integrate structure and attributes (SAEM). To overcome data sparsity problem, and propose four improved embedded learning. Add attribute and structure information to learn while embedding learning (cc-n2v), only add structural information (com-n2v) and only attributes information (clu-n2v), combining com-n2v and clu-n2v into cat-n2v.
    For removing 50% of the random privacy settings, the SAEM combined with cat-n2v compared to node2vec based on network structure proposed in this study is 80%, 15%, and 27% in community detection, node classification, and link prediction. Respectively, when users pay more attention to privacy and the sparse network structure, more accurate predictions can be made. It shows that the research method can provide effective social network application services under privacy concerns.
    For community detection, Egocentric Partitioning algorithm is proposed to detect overlapping or non-overlapping communities. Observe the structure distribution from each node, and then analyze the whole. EgoPart outperforms other common methods show that it provides an effective method for community detection.

    摘要i 英文延伸摘要ii 誌謝vii Table of Contents / 目錄viii List of Tables / 表格x List of Figures / 圖片xi 第1 章. 緒論1 1.1. 背景. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2. 動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3. 研究問題. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4. 研究挑戰. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5. 方法概述. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.6. 潛在應用. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.7. 論文貢獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 第2 章. 相關研究9 2.1. 特徵表示向量學習. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1. node2vec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2. DeepWalk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.3. Large-scale Information Network Embedding(LINE) . . . . . . . . 13 2.1.4. Attributed Social Network Embedding(SNE) . . . . . . . . . . . . 14 2.2. 社群偵測. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3. 節點分類. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4. 連結預測. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.1. Local Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.2. Global Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4.3. Quasi-Local Approaches . . . . . . . . . . . . . . . . . . . . . . . 20 2.4.4. 總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 第3 章. 研究問題敘述21 3.1. 符號定義. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2. 考量隱私之稀疏屬性社群網路特徵表示學習. . . . . . . . . . . . . . . 22 3.3. 應用驗證. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.1. Task 1: 社群偵測(Community Detection) . . . . . . . . . . . . . . 23 3.3.2. Task 2: 節點分類(Node Classification) . . . . . . . . . . . . . . . 25 3.3.3. Task 3: 連結預測(Link Prediction) . . . . . . . . . . . . . . . . . 25 第4 章. 研究方法27 4.1. 研究架構與方法流程. . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2. Structure-Attribute Enhanced Matrix (SAEM) . . . . . . . . . . . . . . . . 27 4.3. ComClu-informed Embedding Learning . . . . . . . . . . . . . . . . . . . 30 4.3.1. com-n2v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3.2. clu-n2v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3.3. cat-n2v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3.4. cc-n2v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.4. Social Network Analysis Tasks . . . . . . . . . . . . . . . . . . . . . . . . 37 4.4.1. Task 1: Node clustering . . . . . . . . . . . . . . . . . . . . . . . . 37 4.4.2. Task 2: Node classification . . . . . . . . . . . . . . . . . . . . . . 43 4.4.3. Task 3: Link Prediction . . . . . . . . . . . . . . . . . . . . . . . . 43 第5 章. 實驗評估44 5.1. 實驗目的. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2. 資料集與摘要統計量. . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.3. 隱私設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.3.1. 隨機移除. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3.2. 部分介意. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3.3. 完全隱藏. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.4. Task 1: 應用驗證 社群偵測(Community Detection) . . . . . . . . . . . . 50 5.4.1. 實驗設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.4.2. 評估指標. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.4.3. 實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.5. Task 2: 應用驗證 節點分類(Node Classification) . . . . . . . . . . . . . 56 5.5.1. 實驗設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.5.2. 評估指標. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.5.3. 實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.6. Task 3: 應用驗證 連結預測(Link Prediction) . . . . . . . . . . . . . . . . 62 5.6.1. 實驗設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.6.2. 評估指標. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.6.3. 實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 第6 章. 討論66 6.1. SAEM 資料增強. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.2. 連結部分延伸討論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.3. 屬性部分延伸討論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 第7 章. 結論68 References 69

    [1] Lada A Adamic and Eytan Adar. Friends and neighbors on the web. Social networks,
    25(3):211–230, 2003.
    [2] AlbertLászló
    Barabási and Réka Albert. Emergence of scaling in random networks.
    science, 286(5439):509–512, 1999.
    [3] Smriti Bhagat, Graham Cormode, and S Muthukrishnan. Node classification in social
    networks. In Social network data analytics, pages 115–148. Springer, 2011.
    [4] Smriti Bhagat, Graham Cormode, and Irina Rozenbaum. Applying linkbased
    classification
    to label blogs. In International Workshop on Social Network Mining and
    Analysis, pages 97–117. Springer, 2007.
    [5] Ricardo JGB Campello, Davoud Moulavi, and Jörg Sander. Densitybased
    clustering
    based on hierarchical density estimates. In PacificAsia
    conference on knowledge discovery
    and data mining, pages 160–172. Springer, 2013.
    [6] Aaron Clauset, Mark EJ Newman, and Cristopher Moore. Finding community structure
    in very large networks. Physical review E, 70(6):066111, 2004.
    [7] Brendan J Frey and Delbert Dueck. Clustering by passing messages between data
    points. science, 315(5814):972–976, 2007.
    [8] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks.
    In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
    Discovery and Data Mining, 2016.
    [9] John A Hartigan and Manchek A Wong. Algorithm as 136: A kmeans
    clustering algorithm.
    Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):100–
    108, 1979.
    [10] P. Jaccard. Etude comparative de la distribution florale dans une portion des Alpes et
    du Jura. Bulletin de la Société vaudoise des sciences naturelles. Impr. Corbaz, 1901.
    [11] Leo Katz. A new status index derived from sociometric analysis. Psychometrika,
    18(1):39–43, 1953.
    [12] Elizabeth A Leicht, Petter Holme, and Mark EJ Newman. Vertex similarity in networks.
    Physical Review E, 73(2):026120, 2006.
    [13] Lizi Liao, Xiangnan He, Hanwang Zhang, and TatSeng
    Chua. Attributed social network
    embedding. arXiv preprint arXiv:1705.04969, 2017.
    [14] David LibenNowell.
    An algorithmic approach to social networks. PhD thesis, Massachusetts
    Institute of Technology, 2005.
    [15] David LibenNowell
    and Jon Kleinberg. The linkprediction
    problem for social networks.
    Journal of the American society for information science and technology,
    58(7):1019–1031, 2007.
    [16] Weiping Liu and Linyuan Lü. Link prediction based on local random walk. EPL (Europhysics
    Letters), 89(5):58007, 2010.
    [17] Linyuan Lü, CiHang
    Jin, and Tao Zhou. Similarity index based on local paths for link
    prediction of complex networks. Physical Review E, 80(4):046122, 2009.
    [18] Qing Lu and Lise Getoor. Linkbased
    classification. In Proceedings of the 20th International
    Conference on Machine Learning (ICML03),
    pages 496–503, 2003.
    [19] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using tsne.
    Journal of
    machine learning research, 9(Nov):2579–2605, 2008.
    [20] Sofus A Macskassy and Foster Provost. A simple relational classifier. Technical report,
    NEW YORK UNIV NY STERN SCHOOL OF BUSINESS, 2003.
    [21] Aaron F McDaid, Derek Greene, and Neil Hurley. Normalized mutual information to
    evaluate overlapping community finding algorithms. arXiv preprint arXiv:1110.2515,
    2011.
    [22] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of
    word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
    [23] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed
    representations of words and phrases and their compositionality. In Advances in neural
    information processing systems, pages 3111–3119, 2013.
    [24] Michael Mitzenmacher. A brief history of generative models for power law and lognormal
    distributions. Internet mathematics, 1(2):226–251, 2004.
    [25] Jennifer Neville and David Jensen. Iterative classification in relational data. In Proc.
    AAAI2000
    Workshop on Learning Statistical Models from Relational Data, pages 13–
    20, 2000.
    [26] Mark EJ Newman. Clustering and preferential attachment in growing networks. Physical
    review E, 64(2):025102, 2001.
    [27] Mark EJ Newman. Finding community structure in networks using the eigenvectors of
    matrices. Physical review E, 74(3):036104, 2006.
    [28] Andrew Y Ng, Michael I Jordan, and Yair Weiss. On spectral clustering: Analysis and
    an algorithm. In Advances in neural information processing systems, pages 849–856,
    2002.
    [29] Ming Ouyang, William J Welsh, and Panos Georgopoulos. Gaussian mixture clustering
    and imputation of microarray data. Bioinformatics, 20(6):917–923, 2004.
    [30] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank
    citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.
    [31] Karl Pearson. The problem of the random walk. Nature, 72(1867):342, 1905.
    [32] Bryan Perozzi, Rami AlRfou,
    and Steven Skiena. Deepwalk: Online learning of social
    representations. In Proceedings of the 20th ACM SIGKDD International Conference
    on Knowledge Discovery and Data Mining, KDD ’14, pages 701–710, 2014.
    [33] Pascal Pons and Matthieu Latapy. Computing communities in large networks using
    random walks. In International symposium on computer and information sciences,
    pages 284–293. Springer, 2005.
    [34] Usha Nandini Raghavan, Réka Albert, and Soundar Kumara. Near linear time algorithm
    to detect community structures in largescale
    networks. Physical review E,
    76(3):036106, 2007.
    [35] Gerard Salton and Michael J McGill. Introduction to modern information retrieval.
    1986.
    [36] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. LINE:
    largescale
    information network embedding. CoRR, 2015.
    [37] Hanghang Tong, Christos Faloutsos, and JiaYu
    Pan. Fast random walk with restart and
    its applications. In Sixth International Conference on Data Mining (ICDM’06), pages
    613–622. IEEE, 2006.
    [38] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing,
    17(4):395–416, 2007.
    [39] Tao Zhou, Linyuan Lü, and YiCheng
    Zhang. Predicting missing links via local information.
    The European Physical Journal B, 71(4):623–630, 2009.
    [40] Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data with
    label propagation. Technical report, Citeseer, 2002.

    下載圖示 校內:2024-08-26公開
    校外:2024-08-26公開
    QR CODE