| 研究生: |
曾子芸 Zeng, Zi-Yun |
|---|---|
| 論文名稱: |
考量隱私之稀疏屬性社群網路特徵表示學習 Privacy-aware Feature Representation Learning in Attributed Social Networks |
| 指導教授: |
李政德
Li, Cheng-Te |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 統計學系 Department of Statistics |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 中文 |
| 論文頁數: | 71 |
| 中文關鍵詞: | 隱私保護 、稀疏屬性社群網路 、特徵表示學習 |
| 外文關鍵詞: | Privacy Protection, Sparse Attribute Community Network, Feature Representation Learning |
| 相關次數: | 點閱:78 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,保護隱私的話題風起雲湧,隱私保護成為現代人越來越注重的議題;然而社群軟體需要大量的資訊,資訊越多越能準確的分析使用者,但越來越多使用者基於隱私考量,不願意透露更多自己的屬性,或是交友圈等,造成資料的不完整。而目前現有的作法,都是假設資料是完整的情況去進行分析,並沒有考慮隱私保護下資料不完整,甚至稀疏的情況;且在社群網路嵌入特徵向量學習方法 (network embedding)中,多只考慮結構資訊,或是考慮屬性資訊時還需額外資訊,綜合以上問題,本研究提出一將結構與屬性融合之方式(Structure-Attributed Enhanced Matrix, SAEM),克服資料稀疏性問題,並提出四種改良的嵌入學習,其一為在嵌入學習時同時加入屬性與結構資訊做學習(cc-n2v),只加入結構資訊(com-n2v)與只加入屬性資訊(clu-n2v),將com-n2v與clu-n2v結合成cat-n2v。
對於移除50%的隨機移除隱私設定下,本研究提出的SAEM結合cat-n2v平均而言相比node2vec基於網路結構,在社群偵測、節點分類與連結預測分別獲得80%、15%與27%的顯著效果提升,在不同隱私設定下結果也大同小異,尤其在使用者愈注重隱私、網路結構愈稀疏的狀況下,能做出更準確的預測,足以顯示本研究的方法得以在隱私顧慮下提供有效的社群網路應用服務。使用者能夠被允許提供不完整資訊,呼應歐盟於2018年5月頒布的一般資料保護規範 (General Data Protection Regulation, GDPR),只需少量的使用者個資,在有效的特徵表示學習下,依然能提供穩定準確效能的網路推薦系統服務。
其中針對社群偵測,提出Egocentric Partitioning (EgoPart)演算法,適用於偵測重疊社群(overlapped communities),亦可用在偵測非重疊社群(non-overlapped communities),先從各節點角度去偵測鄰近結構分布(local clusters),局部觀察後再結合成整體的社群(global clusters),在偵測重疊社區與非重疊社區時,與現今最最廣為被採用的社群偵測演算法Label Propagation相比,分別獲得32%、42%的顯著效果提升,尤其對於偵測非重疊社區,EgoPart表現優於其他常見方法,顯示本研究提供有效之方法在社群偵測上。
In recent years, privacy protection has become an issue that more people pay attention to. However, community software requires lots of information. More and more users are not willing to disclosure their own attributes, circle of friends, etc., resulting in incomplete information. The current practice assume that the data is complete to analyze, not consider the privacy of the data is incomplete or even sparse happening.
In network embedding feature vector learning method, more only consider structural information, or need additional information. To summarize the above problems, this study proposes a way to integrate structure and attributes (SAEM). To overcome data sparsity problem, and propose four improved embedded learning. Add attribute and structure information to learn while embedding learning (cc-n2v), only add structural information (com-n2v) and only attributes information (clu-n2v), combining com-n2v and clu-n2v into cat-n2v.
For removing 50% of the random privacy settings, the SAEM combined with cat-n2v compared to node2vec based on network structure proposed in this study is 80%, 15%, and 27% in community detection, node classification, and link prediction. Respectively, when users pay more attention to privacy and the sparse network structure, more accurate predictions can be made. It shows that the research method can provide effective social network application services under privacy concerns.
For community detection, Egocentric Partitioning algorithm is proposed to detect overlapping or non-overlapping communities. Observe the structure distribution from each node, and then analyze the whole. EgoPart outperforms other common methods show that it provides an effective method for community detection.
[1] Lada A Adamic and Eytan Adar. Friends and neighbors on the web. Social networks,
25(3):211–230, 2003.
[2] AlbertLászló
Barabási and Réka Albert. Emergence of scaling in random networks.
science, 286(5439):509–512, 1999.
[3] Smriti Bhagat, Graham Cormode, and S Muthukrishnan. Node classification in social
networks. In Social network data analytics, pages 115–148. Springer, 2011.
[4] Smriti Bhagat, Graham Cormode, and Irina Rozenbaum. Applying linkbased
classification
to label blogs. In International Workshop on Social Network Mining and
Analysis, pages 97–117. Springer, 2007.
[5] Ricardo JGB Campello, Davoud Moulavi, and Jörg Sander. Densitybased
clustering
based on hierarchical density estimates. In PacificAsia
conference on knowledge discovery
and data mining, pages 160–172. Springer, 2013.
[6] Aaron Clauset, Mark EJ Newman, and Cristopher Moore. Finding community structure
in very large networks. Physical review E, 70(6):066111, 2004.
[7] Brendan J Frey and Delbert Dueck. Clustering by passing messages between data
points. science, 315(5814):972–976, 2007.
[8] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks.
In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, 2016.
[9] John A Hartigan and Manchek A Wong. Algorithm as 136: A kmeans
clustering algorithm.
Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):100–
108, 1979.
[10] P. Jaccard. Etude comparative de la distribution florale dans une portion des Alpes et
du Jura. Bulletin de la Société vaudoise des sciences naturelles. Impr. Corbaz, 1901.
[11] Leo Katz. A new status index derived from sociometric analysis. Psychometrika,
18(1):39–43, 1953.
[12] Elizabeth A Leicht, Petter Holme, and Mark EJ Newman. Vertex similarity in networks.
Physical Review E, 73(2):026120, 2006.
[13] Lizi Liao, Xiangnan He, Hanwang Zhang, and TatSeng
Chua. Attributed social network
embedding. arXiv preprint arXiv:1705.04969, 2017.
[14] David LibenNowell.
An algorithmic approach to social networks. PhD thesis, Massachusetts
Institute of Technology, 2005.
[15] David LibenNowell
and Jon Kleinberg. The linkprediction
problem for social networks.
Journal of the American society for information science and technology,
58(7):1019–1031, 2007.
[16] Weiping Liu and Linyuan Lü. Link prediction based on local random walk. EPL (Europhysics
Letters), 89(5):58007, 2010.
[17] Linyuan Lü, CiHang
Jin, and Tao Zhou. Similarity index based on local paths for link
prediction of complex networks. Physical Review E, 80(4):046122, 2009.
[18] Qing Lu and Lise Getoor. Linkbased
classification. In Proceedings of the 20th International
Conference on Machine Learning (ICML03),
pages 496–503, 2003.
[19] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using tsne.
Journal of
machine learning research, 9(Nov):2579–2605, 2008.
[20] Sofus A Macskassy and Foster Provost. A simple relational classifier. Technical report,
NEW YORK UNIV NY STERN SCHOOL OF BUSINESS, 2003.
[21] Aaron F McDaid, Derek Greene, and Neil Hurley. Normalized mutual information to
evaluate overlapping community finding algorithms. arXiv preprint arXiv:1110.2515,
2011.
[22] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of
word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[23] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed
representations of words and phrases and their compositionality. In Advances in neural
information processing systems, pages 3111–3119, 2013.
[24] Michael Mitzenmacher. A brief history of generative models for power law and lognormal
distributions. Internet mathematics, 1(2):226–251, 2004.
[25] Jennifer Neville and David Jensen. Iterative classification in relational data. In Proc.
AAAI2000
Workshop on Learning Statistical Models from Relational Data, pages 13–
20, 2000.
[26] Mark EJ Newman. Clustering and preferential attachment in growing networks. Physical
review E, 64(2):025102, 2001.
[27] Mark EJ Newman. Finding community structure in networks using the eigenvectors of
matrices. Physical review E, 74(3):036104, 2006.
[28] Andrew Y Ng, Michael I Jordan, and Yair Weiss. On spectral clustering: Analysis and
an algorithm. In Advances in neural information processing systems, pages 849–856,
2002.
[29] Ming Ouyang, William J Welsh, and Panos Georgopoulos. Gaussian mixture clustering
and imputation of microarray data. Bioinformatics, 20(6):917–923, 2004.
[30] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank
citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.
[31] Karl Pearson. The problem of the random walk. Nature, 72(1867):342, 1905.
[32] Bryan Perozzi, Rami AlRfou,
and Steven Skiena. Deepwalk: Online learning of social
representations. In Proceedings of the 20th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, KDD ’14, pages 701–710, 2014.
[33] Pascal Pons and Matthieu Latapy. Computing communities in large networks using
random walks. In International symposium on computer and information sciences,
pages 284–293. Springer, 2005.
[34] Usha Nandini Raghavan, Réka Albert, and Soundar Kumara. Near linear time algorithm
to detect community structures in largescale
networks. Physical review E,
76(3):036106, 2007.
[35] Gerard Salton and Michael J McGill. Introduction to modern information retrieval.
1986.
[36] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. LINE:
largescale
information network embedding. CoRR, 2015.
[37] Hanghang Tong, Christos Faloutsos, and JiaYu
Pan. Fast random walk with restart and
its applications. In Sixth International Conference on Data Mining (ICDM’06), pages
613–622. IEEE, 2006.
[38] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing,
17(4):395–416, 2007.
[39] Tao Zhou, Linyuan Lü, and YiCheng
Zhang. Predicting missing links via local information.
The European Physical Journal B, 71(4):623–630, 2009.
[40] Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data with
label propagation. Technical report, Citeseer, 2002.