簡易檢索 / 詳目顯示

研究生: 陳欣瑜
Chen, Hsin-Yu
論文名稱: 基於特徵表示學習之異質性社群媒體項目興衰預測
Learning Feature Representation to Forecast the Rise and Fall of Heterogeneous Items in Social Media
指導教授: 李政德
Li, Cheng-Te
學位類別: 碩士
Master
系所名稱: 管理學院 - 統計學系
Department of Statistics
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 49
中文關鍵詞: 社群網路分析特徵表示學習倒閉風險評估自激點過程
外文關鍵詞: social network, feature learning, shutdown risk assessment, self-exciting point process
相關次數: 點閱:134下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著手機定位科技的完善以及社群媒體的興盛發展,基於地點位置的線上社交網路服務越來越多元且越來越受歡迎,諸多服務皆允許用戶分享生活經驗與位置訊息。而經由這些服務所記錄的資料量非常龐大,但若妥善運用其效益也相當可觀,如根據歷史的打卡記錄為用戶推薦可能感興趣的地點、根據歷史貼文的關注數量排序社群媒體網站的內容等。本研究擬提出一套研究方法預測地點的倒閉風險以及預測貼文的熱門程度。然而針對相關問題過去的研究多以擷取社群媒體事件相關的特徵,如用戶行為特徵、地理位置特徵等,需昂貴又費時的特徵標記工程,且其方法對特徵品質有一定的要求才能擁有好的預測表現,而本研究提出利用社群媒體事件時間序列建構圖形並學習特徵表示向量再結合容易擷取的特徵,此作法避免擷取特徵時需耗費的時間與金錢成本以及分散擷取之特徵品質良莠不齊的風險。此外,我們透過貼文主題分析,延伸開發出一個人化自激點過程時間序列預測模型,可有效與學習而得之特徵表示向量結合,用以預測未來時間序列。本研究以透過約 1,900萬筆 Instagram 打卡數據與約 52,000 筆 Foursquare 有標記之地點,以及約 166,000筆Twitter 貼文分享數據進行系統性實驗評估,結果顯示相比於傳統特徵擷取,我們藉由特徵表示學習所產生之特徵向量,在地點倒閉風險評估與貼文熱門程度預測上,皆有非常顯著之準確率提升,不同設定下準確率與誤差分別至少提升 10% 與降低10%。本研究具體貢獻有三,(1) 我們所提出的特徵表示學習方法可非常有效地將時間序列轉為特徵向量,可廣泛應用於時間序列資料之分類問題上;(2) 我們的預測方法可準確預測出在未來倒閉風險較高的地點,提供商家作為行銷規劃之評估,政府單位亦可監控各地區經濟消費狀況;(3) 我們的預測方法可準確預測社群媒體上貼文熱門程度的未來趨勢,可應用於社群網路病毒式行銷。

    With the development and maturity of mobile sensing and social media, location-based social services become popular and diverse. Users are allowed to share their life experiences with GPS techniques so that geo-social data (e.g. check-ins and geo-tagged photos) get accumulated rapidly. Applications based on geo-social data include recommending points of interest and ranking popular social media items such as posts and videos. In this work, we propose a novel general-purpose predictive analysis methodology to predict not only the shutdown risk of given venues but also the popularity of posts. While the past studies relied on expensive and time-consuming feature engineering, we propose to automatically learn the feature representation (i.e., embedding vectors) for items. The basic idea is to exploit time series of check-in and retweet events to construct the graphs representing the similarity-based relationships between items, so that graph embedding technique such as node2vec can be applied to derive the learned features. By combining the extracted and learned features, along with employing large-scale Instagram, Twitter, and Foursquare datasets for experiments, the prediction performance of venue shutdown and post popularity can be significantly boosted under lower-cost feature engineering. In addition, through analyzing the topics of posts, we devise a novel personalized self-exciting point process (PSEISMIC) model, which can be effectively combined with the feature representation vectors to further boost the prediction accuracy of future time series. In short, this work delivers three-fold contribution. First, the proposed feature learning method can effectively transform training time series into embedding vectors, which can be widely applied to various prediction tasks of time series data. Second, our method can accurately predict the high-risk venues, which can not only provide insights for their marketing planning, but also display economic consumption phenomenon of each geographical regions for better urban management of government agency. Third, our method can accurately forecast the popularity of online posts so that social viral marketing can become more precise.

    摘要i 英文延伸摘要ii 誌謝vii 目錄viii 表目錄x 圖目錄xi 第一章. 緒論1 1.1. 背景. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2. 動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3. 研究問題. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4. 潛在應用. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.5. 研究面臨的挑戰. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.6. 論文貢獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 第二章. 相關研究6 第三章. 研究方法10 3.1. 研究方法流程與精神. . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2. 特徵工程之基於圖形的特徵表示學習. . . . . . . . . . . . . . . . . . . 12 3.3. Task 1 地點倒閉風險預測. . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3.1. 蒐集地點打卡時間序列. . . . . . . . . . . . . . . . . . . . . . . 15 3.3.2. 圖形建構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3.3. 特徵工程. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3.4. 預測模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4. Task 2 貼文熱門程度預測. . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4.1. 蒐集貼文轉發時間序列. . . . . . . . . . . . . . . . . . . . . . . 20 3.4.2. 圖形建構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4.3. 特徵工程. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4.4. 預測模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 第四章. 實驗評估28 4.1. 實驗目的. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2. 資料集與摘要統計量. . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2.1. Instagram Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2.2. Foursquare Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.2.3. Tweet Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3. Task 1 地點倒閉風險預測. . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.3.1. 實驗設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.3.2. 評估指標. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3.3. 實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.4. Task 2 貼文熱門程度預測. . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.4.1. 實驗設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.4.2. 評估指標. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.4.3. 實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 第五章. 結論47 參考文獻48

    [1] Deepak Agarwal, Bee-Chung Chen, and Pradheep Elango. Spatio-temporal models for estimating click-through rate. In Proceedings of the 18th International Conference on World Wide Web, WWW ’09, pages 21–30, 2009.
    [2] Sitaram Asur, Bernardo A. Huberman, Gábor Szabó, and Chunyan Wang. Trends in social media: Persistence and decay. In Proceedings of the Fifth International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain, July 17-21, 2011, 2011.
    [3] Eytan Bakshy, Jake M Hofman, Winter A Mason, and Duncan J Watts. Everyone’s an influencer: quantifying influence on twitter. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 65–74. ACM, 2011.
    [4] Peng Bao, Hua-Wei Shen, Junming Huang, and Xue-Qi Cheng. Popularity prediction in microblogging network: A case study on sina weibo. In Proceedings of the 22Nd International Conference on World Wide Web, WWW ’13 Companion, pages 177–178, 2013.
    [5] Albert-Laszlo Barabasi. The origin of bursts and heavy tails in human dynamics. Nature, 435(7039):207, 2005.
    [6] Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10):P10008, 2008.
    [7] Hongbo Chen, Zhiming Chen, Mohammad Shamsul Arefin, and Yasuhiko Morimoto. Place recommendation from check-in spots on location-based online social networks. In Networking and Computing (ICNC), 2012 Third International Conference on, pages 143–148. IEEE, 2012.
    [8] Justin Cheng, Lada Adamic, P. Alex Dow, Jon Michael Kleinberg, and Jure Leskovec. Can cascades be predicted? In Proceedings of the 23rd International Conference on World Wide Web, WWW ’14, pages 925–936, 2014.
    [9] Hadi Daneshmand, Manuel Gomez-Rodriguez, Le Song, and Bernhard Schoelkopf. Estimating diffusion network structures: Recovery conditions, sample complexity & soft-thresholding algorithm. In International Conference on Machine Learning, pages 793–801, 2014.
    [10] Nan Du, Le Song, Ming Yuan, and Alex J Smola. Learning networks of heterogeneous influence. In Advances in Neural Information Processing Systems, pages 2780–2788, 2012.
    [11] Joao Bártolo Gomes, Clifton Phua, and Shonali Krishnaswamy. Where will you go? mobile data mining for next place prediction. In International Conference on Data
    Warehousing and Knowledge Discovery, pages 146–158. Springer, 2013.
    [12] Manuel Gomez-Rodriguez, Jure Leskovec, and Bernhard Schölkopf. Modeling information propagation with survival theory. In International Conference on Machine Learning, pages 666–674, 2013.
    [13] Aditya Grover and Jure Leskovec. Node2vec: Scalable feature learning for networks. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 855–864, 2016.
    [14] Desislava Hristova, Matthew J. Williams, Mirco Musolesi, Pietro Panzarasa, and Cecilia Mascolo. Measuring urban social diversity using interconnected geo-social net-
    works. In Proceedings of the 25th International Conference on World Wide Web, WWW’16, pages 21–30, 2016.
    [15] Lenka Kovalcinova and Martin Polacek. Yelp Data-set Challenge Part 6: Predicting Whether Business is Open or Closed and Suggesting the Good Business Practices.
    "https://rpubs.com/Kvitnuca_Zahradka/139697", 2015.
    [16] Anastasios Noulas, Salvatore Scellato, Neal Lathia, and Cecilia Mascolo. Mining user mobility features for next place prediction in location-based services. In Data mining
    (ICDM), 2012 IEEE 12th international conference on, pages 1038–1043. IEEE, 2012.
    [17] Manuel Gomez Rodriguez, Jure Leskovec, David Balduzzi, and Bernhard Schölkopf. Uncovering the structure and temporal dynamics of information propagation. Network
    Science, 2(1):26–65, 2014.
    [18] Benjamin Shulman, Amit Sharma, and Dan Cosley. Predictability of popularity: Gaps between prediction and understanding. In Proceedings of the Tenth International Conference on Web and Social Media, Cologne, Germany, May 17-20, 2016., pages 348–357, 2016.
    [19] Karthik Subbian, B. Aditya Prakash, and Lada Adamic. Detecting large reshare cascades in social networks. In Proceedings of the 26th International Conference on World
    Wide Web, WWW ’17, pages 597–605, 2017.
    [20] Tauhid R Zaman, Ralf Herbrich, Jurgen Van Gael, and David Stern. Predicting information spreading in twitter. In Workshop on computational social science and the wisdom
    of crowds, nips, volume 104, pages 17599–601. Citeseer, 2010.
    [21] Yingjie Zhang, Beibei Li, and Jason Hong. Understanding user economic behavior in the city using large-scale geotagged and crowdsourced data. In Proceedings of the 25th International Conference on World Wide Web, WWW ’16, pages 205–214, 2016.
    [22] Qingyuan Zhao, Murat A. Erdogdu, Hera Y. He, Anand Rajaraman, and Jure Leskovec. SEISMIC: A self-exciting point process model for predicting tweet popularity. In Pro-
    ceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10-13, 2015, pages 1513–1522, 2015.
    [23] Ke Zhou, Hongyuan Zha, and Le Song. Learning social infectivity in sparse low-rank networks using multi-dimensional hawkes processes. In Artificial Intelligence
    and Statistics, pages 641–649, 2013.

    下載圖示 校內:2023-07-01公開
    校外:2023-07-01公開
    QR CODE