簡易檢索 / 詳目顯示

研究生: 張祖耀
Chang, Tsu-Yao
論文名稱: 社群媒體訊息之資訊過濾與檢索
Information Filtering and Retrieval of Social Media Messages
指導教授: 鄧維光
Teng, Wei-Guang
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 33
中文關鍵詞: 資訊過濾資訊檢索社群媒體短文分類
外文關鍵詞: information filtering, information retrieval, social media, short-text classification
相關次數: 點閱:93下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著 Web 2.0 之觀念愈見普及至眾多的使用者,社群媒體之影響力益發重要,而由使 用者所撰寫的訊息量也不斷地以幾何速率增加,與論壇、評論、手機簡訊和電子郵件 相似,社群媒體上的訊息內容通常較短,往往僅包含數十個字,這些短文之數據結構 單純但卻即時,在許多應用情境中會有高度價值,因此如何提高資訊過濾和檢索的效 能已成為眾多研究工作之重點。與篇幅較長的文字資料相比,在進行短文資料之特徵 選取步驟時,便會遇到一些技術挑戰,諸如:特徵空間稀疏而難以充分利用文字特徵 之間的相關性、不同特徵造成後續分類結果差異很大等。在本研究中,我們提出並實 作了一套完整的系統流程,適切地結合特徵選取、資料分類與分群技術,以實現網路 媒體訊息之資訊過濾與檢索工作。

    With the wide spread of the Web 2.0 concept, the influence of social media is becoming more and more important. In addition, the amount of messages generated by numerous users grows at a geometric rate. These messages usually contain only tens of words that are analogous to forum articles, user reviews, short messages and e-mails. These short-text data are of simple structure but may contain real-time information, indicating their potential values in many practical applications. Consequently, the way of improving the performance of corresponding information filtering and retrieval tasks has become the focus of many research works. Note that conducting the feature selection step on short-text data is more challenging than that on long text data. Possible problems include difficulties to make full use of the correlation between the features due to the sparse feature space and varying impacts of obtaining different classification results based on different features. In this work, we thus propose and implement a general scheme that incorporates feature selection, data classification and clustering techniques. With this carefully devised scheme, it is verified that information filtering and information retrieval tasks can be successfully accomplished.

    Chapter 1 Introduction.......................................1 1.1 Motivation and Overview....................................1 1.2 Contributions of This Work..............................2 Chapter 2 Preliminaries..........................................3 2.1 Extracting Messages from Social Media.....................3 2.1.1 Basics of Social Media.................................3 2.1.2 Characteristics of Short-text from Social Media............4 2.1.3 Challenges of Handling Short-text........................5 2.2 General Flows of Short-text Classification..................6 2.2.1 Preprocessing of Short-text Documents..................7 2.2.2 Short-text Feature Construction and Selection............8 2.2.3 Short-text Classification.................................9 Chapter 3 Proposed Approach for Short-text Filtering and Retrieval.........11 3.1 Data Sources and Preprocessing..........................................11 3.2 Short-text Feature Selection.............................................13 3.3 Proposed Scheme of Information Filtering and Retrieval..................16 3.3.1 Training Data Selection and Process Flow..............................16 3.3.2 Usage of Latent Dirichlet Allocation.................................17 3.3.3 Usage of Support Vector Machines....................................18 Chapter 4 Empirical Studies.....................19 4.1 Experimental Environment....................19 4.2 Experimental Process........................20 4.3 Experimental Results........................21 4.3.1 Datasets..................................21 4.3.2 Classification Using SVM..................22 4.3.3 Clustering Using LDA......................26 Chapter 5 Conclusions and Future Works.......29 Bibliography.................................30

    [1] M. Imran, S. Elbassuoni, C. Castillo, F. Diaz and P. Meier, “Practical extraction of disaster-relevant information from social media,” Proceedings of the 22nd International Conference on World Wide Web, pages 1021-1024, May 2013.
    [2] M. A. Cameron, R. Power, B. Robinson and J. Yin, “Emergency situation awareness from twitter for crisis management,” Proceedings of the 21st International Conference on World Wide Web, pages 695-698, April 2012.
    [3] M. Imran, S. Elbassuoni, C. Castillo, F. Diaz and P. Meier, “Extracting Information Nuggets from Disaster- Related Messages in Social Media,” Proceedings of the ISCRAM Conference, 2013.
    [4] P. Dewan, M. Gupta, K. Goyal, and P. Kumaraguru, “MultiOSN: Realtime Monitoring of RealWorld Events on Multiple Online Social Media,” Proceedings of the 5th IBM Collaborative Academia Research Exchange Workshop, Article No. 6, 2013.
    [5] E. Agichtein, C. Castillo, D. Donato, A. Gionis and G. Mishne, “Finding High-Quality Content in Social Media,” Proceedings of the 2008 International Conference on Web Search and Data Mining, pages 183-194, February 2008.
    [6] H.-F. Yu, C.-H. Ho, Y.-C. Juan and C.-J. Lin, “LibShortText: A Library for Short-text Classification and Analysis,” Technical Report, 2013.
    [7] G. Song, Y. Ye, X. Du, X. Huang, and S. Bie, “Short-Text Classification: A Survey,” Journal of multimedia 9(5): 635-643, May 2014.
    [8] Y. Kim, “Convolutional Neural Networks for Sentence Classification,” arXiv preprint arXiv:1408.5882, 2014.
    [9] J. Y. Lee and F. Dernoncourt, “Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks,” Proceedings of the NAACL Conference, March 2016.
    [10] The Application of Deep Learning in Text Classification,
    https://read01.com/QEJ4xO.html
    [11] M. Chen, X. Jin and D. Shen, “Short-Text Classification Improved by Learning Multi-Granularity Topics Mengen,” Proceedings of the Twenty-Second international joint conference on Artificial Intelligence, 3: 1776-1781, July 2011.
    [12] H. Purohit, G. Dong, V. Shalin, K. Thirunarayan and A. Sheth, “Intent Classification of Short-Text on Social Media,” Proceedings of the 8th IEEE International Conference on Social Computing and Networking, December 2015.
    [13] H. Zhang, G. Zhong, “Improving Short-Text Classification by Learning Vector Representations of Both Words and Hidden Topics,” Journal of Knowledge-Based Systems, 3: 76-86, March 2016.
    [14] M. Imran, C. Castillo, F. Diaz and S. Vieweg, “Processing Social Media Messages in Mass Emergency: A Survey,” Journal of ACM Computing Surveys, 47(4), July 2015.
    [15] S. Hwang and K. Kim, “Message Spreading Model over Online Social Network with Multiple Channels and Multiple Groups,” Proceedings of the 9th International Conference on Internet and Web Applications and Services, 2014.
    [16] Web Crawling, http://www.nowpublishers.com/article/Details/INR-017
    [17] J. Cho, and H. G-M, “The Evolution of the Web and Implications for an Incremental Crawler,” Proceedings of International Conference on Very Large Data Bases, 200-209, September 2000.
    [18] S. S. Dhenakaran and K. T. Sambanthan, “Web Crawler - An Overview,” Journal of Computer Science and Communication, 2(1):265-267, June 2011.
    [19] R. Kosala and H. Blockeel, “Web Mining Research: A survey,” ACM SIGKDD Explorations Newsletter, 2(1):1-15, June 2000.
    [20] D. Shestakov, “Current Challenges in Web Crawling,” Proceedings of International Conference on Web Engineering, 518-521, July 2013.
    [21] A. Marcus, M.S. Bernstein, O. Badar, D.R. Karger, S. Madden and R.C. Miller, “Twitinfo: aggregating and visualizing microblogs for event exploration,” Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2011.
    [22] S. Mazumdar, V. Lanfranchi, N. Ireson and F. Ciravegna, “Visual Analysis of Real-time Social Media for Emergency Response,” Proceedings of the ESWC Conference, 2014.
    [23] A. Tommasel and D. Godoy, “Short-text Feature Construction and Selection in Social Media Data: a Survey,” Journal of Artificial Intelligence Review, pages 1-38, November 2016.
    [24] Z. Liu, W. Yu, W. Chen, S. Wang, and F. Wu, “Short-text Feature Selection for Micro-blog Mining,” Proceedings of the International Conference on Computational Intelligence and Software Engineering (CiSE), December 2010.
    [25] M.-Y. Day and C.-C. Lee, “Deep Learning for Financial Sentiment Analysis on Finance News Providers,” Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), August 2016.
    [26] A. Mahajan and S. S. Roy, “Feature Selection for Short-Text Classification Using Wavelet Packet Transform,” Proceedings of the SIGNLL Conference on Computational Natural Language Learning CoNLL, July 2015.
    [27] G. Forman, “An Extensive Empirical Study of Feature Selection Metrics for Text Classification,” Journal of Machine Learning Research, 3: 1289-1305, March 2003.
    [28] D. M. Blei, A. Y. Ng and M. I. Jordan, “Latent Dirichlet Allocation,” Journal of Machine Learning Research, 3: 993-1022, March 2003.
    [29] P. Soucy and G. W. Mineau, “Beyond TFIDF Weighting for Text Categorization in the Vector Space Model,” Proceedings of the 19th International Joint Conference on Artificial intelligence, pages 1130-1135, July 2005.

    無法下載圖示 校內:立即公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE