簡易檢索 / 詳目顯示

研究生: 李玉瓶
Lee, Yu-Ping
論文名稱: 以非關聯式資料庫協助社群媒體之資訊檢索
Facilitating Social Media Retrieval with a NoSQL Database
指導教授: 鄧維光
Teng, Wei-Guang
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系碩士在職專班
Department of Engineering Science (on the job class)
論文出版年: 2016
畢業學年度: 104
語文別: 中文
論文頁數: 48
中文關鍵詞: 非關聯式資料庫資訊檢索文字探勘事件偵測
外文關鍵詞: NoSQL, information retrieval, text mining, event detection
相關次數: 點閱:137下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 大量社群媒體及網站的發展,使得人們很容易散佈與分享不同型態的資料內容,包含影音、照片及文字訊息等,這些大量且即時的資料,帶動了許多領域新的研究機會。然而,也正由於社群媒體資料具有大數據的大量、快速及多變特性,如何由眾多不同的來源擷取關鍵有用的資料,並有效率地加以存放、檢索與分析,便成為一件極具挑戰性的任務。傳統關聯式資料庫有嚴格的綱要限制,因此在面臨大數據的處理與分析上,再也無法像過去數十年那樣獨佔優勢;於是,另一種無固定綱要的非關聯式資料庫便應運而生,成為新一代的資料儲存方式。此外,因全球各地災害頻傳,不僅造成生命、財產重大損失,也嚴重影響社會與經濟發展,許多先進國家政府已積極投入災害防治與管理的政策推動,加強防災教育訓練,以提昇民眾防災意識,我們在本研究中所構想的應用,是希望利用社群媒體中諸多使用者所主動提供的訊息來進行災難事件偵測,並提供早期告警機制,以協助政府及民間救災相關單位進行災難救援管理。為了進行實驗評估,我們設計並建置了一個原型系統,其後端資料庫除了使用傳統關聯式資料庫的MySQL外,也採用了非關聯式資料庫的MongoDB,相較而言,後端採用非關聯式資料庫時的資料儲存與檢索效率均遠勝於傳統關聯式資料庫。

    With the rapid growth of social media, it is easy to spread and share information in various different types, including text messages, photos, video clips, and so on. These large amounts of real-time data bring new opportunities in many research fields. Nevertheless, in view of the 3Vs characteristics (i.e., increasing volume, velocity and variety) of social media data, it is undoubtedly a challenging task to gather data from different data sources, to store them and to generate meaningful analysis results. Consequently, a relational database cannot be dominative the same as in the past decades when handling big data, because of its inherently strict schema. On the other hand, a non-relational, schema-free NoSQL database has become an important data storage tool. In this work, we propose to realize an application of disaster event detection from social media. Note that disasters occur frequently around the world and may cause significant losses of property and life. Many countries have been actively promoting policies for disaster prevention, management, education and training in order to enhance public awareness. The aim of this work is to take advantage of social media that numerous users are willing to offer what they see immediately. Hopefully, disaster events can be quickly detected so as to provide people early warnings. In our empirical studies, both the relational database MySQL and the NoSQL database MongoDB are used in our prototype system. Experimental results show that when using the NoSQL database, the system performance is significantly improved.

    第一章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的與貢獻 3 第二章 文獻探討 4 2.1 NoSQL資料庫簡介 4 2.1.1 NoSQL資料庫的興起與特色 4 2.1.2 常見的NoSQL資料庫類別 5 2.1.3 NoSQL與SQL資料庫差異 7 2.1.4 MongoDB資料庫的架構與功能 9 2.2 以社群媒體資料作文字探勘 11 2.2.1 文字探勘之架構與應用 12 2.2.2 全文檢索功能與目的 13 2.2.3 MongoDB資料庫的檢索功能 13 2.3 以社群媒體資料作主題偵測與追蹤 15 2.3.1 傳統事件偵測與追蹤 15 2.3.2 主題偵測與追蹤之架構及任務 16 第三章 研究方法與系統規劃 18 3.1 系統架構與流程 18 3.2 資料擷取與儲存模型 19 3.3 中文斷詞及關鍵字識別模型 22 3.4 災難事件偵測與追蹤模型 26 第四章 實驗探討與結果 29 4.1 實驗環境 29 4.2 資料庫儲存設計 29 4.3 資料儲存之效能評估 32 4.4 資料搜尋之效能評估 37 4.5 實驗結果討論 40 第五章 結論與未來工作 42 參考文獻 44

    [1] A. B. Mathew and S. D. Madhu Kumar, “Analysis of Data Management and Query Handling in Social Networks Using NoSQL Databases,” Proceedings of the International Conference on Advances in Computing, Communications and Informatics, pp. 800-806, 2015.
    [2] A. Nayak, A. Poriya and D. Poojary, “Type of NOSQL Databases and Its Comparison with Relational Databases,” International Journal of Applied Information Systems, 5(4):16-19, March 2013.
    [3] A. Trivedi, Full-Text Search in MongoDB, http://code.tutsplus.com/tutorials/full-text- search-in-mongodb--cms-24835, Accessed 16 November 2015.
    [4] C. Győrödi, R. Győrödi, G. Pecherle and A. Olah, “A Comparative Study: MongoDB vs. MySQL,” Proceedings of the 13th International Conference on Engineering of Modern Electric Systems, pp. 1-6, 2015.
    [5] C. O. Truica, F. Radulescu, A. Boicea, and I. Bucur, “Performance Evaluation for CRUD Operations in Asynchronously Replicated Document Oriented Database,” Proceedings of the 20th International Conference on Control Systems and Computer Science, pp. 191-196, 2015.
    [6] D. R. Merlin Shalini and S. Dhamodharan, “Performance and Scaling Comparison Study of RDBMS and NoSQL (MongoDB),” COMPUSOFT: International Journal of Advanced Computer Technology, 3(11): 1270-1275, November 2014.
    [7] F. Eckerstorfer, “Performance of NoSQL Databases,” November 2011.
    [8] Q. Huang and G. Cervone, “Usage of Social Media and Cloud Computing During Natural Hazards,” Cloud Computing in Ocean and Atmospheric Sciences, Academic Press, pp. 297-324, 2016.
    [9] J. M. Patel, “Operational NoSQL Systems: What's New and What's Next?,” Computer, 49(4): 23-30, April 2016.
    [10] K. Rakesh, C. Shilpi, and B. Somya, “Effective Way to Handling Big Data Problems Using NoSQL Database (MongoDB),” Journal of Advanced Database Management & Systems, 2(2): 42–48, 2015.
    [11] M. Qi, “Digital Forensics and NoSQL Databases,” Proceedings of the 11th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 734-739, 2014.
    [12] M. A. Mohamed, O. G. Altrafi, and M. O. Ismail, “Relational vs. NoSQL Databases: A Survey,” International Journal of Computer and Information Technology, ISSN: 2279- 0764, 3(3) , May 2014.
    [13] MongoDB, MongoDB Manual 3.2, https://docs.mongodb.com/manual/, Accessed 08 May 2016.
    [14] S. Chickerur, A. Goudar, and A. Kinnerkar, “Comparison of Relational Database with Document-Oriented Database (MongoDB) for Big Data Applications,” Proceedings of the 8th International Conference on Advanced Software Engineering and Its Applications, pp. 41-47, 2015.
    [15] S. Schmid, E. Galicz, and W. Reinhardt. “Performance Investigation of Selected SQL and NoSQL Databases,” https://agile-online.org/, Accessed 22 June 2016.
    [16] V. Sharma and M. Dave, “SQL and NoSQL Databases,” International Journal of Advanced Research in Computer Science and Software Engineering, 2(8), August 2012.
    [17] V. Maplecroft, “Natural Hazards Risk Atlas 2015,” https://www.maplecroft.com/ portfolio/new-analysis/2015/03/04/56-100-cities-most-exposed-natural-hazards-found-key-economies-philippines-japan-china-bangladesh-verisk-maplecroft/, Accessed 16 Nov 2015.
    [18] Y. Gu, X. Wang, S. Shen, J. Wang, and J. U. Kim, “Analysis of Data Storage Mechanism in NoSQL Database MongoDB,” Proceedings of the IEEE International Conference on Consumer Electronics, pp. 70-71, 2015.
    [19] Y. Li and S. Manoharan, “A Performance Comparison of SQL and NoSQL Databases,” Proceedings of the IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 15-19, 2013.
    [20] MongoDB, Wikipedia, https://zh.wikipedia.org/wiki/MongoDB.
    [21] DB-Engines Ranking, DB-Engines, http://db-engines.com/en/ranking.
    [22] M. Sukanya and S. Biruntha, “Techniques on Text Mining,” Proceedings of the IEEE International Conference on Advanced Communication Control and Computing Technologies, pp. 269-271, 2012.
    [23] Y. Zhang, M. Chen, and L. Liu, “A Review on Text Mining,” Proceedings of the 6th IEEE International Conference on Software Engineering and Service Science, pp. 681-685, 2015.
    [24] W. Dou, K. Wang, W. Ribarsky, and M. Zhou, “Event Detection in Social Media Data,” Proceedings of the IEEE VisWeek Workshop on Interactive Visual Text Analytics, pp. 971-980, October 2012.
    [25] A. Nurwidyantoro and E. Winarko, “Event Detection in Social Media: A Survey,” Proceedings of the IEEE International Conference on ICT for Smart Society, pp. 1-5, June 2013.
    [26] A. Kaplan and M. Haenlain, “Users of the World, Unite! The Challenges and Opportunities of Social Media,” Business Horizons, 53(1), pp. 59-68, January 2010.
    [27] T. Sakaki, M. Okazaki, and Y. Matsuo, “Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors,” Proceedings of the 19th International Conference on World Wide Web, 2010.
    [28] N. Adam, J. Eledath, S. Mehrotra, and N. Venkatasubramanian, “Social Media Alert and Response to Threats to Citizens,” Proceedings of the 8th International Conference on Collaborative Computing, Networking, Applications and Worksharing, 2012.
    [29] R. Kosala, E. Adi, and Steven, “Harvesting Real Time Traffic Information from Twitter,” Proceedings of the International Conference on Advances Science and Contemporary Engineering, 2012.
    [30] H. Achrekar, A. Gandhe, R. Lazarus, S. H. Yu, and B. Liu, “Predicting Flu Trends using Twitter Data,” Proceedings of the IEEE Conference on Computer Communications Workshops, 2011.
    [31] J. Ritterman, M. Osborne, and E. Klein, “Using Prediction Markets and Twitter to Predict a Swine Flu Pandemic,” Proceedings of the 1st International Workshop on Mining Social Media, 2009.
    [32] S. Ishikawa, Y. Arakawa, S. Tagashira, and A. Fukuda, “Hot Topic Detection in Local Areas,” Proceedings of ARCS Workshop, 2012.
    [33] Y. Zhao and J. Xu, “A Novel Method of Topic Detection and Tracking for BBS,” Proceedings of the IEEE 3rd International Conference on Communication Software and Networks, pp. 453-457, 2011.
    [34] X. Dai and Y. Sun, “Event Identification within News Topics,” Proceedings of the International Conference on Intelligent Computing and Integrated Systems, pp. 498-502, 2010.
    [35] TDT, 智庫百科, http://wiki.mbalib.com/zh-tw/, Accessed 06 June 2016.
    [36] 陸嘉恒, “挑戰大數據, Facebook、Google、Amazon怎麼處理Big Data?用NoSQL搞定每年100億顆硬碟資料,” 佳魁資訊, April 2015.
    [37] 張錦堂, “全文檢索伺服器Solr初探,” http://newsletter.ascc.sinica.edu.tw/news/ read_news.php?nid=2288.
    [38] 黃純敏, 陳聰宜, 詹雅筑, “新聞事件偵測與追蹤之分群分類演算法研究,” 資訊科技國際期刊, 8(1): 70-78, 2014.
    [39] Fukuball, “Jieba結巴中文斷詞,” https://speakerdeck.com/fukuball/jieba-jie-ba-zhong- -wen-duan-ci, Accessed 30 June 2016.
    [40] 道格, “NoSQL憑什麼成功?網路資訊雜誌,” http://news.networkmagazine.com.tw/ magazine/2013/12/11/60643/, Accessed 30 June 2016.

    無法下載圖示 校內:2021-12-31公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE