簡易檢索 / 詳目顯示

研究生: 温偉佳
Wen, Wei-Chia
論文名稱: 適用於社群媒體資料分類之無程式碼平台
A No-code Development Platform for Classifying Social Media Data
指導教授: 鄧維光
Teng, Wei-Guang
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系碩士在職專班
Department of Engineering Science (on the job class)
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 36
中文關鍵詞: 無程式碼平台自動化機器學習主動學習社群媒體短文分類
外文關鍵詞: no-code development platform, AutoML, active learning, social media, short-text classification
相關次數: 點閱:77下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著網路以及智慧型終端設備的普及,人們的生活更加離不開社群媒體,而社群媒體資料等各項與生活相關的議題更加即時、多元,還包含從社會公益到商業利益,致使人們每天多花時間從中找尋感興趣的議題,但社群媒體資料解釋的模糊性致使其無法輕易地被分類、界定。針對其中特定議題來開發應用,往往需要大量的領域知識、且花費大量時間來針對該議題特性的模型、系統來撰寫程式,若要擴展至其他議題,便要重新花費大量成本來開發。本研究針對此問題設計了無程式碼開發平台,結合操作介面及應用程式介面來將訓練模型的過程管道化,透過此架構加上主動學習的最佳化方法,讓使用者僅需專注在其關心的議題,藉由網頁介面標記一些範例訊息後,便可得到成效不錯的資料分類模型,藉此能自動地找出更多使用者感興趣的社群訊息;此外,此一架構延伸應用至其他議題時亦無需更動程式碼,完全交由使用者操作即可建構另一合適的資料分類模型。由研究結果顯示,基於SVM的訓練模型在分類上,能比關鍵詞分類來得效果更好;其次我們實驗中針對主動學習對真實環境資料的實驗結果,主動學習的方法專注在樣本的挑選上,能使用較少的樣本快速接近該模型的飽和曲線,發揮節省樣本成本的效果。

    With the popularization of the Internet and smart terminal equipment, people’s lives are even more inseparable from social media. Social media information on various life-related issues becomes more immediate and diverse, includes everything from social welfare to commercial benefits. People spend more time looking for topics of interest every day. however, the vague interpretation of social media materials makes it difficult to classify. To develop applications for specific topics, it often requires a lot of domain knowledge and time to develop programs for models of the topic. If it is extended to other themes, it will require a lot of time to develop again.
    In this work, a no-code development platform is designed for this problem, and the operation interface and application programming interface are combined to streamline the process of training the model. Using this general framework and the optimization method of active learning, users only need to focus on the topics, and extending to other topics does not need to be re-developed. Our experimental results show that the training model based on SVM outperforms keyword matching in classifying social messages. Moreover, in our experiment, we focus on the selection of samples based on the experimental results of active learning on real environment data. In general, one may use fewer samples to get comparable performance that shows a significant advantage of saving the cost of data labeling.

    第一章 緒論 1 1.1 研究背景 1 1.2 研究動機與目的 1 1.3 研究貢獻 2 第二章 文獻探討 3 2.1 自動化機器學習 3 2.2 最佳化方法 6 2.2.1 主動學習 6 2.2.2 自適應增強 9 2.3 資料儲存方式 12 2.3.1 社群媒體資料 12 2.3.2 資料標籤 12 第三章 研究方法與系統規劃 14 3.1 社群媒體資料應用的潛力及兩難 14 3.2 無程式碼開發平台 16 3.3 系統流程與架構設計 19 3.3.1 角色任務 19 3.3.2 系統流程 19 3.3.3 系統架構 21 第四章 實驗結果與探討 23 4.1 系統實驗環境 23 4.2 實驗規劃 24 4.3 實驗結果與討論 26 4.3.1 比較關鍵詞與SVM之分類效果 26 4.3.2 比較樣本挑選方法之效果 28 4.4 主動學習對真實環境的效果 29 第五章 結論與未來工作 33 參考文獻 34

    [1] Y.-T. Chen, "An Online Supporting Scheme for Monitoring Disaster Events," M.S. thesis, National Cheng Kung University, Taiwan, pp. 1-35, 2017.
    [2] A. Truong, A. Walters, J. Goodsitt, K. Hines, C. B. Bruss, and R. Farivar, "Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools," in 2019 IEEE 31st international conference on tools with artificial intelligence (ICTAI), pp. 1471-1479, 2019.
    [3] I. Guyon et al., "Design of the 2015 Chalearn Automl Challenge," in 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1-8, 2015.
    [4] A. d. Romblay, Automated Machine Learning [Online]. Available: https://www.slideshare.net/AxeldeRomblay?utm_campaign=profiletracking&utm_medium=sssite&utm_source=ssslideview.
    [5] R.-T. Zhao, J. Wang, G.-J. Chen, Q.-W. Li, and Y.-J. Yuan, "A Machine Learning Pipeline Generation Approach for Data Analysis," in 2020 IEEE 6th International Conference on Computer and Communications (ICCC), pp. 1488-1493, 2020.
    [6] T. Nagarajah and G. Poravi, "A Review on Automated Machine Learning (AutoML) Systems," in 2019 IEEE 5th International Conference for Convergence in Technology (I2CT), pp. 1-6, 2019.
    [7] B. Settles, "Active Learning: Synthesis Lectures on Artificial Intelligence and Machine Learning," Long Island, NY: Morgan & Clay Pool, 2012.
    [8] B. Settles, "Active Learning Literature Survey," Computer sciences technical report, 2009.
    [9] K. Konyushkova, R. Sznitman, and P. Fua, "Learning Active Learning from Data," arXiv preprint arXiv:1703.03365, 2017.
    [10] L.-L. Sun and X.-Z. Wang, "A Survey on Active Learning Strategy," in 2010 International Conference on Machine Learning and Cybernetics, vol. 1: IEEE, pp. 161-166, 2010.
    [11] O. Sagi and L. Rokach, "Ensemble Learning: A Survey," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 8, no. 4, pp. e1249 %@ 1942-4787, 2018.
    [12] L. Breiman, "Random forests," Machine learning, vol. 45, no. 1, pp. 5-32, 2001.
    [13] R. E. Schapire, "A Brief Introduction to Boosting," in Ijcai, vol. 99: Citeseer, pp. 1401-1406, 1999.
    [14] Y. Cao, Q.-G. Miao, J.-C. Liu, and L. Gao, "Advance and Prospects of AdaBoost Algorithm," Acta Automatica Sinica, vol. 39, no. 6, pp. 745-758 %@ 1874-1029, 2013.
    [15] Y.-P. Lee, "Facilitating Social Media Retrieval with a NoSQL Database," M.S. thesis, National Cheng Kung University, Taiwan, pp. 1-48, 2016.
    [16] S. Gupta and G. Narsimha, "Correlation and Comparison of NoSQL Specimen with Relational Data Store," IJRET: International Journal of Research in Engineering and Technology, vol. 4, pp. 1-5, 2015.
    [17] A. Kanade, A. Gopal, and S. Kanade, "A Study of Normalization and Embedding in MongoDB," in 2014 IEEE International Advance Computing Conference (IACC), pp. 416-421, 2014.
    [18] H. Purohit, G. Dong, V. Shalin, K. Thirunarayan, and A. Sheth, "Intent Classification of Short-Text on Social Media," in 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), pp. 222-228, 2015.
    [19] H.-F. Yu, C.-H. Ho, Y.-C. Juan, and C.-J. Lin, "Libshorttext: A Library for Short-text Classification and Analysis," Rapport interne, Department of Computer Science, National Taiwan University. Software available at http://www. csie. ntu. edu. tw/cjlin/libshorttext, 2013.
    [20] T.-Y. Chang, "Information Filtering and Retrieval of Social Media Messages," M.S. thesis, National Cheng Kung University, Taiwan, pp. 1-33, 2017.
    [21] C.-P. Lin, "Identifying and Aggregating Disaster-related Messages from Social Media Streams," M.S. thesis, National Cheng Kung University, Taiwan, pp. 1-42, 2018.
    [22] M. Fryling, "Low Code App Development," Journal of Computing Sciences in Colleges, vol. 34, no. 6, pp. 119-119, 2019.
    [23] M. K. Pratt, Low-code and No-code Development Platforms [Online]. Available: https://searchsoftwarequality.techtarget.com/definition/low-code-no-code-development-platform.
    [24] G. Standish, "Chaos Report on Software Projects," Project Smart, The Standish Group, USA, 2014.
    [25] E. Sahinaslan, O. Sahinaslan, and M. Sabancıoglu, "Low-code Application Platform in Meeting Increasing Software Demands Quickly: SetXRM," in AIP Conference Proceedings, vol. 2334, no. 1: AIP Publishing LLC, p. 070007, 2021.
    [26] A. Holzinger, "Interactive Machine Learning for Health Informatics: When Do We Need The Human-in-the-loop?," Brain Informatics, vol. 3, no. 2, pp. 119-131, 2016.
    [27] J. Sun, Jieba [Online]. Available: https://github.com/fxsjy/jieba.

    下載圖示 校內:2025-01-28公開
    校外:2025-01-28公開
    QR CODE