| 研究生: |
洪煒倫 Hong, Wei-Lun |
|---|---|
| 論文名稱: |
使用者可自訂興趣主題的無程式碼資料分類平台 A No-code Data Classification Platform for User-Specified Topics |
| 指導教授: |
鄧維光
Teng, Wei-Guang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 英文 |
| 論文頁數: | 48 |
| 中文關鍵詞: | 資料分類 、無程式碼開發平台 、主動學習 、社群媒體 |
| 外文關鍵詞: | data classification, no-code development platform, active learning, social media |
| 相關次數: | 點閱:101 下載:13 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著科技進步,社群媒體變成人們分享生活與獲得感興趣訊息的重要媒介,而通常人們接收到的訊息蘊含各式各樣的議題,這些大量的訊息通常是雜亂無章的,人們很難從中直接獲得有價值的資訊,若要最大程度地挖掘這些資訊的價值,便會尋求機器的協助,此時機器學習即是其中一種常見手段,其被廣泛應用於資料分類任務中。在一般機器學習的開發流程中,需自行標記資料及撰寫程式,往往曠時費力,為求降低標記成本且非專業人士也能夠建立模型以進行分類任務,我們設計一無程式碼開發平台,以社群媒體中的訊息作為資料來源,減少使用者收集與過濾資料的功夫,結合主動學習方法,降低標記資料的成本,並採取自動化的流程,幫助使用者在不需撰寫程式的情況下完成模型訓練,最終讓使用者能夠根據自己的需求快速地進行分類任務。因此在實務上的挑戰即包括如何讓使用者自訂興趣主題並快速篩選出感興趣的社群訊息及如何讓使用者快速調整目標或做不同應用。在平台完成後,我們進行實驗以驗證主動學習方法的效果,並透過不同的分類主題任務以測試平台的實用性。
With the progress of technology, social media has become an important medium for people to share their daily life and obtain information of interest. In general, the information people receive contains a variety of themes, these large amounts of information are typically disorganized, and it is difficult for people to obtain valuable information from them directly. Therefore, when people want to maximize the value of these information, they may seek the assistance of machines. At this time, machine learning (ML) is one of the standard methods, which is widely used in data classification tasks. Generally, the development of a machine learning process requires data labeling and programming, which is often time-consuming and laborious. To help the non-professionals build models with only a few efforts of data labeling for conducting classification tasks, we design a no-code development platform (NCDP) that uses messages which are crawled from social media as the data source to reduce the effort of users to collect and filter data, combines the active learning method to lower the cost of users to create datasets and adopts an automatic process that helps users complete model training without the necessity of writing programs. Finally, users can quickly conduct classification tasks. Therefore, practical challenges include how users can get social messages of their interests according to specific topics and how users can fine-tune their targets and conduct different applications. After the system is completed, we verify the effect of the active learning method through experiments and test the practicability of the platform through different classification topic tasks.
[1] L. Zheng, C. Shen, L. Tang, C. Zeng, T. Li, S. Luis, and S.-C. Chen, “Data Mining Meets the Needs of Disaster Information Management,” IEEE Transactions on Human-Machine Systems, 43(5):451-464, September 2013.
[2] P. K. Jayasekara, “Role of Facebook as a Disaster Communication Media,” International Journal of Emergency Services, 8(2):191-204, August 2019.
[3] Z. Wang and X. Ye, “Social Media Analytics for Natural Disaster Management,” International Journal of Geographical Information Science, 32(1):49-72, 2018.
[4] R. Y. Choi, A. S. Coyner, J. Kalpathy-Cramer, M. F. Chiang, J. P. Campbell, "Introduction to Machine Learning, Neural Networks, and Deep Learning, " Translational Vision Science & Technology 9(2):14 (DOI:10.1167/tvst.9.2.14), 2020
[5] A. A. Soofi, A. Awan, "Classification Techniques in Machine Learning: Applications and Issues," Journal of Basic and Applied Sciences, 13: 459-465 (DOI: 10.6000/1927-5129.2017.13.76), August 29, 2017.
[6] M. Sokolova, G. Lapalme, "A Systematic Analysis of Performance Measures for Classification Tasks," Information Processing & Management, 45(4): 427-437 (DOI: 10.1016/j.ipm.2009.03.002), July 1, 2009.
[7] D. Gibert, C. Mateu, and J. Planes, "The Rise of Machine Learning for Detection and Classification of Malware: Research Developments, Trends and Challenges," Journal of Network and Computer Applications, 153: 102526 (DOI: 10.1016/j.jnca.2019.102526), January 2, 2020.
[8] MonkeyLearn Inc., "Text Classification with Machine Learning & NLP," 2021 [Online] Available: https://monkeylearn.com/text-classification/
[9] "Top 8 Challenges for Machine Learning Practitioners" 2021 [Online] Available: https://towardsdatascience.com/top-8-challenges-for-machine-learning-practitioners-c4c0130701a1
[10] C.-P. Lin, "Identifying and Aggregating Disaster-related Messages from Social Media Streams," M.S. thesis, National Cheng Kung University, Taiwan, July, 2018.
[11] M. Woo, "The Rise of No/Low Code Software Development-No Experience Needed?," Engineering (Beijing), 6(9): 960-961 (DOI: 10.1016/j.eng.2020.07.007), July 15, 2020.
[12] C. Y. Hyun, "Design and Implementation of A Low-Code/No-Code System," International Journal of Advanced Smart Convergence, 8(4): 188–193 (DOI: 10.7236/IJASC.2019.8.4.188), December 31, 2019.
[13] J. Waring, C. Lindvall, and R. Umeton, "Automated Machine Learning: Review of the State-of-the-Art and Opportunities for Healthcare," Artificial Intelligence in Medicine, 104 (DOI:104. 101822. 10.1016/j.artmed), April, 2020.
[14] X. He, K. -Y. Zhao, and X. -W. Chu, "AutoML: A Survey of the State-of-the-art," Knowledge-Based Systems, 212: 106622 (DOI: 10.1016/j.knosys.2020.106622), January 5, 2021.
[15] A. Truong, A. Walters, J. Goodsitt, K. Hines, C. B. Bruss, and R. Farivar, "Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools," 2019 IEEE 31st international conference on tools with artificial intelligence (ICTAI), pages 1471-1479, Portland, Oregon, USA, November 4-6, 2019.
[16] T. Lee, J. Johnson, and S. Cheng, "An Interactive Machine Learning Framework," arXiv preprint arXiv:1610.05463, October 16, 2016.
[17] L. Jiang, S.-X. Liu, and C.-J. Chen, "Recent Research Advances on Interactive Machine Learning," Journal of Visualization, 22(2): 401–417 (DOI: 10.1007/s12650-018-0531-1), April 2019.
[18] J. A. Fails, D. R. Olsen, "Interactive Machine Learning", In Proceedings of the 8th international conference on Intelligent user interface, Association for Computing Machinery, pages 39-45, New York, USA, January 12, 2003.
[19] E. Corbett, N. Saul, and M. Pirrung, "Interactive Machine Learning Heuristics," Proceedings of the Machine Learning from User Interaction for Visualization and Analytics Workshop at IEEE VIS, Berlin, Germany, October 22, 2018.
[20] J. J. Dudley, P. O. Kristensson, "A Review of User Interface Design for Interactive Machine Learning," ACM Transactions on Interactive Intelligent Systems (TiiS), 8: 1-37 (DOI: 10.1145/3185517), June 13, 2018.
[21] B. Settles "Active Learning Literature Survey, " Computer Sciences Technical Report, 1648, University of Wisconsin--Madison, 2009
[22] R. Hu, B. M. Namee, and S. J. Delany, "Sweetening the Dataset: Using Active Learning to Label Unlabelled Datasets," Proceedings of the 19th. Irish Conference on Artificial Intelligence and Cognitive Science (AICS’08), Cork, Ireland. 8:53-62 (DOI:10.21427/9w8z-hc83), August, 2008
[23] H. Dave , "Active Learning Sampling Strategies", 2021 [Online] Available: https://medium.com/@hardik.dave/active-learning-sampling-strategies-f8d8ac7037c8
[24] Rens, "Active Learning Explained," 2022 [Online] Available: https://asreview.nl/blog/active-learning-explained/
[25] Robert (Munro) Monarch, "Human-in-the-Loop Machine Learning Active learning and annotation for human-centered AI", V09, Manning Publications Co, 2021
[26] C.-H. Ho, M.-H. Tsai, and C.-J. L, "Active Learning and Experimental Design with SVMs," Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, JMLR Workshop and Conference Proceedings, 16: 71-84, Sardinia, Italy, May 13-15, 2010.
[27] J. Krause, A. Dasgupta, J. Swartz, Y. Aphinyanaphongs, and E. Bertini, "A Workflow for Visual Diagnostics of Binary Classifiers Using Instance-level Explanations," IEEE Conference on Visual Analytics Science and Technology (VAST), pages 162-177, Phoenix, Arizona, USA, October 25-30, 2017.
[28] J. Zhu, H. Wang, B. K. Tsou and M. Ma, "Active Learning With Sampling by Uncertainty and Density for Data Annotations," IEEE Transactions on Audio, Speech, and Language Processing, 18(6): 1323-1331 (DOI: 10.1109/TASL.2009.2033421), August 2010.
[29] M. Bloodgood, "Support Vector Machine Active Learning Algorithms with Query-by-committee versus Closest-to-hyperplane Selection," 2018 IEEE 12th International Conference on Semantic Computing, pages 148-155, Laguna Hills, CA, USA, January 31- February 2, 2018.
[30] H. F. Tu, C. H. Ho, Y. C. Juan, and C. J. Lin, "Libshorttext: A Library for Short-text Classification and Analysis," Department of Computer Science, National Taiwan University, Taipei, Taiwan, 2013.
[31] T. D. Salma, G. A. P. Saptawati, and Y. Rusmawati, "Text Classification Using XLNet with Infomap Automatic Labeling Process, " 2021 IEEE 8th International Conference on Advanced Informatics: Concepts, Theory and Applications, pages 1-6, (Doi: 10.1109/ICAICTA53211.2021.9640255), Virtual Conference, September, 2021.
[32] D. Croce, G. Castellucci, and R. Basili, "GAN-BERT: Generative Adversarial Learning for Robust Text Classification with a Bunch of Labeled Examples, "In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2114–2119, Virtual Conference, 2020.