| 研究生: |
薛莞庭 Hsueh, Wan-Ting |
|---|---|
| 論文名稱: |
建立個人化模型:透過管道化機器學習流程探索感興趣主題 Building a Personalized Model: Exploring Topic of Interest Through a Machine Learning Pipeline |
| 指導教授: |
鄧維光
Teng, Wei-Guang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 英文 |
| 論文頁數: | 44 |
| 中文關鍵詞: | 文本分類 、自動化機器學習 、機器學習運營化 、無程式碼平台 |
| 外文關鍵詞: | text classification, automated machine learning, machine learning operations, no-code development platform |
| 相關次數: | 點閱:111 下載:20 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
每天都有成千上萬的訊息在網路上傳播,在蓬勃發展的社群媒體時代有效傳遞相關資訊,找到所需的資訊可能既費時又具有挑戰性。為了解決這個問題,本研究的重點是在建置一機器學習應用管道,我們提出使用機器學習方法,讓使用者可以根據自己的需求建置個人化模型,並持續接收社群媒體上感興趣的訊息。為確保非技術相關使用者可以簡化的工作流程完成應用建置,我們在使用者友善的無程式碼開發平台上建置了完整的管道,使模型訓練和調整過程自動化,允許使用者部署模型並接收有關的感興趣主題實時訊息。此外,使用者可以重新標記分類不正確的訊息以增強模型的學習效能。為了使整體管道更加一致且可靠,我們將機器學習運營化的概念導入,為的是使整體的流程是一個完整的生命週期,達成持續性的監控以及精進,使模型效能能夠與時俱進。並且加入自動化機器學習方法,取代在傳統的機器學習方法中需要人類介入來回試錯的過程,讓使用者更專注於標記感興趣主題的相關資料與後續模型部署使用後的分類結果觀測。總體來說,這項工作旨在讓非技術相關使用者能參與機器學習流程,並透過簡易的操作介面完成個人化的應用。
Many messages circulate online daily, making it time-consuming and challenging to find the desired information in the thriving era of social media. This work focuses on establishing a machine-learning application pipeline to address this issue. By using machine learning methods to allow users to build personalized models according to their needs and continuously receive the model classification results. To ensure accessibility for non-technical users and streamline the workflow, we develop a complete pipeline on a user-friendly no-code development platform. The platform automates the process of model training and adjustment, enabling users to deploy the model and receive real-time information on their interested topics. Additionally, users can re-label incorrectly classified messages to enhance the model learning performance. To ensure a consistent and reliable overall process, we incorporate the concept of machine learning operations (MLOps) to establish a complete lifecycle and enable continuous monitoring to improve model performance. Automated machine learning (AutoML) is introduced to replace the traditional trial-and-error process that requires human intervention. This allows users to focus more on labeling relevant data related to their interested topics and observing the classification results after model deployment. Overall, our work is committed to enabling non-technical users to participate in the process of machine learning and complete personalized applications through a simple operation interface.
[1] P. K. Jayasekara, “Role of Facebook as a Disaster Communication Media,” International Journal of Emergency Services, 8(2):191-204, August 2019.
[2] R. Sanchis, O. García-Perales, F. Fraile, and R. Poler, "Low-code as an enabler of digital transformation in manufacturing industry", Appl. Sci., vol. 10, no. 20, Dec. 2019.
[3] Y.-H. Chang, Young-Hyun, and B. -K. Ko, "A study on the design of low-code and no-code platform for mobile application development." International journal of advanced smart convergence 6.4 (2017): 50-55.
[4] G. F. Hurlburt, "Low-Code, No-Code, What's Under the Hood?." IT Professional 23.6 (2021): 4-7.
[5] S. Amershi, et al. (2014) ‘Power to the People: The Role of Humans in Interactive Machine Learning’, AI Magazine, 35(4), pp. 105–120. doi: 10.1609/aimag.v35i4.2513.
[6] R. Porter, J. Theiler, and D. Hush, (2013) ‘Interactive machine learning in data exploitation’, Computing in Science and Engineering, 15(5), pp. 12–20. doi: 10.1109/MCSE.2013.74.
[7] "Explainable Interactive Machine Learning" 2022 [online] Available: https://h-lab.iism.kit.edu/1474_1477.php
[8] M.-L. Zhang, and Z.-H. Zhou, "A review on multi-label learning algorithms." IEEE transactions on knowledge and data engineering 26.8 (2013): 1819-1837.
[9] R. Wang, et al. "A novel reasoning mechanism for multi-label text classification." Information Processing & Management 58.2 (2021): 102441.
[10] R. Herbrich, X. Zhang, and T. Graepel, "Bayesian online learning for multi-label and multi-variate performance measures. " In International Conference on Artificial Intelligence and Statistics, (AISTATS), 2010.
[11] M. S. Sorower, "A literature survey on algorithms for multi-label learning." Oregon State University, Corvallis 18.1 (2010): 25.
[12] H. Fan, et al. "Distributed Online Multi-Label Learning with Privacy Protection in the Internet of Things." Applied Sciences 13.4 (2023): 2713.
[13] H. Peng, et al. "Hierarchical taxonomy-aware and attentional graph capsule RCNNs for large-scale multi-label text classification." IEEE Transactions on Knowledge and Data Engineering 33.6 (2019): 2505-2519.
[14] C. Linkun, et al. "A hybrid BERT model that incorporates label semantics via adjustive attention for multi-label text classification." Ieee Access 8 (2020): 152183-152192.
[15] M.-L. Zhang, and Z.-H. Zhou, "A review on multi-label learning algorithms." IEEE transactions on knowledge and data engineering 26.8 (2013): 1819-1837
[16] M.-L. Zhang and K. Zhang, "Multi-label learning by exploiting label dependency", Proc. 16th ACM SIGKDD Int. Conf. KDD, pp. 999-1007, 2010.
[17] I. Katakis and I. Vlahavas, "A review of multi-label classification methods." Proceedings of the 2nd ADBIS workshop on data mining and knowledge discovery (ADMKD 2006). 2006.
[18] X. Zhang, et al. "Enhancing label correlation feedback in multi-label text classification via multi-task learning." arXiv preprint arXiv:2106.03103 (2021).
[19] P. Y. Simard, et al. "Machine teaching: A new paradigm for building machine learning systems." arXiv preprint arXiv:1707.06742 (2017).
[20] Y. Zhou, Y. Yu, and B. Ding, "Towards mlops: A case study of ml pipeline platform." 2020 International conference on artificial intelligence and computer engineering (ICAICE). IEEE, 2020.
[21] Y. Liu, et al. "Building A Platform for Machine Learning Operations from Open Source Frameworks." IFAC-PapersOnLine 53.5 (2020): 704-709.
[22] S. Georgios, et al. "MLOps-definitions, tools and challenges." 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 2022.
[23] "Dramatically Increase Model Training&Deployment Speed" 2021 [Online] Available: https://www.incyclesoftware.com/azure-machine-learning-enterprise-accelerator
[24] S. Georgios, et al. "MLOps-definitions, tools and challenges." 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 2022.
[25] "Mlops: Continuous delivery and automation pipelines in machine learning-google cloud"[online]Available:https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning.
[26] K.-Y. Li, et al. "An automated machine learning framework in unmanned aircraft systems: new insights into agricultural management practices recognition approaches." Remote Sensing 13.16 (2021): 3190.
[27] K.-Y. Li, et al. "Toward automated machine learning-based hyperspectral image analysis in crop yield and biomass estimation." Remote Sensing 14.5 (2022): 1114.
[28] B. De, A. Jens, et al. "A global database of historic and real-time flood events based on social media." Scientific data 6.1 (2019): 311.
[29] P. Arpaia, et al. "Assessment of blood perfusion quality in laparoscopic colorectal surgery by means of Machine Learning." Scientific Reports 12.1 (2022): 14682.
[30] Z.-X. Gu, et al. "Development of a Non-Contacting Muscular Activity Measurement System for Evaluating Knee Extensors Training in Real-Time." Sensors 22.12 (2022): 4632.
[31] Y. Peng, S. Yan, and Z. Lu, "Transfer learning in biomedical natural language processing: An evaluation of BERT and ELMo on ten benchmarking datasets", 2019.
[32] "Sigmoid and SoftMax Functions - The math behind two of the most used activation functions in Machine Learning" 2022 [online] Available: https://towardsdatascience.com/sigmoid-and-softmax-functions-in-5-minutes-f516c80ea1f9