| 研究生: |
張晏箏 Chang, Yen-Cheng |
|---|---|
| 論文名稱: |
基於跨模態對比學習之自動程式評分系統 Automated Code Scoring System based on Cross-Modal Contrastive Learning |
| 指導教授: |
王惠嘉
Wang, Hei-Chia |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 99 |
| 中文關鍵詞: | 自動程式評分系統 、跨模態對比學習 、深度學習 |
| 外文關鍵詞: | Automatic Code Scoring, Cross-modal Contrastive Learning, Deep Learning |
| 相關次數: | 點閱:59 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
程式設計課程通常透過作業評估來輔助學生學習,但隨著學習人數的增加,人工批改程式作業相當費時費力。為解決此問題,自動程式評分系統被提出,其中線上評分系統(Online Judge, OJ)被廣泛應用於各領域,但OJ僅考慮結果是否與預期一致,無法根據與正確答案的相似程度進行差異化評分;亦有研究採用程式碼靜態特徵或深度學習模型擷取之特徵,根據待評分程式碼之特徵或與參考答案程式碼的特徵之間的差異進行評分。
然而,相對於參考答案程式碼,作業要求能提供更多與課程學習目標及評分標準相關的資訊。先前的模型皆無考慮以文字敘述之作業要求,難以針對不同課程目標對程式碼進行不同重點的評分。因此,本研究提出基於跨模態對比學習之自動程式評分系統,結合跨模態預訓練模型與對比學習,獲得具有相符關聯性的嵌入表示空間,並依據作業要求、參考答案程式碼與待評分程式碼之關聯與歐式距離為程式作業評分,期望能夠使自動程式評分系統根據不同作業要求對學生提交的程式碼進行不同重點的評分。
實驗結果顯示,在跨模態預訓練模型中CodeT5+在程式碼評分任務中表現優於其他跨模態模型。進一步將CodeT5+結合對比學習、對比回歸學習與表示增強模組,則能更準確捕捉需求語意並提升評分準確性與順序性,相關性整體優於現有的自動程式評分模型。本研究也透過個案分析,確認模型能有效區分與作業需求相符或不符的程式碼,反映程式碼的相符性。
Programming assignments are essential for learning but become time-consuming to grade manually as class sizes grow. Automatic grading systems, like online judging (OJ) systems, are widely used but focus only on output correctness, lacking the ability to grade based on similarity to correct solutions. Other studies have used static features of code or features extracted by deep learning models to evaluate the differences between the features of the code to be graded and those of the reference code.
Assignment requirements provide richer information regarding course learning objectives and grading criteria than reference codes. However, previous models have not utilized textual descriptions of these requirements, limiting their adaptability to different grading objectives. This study proposes an automatic programming grading system based on cross-modal contrastive learning. By integrating pre-trained models with contrastive learning, the system creates an embedding space aligning representation of requirements, reference code, and submitted code. This enables the system to emphasize different aspects of grading according to varying assignment requirements.
Experimental results show that CodeT5+ outperforms other cross-modal models in programming grading. Furthermore, integrating CodeT5+ with contrastive learning and representation enhancements further improves its ability to capture requirement semantics, boosting grading accuracy and ranking consistency. Case studies further confirm the model's effectiveness in distinguishing code that aligns with or deviates from assignment requirements, accurately reflecting the relevance of the submitted code.
Ala-Mutka, K. M. (2005). A survey of automated assessment approaches for programming assignments. Computer science education, 15(2), 83-102.
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
Baltrušaitis, T., Ahuja, C., & Morency, L.-P. (2018). Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence, 41(2), 423-443.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., & Askell, A. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.
Caiza, J. C., & Del Alamo, J. M. (2013). Programming assignments automatic grading: review of tools and implementations. INTED2013 Proceedings, 5691-5700.
Cheang, B., Kurnia, A., Lim, A., & Oon, W.-C. (2003). On automated grading of programming assignments in an academic institution. Computers & Education, 41(2), 121-131.
CodeSerra. (2022). CodeBERT. https://codeserra.medium.com/codebert-83171b23c33c
de Freitas, A., Coffman, J., de Freitas, M., Wilson, J., & Weingart, T. (2023). Falconcode: A multiyear dataset of python code samples from an introductory computer science course. Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1, (SIGCSE 2023). Association for Computing Machinery, New York, NY, USA, 938–944.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019, June). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In J. Burstein, C. Doran, & T. Solorio, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171-4186. Minneapolis, Minnesota.
Douce, C., Livingstone, D., & Orwell, J. (2005). Automatic test-based assessment of programming: A review. Journal on Educational Resources in Computing (JERIC), 5(3), 4-es.
Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., & Jiang, D. (2020). Codebert: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, 1536–1547. Association for Computational Linguistics.
Gao, T., Yao, X., & Chen, D. (2021). SimCSE: Simple Contrastive Learning of Sentence Embeddings. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 6894-6910.Association for Computational Linguistics.
Glassman, E. L., Singh, R., & Miller, R. C. (2014). Feature engineering for clustering student solutions. Proceedings of the first ACM conference on Learning@ scale conference,171-172. Association for Computing Machinery, New York, NY, USA.
Guo, D., Ren, S., Lu, S., Feng, Z., Tang, D., Liu, S., Zhou, L., Duan, N., Svyatkovskiy, A., & Fu, S. (2020). Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366.
Hadsell, R., Chopra, S., & LeCun, Y. (2006). Dimensionality reduction by learning an invariant mapping. 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR'06), 1735-1742 , New York, NY, USA.
Hao, X., Xu, Z., Guo, M., Hu, Y., & Geng, F. (2023). The effect of embedded structures on cognitive load for novice learners during block-based code comprehension. International Journal of STEM Education, 10(1), 1-16.
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, 9729-9738. Computer Vision Foundation.
Hearst, M. A. (1997). Text tiling: Segmenting text into multi-paragraph subtopic passages. Computational linguistics, 23(1), 33-64.
Higgins, C. A., Gray, G., Symeonidis, P., & Tsintsifas, A. (2005). Automated assessment and experiences of teaching programming. Journal on Educational Resources in Computing (JERIC), 5(3), 5-es.
Husain, H., Wu, H.-H., Gazit, T., Allamanis, M., & Brockschmidt, M. (2019). Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436.
Ihantola, P., Ahoniemi, T., Karavirta, V., & Seppälä, O. (2010). Review of recent systems for automatic assessment of programming assignments. Proceedings of the 10th Koli calling international conference on computing education research, 86–93. Association for Computing Machinery, New York, NY, USA.
Jain, R., Gervasoni, N., Ndhlovu, M., & Rawat, S. (2023). A Code Centric Evaluation of C/C++ Vulnerability Datasets for Deep Learning Based Vulnerability Detection Techniques. Proceedings of the 16th Innovations in Software Engineering Conference, Article 6, 1–10. Association for Computing Machinery, New York, NY, USA.
Jurado, F., Redondo, M. A., & Ortega, M. (2012). Using fuzzy logic applied to software metrics and test cases to assess programming assignments and give advice. Journal of Network and Computer Applications, 35(2), 695-712.
Kostic, M., Witschel, H. F., Hinkelmann, K., & Spahic-Bogdanovic, M. (2024). LLMs in Automated Essay Evaluation: A Case Study. Proceedings of the AAAI Symposium Series , 3(1), 143-147. Association for the Advancement of Artificial Intelligence.
Kurnia, A., Lim, A., & Cheang, B. (2001). Online judge. Computers & Education, 36(4), 299-315.
Lagakis, P., Demetriadis, S., & Psathas, G. (2023). Automated grading in coding exercises using large language models. In Interactive Mobile Communication, Technologies and Learning, 363-373. Springer.
Li, J., Liu, F., Li, J., Zhao, Y., Li, G., & Jin, Z. (2023). MCodeSearcher: Multi-View Contrastive Learning for Code Search. Proceedings of the 14th Asia-Pacific Symposium on Internetware, 270–280. Association for Computing Machinery, New York, NY, USA.
Li, X., Gong, Y., Shen, Y., Qiu, X., Zhang, H., Yao, B., Qi, W., Jiang, D., Chen, W., & Duan, N. (2022). CodeRetriever: A Large Scale Contrastive Pre-Training Method for Code Search. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2898–2910. Association for Computational Linguistics.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
Messer, M., Brown, N. C., Kölling, M., & Shi, M. (2023). Automated Grading and Feedback Tools for Programming Education: A Systematic Review. ACM Transactions on Computing Education, 24(1), 1-43.
Muddaluru, R. V., Thoguluva, S. R., Prabha, S., Pati, P. B., & Balakrishnan, R. M. (2023). Auto-grading C programming assignments with CodeBERT and Random Forest Regressor. 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), 1-6. Delhi, India.
Nagy, P., & Davoudi, H. (2023). Towards Deep Learning Models for Automatic Computer Program Grading. 2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA), 1-10. Thessaloniki, Greece.
Nijkamp, E., Pang, B., Hayashi, H., Tu, L., Wang, H., Zhou, Y., Savarese, S., & Xiong, C. (2022). Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474.
Paiva, J. C., Leal, J. P., & Figueira, Á. (2022). Automated assessment in computer science education: A state-of-the-art review. ACM Transactions on Computing Education (TOCE), 22(3), 1-40.
Qin, Y., Sun, G., Li, J., Hu, T., & He, Y. (2021). Scg_fbs: a code grading model for students’ program in programming education. 2021 13th International Conference on Machine Learning and Computing, 210–216. Association for Computing Machinery, New York, NY, USA.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., & Clark, J. (2021). Learning transferable visual models from natural language supervision. Proceedings of Machine Learning Research, Volume 139. International conference on machine learning.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1), 5485-5551.
Roziere, B., Gehring, J., Gloeckle, F., Sootla, S., Gat, I., Tan, X. E., Adi, Y., Liu, J., Sauvestre, R., & Remez, T. (2023). Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
Sawant, N., & Sengamedu, S. H. (2023). Code compliance assessment as a learning problem. 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), 445–454. IEEE Press, Melbourne, Australia.
Sendjaja, K., Rukmono, S. A., & Perdana, R. S. (2021). Evaluating control-flow graph similarity for grading programming exercises. 2021 International Conference on Data and Software Engineering (ICoDSE), 1-6. Bandung, Indonesia.
Singh, G., Srikant, S., & Aggarwal, V. (2016). Question independent grading using machine learning: The case of computer program grading. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,
Srikant, S., & Aggarwal, V. (2014). A system to grade computer programming skills using machine learning. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. Association for Computing Machinery, New York, NY, USA.
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27.
Tharmaseelan, J., Manathunga, K., Reyal, S., Kasthurirathna, D., & Thurairasa, T. (2021). Revisit of automated marking techniques for programming assignments. 2021 IEEE Global Engineering Education Conference (EDUCON), Vienna, Austria.
Thorburn, G., & Rowe, G. (1997). PASS: An automated system for program assessment. Computers & Education, 29(4), 195-206.
Tremblay, G., Guérin, F., Pons, A., & Salah, A. (2008). Oto, a generic and extensible tool for marking programming assignments. Software: Practice and Experience, 38(3), 307-333.
Ullah, Z., Lajis, A., Jamjoom, M., Altalhi, A., Al‐Ghamdi, A., & Saleem, F. (2018). The effect of automatic assessment on novice programming: Strengths and limitations of existing systems. Computer Applications in Engineering Education, 26(6), 2328-2341.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Wang, D., Zhang, E., & Lu, X. (2022). Automatic Grading of Student Code with Similarity Measurement. Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Cham.
Wang, T., Su, X., Ma, P., Wang, Y., & Wang, K. (2011). Ability-training-oriented automated assessment in introductory programming course. Computers & Education, 56(1), 220-226.
Wang, Y., Le, H., Gotmare, A. D., Bui, N. D., Li, J., & Hoi, S. C. (2023). Codet5+: Open code large language models for code understanding and generation. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 1069–1088. Association for Computational Linguistics.
Wang, Y., Wang, W., Joty, S., & Hoi, S. C. (2021). Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 8696-8708. Association for Computational Linguistics.
Wasik, S., Antczak, M., Badura, J., Laskowski, A., & Sternal, T. (2018). A survey on online judge systems and their applications. ACM Computing Surveys (CSUR), 51(1), 1-34.
World Economic Forum, J. (2020). The future of jobs report 2020. Retrieved from Geneva.
Wu, Z., Xiong, Y., Yu, S. X., & Lin, D. (2018). Unsupervised feature learning via non-parametric instance discrimination. Proceedings of the IEEE conference on computer vision and pattern recognition,
Xie, J., Cai, K., Kong, L., Zhou, J., & Qu, W. (2022). Automated essay scoring via pairwise contrastive regression. Proceedings of the 29th International Conference on Computational Linguistics, 3733-3742. Salt Lake City, UT, USA.
Yang, R., Cao, J., Wen, Z., Wu, Y., & He, X. (2020). Enhancing automated essay scoring performance via fine-tuning pre-trained language models with combination of regression and ranking. Findings of the Association for Computational Linguistics: EMNLP 2020, 1560–1569. Association for Computational Linguistics.
Yu, X., Rao, Y., Zhao, W., Lu, J., & Zhou, J. (2021). Group-aware contrastive regression for action quality assessment. Proceedings of the IEEE/CVF international conference on computer vision, 7899-7908. Montreal, QC, Canada.
Yulianto, S. V., & Liem, I. (2014). Automatic grader for programming assignment using source code analyzer. 2014 International Conference on Data and Software Engineering (ICODSE), 1-4. Bandung, Indonesia.
Zhang, J., Wang, X., Zhang, H., Sun, H., Wang, K., & Liu, X. (2019). A novel neural source code representation based on abstract syntax tree. 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), 783-794. Montreal, QC, Canadas.
Zydroń, P. W., & Protasiewicz, J. (2023). Enhancing Code Review Efficiency–Automated Pull Request Evaluation using Natural Language Processing and Machine Learning. Advances in Science and Technology. Research Journal, 17(4).
吳軒億. (2022). 後疫情時代促進大學 STEM 人才培育策略之探討. 人力規劃及發展研究報告, 第 22 輯.
推動大學程式設計教學. (n.d.). 推動大學程式設計教學-關於我們. Retrieved 2/28 from https://plus.pro.edu.tw/about/about.jsp
曾令愉、推動大學程式設計教學計畫團隊. (2022). 迎戰下個世代的數位轉型,大學程式教育準備好了!--教育部「推動大學程式設計教學計畫」成果發表會報導(上篇). Retrieved 2/28 from https://plus.pro.edu.tw/news_detail.jsp?id=88&t=n
蔡進雄. (2019). 各國推動STEM教育的新動態. https://epaper.naer.edu.tw/edm.php?grp_no=3&edm_no=180&content_no=3176
校內:2030-02-02公開