簡易檢索 / 詳目顯示

研究生: 郭家蘅
Kuo, Chia-Heng
論文名稱: 利用自然語言處理技術於勞資爭議訴訟之判決預測-以BERT為例
Predicting the Judgements of the Labor Dispute Cases with Natural Language Processing
指導教授: 利德江
Li, Der-Chiang
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業與資訊管理學系碩士在職專班
Department of Industrial and Information Management (on the job class)
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 54
中文關鍵詞: 自然語言處理BERT模型判決預測
外文關鍵詞: natural language processing , BERT model, judgment prediction
相關次數: 點閱:29下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究旨在建立一個自動化法律文本處理及判決預測系統。首先,透過對司法院公開的裁判文書進行分類,確定其中屬於「僱傭關係存續」判決的案件。其次,利用正規表示式進行裁判書的資料結構化,整理成案件判決表格,以利後續分析。對於研究範圍內的案件文本進行摘要,旨在提取關鍵資訊,減少文本複雜性,並為模型訓練提供更具代表性的輸入。最後,使用基於BERT的預訓練模型進行二元類別分類的訓練,以預測判決結果,提高處理法律文本的效率。在實驗結果中,使用裁判書中「原告主張摘要」作為輸入資料,準確率達0.67,優於未經過摘要的裁判書全文的準確率0.62。總體而言,本研究致力於構建一個全面而高效的法律文本處理系統,以協助法律研究和提供法律實務方面的支持。

    This study aims to establish an automated legal text processing and judgment prediction system. First, by classifying judicial documents publicly available from the Judicial Yuan, the system identifies cases that fall under "employment relationship continuation" judgments. Next, regular expressions are utilized to structure the data from the judicial documents, organizing them into case judgment tables to facilitate subsequent analysis. For the case texts within the research scope, the goal is to summarize the content, extracting key information, reducing text complexity, and providing more representative input for model training. Finally, a pre-trained BERT-based model is employed for Binary classification to predict judgment outcomes, thereby enhancing the efficiency of legal text processing.
    In the experimental results, using the "plaintiff's claims summary" as input data achieved an accuracy of 0.67, which is superior to the 0.62 accuracy obtained from using the full judicial document without summarization. Overall, this study strives to build a comprehensive and efficient legal text processing system to support legal research and provide assistance in legal practice.

    摘要 i Abstract ii 誌謝 vii 目錄 viii 表目錄 x 圖目錄 xi 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機與目的2 1.3 研究流程3 第二章 文獻探討 5 2.1 人工智慧領域與法律結合的法資訊學 5 2.1.1 法律文件分類 5 2.1.2 法律要素標籤資訊及資訊提取 6 2.1.3 其他NLP在法律上的相關應用及研究 7 2.2 法律預測領域研究 8 2.3 語言模型的概述與發展 9 2.4 BERT模型 11 2.5 小結 11 第三章 研究方法 13 3.1 研究架構 13 3.2 資料蒐集與預處理 14 3.2.1 資料蒐集 14 3.2.2 裁判檢索關鍵宇與篩選方法 15 3.3 資料結構化 16 3.3.1 以正規表達式進行資料表格化 17 3.3.2 資料標籤 20 3.4 裁判書摘要 21 3.5 判決結果預測 22 3.5.1 文本特徵提取 23 3.5.2 分類器的構建 24 3.6 模型性能評估 25 3.7 小結 26 第四章 實驗結果與評估 28 4.1 實驗方法 28 4.1.1 資料來源 28 4.1.2 摘要模型建置 29 4.2 實驗驗證 30 4.2.1 實驗1 30 4.2.2 實驗2 32 4.3 小節 34 第五章 結論與未來方向 36 5.1 研究成果 36 5.2 未來方向 36 參考文獻 38

    Aletras N, Tsarapatsanis D, Preoţiuc-Pietro D, & Lampos V. (2016). Predicting judicial decisions of the European court of human rights: a natural language processing perspective. PeerJ Computer Science, 2, 93.

    Ashley, K. D., & Brüninghaus, S. (2009). Automatically classifying case texts and predicting outcomes, Artificial Intelligence and Law. 17, 125–165.

    Beltagy, I., Peters, M. E., & Cohan, A. (2020). Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150.

    Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., …, & Amodei., D. (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems 33 (NeurIPS 2020), Canada.

    Carvalho, D. S., Nguyen, M. T., & Tran, C. X. (2015). Lexical-Morphological modeling for legal text analysis, Ninth International Workshop on Juris-informatics (JURISIN), Japan.

    Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., & Androutsopoulos, I., (2020), LEGAL-BERT: The Muppets straight out of Law School. Findings of the Association for Computational Linguistics: EMNLP 2020, Online.

    Chen, S., Wang, P., Fang, W., Deng, X., & Zhang, Feng. (2019). Learning to predict charges for judgment with legal graph, Artificial Neural Networks and Machine Learning-ICANN 2019: Text and Time Series. Germany.

    Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

    Ding, M., Zhou, C., Yang, H., & Tang, J. (2020). Cogltx: Applying bert to long texts. Advances in Neural Information Processing Systems, 33, 12792-12804.

    DO, P.K., Nguyen, H. T., Tran, C. X., Nguyen, M.T., & Nguyen, M.L. (2016). Legal question answering using ranking SVM and deep convolutional neural network, Tenth International Workshop on Juris-informatics (JURISIN), Japan.

    Ganguly, D., Conrad, J.G., Ghosh, K., Ghosh, S., Goyal, P., Bhattacharya, P., & Paul, S. (2023). Legal IR and NLP: the history, challenges, and state-of-the-art, European Conference on Information Retrieval, Ireland.

    Jiang, X., Ye, H., Luo, Z., Chao, W. H., & Ma, W. (2018). Interpretable rationale augmented charge prediction system, Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, USA.

    Kirkpatrick, K (2017). It's not the algorithm, it's the data. Communications of the ACM, 60(2), 21-23.

    Liu, C. L., Chang, C. T., & Ho, J. H. (2004). Case instance generation and refinement for case-based criminal summary judgments in Chinese, Journal of Information Science and Engineering, 20, 783-800.

    Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692. Jul 2019.

    Ma, W.,& Chen, K.(2005) Design of CKIP Chinese word segmentation system, Chinese and Oriental Languages Information Processing Society, 14.3, 235-249.

    Mikolov, T., Karafiát, M., Burget, L., Černocký, J., & Khudanpur, S. (2010) Recurrent neural network based language model. Proc. Interspeech 2010, 1045-1048.

    Miller, D.(2019) Leveraging BERT for Extractive Text Summarization on Lectures, arXiv:1906.04165. Jun 2019.

    Tomas, M., Chen, K., Corrado, G., & Deanet, J. (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Jan 2013.

    Pappagari, R., Zelasko, P., Villalba, J., Carmiel, Y., & Dehak, N. (2019). Hierarchical transformers for long document classification, IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 838–844.

    Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.

    Rajmohan, S. (2022). A study on legal judgment prediction using deep learning techniques, 2022 IEEE Silchar Subsection Conference (SILCON), India.

    Schild, U. (1998). Criminal sentencing and intelligent decision support. AI and Law, 6, 151-202.

    Stolcke, A. (2002) SRILM - an extensible language modeling toolkit. Proc. 7th International Conference on Spoken Language Processing (ICSLP 2002), 901-904, doi: 10.21437/ICSLP.2002-303

    Sun, C., Qiu, X., Xu, Y., Huang, X. (2019). How to Fine-Tune BERT for Text Classification? Chinese Computational Linguistics (CCL 2019), 194–206, China.

    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Łukasz Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 5998-6008.

    Zaheer, M., Guruganesh, G., Dubey, K. A., Ainslie, J., Alberti, C., Ontanon, S., Pham, P., Ravula, A., Wang, Q., Yang L., & Ahmed, A. (2020). Big bird: Transformers for longer sequences. Advances in neural information processing systems, 33, 17283-17297.

    Zhao, W., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, B., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J,. Ren, R., Li, Y., Tang, X., Liu, Z., Liu, p., Nie, J., & Wen J. (2023). A Survey of Large Language Models. arXiv preprint arXiv: 2303.18223, Nov 2023.

    司法院新聞(2023)。因應國民法官新制,司法院啟用AI量刑資訊系統--具備二種模式、擁有四大優點。檢自https://www.judicial.gov.tw/tw/cp-1887-806741-d6471-1.html (sep. 4 2023)

    邵軒磊、黃詩淳(2020)。新住民相關親權酌定裁判書的文字探勘:對「平等」問題的法實證研究嘗試。臺大法學叢論,49(S),P1267–1308。

    林琬真、郭宗廷、張桐嘉、顏厥安、陳昭如、林守德(2012)。利用機器學習於中文法律檔之標記、案件分類及量刑預測。中文計算語言學期刊,17(4),49-68。

    黃詩淳、邵軒磊(2020)。以人工智慧讀取親權酌定裁判文本:自然語言與文字探勘之實踐。臺大法學叢論,49(1),195-224。

    劉志鵬(1999)。論「勞工確不能勝任工作」-最高法院八十四年度臺上字第六七三號判決評釋,勞動法學會(編),勞動法裁判選輯(二)(初版,253-275)。元照。

    鄭津津(2007)。「勞工確不能勝任工作」爭議問題之研究-評最高法院九十五年臺上字第一八六六號判決。月旦法學雜誌,144,229-244。

    鍾文傑、陳哲文、王駿發、曾世邦、王宗松(2020)。基於多BERT模型之NLLP應用於建築工程訴訟之理解與預測。第32屆計算語言學研討會,臺北。

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE