研究生: |
鍾文傑 Chung, Wen-Chieh |
---|---|
論文名稱: |
基於深度學習之工程訴訟案件篩選與歷審統計表建立及案件預測系統 Case screening, summary table creation and legal judgment prediction system for construction litigation based on deep learning |
指導教授: |
王駿發
Wang, Jhing-Fa |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 54 |
中文關鍵詞: | 案件篩選 、資訊擷取 、IDF 、POS 、文本相似度 、判決預測 、BERT |
外文關鍵詞: | case screening, information extraction, IDF, part of speech, text similarity, judgment prediction, BERT |
相關次數: | 點閱:47 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來由於人工智慧發展迅速,深度學習的技術逐漸開始應用於各種領域,法律便是其中之一。本研究以深度學習技術提出一個工程訴訟案件篩選與歷審統計表建立及案件預測系統,並分為三個部分。第一部份是工程訴訟案件篩選,由中華民國司法院提供之判決書資料中經由兩步驟篩選出屬於建築工程訴訟之案件,其準確率達到93.55%。第二部分是案件歷審統計表建立,將案件的歷審判決書利用正則表達式進行資訊擷取並彙整成個案之歷審統計表,準確率達到86.75%。第三部分是案件預測,案件預測有三項輸出:1.同類型案例之統計表格 2.與本案相似之過往案例 3.本案之法院判決預測結果。 同類型案例之統計表格為匹配輸入之案件類別,並輸出事先由法律專家針對案件類別進行訴訟所得與訴訟時間統計之統計表。與本案之相似案例是利用詞嵌入模型將判決書文本與輸入案件內的詞進行向量化後,再將每個詞進行IDF與詞性(POS)加權並計算其餘弦相似度,最後列出前10個相似的案例。本案之法院判決預測結果是利用判決書文本利用BERT模型將判決書文本向量化後再通過神經網路,預測法院判決之結果,在預測訴訟時間及訴訟所得的準確率為88.89%與82.22%。本系統可讓使用者於訴訟前先行得知案件可能的勝敗訴情形,再評估是否要提起訴訟。
In recent years, due to the rapid development of artificial intelligence, deep learning technology has gradually begun to be applied in various fields, and law is one of them. This study uses deep learning technology to propose a construction litigation case screening, summary table creation and legal judgment prediction system , and is divided into three parts. The first part is the case screening for construction litigation. In the judgment data provided by the Judicial Yuan of the Republic of China, the cases that belong to the construction litigation are screened out in two steps. The accuracy of case screening is 93.55%. The second part is the case summary table creation for construction litigation, which uses regular expressions to extract information from the case trial records and integrates it into case summary tables. The third part is the case prediction. The case prediction has three outputs: 1. Statistical table of the same type of case 2. Similar case 3. The court judgment prediction result. The accuracy of summary table creation is 86.75%. The statistical table of the same type of case is the case type that matches the input, and a statistical table that the legal experts conduct litigation proceeds and litigation time statistics on the case type in advance is output. Similar case, the word embedding model is used to vectorize the words in the judgment and the input case. Then, each word is weighted by IDF and part of speech (POS) and the cosine similarity is calculated. Finally, the top 10 similar cases are listed. The court judgment prediction result is to use the BERT model to vectorize the judgment and then pass the neural network to predict the result of the court judgment. The accuracy rate in predicting litigation period and the gain in the litigation is 88.89% and 82.22%. This system allows the user to know the possible success or failure of the case before the lawsuit, and then assess whether to initiate a lawsuit.
[1] Rissland, Edwina L., Kevin D. Ashley, and Ronald Prescott Loui. "AI and Law: A fruitful synergy." Artificial Intelligence 150.1-2 (2003): 1-15.
[2] Bench-Capon, Trevor, et al. "A history of AI and Law in 50 papers: 25 years of the international conference on AI and Law." Artificial Intelligence and Law 20.3 (2012): 215-319.
[3] Aletras, Nikolaos, et al. "Proceedings of the Natural Legal Language Processing Workshop 2019." Proceedings of the Natural Legal Language Processing Workshop 2019. 2019.
[4] Do, Phong-Khac, et al. "Legal question answering using ranking SVM and deep convolutional neural network." arXiv preprint arXiv:1703.05320 (2017).
[5] Katz, Daniel Martin, Michael J. Bommarito, and Josh Blackman II. "A general approach for predicting the behavior of the Supreme Court of the United States." PloS one 12.4 (2017).
[6] Virtucio, Michael Benedict L., et al. "Predicting decisions of the philippine supreme court using natural language processing and machine learning." 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC). Vol. 2. IEEE, 2018.
[7] Aletras, Nikolaos, et al. "Predicting judicial decisions of the European Court of Human Rights: A natural language processing perspective." PeerJ Computer Science 2 (2016): e93.
[8] 工程糾紛處理流程與相關事項之介紹. Available: http://ja.lawbank.com.tw/pdf/%E5%B7%A5%E7%A8%8B%E7%B3%BE%E7%B4%9B%E8%99%95%E7%90%86%E6%B5%81%E7%A8%8B%E8%88%87%E7%9B%B8%E9%97%9C%E4%BA%8B%E9%A0%85%E4%B9%8B%E4%BB%8B%E7%B4%B9.pdf
[9] Su, Gui-yang, et al. "Improving the precision of the keyword-matching pornographic text filtering method using a hybrid model." Journal of Zhejiang University-Science A 5.9 (2004): 1106-1113.
[10] Wu, Ou, and Weiming Hu. "Web sensitive text filtering by combining semantics and statistics." 2005 International Conference on Natural Language Processing and Knowledge Engineering. IEEE, 2005.
[11] 张静, and 张妍. "正则表达式及其在信息抽取中的应用." 电脑知识与技术 5.15 (2009): 3867-3868.
[12] Lin, Tao, et al. "Deep Web Data Extraction Based on Regular Expression." Advanced Materials Research. Vol. 718. Trans Tech Publications Ltd, 2013.
[13] Turchin, Alexander, et al. "Using regular expressions to abstract blood pressure and treatment intensification information from the text of physician notes." Journal of the American Medical Informatics Association 13.6 (2006): 691-695.
[14] Xu, Jin, et al. "Judicial Case Screening Based on LDA." International Conference of Pioneering Computer Scientists, Engineers and Educators. Springer, Singapore, 2019.
[15] Zhang, W., Yoshida, T., Tang, X.: A comparative study of TF* IDF, LSI and multi-words for text classification. Expert Syst. Appl. 38(3), 2758–2765 (2011)
[16] Zelikovitz, S., Hirsh, H.: Using LSI for text classification in the presence of background text. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, pp. 113–118. ACM (2001)
[17] He, Tieke, et al. "Word embedding based document similarity for the inferring of penalty." International Conference on Web Information Systems and Applications. Springer, Cham, 2018.
[18] Xia, Chunyu, et al. "Ensemble Methods for Word Embedding Model Based on Judicial Text." International Conference on Web Information Systems and Applications. Springer, Cham, 2019.
[19] Xiao, Chaojun, et al. "Cail2018: A large-scale legal dataset for judgment prediction." arXiv preprint arXiv:1807.02478 (2018).
[20] Chen, Baogui, et al. "A Deep Learning Method for Judicial Decision Support." 2019 IEEE 19th International Conference on Software Quality, Reliability and Security Companion (QRS-C). IEEE, 2019.
[21] Joulin, Armand, et al. "Bag of tricks for efficient text classification." arXiv preprint arXiv:1607.01759 (2016).
[22] Kim, Yoon. "Convolutional neural networks for sentence classification." arXiv preprint arXiv:1408.5882 (2014).
[23] Zhang, Shu, et al. "Evaluation of Judicial Imprisonment Term Prediction Model Based on Text Mutation." 2019 IEEE 19th International Conference on Software Quality, Reliability and Security Companion (QRS-C). IEEE, 2019.
[24] Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
[25] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems. 2017.
[26] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[27] Wu, Yonghui, et al. "Google's neural machine translation system: Bridging the gap between human and machine translation." arXiv preprint arXiv:1609.08144 (2016).
[28] Ma, Wei-Yun, and Keh-Jiann Chen. "Design of CKIP Chinese word segmentation system." Chinese and Oriental Languages Information Processing Society 14.3 (2005): 235-249.
[29] Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).
[30] Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.