=Paper=
{{Paper
|id=Vol-3159/T2-7
|storemode=property
|title=Simple Transformers in Rhetoric Role Labelling for Legal Judgements
|pdfUrl=https://ceur-ws.org/Vol-3159/T2-7.pdf
|volume=Vol-3159
|authors=Sai Shridhar Balamurali,Kayalvizhi S,Thenmozhi D
|dblpUrl=https://dblp.org/rec/conf/fire/BalamuraliSD21
}}
==Simple Transformers in Rhetoric Role Labelling for Legal Judgements==
Simple Transformers in Rhetoric Role Labelling for Legal Judgements B Sai Shridhar1 , S Kayalvizhi2 and D Thenmozhi3 1 SSN College Of Engineering, Chennai 2 SSN College Of Engineering, Chennai 3 SSN College Of Engineering, Chennai Abstract Legal case documents follow a common thematic structure with implicit sections like “Facts of the Case”, “Issues being discussed”, “Arguments given by the parties”, etc. These sections are popularly termed as "rhetoric roles". Knowledge of such semantic segments or roles will not only enhance the readability of the documents but also help in downstream tasks like computing document similarity, summarization, etc. The task is, given a legal document, to classify each sentence into 7 rhetoric roles. We compare ALBERT, BERT, RoBERTa and LaBSE for this task. The results show that BERT had the best accuracy at predicting the labels. Keywords Legal documents, Rhetoric labels, BERT, ALBERT, RoBERTa, LaBSE 1. Introduction In countries that follow the common law system (e.g., UK, USA, Canada, Australia, India) there are two primary sources of law- Statutes (established laws, such as the Constitution of a country) and Precedents (prior cases decided in courts of law). Precedents or prior cases help a lawyer understand how the Court has dealt with similar scenarios in the past, and prepare the legal reasoning accordingly. When a lawyer is presented with a situation (that will potentially lead to filing of a case), it will be very beneficial to him/her if there is an automatic system that identifies a set of related prior cases involving similar situations as well as statutes/acts that can be most suited to the purpose in the given situation. Most of the legal case documents follow a common structure with different sections like "Details of the Case", “Issues being discussed”, “Arguments given by the parties”, etc. These sections are popularly termed as "rhetoric roles". Acquiring such semantic roles will not only improve the readability of the documents but also needed to compute document similarity, summarization, etc. However, this information is generally not specified explicitly in case documents, which are usually just free-flowing text. The task is to semantically label the sentence into one among the seven roles. FIRE 2021:Forum for Information Retrieval Evaluation, December 13-17, 2021, India. $ saishridhar.16@gmail.com (B. S. Shridhar); kayalvizhis@ssn.edu.in (S. Kayalvizhi); theni1_d@ssn.edu.in (D. Thenmozhi) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 2. Related Work In AILA 2020, the task of labelling rhetoric roles for legal judgments was solved by many authors. In [1] RoBERTa along with Bi-LSTM was used. In [2] they combine TF-IDF features and deep semantic features using BERT. Logistic regression, linear kernel SVM and AdaBoost are used as classifiers. In [3] RoBERTa and a fully connected layer for classification was used. In [4] they experiment with both TF-IDF features and BERT-based features for the task and in [5] they experiment with FastText and TF-IDF from the feature engineering aspect, and Multi-layer perpceptron and Random Forest from the classifier aspect. From [6] we can see that the RoBERTa and BERT transformers seem to be giving better performance. 3. Task & Dataset Description The AILA 2021 Task 1 training data consists of 60 case documents. In each document the sentences are labelled by one of the 7 categories: 1. Facts: sentences that denote the chronology of events that led to filing the case 2. Ruling by Lower Court: the cases in the dataset were given a preliminary ruling by the lower courts (Tribunal, High Court etc.). These sentences correspond to the rul- ing/decision given by these lower courts. 3. Argument: sentences that denote the arguments of the contending parties 4. Statute: relevant statute cited 5. Precedent: relevant precedent cited 6. Ratio of the decision: sentences that denote the rationale/reasoning given by the Supreme Court for the final judgement 7. Ruling by Present Court: sentences that denote the final decision given by the Supreme Court for that case document These documents were manually annotated by legal experts. In addition, we also included the rhetoric labels of the AILA 2021 Task 2 dataset and the AILA 2020 Task 2 dataset. The labels of these datasets were categorical variables so first we converted these variables to ordinal encoded variables. In total we had 12170 text-label pairs to train the model. 4. Proposed Methodology The task involves the classification of sentences from legal case documents into 7 rhetoric roles. We use the simpletransformers library to import the BERT, RoBERTa, LaBSE and ALBERT transformers which we use as classifiers. We then select the 3 best models and measure the macro F1, precision, and recall values to find the best suited classifier for the task. 4.1. BERT BERT[7], which stands for Bidirectional Encoder Representations from Transformers, is based on Transformers, a deep learning model in which every output element is connected to every input element, and the weightings between them are dynamically calculated based upon their connection. BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. 4.2. ALBERT The backbone of the ALBERT architecture is similar to BERT in that it uses a transformer encoder (Vaswani et al., 2017) with GELU nonlinearities (Hendrycks & Gimpel, 2016). The three main differences are: • Splitting the embedding matrix into two smaller matrices using parameterized factoriza- tion embeddings • Cross layer parameter sharing • Inter sentence Coherence Loss 4.3. RoBERTa RoBERTa stands for Robustly Optimized BERT Pre-training Approach. optimize the training of BERT architecture in order to take lesser time during pre-training. It has almost similar architecture as compare to BERT, but in order to improve the results on BERT architecture, the authors made some simple design changes in its architecture and training procedure. These changes are: • Removing the Next Sentence Prediction Objective • Training with bigger batch sizes and longer sequences • Dynamically changing the masking pattern 4.4. LaBSE LaBSE [10] stands for Language-Agnostic BERT Sentence Embedding. The architecture is based on a Bidirectional Dual-Encoder (Guo et. al.) with Additive Margin Softmax (Yang et al.) with improvements.It produces language-agnostic sentence embeddings for more than 100 languages in a single model. The model is trained to generate similar embeddings for bilingual sentence pairs that are translations of each other. We use these models as Multiclass Classification Models with the default parameters given by simple transformers. 5. Results The BERT, ALBERT, RoBERTa and LaBSE classifiers are compared using their mcc scores and evaluation loss on an evaluation data set. The results are shown in Table 1. The 3 best runs are chosen for predicting the rhetoric values for the given test data. The Precision, Recall and Macro F1-Score are calculated class wise and shown in Table 2. The overall score is then calculated and shown in Table 3. From Table 2, we can see that LaBSE has the best precision scores for Classifier Epochs mcc Evaluation Loss BERT 5 0.5333 1.5881 ALBERT 5 0.4954 1.1785 RoBERTa 5 0.5323 1.2877 LaBSE 5 0.5396 1.0932 Table 1 Results of Evaluation Set Ratio Ruling Ruling Classifier Metric Argument Facts Precedent Of by Lower by Current Statute Decision Court Court P 0.2153 0.599 0.3922 0.694 0.07407 0.1667 0.7333 LaBSE R 0.7949 0.4812 0.2985 0.4462 0.1333 0.8846 0.7333 F 0.3388 0.5336 0.339 0.5432 0.09524 0.2805 0.7333 P 0.5849 0.6122 0.3148 0.6852 0.06383 0.4222 0.4762 BERT R 0.7949 0.5021 0.5075 0.5927 0.2000 0.7308 0.6667 F 0.6739 0.5517 0.3886 0.6356 0.09677 0.5352 0.5556 P 0.5172 0.6243 0.3200 0.6642 0.1034 0.2857 0.5500 RoBERTa R 0.7692 0.4728 0.3582 0.6201 0.2000 0.8462 0.7333 F 0.6186 0.5381 0.3380 0.6416 0.1364 0.4272 0.6286 Table 2 Class wise Results Classifier Precision Recall Macro F1-Score BERT (Run 2) 0.451 0.571 0.491 RoBERTa (Run 1) 0.438 0.571 0.475 LaBSE (Run 3) 0.411 0.539 0.409 Table 3 Results of Test data precedent and ratio of decision and has the best scores for predicting statutes. The LaBSE model does well in classifying statutes but poorly in classifying arguments whereas BERT was the best at classifying Arguments and the worst at Statutes. RoBERTa performs the best at ratio of decision and ruling by lower court classes. It also has highest precision with Facts. BERT performs consistently in most class predictions and has the best overall scores as shown in Table 3. All the classifiers struggle in the task of classifying decisions by lower courts with the highest F-Score being only 0.1346. 6. Conclusion The task given was to semantically label the sentences in a legal document into 7 rhetoric roles. Previously, the BERT and RoBERTa transformers, used as classifiers, produced the best results. In this paper we use the simpletransformers library and import the classifiers BERT, ALBERT, RoBERTa and LaBSE. First, we compare the runs of all 4 models and the 3 best performing models were chosen to predict the test data set. In the test data set the BERT classifier performed the best with RoBERTa being a close second. LaBSE outperformed the other two in predicting statutes but performed significantly worse in classifying arguments and Ruling by Current Court. The proposed method can be further improved by trying DynaBERT and ConvBERT transformers [11]. Acknowledgments We would like thank the Department Of Science and Technology (DST)-SERB funding scheme and HPC laboratory for providing the resources and space for our research. References [1] Majumder, S. B., & Das, D. (2020). rhetoric Role Labelling for Legal Judgements Using RoBERTa. In FIRE (Working Notes) (pp. 22-25). [2] Gaoa, J., Ninga, H., Han, Z., Kongb, L., & Qib, H. (2020). Legal text classification model based on text statistical features and deep semantic features. [3] Jain, R., Agarwal, A., & Sharma, Y. (2020). Spectre@ AILA-FIRE2020: Supervised rhetoric Role Labeling for Legal Judgments using Transformers. In FIRE (Working Notes) (pp. 66-70). [4] Wu, M., Wu, Z., Wang, X., & Han, Z. (2020). Retrieval Model and Classification Model for AILA2020. In FIRE (Working Notes) (pp. 82-86). [5] Balaji, N. N. A., Bharathi, B., & Bhuvana, J. (2020). Legal Information Retrieval and rhetoric Role Labelling for Legal Judgements. In FIRE (Working Notes) (pp. 26-30). [6] Bhattacharya, P., Ghosh, K., Ghosh, S., Pal, A., Mehta, P., Bhattacharya, A., & Majumder, P. (2020). Overview of the FIRE 2020 AILA Track: Artificial Intelligence for Legal Assistance. In FIRE (Working Notes) (pp. 1-11). [7] https://searchenterpriseai.techtarget.com/definition/BERT-language-model [8] https://huggingface.co/transformers/model_doc/albert.html [9] https://huggingface.co/transformers/model_doc/RoBERTa.html [10] https://towardsdatascience.com/labse-language-agnostic-bert-sentence-embedding-by- google-ai-531f677d775f [11] https://towardsdatascience.com/advancing-over-bert-bigbird-convbert-dynabert- bca78a45629c [12] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008. [13] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692, 2019. [14] V. Parikh, U. Bhattacharya, P. Mehta, A. Bandyopadhyay, P. Bhattacharya, K. Ghosh, S. Ghosh, A. Pal, A. Bhattacharya., P. Majumder, Overview of the third shared task on Artificial Intelligence for Legal Assistance at Fire 2021. In FIRE (Working Notes) - Forum for Information Retrieval Evaluation, India, December 13-17, 2021. V. Parikh, U. Bhattacharya, P. Mehta, A. Bandyopadhyay, P. Bhattacharya, K. Ghosh, S. Ghosh, A. Pal, A. Bhattacharya., P. Majumder, FIRE 2021 AILA track: Artificial intelligence for legal assistance. In Proc. of FIRE 2021 - 13th Forum for Information Retrieval Evaluation, India, December 13-17, 2021. [15] P. Bhattacharya, P. Mehta, K. Ghosh, S. Ghosh, A. Pal, A. Bhattacharya., P. Majumder, Overview of the Fire 2020 AILA track: Artificial Intelligence for Legal Assistance. In FIRE (Working Notes) - Forum for Information Retrieval Evaluation, Hyderabad, India, December 16-20, 2020. P. Bhattacharya, S. Paul, K. Ghosh, S. Ghosh, A. Wyner, Identification of Rhetorical Roles of Sentences in Indian Legal Judgments. In Proc. of JURIX 2019 - International Conference on Legal Knowledge and Information Systems, 2019. [16] V. Parikh, V. Mathur, P. Mehta, N. Mittal, P. Majumder, LawSum: A weakly supervised approach for Indian Legal Document Summarization. arXiv preprint arXiv:2110.01188v3.