1. Introduction

Rhetorical Labeling for Legal Judgements using fastText

Tebo Leburu-Dingalo

Edwin Thuma

Gontlafetse Mosweunyane

Nkwebi Peace Motlogelwa

0 0 Department of Computer Science, University of Botswana

This paper describes our participating systems in the FIRE AILA 2021 shared task on predicting rhetorical roles for sentences in a legal judgement document. In particular we propose three multi-class classifiers to predict for each of the sentences a rhetorical role from the following: facts, arguments, ratio of the decision, precedent, statutes, ruling of lower court and ruling of present court. Each of the classifiers uses a supervised fastText model. As input tokens the first classifier uses unigrams, the second one used bigrams and the last one uses trigrams. Our system that uses trigrams attains an F-Score of 0.340 followed closely by the bigram system at 0.338 while the baseline has a score of 0.317.

eol>Rhetorical Role Facts Arguments fastText

1. Introduction

ratio of the decision and ruling by the present court . The task was started as part of FIRE 2020 AILA Track [ 4 ]. For the task, a training dataset consisting of 50 documents containing 9, 308 sentences in total with rhetorical labels assigned by law experts was used, while the test dataset had an additional set of 10 case documents. The dataset was provided by [ 4 ]. The 21 runs submitted by the 9 teams employed diferent methods for rhetorical role labelling. The best performing system in terms of F-Score and Recall was by team ju_nlp [ 5 ] who experimented with the transformer architecture ROBERTA (state-of-the-art deep learning model) and BiLSTM with diferent epochs of the model training for the diferent runs. Scores attained for F-score and Recall were 0.468 and 0.501 respectively. Team heu_gjm [ 6 ] deployed TF-IDF features and deep semantic features using BERT, with diferent classifiers namely Logistic Regression, Linear Kernel SVM and AdaBoost. The BERT model with Logistic Regression gave the best precision for the task at 0.541. Team double_liu [ 7 ] used bag-of-words based features with SVM and Adaboost as classifiers. The team also used the BERT model, which outperformed all systems submitted in terms of accuracy at 0.619. Results from the task show that even with the use of complex deep learning methods rhetorical labelling remains a dificult problem to solve as none of the methods proposed achieves optimal performance. In this work we attempt to address the rhetorical role labelling problem through the use of a fastText classifier. FastText is a linear classifier which has been shown to perform on par with deep learning algorithms in text classification while training at faster speeds and utilizing less processing power [ 8, 9 ]. In addition our choice of the fastText model is motivated by its capability to support out of dictionary words which can be useful when working with domain specific corpora. Furthermore, the model allows the use of phrases as input tokens to preserve word order, a practice that has proven efective for classification problems [ 10, 11, 12 ]. Thus alongside exploring the efectiveness of the fastText classifier in the detection of rhetorical roles we will further investigate the efectiveness of using bigrams and trigrams in improving classification accuracy.

2. Methodology

Rhetorical Labelling (RL) entails segmenting a document into several coherent sentences and assigning rhetorical roles to these sentences. A rhetorical role describes a semantic function that a sentence plays in a document. The task calls for the labelling of sentences into seven roles as follows: Facts referring to the chronology of events that led to filling the case, ruling by lower court, Arguments of contending parties, relevant cited statute, relevant precedent cited, ratio of the decision referring to rationale/reasoning given for the final judgement and ruling by present court referring to the final decision given by the court. For our study, the task will be approached as a classification problem where each role is considered a class, and each instance of a sentence in a document is classified into only one of the classes. In our experiments we deploy a supervised fastText text classifier trained on the provided Task 1 dataset.

Fasttext 1 is an open source toolkit developed for efective learning of text representations and text classification [ 6 ]. FastText incorporates the context of words in its embeddings as surrounding words are taken into account when learning a word representation. Furthermore fastText represents each word as a bag of character n-grams in addition to the word itself which is useful for corpora with rare non-dictionary words. Text representations are obtained by averaging word representations. The representations are then fed into a linear classifier and classes determined by deploying a loss function that computes probability distribution over predefined class labels. By default fastText accepts unigrams as input tokens, however this can be varied to for instance bigrams and trigrams. The loss function generally used is the softmax which is can be changed to hierarchical softmax for larger number of classes to speed up training.

3. Experimental Setup 3.1. Dataset

The training dataset consists of 70 documents with variable number of sentences of diferent lengths. The test data consists of 10 documents also with varying number of sentences. Each of the sentences in the training dataset is annotated with one of the seven classes. It was noted that the data was unbalanced with an unequal distribution among the classes. Measures to be used for evaluation are Precision, Recall and F1 Score.

3.2. Platform

The Python Programming Language and its libraries is used for all experiments. The Fasttext Open Source Library is used for classification.

3.3. Pre-Processing

Training data sentences are converted to lower case, contractions fixed and punctuations removed. The NLTK library is used to remove stop words. The Porter Stemmer is used to stem the words. To conform to fasttext input file requirements, each sentence is rearranged and a prefix “ __label__” afixed to the start of each class label. The final format for each sentence is shown in the example below: __label__Facts none of her children survived her

For training, data is converted into input text files for training and validation using a ratio of 70/30.

4. Runs Description

In our approach we consider the influence of word order in improving performance. We therefore train a classifier with similar parameters while varying the length of word tokens. The submitted runs are for the diferent models of the classifier obtained for diferent input tokens. Each test sentence was pre-processed to lower case, fix contractions, remove stop-words and also stem the words. The Porter Stemmer is used to stem the words. For our baseline the fastText classifier model is trained for 25 epochs at a learning rate of 0.5 with WordNgrams set to unigrams.

In an efort to improve performance, in our second run we use bigrams as our input tokens while the model’s learning rate and epochs remain at 25 and 0.5 respectively. A slight improvement is noticed over the baseline in terms of both training accuracy and precision. 4.3. UB_BW RUN 3 In our third run the model’s parameters are retained as per the two previous runs, however the input tokens are set to trigrams. A negligible improvement is noted in terms of training accuracy and precision over the second run. Training data results based on Precision, and performance accuracy (on the training set) are shown in the Table 2 and Table 1.

5. Results and Analysis

The performance of our runs relative to other teams systems on the test data is shown in Table 2. For our baseline system we used unigrams as input tokens while for the second and third systems bigrams and trigrams were used respectively. It can be observed from the results that the system that the trigrams based system UB_BW RUN 3 performed much better than the baseline that used unigrams UB_BW RUN 1 across all measures. However a negligible diference is noticed between the trigrams and bigrams system UB_BW RUN 2. A category wise analysis of the results extracted from the results as shown in Table 1 shows all systems performed poorly in terms of predicting labels for the classes Ruling by Lower Court and Statute. It can also be noted that for other classes the baseline system and the bigram system outperformed the trigram system. However the trigram system outperforms the other systems in terms of F-score for all classes

6. Discussion and Conclusion

In this paper we explored the efectiveness of using phrases with the fastText classifier to assign rhetorical labels to sentences in a court case document. While our systems did not give good performance overall we believe that with enhancements and more training data the fastText classifier has potential to benefit the rhetorical labelling task. We also observe performance with the introduction of bigrams and trigrams in the model which indicates that phrases can have a positive influence in a classification task. Going forward we aim to further investigate the influence of phrases in improving text classification by performing empirical evaluation with various models.

[1]

Bhattacharya ,

Mehta ,

Ghosh ,

Pal ,

Bhattacharya ,

Majumder , Overview of the FIRE 2020 AILA track: Artificial intelligence for legal assistance , in: P. Mehta,

Mandl ,

Majumder , M. Mitra (Eds.), Working Notes of FIRE 2020 - Forum for Information Retrieval Evaluation, Hyderabad , India, December 16-20 , 2020 , volume 2826 of CEUR Workshop Proceedings, CEUR-WS.org , 2020 , pp. 1 - 11 .

[2]

Parikh , U. Bhattacharya,

Mehta ,

Bandyopadhyay ,

Bhattacharya ,

Ghosh ,

Pal ,

Bhattacharya ,

Majumder , Overview of the third shared task on artificial intelligence for legal assistance at fire 2021 , in: FIRE (Working Notes), 2021 .

[3]

Parikh , U. Bhattacharya,

Mehta ,

Bandyopadhyay ,

Bhattacharya ,

Ghosh ,

Pal ,

Bhattacharya ,

Majumder , Fire 2021 aila track: Artificial intelligence for legal assistance , in: Proceedings of the 13th Forum for Information Retrieval Evaluation , 2021 .

[4]

Bhattacharya ,

Paul ,

Ghosh ,

Wyner , Identification of rhetorical roles of sentences in indian legal judgments , in: M. Araszkiewicz , V. Rodríguez-Doncel (Eds.), Legal Knowledge and Information Systems - JURIX 2019 : The Thirty- second Annual Conference, Madrid, Spain, December 11-13 , 2019 , volume 322 of Frontiers in Artificial Intelligence and Applications , IOS Press, 2019 , pp. 3 - 12 .

[5]

S. B.

Majumder , D. Das , Rhetorical role labelling for legal judgements using ROBERTA , in: P. Mehta,

Mandl ,

[6]

Gao ,

Ning ,

Han , L . Kong,

Qi , Legal text classification model based on text statistical features and deep semantic features , in: P. Mehta,

Mandl ,

[7]

Liu ,

Liu , Z. Han, Query revaluation method for legal information retrieval , in: P. Mehta,

Mandl ,

[8]

Joulin , E. Grave,

Bojanowski , T. Mikolov, Bag of tricks for eficient text classification , CoRR abs/1607 .01759 ( 2016 ). arXiv: 1607 . 01759 .

[9]

Zolotov ,

Kung , Analysis and optimization of fasttext linear text classifier , CoRR abs/1702 .05531 ( 2017 ). arXiv: 1702 . 05531 .

[10]

Johnson , T. Zhang, Efective use of word order for text categorization with convolutional neural networks , CoRR abs/1412 .1058 ( 2014 ). arXiv: 1412 . 1058 .

[11]

Chang ,

Masterson , Using word order in political text classification with long short-term memory models , Political Analysis 28 ( 2020 ) 395 - 411 .

[12]

Jameel ,

Lam , L. Bing, Supervised topic models with word order structure for document classification and retrieval learning , Inf. Retr. J . 18 ( 2015 ) 283 - 330 .