1. Introduction

The Language Model for Legal Retrieval and Bert-based Model for Rhetorical Role Labeling for Legal Judgments

Yujie Xu

Tang Li

itangkk@gmail.com 1

Zhongyuan Han

hanzhongyuan@gmail.com 0 0 Foshan University , Foshan , China 1 Heilongjiang Institute of Technology , Harbin , China

This paper mainly introduces the solutions to the two tasks published in FIRE2020(forum for information retrieval evaluation), For Task1 (statistic retrieval), The task 1 is, for a given query(description of a situation), identify relevant statutes and prior-cases. This task includes two subtasks, Task1a (identifying relevant prior cases) and Task1b (identifying relevant statistics), For these two subtasks, we use the language model to score each query, and then rank them according to the score. For Task2(rhetorical role labeling for legal judgments), It requires us to classify sentences. We think it's a multi-classification problem, and finally, we use Bert to complete the classification task. In the final result, the score of Task1a is 0.125, the score of Task1b is 0.2003, and the accuracy of Task2 is 0.549. The results and experiments show that the language model is a better way to complete Task1 and Bert is better to complete task2.

1 Legal Retrieval Rhetorical Role Labeling Language Model Bert

1. Introduction

With the gradual maturity of the social legal system, laws and regulations have become more detailed and standardized, and people's demand for legal aid is gradually increasing. Compared with the low efficiency of artificial legal aid, a series of advantages such as high efficiency and high accuracy of artificial intelligence legal aid is gradually highlighted.

In this regard, FIRE2020 proposed a task and named it AILA2020 (artificial intelligence for legal assistance) to improve the legal aid of artificial intelligence, For the two subtasks in Task1, they provided 10 short descriptions of a legal situation, 3000 judgments delivered by the Supreme Court of India and 197 statutes (Sections of Acts) from Indian law. Retrieve the most relevant case documents or statements for a given query. For Task 2 they provide 8096 rhetorical sentences as training data and 1905 test data, Among them, 8096 training data sentences are classified into one of the following seven semantic segments / rhetorical roles, They are Fact, Ruling by Lower Court, Argument, Statute, Precedent, Ratio of the decision and Ruling by Present Court We are required to divide 1905 test data into these seven categories.

2. Methods 2.1 Methods for Task1a

Fig. 1 describes our method of solving Task1 with the language model.

2.2 Methods for Task1b

For Task1b, we not only choose the method of Task1a, but also choose the Jelinek-Mercer language model[ 2 ] to calculate the similarity between query and document. Eq.(2) is used to calculate the similarity between query and document. Before retrieval, we also process the given data by word-based n-gram and character-based n-gram. We find that the performance of character-based n-gram is much better than that of word-based n-gram, while the n-gram based on 2-7 achieves the best result.

p(w |ˆD )   D pML (w |ˆD )  (1   D ) p(w |ˆC ) （2）

2.3 Methods for Task2 3. Experimental Setting

2 http://www.lemurproject.org/ 3 https://github.com/bojone/bert4keras

For Task2, we think that this is a multi-classification problem. We use the Logistic Regression Model and lighter version of Bert3. The weight of bert is set with uncased_ L-12_ H768_ A-12. 8096 training data without any processing are used to fine-tuning the Bert model with the parameters(max-Len = 124, batch_ Size = 24, units = 7, epoch = 2).

3.1 Parameter Selection

For Task1a, we tried to take different values of μ and λof Two-Stage Language Model to observe their effects. In fig. 2, we take the results of different λ when μ= 1500 and μ= 2500.

For Task1b, we tried to take different values of μ and λ of Two-Stage Language Model to observe their effects. In Fig. 3, we take the results of different λ when μ=1500 and μ= 2500.

In Task1b, we tried to select different n-gram processing to observe their effects. The experimental results are shown in Table 1.

In conclusion, μ = 2500, λ = 0.8 can achieve better results. In Task1b processing, character level 2 + 3 + 4 + 5 + 6 + 7 has higher accuracy than other results.

3.2 Experimental Results

For Task 1, we submitted three groups of results. Table 2 and Table 3 are the experimental results of the test data we submitted[ 3 ].

For Task 2, we submitted two sets of results. Table 4 shows the experimental results of the test data we submitted.

4. Conclusions

This paper introduces the evaluation method we used in FIRE2020 AILA. Compared with other results, we have exposed many deficiencies. For the task of identifying related prior cases, the final evaluation results show that BM25 and TF-IDF are better than our methods. while for multiclassification tasks, Bert shows good results.

5. Acknowledgements

This work is supported by National Social Science Fund of China (No.18BYY125).

6. References

[1]

ChengXiang

Zhai , John Lafferty, “ Two-Stage Language Models for Information Retrieval” . The Twenty-Fifth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

[2]

Guodong

Ding , Bin Wang, “ GJM-2: A Special Case of General Jelinek-Mercer Smoothing Method for Language Modeling Approach to Ad Hoc IR ” . Information Retrieval Technology, Second Asia Information Retrieval Symposium , AIRS 2005 ,

Jeju

Island , Korea, October 13-15 , 2005 , Proceedings.

[3] Bhattacharya , Paheli and Mehta, Parth and Ghosh, Kripabandhu and Ghosh, Saptarshi and Pal, Arindam and Bhattacharya, Arnab and Majumder, Prasenjit, Overview of the FIRE 2020 AILA track: Artificial Intelligence for Legal Assistance . Proceedings of FIRE 2020 - Forum for Information Retrieval Evaluation . Hyderabad, India, December, 2020