Query Revaluation Method For Legal Information Retrieval Liang Liua, Lexiao Liub,*, Zhongyuan Hanc a Heilongjiang Institute of Technology, Harbin, China b Beihang University, Beijing, China c Foshan University, Foshan, China Abstract In this paper, we introduced in detail the method of implementing the task of identifying relevant prior cases in artificial intelligence for legal assistance. For the task, we transformed the problem into a retrieval task and used the BM25 retrieval model to try to make it perform better in this task. The improved method wins second place on MAP and the second place on BPREF. Keywords1 Legal Information Retrieval, Language Model, BM25, IDF, Identifying Relevant Prior Case 1. Introduction It is of great importance to give prior cases for the Common Law system2. A prior case (also called a precedent) is an older court case related to the current case, which discusses similar issue(s) and which can be used as a reference in the current case [1]. Therefore, legal practitioners need to find and study prior cases to study how to explain current issues in older cases. Artificial Intelligence for Legal Assistance (AILA) is a series of shared tasks aimed at developing datasets and methods for solving a variety of legal informatics problems [2]. AILA20203, which aims to develop an automatic system and focused on precedent and statute retrievals for a given legal scenario [5]. This year AILA will consist of two different tasks. Task 1 is the same as AILA 2019 and we will focus on task 1a in this paper. Generally, legal information retrieval is regarded as a rank task. Last year, we proposed an improvement of BM25 and achieved excellent results [3]. As early as 2004, the multiple weighted fields base on BM25 were proposed by Robertson [4]. So we will continue to improve the method of last year in 2020. 2. Methods For the task of Identifying relevant prior cases, we treated it as an information retrieval task and submitted three runs with BM25. 2.1. Data Pre-processing According to the statistics, the query, which is a description of the situation in Query_doc, contains over 500 words on average, and the document, which the prior case in Object_casedocs, contains over 3,000 words on average. For traditional retrieval, the query sentence in the task is too long. Consequently, we should preprocess the data to shorten the length of the sentence without losing its main meaning. As we all know, the common method is to remove all stop words, we also chose Forum for Information Retrieval Evaluation 2020, December 16-20, 2020, Hyderabad, India EMAIL: trueliuliang@gmail.com (A. 1); liulx15@yeah.net (A. 2)(*corresponding author); hanzhongyuan@gmail.com (A. 3) 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 2 https://en.wikipedia.org/wiki/Common_law/ 3 https://sites.google.com/view/aila-2020/track-description this method and converted the text to lowercase. Finally, we use Lucene toolkit4 to index the document. 2.2. double_liu_2020_1 For this submission, we chose the BM25 model and improved it by modifying its relevant calculation, as follows: n TF (qi , D )  (k1  1) BM25( D, Q )   IDF (qi )  |D| (1) i 1 TF (qi , D )  k1  (1  b  b  ) avgdl In this formula (1), we given the definition of BM25, where qi is the word in Q, |D| is the length of document D, and avgdl is the average document length in the text collection, k1 and b are the parameters of BM25. In this task, we set the parameter k1=2.99 and b=0.65. Furthermore, we modify the relevant computation to get an improved BM25. rel ( D, Q)  BM 25( D, Q)  BM 25( D, Q' ) (2) where Q is a query sentence with stop words removed, and Q' is a keyword that is further extracted from Q, here we choose the IDF algorithm to sort the words in Q, and form the top m% words into Q'. m is a free parameter, and we set m=50. 2.3. double_liu_2020_2 Inspired by the former, we split the method in double_liu_2020_1 into two sub-methods as our double_liu_2020_2 and double_liu_2020_3. In the double_liu_2020_2 submission, we chose the first half of formula (2) to form our method one, as follows: rel ( D, Q)  BM 25( D, Q) (3) All the other settings are followed double_liu_2020_1. 2.4. double_liu_2020_3 For this submission, we choose the second half of formula (2) to form method three, as shown below: rel ( D, Q)  BM 25( D, Q' ) (4) All the other settings are also followed double_liu_2020_1. 2.5. Other methods We also tried other experiments, but the results were not satisfactory. 2.5.1 Cosine Similarity For this method, we want to rank the cosine similarity between the query sentence and the document as an indicator. Firstly, we use the bag-of-words model to construct word vectors for the query sentence and the document respectively and then calculate the cosine similarity and rank. The formula for cosine similarity is as follows: 4 https://lucene.apache.org/ n A B (A  B ) i i rel(D, Q)  Cos ( A, B )   i 1 (5) | A|| B | n n  ( A )   (B ) i 1 i 2 i 1 i 2 Where A and B are two vectors. 2.5.2 Generalized Jaccard Similarity In this method, we choose to use generalized Jaccard similarity as an indicator to sort. n  min(A , B ) i i rel(D, Q)  J ( A, B )  i 1 n (6)  max( A , B ) i 1 i i 2.5.3 Cosine with Jaccard In this method, we improve the previous two methods and introduce the parameter k. The specific formula is as follows: rel(D, Q)  k  Cos( A, B)  (1  k )  J ( A, B) (7) where k is a free parameter, and we set k=0.3 3. Results 3.1. Evaluation Measures Standard Information retrieval metrics like Measures like Precision, Recall, Mean Average Precision (MAP)5, Discounted Cumulative Gain(DCG) and Mean Reciprocal Rank(MRR) will be used for evaluation in the task. 3.2. Evaluation Results Table 1. Results of the AILA Task 1a sorted by MAP Run_ID MAP BPREF recip_rank P@10 rank double_liu_2020_3 0.1382 0.1045 0.1886 0.07 2 double_liu_2020_1 0.1306 0.0737 0.1963 0.07 4 double_liu_2020_2 0.123 0.0621 0.1969 0.08 11 Jaccard 0.0820 0.0578 0.1464 0.07 - Cosine_Jaccard_k 0.0781 0.0563 0.1521 0.08 - Cosine 0.0490 0.0116 0.1579 0.04 - 5 https://trec.nist.gov/pubs/trec16/appendices/measures.pdf Table 2. Results of the AILA Task 1a sorted by BPREF Run_ID MAP BPREF recip_rank P@10 rank double_liu_2020_3 0.1382 0.1045 0.1886 0.07 2 double_liu_2020_1 0.1306 0.0737 0.1963 0.07 9 double_liu_2020_2 0.123 0.0621 0.1969 0.08 16 Jaccard 0.0820 0.0578 0.1464 0.07 - Cosine_Jaccard_k 0.0781 0.0563 0.1521 0.08 - Cosine 0.0490 0.0116 0.1579 0.04 - It can be seen from Table 1 and Table 2 that double_liu_2020_3 is the best among the methods we submitted. According to the results, the relevant prior case information is helpful to guide the judgment of current case. 4. Conclusion In this task, we describe a method that uses an improved BM25 to identify relevant priors, and it can be concluded that using certain algorithms to extract keywords will improve efficiency. Compared with other submissions of the task, our improved BM25 model can get the second place in MAP and BPREF. 5. Acknowledgments This work is supported by National Social Science Fund of China (No.18BYY125). 6. References [1] Mandal, A., Ghosh, K., Bhattacharya, A., Pal, A., Ghosh, S.: Overview of the fire 2017 irled trac k: Information retrieval from legal documents//Proceedings of FIRE 2017 - Forum for Information Retrieval Evaluation, 2017:63-68 [2] Bhattacharya, P., Ghosh, K, Ghosh, S., Pal, A., Mehta, P., Bhattacharya, A., Majumder P.: Overv iew of the FIRE 2019 AILA Track: Artificial Intelligence for Legal Assistance//Proceedings of F IRE 2019 - Forum for Information Retrieval Evaluation, 2019. [3] Zhao, Z., Ning, H., Huang, C., Kong, L., Han, Y., Han, Z.: Fire2019@aila: Legal information ret rieval using improved bm25//Proceedings of FIRE 2019 - Forum for Information Retrieval Evaluation, 2019:40-45. [4] Robertson S, Zaragoza H, Taylor M. Simple BM25 extension to multiple weighted fields//Procee dings of the thirteenth ACM international conference on Information and knowledge managemen t. 2004: 42-49. [5] P. Bhattacharya, P. Mehta, K. Ghosh, S. Ghosh, A. Pal, A. Bhattacharya., P. Majumder, Overview of the Fire 2020 AILA track: Artificial Intelligence for Legal Assistance. In Proc. of FIRE 2020 - Forum for Information Retrieval Evaluation, Hyderabad, India, December 16-20, 2020.