UB at FIRE 2020 Precedent and Statute Retrieval
Tebo Leburu-Dingaloa , Nkwebi Peace Motlogelwaa , Edwin Thumaa and
Monkgogi Modungoa
a
    Department of Computer Science, University of Botswana


                                          Abstract
                                          In this paper we explore several retrieval strategies in an attempt to identify relevant statues and prior cases
                                          using a description of a current situation (current case). In particular, we investigate whether we can im-
                                          prove the retrieval performance of a precedent retrieval system by indexing only the key concepts in the prior
                                          case documents. In addition, we investigate whether we could improve the retrieval performance by expand-
                                          ing the original queries and performing retrieval on a summarized document collection. The results suggest
                                          that expanding the current case can improve the retrieval performance when the retrieval is performed on a
                                          summarised document collection of prior cases. For statute retrieval, we investigate whether the retrieval per-
                                          formance could be improved by extracting only the key concepts from the queries or by expanding the queries
                                          without summarising the statute documents. The results of this study suggest that summarising the current
                                          case can improve the retrieval performance of a statute retrieval system.

                                          Keywords
                                          Precedent Retrieval, Statute Retrieval, Text Summaraization


1. Introduction
The development of tools that enable users access to digitized legal content has been a major focus
area of research since the 1960’s [1]. Central to this has been the provision of search and retrieval
strategies that are effective in returning the most relevant information given the vast amount of dig-
ital content being generated daily in the legal field. To this end a notable number of Information
Retrieval(IR) systems have been developed for the legal field that borrow heavily from the Natural
Language Processing (NLP) and Machine Learning (ML) techniques already proven effective in gen-
eral search tasks. Activities that have benefited in this regard include the retrieval of relevant statutes
and precedents that can be used as references for an ongoing case. Statutes are written laws referred
to by judges in support of judicial decisions [2]. Precedents on the other hand are meant to support
the principle that obliges courts to follow decision reached in historical cases when making a ruling
on a similar case [3]. However as noted by Carvalho et al. [4] , Maxwell and Schaefer [5] and Yosh-
ioka et al. [6], majority of systems developed for the legal field inclusive of precedence and statute
retrieval systems are not able to achieve expected level of performance even when implemented with
approaches proved effective in general search tasks. This general lack of performance by IR systems
in the legal domain has been attributed to several factors including the inherent complex syntax and
domain specificity of language used in law documents [4].


Forum for Information Retrieval Evaluation 2020, December 16-20, 2020, Hyderabad, India
" leburut@ub.ac.bw (T. Leburu-Dingalo); motlogel@ub.ac.bw (N.P. Motlogelwa); thamae@ub.ac.bw (E. Thuma);
mudongom@ub.ac.bw (M. Modungo)
~ https://www.ub.bw (T. Leburu-Dingalo); https://www.ub.bw (N.P. Motlogelwa); https://www.ub.bw (E. Thuma);
https://www.ub.bw (M. Modungo)

                                       © 2020 Copyright for this paper by its authors.
                                       Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
   Another factor identified in this regard is the tendency of law documents to be long and wordy,
which could impact retrieval performance when used as queries (current case). As several studies have
shown, longer or verbose queries are more difficult to process by IR systems when compared to their
shorter version. Bendersky and Croft [7] illustrate this in their exploration of a probabilistic model
for verbose queries using the newswire and web collections. The effectiveness of shorter queries
against longer queries is further confirmed by Huston and Croft [8] in their evaluation of query pro-
cessing techniques against data drawn from Yahoo! Answers CQA service. Research efforts towards
improving effectiveness of IR systems in the legal domain are currently being supported by several
international initiatives such as the Forum for Information Retrieval Evaluation (FIRE)1 platform. To
achieve this, the platform avails datasets against, which researchers can develop and evaluate com-
parable IR systems. The datasets are availed through a series of tasks that address different aspects of
legal information retrieval.
   In this paper, we present our work that we submitted for participation at the Artificial Intelligence
for Legal Assistance (AILA)2 , which is a series of shared tasks aimed at developing datasets and meth-
ods for solving variety of legal informatics problem [9]. In particular, we participated in Task 1, which
focuses on precedent and statute retrieval. Precedent retrieval Task 1A focuses on the identification of
relevant prior cases for a given a legal situation representing a current case. Statute retrieval Task 1B
focuses on the identification of the most relevant statutes for a given legal situation. Our approach ex-
plores the effectiveness of using shortened versions of both the query and document texts as opposed
to their original versions. To this end we deploy text summarization to find the most informative
terms to act as representatives for queries and documents in both retrieval tasks. The remainder of
this paper is organized as follows in Section 2 we present related work. Section 3 describes the meth-
ods used in this study. In Section 4 we discuss our experiments. Section 5 discusses our results and
discussion.


2. Related Work
2.1. Statute and Precedence Retrieval Systems
Statute laws and precedents serve an important role in countries that follow the common law systems.
Statutes enable judges to apply legal principles when handling a case while precedents or prior decided
cases allow them to reach similar decisions for subsequent cases with similar issues or circumstances.
Additionally lawyers are able to use the resources as references when preparing for a case. Several
statute and precedence retrieval systems have since been proposed aimed at enabling judges and
lawyers timely access to these resources. Zhao [10] use a combination of IDF and improved BM25
to implement a competitive method for precedence retrieval. The BM25 model is enhanced by using
relevance scores of both the original and filtered case. The query case is filtered by selecting the
top ranking scored query terms based on IDF scores. Thenmozhi [11] deploy the use of Parts of
Speech(POS) tagging and a vector space model to implement a model for precedent retrieval. The
method uses both concepts and relationships from text as features. A feature vector is constructed
for each document based on TF-IDF scores. Prior cases are then retrieved and ranked for each current
case based on a cosine similarity measure. Shao et .al [12] obtain relative success with a vector space
based model for statute retrieval. The authors use both the original query and summary of the query
generated using TextRank. Candidate statues are constructed using both the title and the description

   1
       http://fire.irsi.res.in/fire/2020/home
   2
       https://sites.google.com/view/aila-2020
of the statutes.

2.2. Query Reduction in Legal Retrieval
Reduction of verbose or lengthy queries in an effort to improve retrieval performance is an approach
that has been adopted by many in the literature. Driving this research is the fact that many studies
show that systems tend to perform better for shorter versions of queries as illustrated by [7] and [8].
Many strategies advanced towards legal retrieval thus deploy summarization techniques that seek to
represent a document with a subset of the most informative terms or key concepts from the document.
Thuma et. al [13] demonstrate the efficacy of this approach in a statute retrieval task. The authors
observe notable improvement in system performance when TagCrowd is used to generate query terms
using key concepts derived from a longer description of a query case. A degradation in performance is
further observed if the summarized query is expanded with informative terms from the corpus. Rossi
and Kanoulas [14] combine text summarization and a generalized language model BERT to measure
pairwise similarity between documents in a legal retrieval task. Text in this work is summarized using
a graph-based algorithm TextRank. Sandeep and Bharadwaj [15] obtain summarized versions of case
documents by filtering out insignificant terms based on a predefined threshold. The significance of a
term is determined by a linear combination of the term’s frequency and its POS tag weight. A nearest
neighbour approach is then used to determine similarity between the query and candidate documents.


3. Description of Methods
In our experiments, we used 𝑇 𝐹 -𝐼 𝐷𝐹 as the main technique for both document ranking and retrieval
and text summarization. A brief description of approaches used is outlined below.

3.1. Term Weighting Model
Our first proposed approach uses the 𝑇 𝐹 -𝐼 𝐷𝐹 term weighting model to rank and retrieve documents
(prior cases/ statutes). 𝑇 𝐹 -𝐼 𝐷𝐹 is a numerical statistic that is calculated by taking the product of two
components; term frequency (𝑡𝑓 ) and inverse document frequency (𝑖𝑑𝑓 ). 𝑡𝑓 refers to the number of
times term 𝑡 occurs in document 𝑑 [16]. The basic 𝑖𝑑𝑓 calculation is as follows:
                                                                    𝑁
                                                    𝑖𝑑𝑓 (𝑡) = log                                       (1)
                                                                    𝑛𝑖
   where 𝑁 is the total number of documents in collection 𝐶, and 𝑛𝑖 is the number of documents the
term 𝑡 occurs in.

3.2. Text summarization algorithm with 𝑇 𝐹 -𝐼 𝐷𝐹
The text summarization algorithm3 we used run on the Python Natural Language ToolKit (NLTK) 4
and uses the 𝑇 𝐹 -𝐼 𝐷𝐹 algorithm. The algorithm computes a score for each sentence as the sum of
𝑇 𝐹 -𝐼 𝐷𝐹 scores of each word in the sentence as shown below:
                                                          𝑙𝑎𝑠𝑡 𝑤𝑜𝑟𝑑
                                       𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑠𝑐𝑜𝑟𝑒 =       ∑           𝑇 𝐹 -𝐼 𝐷𝐹𝑤𝑜𝑟𝑑𝑖
                                                         𝑖=𝑓 𝑖𝑟𝑠𝑡 𝑤𝑜𝑟𝑑

    3
        https://towardsdatascience.com/text-summarization-using-tf-idf-e64a0644ace3
    4
        https://www.nltk.org/
  The algorithm summarizes only those sentences with a sentence score greater than the threshold.
The threshhold is computed as the average score for sentences as follows:
                                         𝑙𝑎𝑠𝑡 𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒
                        𝑡ℎ𝑟𝑒𝑠ℎℎ𝑜𝑙𝑑 = (       ∑           𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑖 𝑠𝑐𝑜𝑟𝑒)/(𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓 𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑠)
                                     𝑖=𝑓 𝑖𝑟𝑠𝑡 𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒

We used the training queries to select the optimal threshold to use. In particular, we conducted several
experiments in which we varied the threshold, then performing actual retrieval, and lastly, evaluating
the retrieval performance. The most effective threshold of 0.35 was then used with the test datasets
to perform the actual retrieval.


4. Experimental Setting
FAQ Retrieval Platform: For all our experiments, we used Terrier-4.25 [17], an open source Infor-
mation Retrieval (IR) platform. All the documents used in this study were first pre-processed before
indexing and this involved tokenising the text and stemming each token using the full Porter stem-
ming algorithm [18]. A comprehensive description of the test collection used in this study can be
found in Bhattacharya et. al [9].

4.1. Task 1A: Precedent Retrieval
A baseline retrieval was conducted using Terrier 4.2, the original prior case documents and the orig-
inal test queries using 𝑇 𝐹 -𝐼 𝐷𝐹 as the term weighting model (UB-1). The second experiment used
summarised prior case documents to improve retrieval effectiveness by extracting only key concepts
from the prior case documents (UB-2). In the final run, we investigate whether we could improve
retrieval effectiveness by expanding the original queries with the top 10 terms selected from the top 3
ranked documents after the first pass retrieval (UB-3). We performed the retrieval on the summarised
prior case documents. For query expansion we used the Terrier 4.2 Bo1 model for query expansion to
select the expansion terms.

4.2. Task 1B: Statute Retrieval
A baseline retrieval was conducted using Terrier 4.2, the original test corpus and the original test
queries using 𝑇 𝐹 -𝐼 𝐷𝐹 as the term weighting model (UB-1). The second experiment used summarised
queries to improve retrieval effectiveness by extracting only key concepts from the queries (UB-2).
In the final run, we investigate whether we could improve retrieval effectiveness by expanding the
summarised queries with the top 10 terms selected from the top 3 ranked documents after the first
pass retrieval (UB-3). For query expansion we used the Terrier 4.2 Bo1 model for query expansion to
select the expansion terms.


5. Results and Discussion
Our results from both experiments were submitted to the AILA 2020 competition for evaluation by
the organizers. The evaluation for Task 1A and Task1B uses MAP, BPREF, recip_rank and P@10.
The results of Task 1A and Task1B based on the aforementioned evaluation measures are shown in

   5
       http://terrier.org/
Table 1
Task 1A: Precedent Retrieval
                   Run ID        MAP         BPREF         recip_rank         P@10
                    UB-1        0.1229        0.07            0.2033           0.09
                    UB-2        0.1168       0.0798           0.1967           0.07
                    UB-3        0.1573       0.1128           0.238            0.08

Table 2
Task 1B: Statute Retrieval
                   Run ID        MAP         BPREF         recip_rank         P@10
                    UB-1        0.3085       0.2633           0.573            0.14
                    UB-2        0.3134       0.2633          0.5787            0.15
                    UB-3        0.1876       0.1502          0.2468            0.09

Table 1 and Table 2 respectively. Our third approach (UB-3) in Task 1A, which uses expanded queries
for retrieval on a summarised document collection of prior cases performed better than all the other
systems that participated in the task across three evaluation measures (MAP, BREF and recip_rank,
Table 1) . For statute retrieval (Task 1B), an improvement in the retrieval performance was attained
after summarising the current case (Table 2). Our approach performed better than all other teams
that participated in the task when systems where only evaluated using recip_rank, which was 0.5787
(Table 2). Overall, the results of this study and also previous works suggest that document and query
summarisation can improve the retrieval performance for both statute and precedent retrieval. To
develop a full picture of the effects of summarization, additional studies will be needed that empirical
evaluate and develop novel document summarisation techniques tailored for the legal domain.


References
 [1] J. Bing, Performance of legal text retrieval systems: The curse of boole, Law. Libr. J. 79 (1987)
     187.
 [2] P. Bhattacharya, K. Ghosh, S. Ghosh, A. Pal, P. Mehta, A. Bhattacharya, P. Majumder, Fire 2019
     aila track: Artificial intelligence for legal assistance, in: Proceedings of the 11th Forum for
     Information Retrieval Evaluation, FIRE ’19, Association for Computing Machinery, New York,
     NY, USA, 2019, p. 4–6. URL: https://doi.org/10.1145/3368567.3368587. doi:10.1145/3368567.
     3368587.
 [3] L. K. Branting, A reduction-graph model of precedent in legal analysis, Artificial Intelligence
     150 (2003) 59–95.
 [4] D. S. Carvalho, V. D. Tran, V.-K. Tran, L.-M. Nguyen, Improving legal information retrieval by
     distributional composition with term order probabilities., in: COLIEE@ ICAIL, 2017, pp. 43–56.
 [5] K. T. Maxwell, B. Schafer, Concept and context in legal information retrieval, in: Proceedings of
     the 2008 Conference on Legal Knowledge and Information Systems: JURIX 2008: The Twenty-
     First Annual Conference, IOS Press, NLD, 2008, p. 63–72.
 [6] M. Yoshioka, Y. Kano, N. Kiyota, K. Satoh, Overview of japanese statute law retrieval and en-
     tailment task at coliee-2018, in: Twelfth International Workshop on Juris-informatics (JURISIN
     2018), 2018.
 [7] M. Bendersky, W. B. Croft, Discovering key concepts in verbose queries, in: Proceedings of the
     31st Annual International ACM SIGIR Conference on Research and Development in Informa-
     tion Retrieval, SIGIR ’08, Association for Computing Machinery, New York, NY, USA, 2008, p.
     491–498. URL: https://doi.org/10.1145/1390334.1390419. doi:10.1145/1390334.1390419.
 [8] S. Huston, W. B. Croft, Evaluating verbose query processing techniques, in: In Proc. of SIGIR,
     SIGIR ’10, 2010, pp. 291–298.
 [9] P. Bhattacharya, P. Mehta, K. Ghosh, S. Ghosh, A. Pal, A. Bhattacharya, P. Majumder, Overview
     of the FIRE 2020 AILA track: Artificial Intelligence for Legal Assistance, in: Proceedings of FIRE
     2020 - Forum for Information Retrieval Evaluation, 2020.
[10] Z. Zhao, H. Ning, L. Liu, C. Huang, L. Kong, Y. Han, Z. Han, Fire2019@aila: Legal information
     retrieval using improved BM25, in: FIRE (Working Notes), volume 2517 of CEUR Workshop
     Proceedings, CEUR-WS.org, 2019, pp. 40–45.
[11] D. Thenmozhi, K. Kannan, C. Aravindan, A text similarity approach for precedence retrieval
     from legal documents., in: FIRE (Working Notes), 2017, pp. 90–91.
[12] Y. Shao, Z. Ye, Thuir@aila 2019: Information retrieval approaches for identifying relevant prece-
     dents and statutes, in: FIRE (Working Notes), volume 2517 of CEUR Workshop Proceedings,
     CEUR-WS.org, 2019, pp. 46–51.
[13] E. Thuma, N. P. Motlogelwa, T. Leburu-Dingalo, M. Mudongo, Query reduction for an effective
     japanese statute law retrieval, in: 2019 Conference on Next Generation Computing Applications
     (NextComp), 2019, pp. 1–4. doi:10.1109/NEXTCOMP.2019.8883643.
[14] J. Rossi, E. Kanoulas, Legal information retrieval with generalized language models, Proceedings
     of the 6th Competition on Legal Information Extraction/Entailment. COLIEE (2019).
[15] G. Sandeep, S. Bharadwaj, An extraction based approach to keyword generation and precedence
     retrieval: Bits pilani-hyderabad., in: FIRE (Working Notes), 2017, pp. 74–77.
[16] S. Robertson, Understanding inverse document frequency: on theoretical arguments for idf, J.
     Documentation 60 (2004) 503–520.
[17] I. Ounis, G. Amati, P. V., B. He, C. Macdonald, Johnson, Terrier Information Retrieval Platform,
     in: Proceedings of the 27th European Conference on IR Research, volume 3408 of Lecture Notes
     in Computer Science, Springer-Verlag, Berlin, Heidelberg, 2005, pp. 517–519.
[18] M. Porter, An Algorithm for Suffix Stripping, Readings in Information Retrieval 14 (1997) 313–
     316.