Legal Statutes Retrieval: A Comparative
Approach on Performance of Title and Statutes
Descriptive Text
Moemedi Lefoane, Tshepho Koboyatshwene, Goaletsa Rammidi, and V.
Lakshmi Narasimham
University of Botswana, Gaborone, Botswana
{moemedi.lefoane, tshepho.koboyatshwene, goaletsa.rammidi,
lakshmi.narasimhan}@mopipi.ub.bw
Abstract. Legal Statutes play a crucial role in the Justice system. For
countries that adopt the common law system they are often cited in
court decisions to argue cases of interest. AILA 2019 track presented
two tasks; precedents retrieval task and statutes retrieval task1 . Our
team participated in the latter. The statutes provided consisted of two
components namely; Title and Statute description. In this study we first
conduct the experiment to determine the best term weighting model for
this task. After determining the best term weighting model, the second
set of experiments which aimed to determine the extent to which these
components (title and description of statutes) contribute to retrieval ef-
fectiveness. To find out how retrieval effectiveness is affected by different
components three experiments were conducted; the first involved index-
ing title and description of each statute as a document, retrieval us-
ing IF B2 is performed generating the first run (Baseline), the second
experiment involved indexing only title disregarding description of the
statutes, this generate the second run. For the final experiment, only
description of statutes are indexed disregarding title, again indexing, re-
trieval performed to generate the third run. The three runs were then
sent to organisers for evaluation. The evaluation results shows our team
came second, furthermore results suggest that indexing with title only
and disregarding description of statutes is sufficient enough for retrieval
of statutes.
Keywords: Legal Statutes Retrieval · Legal Text Mining · Information
Retrieval .
1 Introduction
Information retrieval(IR) is concerned with finding documents of unstructured
text that are relevant to the information need from a collection of documents
or from other material provided. Material or a document is relevant if it has
information of value to satisfy the information need [7]. As indicated earlier
1
https://sites.google.com/view/fire-2019-aila/track-description
Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0). FIRE 2019, 12-15 December 2019,
Kolkata, India.
2 M. Lefoane et al.
Artificial Intelligence for Legal Assistance (AILA 2019) track was divided into
two tasks [1]. Our team participated in Legal Statutes Retrieval task, the goal
was to generate a ranked list of relevant statutes for each object query provided
in the dataset [1].
Experiments conducted by Tamrakar et al [5] on FIRE 2011 datasets us-
ing different probabilistic models in Terrier 3.5 such as BM 25, BB2, IF B2,
In expB2, In expC2, InL2, DF R BM 25, DF I0 and P L2 yielded promising
results. The datasets used consisted of various documents from newspapers and
websites. M eanAverageP recision(M AP ) and R − precision were used for mea-
suring the performance of the different models. The results indicated the highest
MAP value of 0.7846 for the IF B2 model with the usage of a sample or few of
news corpus dataset. IF B2 is one of the DF R models implemented in Terrier
[4].
Another study conducted by Diana [6] used two variants from DF R models
namely, P L2 and DLH13 for the CHiC 2013 Lab using a collection of textual
cultural heritage objects based on the English and/or Italian languages. The
best performance was obtained using DLH13 for the monolingual experiments
with two of the collections which were made available.
Divergence from Randomness (DFR) is a probabilistic keyword indexing
model, which was proposed by Amati et al [2] and was then incorporated in
Terrier as one of the IR models. In DRF , a term weight is computed by measur-
ing the divergence between a term distribution produced by a random process
within the collection and the actual term distribution within a document. The
assumption is that some words are not equally important when describing the
content of the documents. Considering the entire document collection C, there is
a random distribution of words (such as stop words) that carry little information
or are deemed as less important across all documents. Another assumption is
that there is an elite set of documents that contain speciality words or terms
that are more informative following Poisson distribution [2].
The rest of the paper is organised as follows; Section 2 outlines our proposed
approach detailing dataset description and experimental setup, 3 and 4 discuss
results and conclusion respectively.
2 Methodology
We submitted 3 runs for this task; The first run formed the baseline. To generate
the second and final run we relied on field base indexing model to index; first
title only without description of the statutes and finally indexing only description
without title of the statutes. The rest of this section provides more details on
how the runs were generated. For all the three runs we used IF B2 retrieval
model.
2.1 Data set Description
The dataset for this study consist of 50 object queries, of which the first 10
formed part of the training data. The remaining queries (11 - 50) formed part
Title Suppressed Due to Excessive Length 3
of the Test data for which 3 runs were generated and submitted to Forum for
Information Retrieval Evaluation (FIRE) for evaluation. For the training data,
relevance assessments were provided and the document collection for training
data consisted of statutes document collection of 197 states. The 197 statutes
also formed document collection for the Testdata set2 .
2.2 Experimental Setup
The first part of the experiment was to address the question; which of the term
weighting model performed best for retrieval of statutes, so the experiment was
set for training data. In order to perform experiments the data set provided
was transformed into TREC Style format, that is for both object queries as well
as statutes documents. The parsed documents follows TREC format and shell
scripting was used for parsing. Section 2.3 and Section 2.4 illustrate the object
query as well as document/statute in TREC format.
We used Terrier 4.23 [3] to perform all our experiments for indexing, retrieval.
For evaluation we used trec eval 9.04 . The platform has been used successfully
for ad-hoc retrieval tasks. Preprocessing performed for all experiments are; stem-
ming using Potter’s stemmer, stopwords were removed using Terrier stopword
list. We then performed retrieval using different term weighting models as imple-
mented in Terrier and the results are shown in Table 1. M eanAverageP recision
results revealed that Divergence from Randomness IF B2 overall performance
was better than the other models. We therefore chose IF B2 for the next set of
experiments to investigate retrieval effectiveness of each of the statutes compo-
nent.
To generate the first run, we first separated the given queries (qurey 1 - 50)
into training and test queries. Queries 1- 10 form training queries for our training
data, and queries 11 - 50 form test queries. The first experiment was conducted
to investigating different weighting models as implemented in terrier in order to
find which one performs the best on the training data. We observe that the IF B2
gives the best performance followed by LemurT F IDF and finally InExpB2. We
therefore generate the first run (UBLTM1) using IF B2.
For the second and and third run we transform into TREC style format
but this time with two fields namely: Title and Description. We then index the
statutes using title only and retrieve using test queries as well as IF B2 to gen-
erate our second run (UBLTM2). For the final run we indexed using description
only and retrieve using IF B2 to generate final run (UBLTM3). The idea is so
that we can investigate the effect of title and description only on the retrieval
effectiveness of each of the two fields.
2
https://sites.google.com/view/fire-2019-aila/dataset-evaluation-plan
3
http://terrier.org/docs/v4.2/
4
https://trec.nist.gov/trec eval/
4 M. Lefoane et al.
2.3 Sample AILA Query transformed in to a format that can be
used as query
Because the aim of the study was to compare the extent to which components of
the statutes contributed to retrieval effectiveness, the statutes were transformed
into two types of TREC Style document collection; one where the entire content
of the statutes i.e. the title and description of the statutes was transformed as
shown below:
Below is a sample of part of AILA Q1 parsed into TREC TOPIC format:
AILA Q1
Description:
The appellant on February 9, 1961 was appointed as an Officer in Grade III in
the respondent Bank ( for short ’the Bank’). He was promoted on April 1, 1968
to the Grade officer in the Foreign Exchange Department in the Head Office of
the Bank. Sometime in 1964,...[TEXT OMITTED]
...
Narrative:
2.4 Sample Transformed Statute
Below is a sample of part of prior case 0001 parsed into TREC DOCUMENT
format:
S103
Freedom to manage religious affairs
...
Subject to public order, morality and health, every religious denomination or any
section thereof shall have the right- (a) to establish and maintain institutions
for religious and charitable purposes; (b) to manage its own affairs in matters
of religion; (c) to own and acquire movable and immovable property; and (d) to
administer such property in accordance with law.
...
3 Results
Table 1 shows results for experiments performed to determine the best term
weighting model for Legal Statutes retrieval task. Results shows that IF B2 is the
Title Suppressed Due to Excessive Length 5
best model in terms of M AP . Table 2 shows top 9 results for runs submitted to
AILA 2019 organisers for evaluation. Our Team name in the table is UBLTM. In
the table, P @10 refers to Precision@10, M AP refers to Mean Average Precision,
BP REF refers to binary preference-based measure and RecipR ank refers to
Reciprocal Rank.
Table 1: Mean Average Precision for performance of different measures for Training
Set.
Query TFIDF LemurTFIDF BM25 DFRBM25 PL2 IFB2 InExpB2
AILA Q1 0.0200 0.0206 0.0217 0.0215 0.0213 0.0382 0.0206
AILA Q2 0.0601 0.0523 0.0608 0.0608 0.088 0.0624 0.0606
AILA Q3 0.1367 0.3986 0.1624 0.1504 0.1564 0.5169 0.3919
AILA Q4 0.0565 0.0864 0.0735 0.0718 0.0722 0.0517 0.0663
AILA Q5 0.0205 0.0214 0.0226 0.0227 0.0223 0.0634 0.0211
AILA Q6 0.0795 0.0886 0.0852 0.0878 0.0932 0.0615 0.0856
AILA Q7 0.1693 0.3270 0.3059 0.3028 0.1628 0.2895 0.3030
AILA Q8 0.0294 0.0538 0.0368 0.0365 0.0368 0.1007 0.0361
AILA Q9 0.0282 0.0324 0.0369 0.0362 0.0355 0.0475 0.0306
AILA Q10 0.3155 0.3130 0.3176 0.3384 0.3696 0.3871 0.3353
all Queries 0.0916 0.1394 0.1123 0.1129 0.1058 0.1619 0.1351
Table 2: Top 9 Results of the AILA runs for Task 2 – Statute Retrieval form AILA
2019 organisers.
Team name P@10 MAP BPREF Recip Rank
Yunqiu Shao thuir legal 0.0975 0.1566 0.0961 0.281
Yunqiu Shao thuir legal 0.0900 0.1318 0.0742 0.247
Yunqiu Shao thuir legal 0.0650 0.1115 0.0653 0.230
UBLTM 0.0725 0.1023 0.0571 0.211
UBLTM 0.0725 0.1023 0.0571 0.211
UBLTM 0.0725 0.1022 0.0571 0.214
Sara Renjit - CUSAT NLP 0.055 0.0967 0.0377 0.199
Soumil Mandal JU SRM 0.06 0.0918 0.0402 0.201
Sara Renjit - CUSAT NLP 0.055 0.0866 0.0412 0.202
4 Conclusion
Our experiments set out to investigate the extent to which different parts of
statutes contribute to retrieval effectiveness, results reveal that titles of the
6 M. Lefoane et al.
statutes contain sufficient information to aid retrieval. For future work, the na-
ture of statutes could be investigated further to understand characteristics better
and inform the direction to take.
References
1. P. Bhattacharya, K. Ghosh, S. Ghosh, A. Pal, P. Mehta, A. Bhattacharya., P. Ma-
jumder, Overview of the Fire 2019 AILA track: Artificial Intelligence for Legal
Assistance. In Proc. of FIRE 2019 - Forum for Information Retrieval Evaluation,
Kolkata, India, December 12-15, 2019.
2. Gianni Amati and Cornelis Joost Van Rijsbergen. Proba- bilistic Models of Informa-
tion Retrieval Based on Measuring the Divergence from Randomness. ACM Trans.
Inf. Syst. 20, 4 (Oct. 2002), 357389. DOI:http://dx.doi.org/10.1145/582415.582416
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schtze. 2008. Intro-
duction to Information Retrieval. Cambridge University Press, Cambridge, UK.
http://nlp.stanford.edu/IR- book/information-retrieval-book.html
3. I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald, and C. Lioma. Terrier: A
High Performance and Scalable In- formation Retrieval Platform. In Proceedings of
ACM SIGIR06 Workshop on Open Source Information Retrieval (OSIR 2006).
4. I. Ounis, G. Amati, Plachouras V., B. He, C. Macdonald, and Johnson. Terrier In-
formation Retrieval Platform. In Pro- ceedings of the 27th European Conference on
IR Research (ECIR 2005) (Lecture Notes in Computer Science), Vol. 3408. Springer,
517519 (2005).
5. A. Tamrakar and S. K. Vishwakarma. Analysis of Proba- bilistic Model for Docu-
ment Retrieval in Information Retrieval. In 2015 International Conference on Com-
putational Intelli- gence and Communication Networks (CICN). 760765. (2015),
DOI: http://dx.doi.org/10.1109/CICN.2015.155
6. D. Tanase. Using the divergence framework for randomness: CHiC 2013 lab report.
CEUR Workshop Proceedings 1179 (2013).
7. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schtze. 2008. Intro-
duction to Information Retrieval. Cambridge University Press, Cambridge, UK.
http://nlp.stanford.edu/IR- book/information-retrieval-book.html