<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UB at FIRE 2020 Precedent and Statute Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tebo Leburu-Dingalo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nkwebi Peace Motlogelwa</string-name>
          <email>motlogel@ub.ac.bw</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edwin Thuma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Monkgogi Modungo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Botswana</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>2</fpage>
      <lpage>7</lpage>
      <abstract>
        <p>In this paper we explore several retrieval strategies in an attempt to identify relevant statues and prior cases using a description of a current situation (current case). In particular, we investigate whether we can improve the retrieval performance of a precedent retrieval system by indexing only the key concepts in the prior case documents. In addition, we investigate whether we could improve the retrieval performance by expanding the original queries and performing retrieval on a summarized document collection. The results suggest that expanding the current case can improve the retrieval performance when the retrieval is performed on a summarised document collection of prior cases. For statute retrieval, we investigate whether the retrieval performance could be improved by extracting only the key concepts from the queries or by expanding the queries without summarising the statute documents. The results of this study suggest that summarising the current case can improve the retrieval performance of a statute retrieval system.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Precedent Retrieval</kwd>
        <kwd>Statute Retrieval</kwd>
        <kwd>Text Summaraization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Another factor identified in this regard is the tendency of law documents to be long and wordy,
which could impact retrieval performance when used as queries (current case). As several studies have
shown, longer or verbose queries are more dificult to process by IR systems when compared to their
shorter version. Bendersky and Croft [7] illustrate this in their exploration of a probabilistic model
for verbose queries using the newswire and web collections. The efectiveness of shorter queries
against longer queries is further confirmed by Huston and Croft [8] in their evaluation of query
processing techniques against data drawn from Yahoo! Answers CQA service. Research eforts towards
improving efectiveness of IR systems in the legal domain are currently being supported by several
international initiatives such as the Forum for Information Retrieval Evaluation (FIRE)1 platform. To
achieve this, the platform avails datasets against, which researchers can develop and evaluate
comparable IR systems. The datasets are availed through a series of tasks that address diferent aspects of
legal information retrieval.</p>
      <p>In this paper, we present our work that we submitted for participation at the Artificial Intelligence
for Legal Assistance (AILA)2, which is a series of shared tasks aimed at developing datasets and
methods for solving variety of legal informatics problem [9]. In particular, we participated in Task 1, which
focuses on precedent and statute retrieval. Precedent retrieval Task 1A focuses on the identification of
relevant prior cases for a given a legal situation representing a current case. Statute retrieval Task 1B
focuses on the identification of the most relevant statutes for a given legal situation. Our approach
explores the efectiveness of using shortened versions of both the query and document texts as opposed
to their original versions. To this end we deploy text summarization to find the most informative
terms to act as representatives for queries and documents in both retrieval tasks. The remainder of
this paper is organized as follows in Section 2 we present related work. Section 3 describes the
methods used in this study. In Section 4 we discuss our experiments. Section 5 discusses our results and
discussion.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Statute and Precedence Retrieval Systems</title>
        <p>Statute laws and precedents serve an important role in countries that follow the common law systems.
Statutes enable judges to apply legal principles when handling a case while precedents or prior decided
cases allow them to reach similar decisions for subsequent cases with similar issues or circumstances.
Additionally lawyers are able to use the resources as references when preparing for a case. Several
statute and precedence retrieval systems have since been proposed aimed at enabling judges and
lawyers timely access to these resources. Zhao [10] use a combination of IDF and improved BM25
to implement a competitive method for precedence retrieval. The BM25 model is enhanced by using
relevance scores of both the original and filtered case. The query case is filtered by selecting the
top ranking scored query terms based on IDF scores. Thenmozhi [11] deploy the use of Parts of
Speech(POS) tagging and a vector space model to implement a model for precedent retrieval. The
method uses both concepts and relationships from text as features. A feature vector is constructed
for each document based on TF-IDF scores. Prior cases are then retrieved and ranked for each current
case based on a cosine similarity measure. Shao et .al [12] obtain relative success with a vector space
based model for statute retrieval. The authors use both the original query and summary of the query
generated using TextRank. Candidate statues are constructed using both the title and the description
1http://fire.irsi.res.in/fire/2020/home
2https://sites.google.com/view/aila-2020
of the statutes.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Query Reduction in Legal Retrieval</title>
        <p>Reduction of verbose or lengthy queries in an efort to improve retrieval performance is an approach
that has been adopted by many in the literature. Driving this research is the fact that many studies
show that systems tend to perform better for shorter versions of queries as illustrated by [7] and [8].
Many strategies advanced towards legal retrieval thus deploy summarization techniques that seek to
represent a document with a subset of the most informative terms or key concepts from the document.
Thuma et. al [13] demonstrate the eficacy of this approach in a statute retrieval task. The authors
observe notable improvement in system performance when TagCrowd is used to generate query terms
using key concepts derived from a longer description of a query case. A degradation in performance is
further observed if the summarized query is expanded with informative terms from the corpus. Rossi
and Kanoulas [14] combine text summarization and a generalized language model BERT to measure
pairwise similarity between documents in a legal retrieval task. Text in this work is summarized using
a graph-based algorithm TextRank. Sandeep and Bharadwaj [15] obtain summarized versions of case
documents by filtering out insignificant terms based on a predefined threshold. The significance of a
term is determined by a linear combination of the term’s frequency and its POS tag weight. A nearest
neighbour approach is then used to determine similarity between the query and candidate documents.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Description of Methods</title>
      <p>In our experiments, we used</p>
      <sec id="sec-3-1">
        <title>3.1. Term Weighting Model</title>
        <p>(prior cases/ statutes).  
 
Our first proposed approach uses the  
 
as the main technique for both document ranking and retrieval
and text summarization. A brief description of approaches used is outlined below.</p>
        <p>term weighting model to rank and retrieve documents
is a numerical statistic that is calculated by taking the product of two
times term  occurs in document  [16]. The basic 
components; term frequency ( ) and inverse document frequency (
.</p>
        <p>) 
calculation is as follows:</p>
        <p>
          refers to the number of
 ( ) = log

 
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
where  is the total number of documents in collection  , and   is the number of documents the
term  occurs in.
  - 
and uses the  
 
3.2. Text summarization algorithm with  
 
The text summarization algorithm3 we used run on the Python Natural Language ToolKit (NLTK) 4
algorithm. The algorithm computes a score for each sentence as the sum of
scores of each word in the sentence as shown below:


 =
        </p>
        <p>=
∑
  - 
 
3https://towardsdatascience.com/text-summarization-using-tf-idf-e64a0644ace3</p>
        <p>The algorithm summarizes only those sentences with a sentence score greater than the threshold.
The threshhold is computed as the average score for sentences as follows:</p>
        <p>ℎℎℎ = ( ∑    )/(  )</p>
        <p>=  
We used the training queries to select the optimal threshold to use. In particular, we conducted several
experiments in which we varied the threshold, then performing actual retrieval, and lastly, evaluating
the retrieval performance. The most efective threshold of 0.35 was then used with the test datasets
to perform the actual retrieval.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Setting</title>
      <p>FAQ Retrieval Platform: For all our experiments, we used Terrier-4.25 [17], an open source
Information Retrieval (IR) platform. All the documents used in this study were first pre-processed before
indexing and this involved tokenising the text and stemming each token using the full Porter
stemming algorithm [18]. A comprehensive description of the test collection used in this study can be
found in Bhattacharya et. al [9].</p>
      <sec id="sec-4-1">
        <title>4.1. Task 1A: Precedent Retrieval</title>
        <p>A baseline retrieval was conducted using Terrier 4.2, the original prior case documents and the
original test queries using   -  as the term weighting model (UB-1). The second experiment used
summarised prior case documents to improve retrieval efectiveness by extracting only key concepts
from the prior case documents (UB-2). In the final run, we investigate whether we could improve
retrieval efectiveness by expanding the original queries with the top 10 terms selected from the top 3
ranked documents after the first pass retrieval (UB-3). We performed the retrieval on the summarised
prior case documents. For query expansion we used the Terrier 4.2 Bo1 model for query expansion to
select the expansion terms.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Task 1B: Statute Retrieval</title>
        <p>A baseline retrieval was conducted using Terrier 4.2, the original test corpus and the original test
queries using   -  as the term weighting model (UB-1). The second experiment used summarised
queries to improve retrieval efectiveness by extracting only key concepts from the queries (UB-2).
In the final run, we investigate whether we could improve retrieval efectiveness by expanding the
summarised queries with the top 10 terms selected from the top 3 ranked documents after the first
pass retrieval (UB-3). For query expansion we used the Terrier 4.2 Bo1 model for query expansion to
select the expansion terms.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Discussion</title>
      <p>Our results from both experiments were submitted to the AILA 2020 competition for evaluation by
the organizers. The evaluation for Task 1A and Task1B uses MAP, BPREF, recip_rank and P@10.
The results of Task 1A and Task1B based on the aforementioned evaluation measures are shown in
0.09
0.07
0.08
0.14
0.15
0.09
tailment task at coliee-2018, in: Twelfth International Workshop on Juris-informatics (JURISIN
2018), 2018.
[7] M. Bendersky, W. B. Croft, Discovering key concepts in verbose queries, in: Proceedings of the
31st Annual International ACM SIGIR Conference on Research and Development in
Information Retrieval, SIGIR ’08, Association for Computing Machinery, New York, NY, USA, 2008, p.
491–498. URL: https://doi.org/10.1145/1390334.1390419. doi:10.1145/1390334.1390419.
[8] S. Huston, W. B. Croft, Evaluating verbose query processing techniques, in: In Proc. of SIGIR,</p>
      <p>SIGIR ’10, 2010, pp. 291–298.
[9] P. Bhattacharya, P. Mehta, K. Ghosh, S. Ghosh, A. Pal, A. Bhattacharya, P. Majumder, Overview
of the FIRE 2020 AILA track: Artificial Intelligence for Legal Assistance, in: Proceedings of FIRE
2020 - Forum for Information Retrieval Evaluation, 2020.
[10] Z. Zhao, H. Ning, L. Liu, C. Huang, L. Kong, Y. Han, Z. Han, Fire2019@aila: Legal information
retrieval using improved BM25, in: FIRE (Working Notes), volume 2517 of CEUR Workshop
Proceedings, CEUR-WS.org, 2019, pp. 40–45.
[11] D. Thenmozhi, K. Kannan, C. Aravindan, A text similarity approach for precedence retrieval
from legal documents., in: FIRE (Working Notes), 2017, pp. 90–91.
[12] Y. Shao, Z. Ye, Thuir@aila 2019: Information retrieval approaches for identifying relevant
precedents and statutes, in: FIRE (Working Notes), volume 2517 of CEUR Workshop Proceedings,
CEUR-WS.org, 2019, pp. 46–51.
[13] E. Thuma, N. P. Motlogelwa, T. Leburu-Dingalo, M. Mudongo, Query reduction for an efective
japanese statute law retrieval, in: 2019 Conference on Next Generation Computing Applications
(NextComp), 2019, pp. 1–4. doi:10.1109/NEXTCOMP.2019.8883643.
[14] J. Rossi, E. Kanoulas, Legal information retrieval with generalized language models, Proceedings
of the 6th Competition on Legal Information Extraction/Entailment. COLIEE (2019).
[15] G. Sandeep, S. Bharadwaj, An extraction based approach to keyword generation and precedence
retrieval: Bits pilani-hyderabad., in: FIRE (Working Notes), 2017, pp. 74–77.
[16] S. Robertson, Understanding inverse document frequency: on theoretical arguments for idf, J.</p>
      <p>Documentation 60 (2004) 503–520.
[17] I. Ounis, G. Amati, P. V., B. He, C. Macdonald, Johnson, Terrier Information Retrieval Platform,
in: Proceedings of the 27th European Conference on IR Research, volume 3408 of Lecture Notes
in Computer Science, Springer-Verlag, Berlin, Heidelberg, 2005, pp. 517–519.
[18] M. Porter, An Algorithm for Sufix Stripping, Readings in Information Retrieval 14 (1997) 313–
316.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bing</surname>
          </string-name>
          ,
          <article-title>Performance of legal text retrieval systems: The curse of boole, Law</article-title>
          . Libr. J.
          <volume>79</volume>
          (
          <year>1987</year>
          )
          <fpage>187</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <article-title>Fire 2019 aila track: Artificial intelligence for legal assistance</article-title>
          ,
          <source>in: Proceedings of the 11th Forum for Information Retrieval Evaluation</source>
          , FIRE '19,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2019</year>
          , p.
          <fpage>4</fpage>
          -
          <lpage>6</lpage>
          . URL: https://doi.org/10.1145/3368567.3368587. doi:
          <volume>10</volume>
          .1145/3368567. 3368587.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L. K.</given-names>
            <surname>Branting</surname>
          </string-name>
          ,
          <article-title>A reduction-graph model of precedent in legal analysis</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>150</volume>
          (
          <year>2003</year>
          )
          <fpage>59</fpage>
          -
          <lpage>95</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Carvalho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. D.</given-names>
            <surname>Tran</surname>
          </string-name>
          , V.
          <article-title>-</article-title>
          <string-name>
            <surname>K. Tran</surname>
          </string-name>
          , L.
          <string-name>
            <surname>-M. Nguyen</surname>
          </string-name>
          ,
          <article-title>Improving legal information retrieval by distributional composition with term order probabilities</article-title>
          .,
          <source>in: COLIEE@ ICAIL</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>43</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K. T.</given-names>
            <surname>Maxwell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schafer</surname>
          </string-name>
          ,
          <article-title>Concept and context in legal information retrieval</article-title>
          ,
          <source>in: Proceedings of the 2008 Conference on Legal Knowledge and Information Systems: JURIX</source>
          <year>2008</year>
          :
          <article-title>The TwentyFirst Annual Conference</article-title>
          , IOS Press, NLD,
          <year>2008</year>
          , p.
          <fpage>63</fpage>
          -
          <lpage>72</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yoshioka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kiyota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Satoh</surname>
          </string-name>
          ,
          <article-title>Overview of japanese statute law retrieval and en-</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>