<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Legal Statutes Retrieval: A Comparative Approach on Performance of Title and Statutes Descriptive Text</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Moemedi Lefoane</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tshepho Koboyatshwene</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Goaletsa Rammidi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>V. Lakshmi Narasimham</string-name>
          <email>lakshmi.narasimhang@mopipi.ub.bw</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Botswana</institution>
          ,
          <addr-line>Gaborone</addr-line>
          ,
          <country country="BW">Botswana</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Legal Statutes play a crucial role in the Justice system. For countries that adopt the common law system they are often cited in court decisions to argue cases of interest. AILA 2019 track presented two tasks; precedents retrieval task and statutes retrieval task1. Our team participated in the latter. The statutes provided consisted of two components namely; Title and Statute description. In this study we rst conduct the experiment to determine the best term weighting model for this task. After determining the best term weighting model, the second set of experiments which aimed to determine the extent to which these components (title and description of statutes) contribute to retrieval effectiveness. To nd out how retrieval e ectiveness is a ected by di erent components three experiments were conducted; the rst involved indexing title and description of each statute as a document, retrieval using IF B2 is performed generating the rst run (Baseline), the second experiment involved indexing only title disregarding description of the statutes, this generate the second run. For the nal experiment, only description of statutes are indexed disregarding title, again indexing, retrieval performed to generate the third run. The three runs were then sent to organisers for evaluation. The evaluation results shows our team came second, furthermore results suggest that indexing with title only and disregarding description of statutes is su cient enough for retrieval of statutes.</p>
      </abstract>
      <kwd-group>
        <kwd>Legal Statutes Retrieval Legal Text Mining Information Retrieval</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Information retrieval(IR) is concerned with nding documents of unstructured
text that are relevant to the information need from a collection of documents
or from other material provided. Material or a document is relevant if it has
information of value to satisfy the information need [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. As indicated earlier
      </p>
      <p>
        Arti cial Intelligence for Legal Assistance (AILA 2019) track was divided into
two tasks [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Our team participated in Legal Statutes Retrieval task, the goal
was to generate a ranked list of relevant statutes for each object query provided
in the dataset [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Experiments conducted by Tamrakar et al [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] on FIRE 2011 datasets
using di erent probabilistic models in Terrier 3.5 such as BM 25, BB2, IF B2,
In expB2, In expC2, InL2, DF R BM 25, DF I0 and P L2 yielded promising
results. The datasets used consisted of various documents from newspapers and
websites. M eanAverageP recision(M AP ) and R precision were used for
measuring the performance of the di erent models. The results indicated the highest
MAP value of 0.7846 for the IF B2 model with the usage of a sample or few of
news corpus dataset. IF B2 is one of the DF R models implemented in Terrier
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Another study conducted by Diana [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] used two variants from DF R models
namely, P L2 and DLH13 for the CHiC 2013 Lab using a collection of textual
cultural heritage objects based on the English and/or Italian languages. The
best performance was obtained using DLH13 for the monolingual experiments
with two of the collections which were made available.
      </p>
      <p>
        Divergence from Randomness (DFR) is a probabilistic keyword indexing
model, which was proposed by Amati et al [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and was then incorporated in
Terrier as one of the IR models. In DRF , a term weight is computed by
measuring the divergence between a term distribution produced by a random process
within the collection and the actual term distribution within a document. The
assumption is that some words are not equally important when describing the
content of the documents. Considering the entire document collection C, there is
a random distribution of words (such as stop words) that carry little information
or are deemed as less important across all documents. Another assumption is
that there is an elite set of documents that contain speciality words or terms
that are more informative following Poisson distribution [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>The rest of the paper is organised as follows; Section 2 outlines our proposed
approach detailing dataset description and experimental setup, 3 and 4 discuss
results and conclusion respectively.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <p>We submitted 3 runs for this task; The rst run formed the baseline. To generate
the second and nal run we relied on eld base indexing model to index; rst
title only without description of the statutes and nally indexing only description
without title of the statutes. The rest of this section provides more details on
how the runs were generated. For all the three runs we used IF B2 retrieval
model.
2.1</p>
      <sec id="sec-2-1">
        <title>Data set Description</title>
        <p>The dataset for this study consist of 50 object queries, of which the rst 10
formed part of the training data. The remaining queries (11 - 50) formed part
of the Test data for which 3 runs were generated and submitted to Forum for
Information Retrieval Evaluation (FIRE) for evaluation. For the training data,
relevance assessments were provided and the document collection for training
data consisted of statutes document collection of 197 states. The 197 statutes
also formed document collection for the Testdata set2.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Experimental Setup</title>
        <p>The rst part of the experiment was to address the question; which of the term
weighting model performed best for retrieval of statutes, so the experiment was
set for training data. In order to perform experiments the data set provided
was transformed into TREC Style format, that is for both object queries as well
as statutes documents. The parsed documents follows TREC format and shell
scripting was used for parsing. Section 2.3 and Section 2.4 illustrate the object
query as well as document/statute in TREC format.</p>
        <p>
          We used Terrier 4.23 [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] to perform all our experiments for indexing, retrieval.
For evaluation we used trec eval 9.04. The platform has been used successfully
for ad-hoc retrieval tasks. Preprocessing performed for all experiments are;
stemming using Potter's stemmer, stopwords were removed using Terrier stopword
list. We then performed retrieval using di erent term weighting models as
implemented in Terrier and the results are shown in Table 1. M eanAverageP recision
results revealed that Divergence from Randomness IF B2 overall performance
was better than the other models. We therefore chose IF B2 for the next set of
experiments to investigate retrieval e ectiveness of each of the statutes
component.
        </p>
        <p>To generate the rst run, we rst separated the given queries (qurey 1 - 50)
into training and test queries. Queries 1- 10 form training queries for our training
data, and queries 11 - 50 form test queries. The rst experiment was conducted
to investigating di erent weighting models as implemented in terrier in order to
nd which one performs the best on the training data. We observe that the IF B2
gives the best performance followed by LemurT F IDF and nally InExpB2. We
therefore generate the rst run (UBLTM1) using IF B2.</p>
        <p>For the second and and third run we transform into TREC style format
but this time with two elds namely: Title and Description. We then index the
statutes using title only and retrieve using test queries as well as IF B2 to
generate our second run (UBLTM2). For the nal run we indexed using description
only and retrieve using IF B2 to generate nal run (UBLTM3). The idea is so
that we can investigate the e ect of title and description only on the retrieval
e ectiveness of each of the two elds.
2 https://sites.google.com/view/ re-2019-aila/dataset-evaluation-plan
3 http://terrier.org/docs/v4.2/
4 https://trec.nist.gov/trec eval/</p>
      </sec>
      <sec id="sec-2-3">
        <title>Sample AILA Query transformed in to a format that can be used as query</title>
        <p>Because the aim of the study was to compare the extent to which components of
the statutes contributed to retrieval e ectiveness, the statutes were transformed
into two types of TREC Style document collection; one where the entire content
of the statutes i.e. the title and description of the statutes was transformed as
shown below:</p>
        <p>Below is a sample of part of AILA Q1 parsed into TREC TOPIC format:
&lt;TOP&gt;
&lt;NUM&gt; AILA Q1 &lt;/NUM&gt;
&lt;DESC&gt;Description:
The appellant on February 9, 1961 was appointed as an O cer in Grade III in
the respondent Bank ( for short 'the Bank'). He was promoted on April 1, 1968
to the Grade o cer in the Foreign Exchange Department in the Head O ce of
the Bank. Sometime in 1964,...[TEXT OMITTED]
...
&lt;/DESC&gt;
&lt;/NARR&gt;Narrative:
&lt;/TOP&gt;
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Sample Transformed Statute</title>
        <p>Below is a sample of part of prior case 0001 parsed into TREC DOCUMENT
format:
&lt;DOC&gt;
&lt;DOCNO&gt; S103&lt;/DOCNO&gt;
&lt;TITLE&gt;
Freedom to manage religious a airs
...
&lt;/TITLE&gt;
&lt;TEXT&gt;
Subject to public order, morality and health, every religious denomination or any
section thereof shall have the right- (a) to establish and maintain institutions
for religious and charitable purposes; (b) to manage its own a airs in matters
of religion; (c) to own and acquire movable and immovable property; and (d) to
administer such property in accordance with law.
...
&lt;/TEXT&gt;
&lt;/DOC&gt;
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>best model in terms of M AP . Table 2 shows top 9 results for runs submitted to
AILA 2019 organisers for evaluation. Our Team name in the table is UBLTM. In
the table, P @10 refers to Precision@10, M AP refers to Mean Average Precision,
BP REF refers to binary preference-based measure and RecipRank refers to
Reciprocal Rank.
Our experiments set out to investigate the extent to which di erent parts of
statutes contribute to retrieval e ectiveness, results reveal that titles of the
statutes contain su cient information to aid retrieval. For future work, the
nature of statutes could be investigated further to understand characteristics better
and inform the direction to take.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Bhattacharya.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <article-title>Overview of the Fire 2019 AILA track: Arti cial Intelligence for Legal Assistance</article-title>
          .
          <source>In Proc. of FIRE 2019 - Forum for Information Retrieval Evaluation</source>
          , Kolkata, India,
          <source>December 12-15</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Gianni</given-names>
            <surname>Amati</surname>
          </string-name>
          and Cornelis Joost Van Rijsbergen.
          <article-title>Proba- bilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness</article-title>
          .
          <source>ACM Trans. Inf. Syst</source>
          .
          <volume>20</volume>
          ,
          <issue>4</issue>
          (Oct.
          <year>2002</year>
          ),
          <fpage>357389</fpage>
          . DOI:http://dx.doi.org/10.1145/582415.582416 Christopher
          <string-name>
            <surname>D. Manning</surname>
            , Prabhakar Raghavan, and
            <given-names>Hinrich</given-names>
          </string-name>
          <string-name>
            <surname>Schtze</surname>
          </string-name>
          .
          <year>2008</year>
          . Introduction to Information Retrieval. Cambridge University Press, Cambridge, UK. http://nlp.stanford.edu/IR- book/information-retrieval-book.html
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>I.</given-names>
            <surname>Ounis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Amati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Plachouras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macdonald</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Lioma</surname>
          </string-name>
          .
          <article-title>Terrier: A High Performance and Scalable In- formation Retrieval Platform</article-title>
          .
          <source>In Proceedings of ACM SIGIR06 Workshop on Open Source Information Retrieval (OSIR</source>
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>I.</given-names>
            <surname>Ounis</surname>
          </string-name>
          , G. Amati, Plachouras V.,
          <string-name>
            <given-names>B.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macdonald</surname>
          </string-name>
          , and Johnson. Terrier Information Retrieval Platform.
          <source>In Pro- ceedings of the 27th European Conference on IR Research (ECIR 2005) (Lecture Notes in Computer Science)</source>
          , Vol.
          <volume>3408</volume>
          . Springer,
          <volume>517519</volume>
          (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>A.</given-names>
            <surname>Tamrakar</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Vishwakarma</surname>
          </string-name>
          .
          <article-title>Analysis of Proba- bilistic Model for Document Retrieval in Information Retrieval</article-title>
          .
          <source>In 2015 International Conference on Computational Intelli- gence and Communication Networks (CICN)</source>
          .
          <volume>760765</volume>
          . (
          <year>2015</year>
          ), DOI: http://dx.doi.org/10.1109/CICN.
          <year>2015</year>
          .155
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>D.</given-names>
            <surname>Tanase</surname>
          </string-name>
          .
          <article-title>Using the divergence framework for randomness: CHiC 2013 lab report</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          <volume>1179</volume>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Christopher</surname>
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Manning</surname>
            , Prabhakar Raghavan, and
            <given-names>Hinrich</given-names>
          </string-name>
          <string-name>
            <surname>Schtze</surname>
          </string-name>
          .
          <year>2008</year>
          . Introduction to Information Retrieval. Cambridge University Press, Cambridge, UK. http://nlp.stanford.edu/IR- book/information-retrieval-book.html
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>