<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Finding Important Arguments from a Legal Case</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daniel Konstantynowicz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Franciszek Grzegorz Wojciechowski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Procheta Sen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Liverpool</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Within the legal field, the strength of an argument can be the deciding factor in the outcome of a case. Lawyers find themselves spending hours going over and analysing legal precedents and relevant statutes to build a compelling argument, and with the exponential growth in case data availability it can be useful for a lawyer if an artificial intelligence tool can automatically show important arguments present in a similar case from the past. In this work we propose an approach to estimate the importance of arguments using Natural Language Processing techniques. As a first step arguments are extracted from a legal case, and then the importance of each argument is estimated in an automated way. We explored both supervised and unsupervised approach to estimate the importance of arguments in a legal case.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Argument Importance</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Information Extraction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction
Within the legal field, the strength of an argument can be the deciding factor in the outcome
of a case. Lawyers find themselves spending hours going over and analysing legal precedents
and relevant statutes to build a compelling argument. With the exponential growth in data
availability and the need to be able to keep up with competitors, there has been an increasing
call for tools which could be used to assist in research and analysis of legal cases.</p>
      <p>Natural Language Processing (NLP) is a branch of artificial intelligence concerned with giving
computers the ability to understand text and spoken words in much the same way human beings
can. NLP combines rule-based modelling of the human language, with statistical, machine
learning, and deep learning models. Together, these technologies enable computers to process
human language in the form of text or voice data and to ‚Äòunderstand‚Äô its full meaning,
complete with the speaker or writer‚Äôs intent and sentiment 1.</p>
      <p>In this work, we use NLP techniques to analyse legal documents and to identify the strengths
of the arguments present within each case. The findings of this work will provide insights into
the potential of NLP in assisting lawyers with preparing for cases, as well as help assist judges
by showing what the collection of past precedents have dictated. We explored both supervised
(i.e. where training data is required) and unsupervised approaches (i.e. where no training data is
required) to estimate the importance of arguments. We used data from Indian legal proceedings
to estimate the importance of arguments. To the best of our knowledge, this is the first work on
estimating the importance of legal arguments using NLP techniques.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Proposed Methodology</title>
      <p>
        The rst step to estimate the argument importance is to automatically extract arguments from
a legal case. To this end, we apply a method proposed in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to automatically extract arguments
from the legal case. In the following subsection we have provided a brief overview of the
proposed approach for automatically extracting arguments from the legal case.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Argument Extraction [1]</title>
        <p>
          The study in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] showed that a legal case can be broken down into several rhetorical roles. The
di erent labels are a)Facts, b) Ruling by Lower Court, c) Argument, d) Statute, e) Precedent, f)
Ratio of decision, g) Ruling by Present Court. Facts refer to the chronology of events that led to
the ling of the case, and how the case evolved over time in the legal system. Ruling by Lower
Court refer to the judgments given by lower courts (Trial Court, High Court). Argument refers
to the discussion on the law that is applicable to the set of proven facts. Statute refers to the
established laws, which can come from a mixture of sources. Precedent refers to the prior case
documents. Ratio of decision refers to the application of the law along with reasoning/rationale
on the points argued in the case. Ruling by Present Court refers to the ultimate decision and
conclusion of the Court.
        </p>
        <p>
          [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] used a Hierarchical BiLSTM CRF model [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] to automatically assign one of the seven
rhetorical labels mentioned above to each sentence of a legal case document. They used a
manually labelled data by legal experts to train the BiLSTM CRF model. In the context of this
research, we are only interested in sentences belonging to Argument and Ratio of Decision
category in a legal document.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Argument Importance</title>
        <p>Once arguments are extracted, we explored both unsupervised and supervised approaches to
estimate the importance of a particular argument. Each one of them is described as follows.</p>
        <sec id="sec-2-2-1">
          <title>2.2.1. Unsupervised Approaches</title>
          <p>For unsupervised approach, clustering methods were used to identify unique arguments from
the set of arguments obtained from the method described in Section 2.1. The motivation for using
clustering is to investigate whether unique arguments can contribute to the most important
arguments in a legal case.</p>
          <p>Clustering Based Approaches We speci cally used DBSCAN (Density-Based Spatial
Clustering of Applications with Noise) clustering algorithm in this research scope. DBSCAN groups
similar data points based on their density. It has the advantage of nding arbitrarily shaped
clusters and identifying noise (outliers) in the data when the popular k-means clustering algorithm
has no ability to automatically determine the number of clusters, lacks the ability to identify
clusters with arbitrary shapes, and does not perform well with clusters of di erent densities.
To cluster the set of arguments, we have used two di erent representation methodologies to
present each argument sentence. Each one of them are described as follows.</p>
          <p>TF-IDF Vector Based Clustering TF-IDF Vectorizer converts each sentence in a vector of
oating point numbers. The main idea behind TF-IDF is to give more importance to words
that appear less frequently in the entire corpus but more frequently in an input instance, thus
helping to identify key terms that distinguish an input instance.</p>
          <p>
            BERT Based Clustering The second representation approach used for clustering is BERT[
            <xref ref-type="bibr" rid="ref3">3</xref>
            ].
BERT captures context of a sequence of words from both ways (i.e. both left and right context).
BERT was pre-trained on a large corpus of text that can learn in general language As a result
of this BERT is able to represent the semantic meaning present in a sentence better than the
TF-IDF approach.
          </p>
        </sec>
        <sec id="sec-2-2-2">
          <title>2.2.2. Supervised Approaches</title>
          <p>
            For supervised approach, we trained a neural network to estimate the importance of an argument.
The neural network takes an argument as an input, this is a vector that states which arguments
are and are not present. Depending on the pre-processing done the vector changes in size and
therefore the number of neurons in each layer of the model will change accordingly. The models
designed for this project was a sequential neural network, which were chosen due to their
adaptive learning, self-organization, and fault-tolerance capabilities([
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]). The model consists
of an input layer, 6 hidden layers. Each hidden layer utilises a sigmoid function and then an
output layer.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiment Setup</title>
      <p>Dataset We used a series of 3, 564 cases crawled by from Thomson Reuters Westlaw India
website(2). It contains cases ranging from 1951 to 2016 and ts into the following 5 categories:
Land &amp; Property, Constitutional, Criminal, Intellectual Property, Labour and industrial law. For
unsupervised approach we have used cases from all the categories. However, for supervised
approach we focused only on criminal cases. The reason for choosing only criminal cases for
supervised approach is that training a neural network should be better on similar types of cases.</p>
      <p>To train the model it is given an importance score for each argument in the legal case, this
is found by measuring the similarity between each argument and the ratio of decision. The
resulting importance scores are then normalised by dividing each score by the sum of all
the argument similarity scores. The output score vector has the same structure as the input
argument vector, with each index in the vector referencing the index of the argument in the
dictionary. However this time, instead of a binary input, if an argument is present its normalised
similarity score is provided otherwise the value is set to 0.</p>
      <p>Pre-Processing As a preprocessing step, we removed all the redundant information from
the legal case document. For example, information about citation and involving parties in each
legal case is a redundant information for our proposed approaches. Hence these information
was removed from the beginning and end of each legal case document. URLs and non-English
characters were also removed from each case document using regex pattern. We used sentence
tokenizer available in NLTK3 to convert each case document into an array of sentences.
Evaluation There is no publicly available dataset where the importance of arguments are
manually labelled by legal experts. As a result of this, we opted for an implicit judgement
of our proposed approach for nding important arguments. We rst took the nal decision
segment (sentences belonging to Ratio of Decision category) corresponding to each legal case.
It has been observed that the most important arguments presented during the court proceeding
is again described in the nal judgement section. Hence we compute the similarity of the
predicted important arguments and the nal judgement section. If that similarity is greater
than a particular threshold then we consider that predicted argument as one of the important
arguments. If out of K predicted arguments only k1 are important then the accuracy for that
particular instance is computed as k1/k ⇤ 100. We set the similarity threshold for our case as
0.4.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>Results for Unsupervised Approaches Figure 1 shows the visualization of the clusters
using both TF-IDF technique. It can be easily observed that with the increase in the number of
clusters the data points corresponding to di erent clusters are observed nearby. As a result of
this, it can be concluded that a decrease in the number of clusters will help to identify more
unique arguments from a case. We also had similar observation for BERT based clustering
approach. We manually observed the results obtained from the unsupervised approach and
our conclusion was that it was not being able to identify any important arguments. Hence we
nally opted for supervised approach.
Results for Supervised Approaches Figure 2 shows that the best performing version of the
proposed model gives 15% accuracy with Adagrad optimizer.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this work we investigated how we can use existing supervised and unsupervised NLP
techniques to estimate the importance of arguments in a legal case. The best performance
obtained from the neural approach solution was 15%. However, we would like to mention that
this is a work in progress. The results described in this paper is the output of initial investigation.
The major challenge in nding important arguments is the lack of labelled data and the noise
present in legal case data. We hope to achieve better solution with more sophisticated techniques
in future.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Paul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wyner</surname>
          </string-name>
          ,
          <article-title>Identi cation of rhetorical roles of sentences in indian legal judgments</article-title>
          , CoRR abs/
          <year>1911</year>
          .05405 (
          <year>2019</year>
          ). URL: http://arxiv.org/ abs/
          <year>1911</year>
          .05405. arXiv:
          <year>1911</year>
          .05405.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Bidirectional lstm-crf models for sequence tagging</article-title>
          ,
          <year>2015</year>
          . arXiv:
          <fpage>1508</fpage>
          .
          <year>01991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <article-title>An overview of neural approach on pattern recognition, 2020</article-title>
          . URL: https://www.analyticsvidhya.com/blog/2020/12/
          <article-title>an-overview-of-neural-approach-on-pattern-recognition/.</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>