<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Classification on Sentence Embeddings for Legal Assistance</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Arka Mitra</string-name>
          <email>thearkamitra@iitkgp.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Deeplearning, Sentence Embeddings, BERT, Classification, Natural Language Processing</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Indian Institute of Technology</institution>
          ,
          <addr-line>Kharagpur</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <fpage>13</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>Legal proceedings take plenty of time and also cost a lot. The lawyers have to do a lot of work in order to identify the diferent sections of prior cases and statutes. The paper tries to solve the first tasks in AILA2021 (Artificial Intelligence for Legal Assistance) that will be held in FIRE2021 (Forum for Information Retrieval Evaluation). The task is to semantically segment the document into diferent assigned one of the 7 predefined labels or ”rhetorical roles.” The paper uses BERT to obtain the sentence embeddings from a sentence, and then a linear classifier is used to output the final prediction. The experiments show that when more weightage is assigned to the class with the highest frequency, the results are better than those when more weightage is given to the class with a lower frequency. In task 1, the team legalNLP obtained a F1 score of 0.22.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Legal systems in many countries like USA, UK, Canada, India has two main sources- Precedents
and Statutes; Precedents are previous similar cases while statutes are written laws that have
to be followed in the country. The number of legal cases have been increasing and thus it is
quite dificult for a lawyer to go through many of the precedents. Additionally, the legal reports
in diferent countries are structured in diferent ways. Due to the lack of standardization, it is
necessary to have a method that can help the lawyer to identify the diferent sentences in the
report and process the report faster, while obtaining the relevant information quickly. The task
1 of AILA 2021 aims for the semantic segmentation of the document to assist the lawyer to
process the information faster.</p>
      <p>AILA 2021, held in collocation with FIRE 2021, has several tasks for legal informatics. Legal
documents follow certain sections like “Facts of the Case”, “Issues being discussed” etc which
are called “rhetorical roles”. For task 1, the sentences had to be classified into one of the seven
diferent classes. More details on the classes and the dataset have been given in section
3.</p>
      <p>The remaining of the paper is divided into the following sections: 2 which goes through the
related work done for Rhetorical labelling in legal reports; 3 that provides a detail of the dataset
that has been used; 4 describes the methodology that has been used for the paper; 5 showcases
the results that have been obtained; 6 discusses the results and provides insights on the diferent
models that have been used. 7 talks of the future work that would be done and 8 concludes the
paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Text segmentation has been an important task in natural language processing. There have been
probabilistic approaches that used Hidden Markov Models [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Maximum Entropy Markov
Models [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Saravanan and Ravindran [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] used Conditional Random Fields (CRF) for the
identiifcation of rhetoric labels for the segmentation and summarization of legal documents. Avelka
et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] used CRFs on annotated data from the US cyber crime and trade secrets decisions.
Bhattacharya et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] used CRF on top of a Bi-LSTM network to classify the sentences into
diferent categories.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset</title>
      <p>
        The AILA track started in 2019 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and it had focused on Precedent and Statute retrieval. The
second version of the same track focused [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] on precedent and statute retrieval as well as
rhetoric labelling [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The third iteration of the track [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] also includes rhetoric labelling but at
the same time, it contains a task for automatic summarization [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>There are 60 documents in the task 1 dataset with 11285 labelled sentences. Each of the sentences
has one of the seven possible labels:
• Facts : Sentences that discuss the facts about the case
• Ruling by Lower Court : The dataset contains Indian Supreme Court cases, which usually
have a ruling at a lower court like High Court or Tribunal; the label indicates that the
sentences are the decisions given in the lower court
• Argument : Arguments provided by the diferent parties
• Statute : The statute corresponding to the present case
• Precedent : The precedent corresponding to the present case
• Ratio of the decision : The reasoning given by the Supreme Court for the decision
• Ruling by Present Court: The final decision given by the Supreme Court</p>
    </sec>
    <sec id="sec-4">
      <title>4. Method</title>
      <p>The first subsection discusses the approach for the task and the next subsection provides the
experimental details.</p>
      <sec id="sec-4-1">
        <title>4.1. Approach</title>
        <p>The dataset contained seven diferent classes but the distribution among those classes are quite
skewed. Table 1 shows that the number of samples with the label “Ratio of decision” is about
keep the number of samples almost equal- that would allow the model to learn meaningful
information from each of the classes.</p>
        <p>There are three main methods to do that. In the first method, the sampling from the dataset
can be done in such a way that more samples from the lower frequency class are taken and
lower samples are taken from the class with higher frequency. The downside to this method is
that we are leaving several samples from the dataset which would decrease the performance of
the model. In another method, one can do the sampling in such a way that the same example
of the class with lower frequency is included into the batch multiple times such that the total
distribution in the new dataset has the same number for each of the classes. However, since
the same example is chosen multiple times, it is increasing the chances of overfitting and also
increases the computation overhead. The last method keeps the computational cost about the
same as the first method but at the same time does not change the dataset size. In this method,
the loss is changed so that false predictions for the class with lower number of samples is
penalized more. The new loss, as shown in Eqn. 1, is a modified version of the cross-entropy
loss. The weight multiplied with the cross-entropy can be considered as the number of times
the sample has been considered. A class with a higher number of samples in the dataset needs
to have a lower weight associated with it.</p>
        <p>
          (,  )
=  ℎ[ ] ∗ (−(
([ ])
∑ ([])

))
(1)
The author prepossessed the document and combined the sentences in all the documents and
the associated labels to create the dataset that has been used. BERT[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] was used for creating
the sentence embeddings for the sentences. BERT has been trained on a lot of data and thus
would be able to create a condensed representation of the sentence. The output of the “CLS”
token from the BERT output was considered to be the sentence embedding for the sentence
and then that was sent through a linear layer. The logits obtained from the linear layer were
considered and the maximum of them was selected to be the predicted class of the section. The
overall methodology has been described in Figure 1.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Experimental Details</title>
        <p>
          The cased and uncased BERT model have been implemeneted with the help of the Huggingface
library [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Pytorch has been used for the framework. The batch size of 8 has been considered
and the model has been trained for 4 epochs. 80% of the data has been considered as the training
set and the rest is considered as the validation set. The model weights which gave the best
results on the validation set had been saved and used for inference on the test set. AdamW
optimizer [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] with an initial learning rate of 2e-5 is used for training. The max length for the
padding was kept at about 0.98 percentile which is around 120 tokens. The codes are publicly
available on github1. The random seed has been set to 42.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>There are three runs which have been submitted finally. The macro-F1 score, precision and
recall of the diferent runs are given in Table. 2.</p>
      <p>The description of the three runs are as follows:
• The first run uses base cased BERT with weights to modify the cross entropy loss
• The second run uses base uncased BERT with the same weights as the previous run.
• The third and final run uses base cased BERT but here the weights are inverted such that
the class with higher number of samples is given more weights.</p>
      <p>1https://github.com/thearkamitra/LegalNLP</p>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>The results show that the cased BERT model performed better than the uncased model. This
can be explained by the fact that there may be some phrases in legal reports that have diferent
meanings when used in uppercase vs lowercase and the BERT cased model is able to capture the
contextual information. Due to the better performance of cased BERT, the author performed the
same experiment with the same random seeds but with diferent weight for the cross-entropy
loss. Comparison between the first and third run shows that the model performed better when
more weightage was given to classes that existed more abundantly. This contradicts the belief
that a model trained with skewed distribution would perform worse than one without. A
possible explanation might be that the test data has more sentences with labels corresponding
to the higher class. As a consequence, the metric reports a higher score for the third metric.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Future Work</title>
      <p>In the present work, the sentences from the documents had been extracted and aggregated to
form the dataset. However, there is a relation between the labels and where the sentence is
located in the document (for example, Ruling by Present Court is always present in the final
ending of the documents). Also, the author has not considered the co-occurance of the diferent
label. For that, Hidden markov model or some probabilistic state machine could be used to
further improve the accuracy of the model.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusion</title>
      <p>The paper describes the modified cross-entropy loss and the use of BERT models for rhetoric
role labelling in legal documents. The three runs that had been submitted obtained a score of
0.196, 0.192 and 0.22 respectively.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments</title>
      <p>
        The author thanks the organizers of Artificial Intelligence for Legal Assistance for creating this
task. The author would also like to acknowledge Google Colab for providing the computational
resources needed. The BERT model is built on the library made by Huggingface [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Borkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Deshmukh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sarawagi</surname>
          </string-name>
          ,
          <article-title>Automatic segmentation of text into structured records</article-title>
          ,
          <source>SIGMOD record 30</source>
          (
          <year>2001</year>
          )
          <fpage>175</fpage>
          -
          <lpage>186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>McCallum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Freitag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. C.</given-names>
            <surname>Pereira</surname>
          </string-name>
          ,
          <article-title>Maximum entropy markov models for information extraction and segmentation</article-title>
          , in: ICML,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Saravanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ravindran</surname>
          </string-name>
          ,
          <article-title>Identification of rhetorical roles for segmentation and summarization of a legal judgment 18 (</article-title>
          <year>2010</year>
          )
          <fpage>45</fpage>
          -
          <lpage>76</lpage>
          . URL: https://doi.org/10.1007/ s10506-010-9087-7. doi:
          <volume>10</volume>
          .1007/s10506- 010- 9087- 7.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Savelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. D.</given-names>
            <surname>Ashley</surname>
          </string-name>
          ,
          <article-title>Segmenting u.s. court decisions into functional and issue specific parts</article-title>
          ,
          <source>in: JURIX</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Paul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Z.</given-names>
            <surname>Wyner</surname>
          </string-name>
          ,
          <article-title>Identification of rhetorical roles of sentences in indian legal judgments</article-title>
          , ArXiv abs/
          <year>1911</year>
          .05405 (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <article-title>Fire 2019 aila track: Artificial intelligence for legal assistance</article-title>
          ,
          <source>Proceedings of the 11th Forum for Information Retrieval Evaluation</source>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <article-title>Overview of the fire 2020 aila track: Artificial intelligence for legal assistance</article-title>
          ,
          <source>in: FIRE (working notes)</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Paul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wyner</surname>
          </string-name>
          ,
          <article-title>Identification of rhetorical roles of sentences in indian legal judgments</article-title>
          ,
          <source>in: Proc. International Conference on Legal Knowledge and Information Systems (JURIX)</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V.</given-names>
            <surname>Parikh</surname>
          </string-name>
          , U. Bhattacharya,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bandyopadhyay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <article-title>Overview of the third shared task on artificial intelligence for legal assistance at fire 2021</article-title>
          , in: FIRE (Working Notes),
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>V.</given-names>
            <surname>Parikh</surname>
          </string-name>
          , U. Bhattacharya,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bandyopadhyay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <article-title>Fire 2021 aila track: Artificial intelligence for legal assistance</article-title>
          ,
          <source>in: Proceedings of the 13th Forum for Information Retrieval Evaluation</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: NAACL</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delangue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cistac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Funtowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Davison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shleifer</surname>
          </string-name>
          , P. von Platen, C. Ma,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jernite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Plu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. L.</given-names>
            <surname>Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gugger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Drame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lhoest</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Rush</surname>
          </string-name>
          ,
          <article-title>Huggingface's transformers: State-of-the-art natural language processing</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>1910</year>
          .03771.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>I.</given-names>
            <surname>Loshchilov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          ,
          <article-title>Decoupled weight decay regularization</article-title>
          ,
          <source>in: ICLR</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>