<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>S. Dutta)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Categorizing Roles of Legal Texts via Sequence Tagging on Domain-Specific Language Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sourav Dutta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Huawei Ireland Research Centre</institution>
          ,
          <addr-line>Dublin</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Automatically understanding rhetorical roles of text snippets within a legal document provides an interesting problem, enabling several downstream tasks like summarization of legal judgments, similar legal text search, and case analysis. The task is challenging as legal case documents are domain-specific, usually not well-structured, and rhetorical roles may be subjective. To this end, we present how sentence embeddings from domain-specific pre-trained language model can be combined with a sequence tagging classifier , to better understand the implicit sections within legal documents via long-term relationships, for sentence classification. Our proposed methodology secured the 1st rank, with an F1 score of 0.557 on the shared task 1 in the “Artificial Intelligence for Legal Assistance” (AILA) track of the Forum of Information Retrieval Evaluation (FIRE), 2021.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Legal Data Analytics</kwd>
        <kwd>Rhetorical Role Labelling</kwd>
        <kwd>Sentence Classification</kwd>
        <kwd>Language Model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The legal framework in most countries rely on two primary sources – Statues and Precedents.
Statutes are bodies of written law, such as the Constitution of a country, while Precedents denote
the prior cases as decided in other Courts of law. A legal representative, when presenting a case
in a court of law, must adhere to facts, relevant precedents and statues. Legal documents tend to
large and majorly unstructured [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], necessitating novel systems for automatically understand
and segment such documents into coherent meaningful parts. Such frameworks would improve
readability and assist legal representatives, but also enable diverse downstream tasks such as
semantic case search [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], legal document summarization [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ], and legal analysis [6].
      </p>
      <p>Rhetorical role labelling of sentences in a legal document refers to understanding the semantic
function a sentence in the document [7]. Although, legal case documents, in general, follow a
common thematic structure with implicit sections like “Facts”, “Issues” and “Arguments given
by parties”, this information is generally not specified explicitly in free-flowing case documents
and various themes often interleave with each other. To alleviate the above challenges, current
research in the domain of legal informatics involves the use of machine learning approaches
for supervised classification of legal texts. However, the presence of limited training data and
domain-specificity provides a challenge for automating the identification of rhetorical roles.</p>
      <p>To this end, we propose a framework for rhetorical role labelling of legal sentences. To
provide domain information, we rely on sentence embeddings from a fine-tuned
domainspecific transformer based language model. This enables the contextual understanding of the
domain-specific vocabulary and their relationship with the rhetoric labels. Further, as documents
implicitly have an underlying thematic structure, we adapt a Gated Recurrent Unit-Conditional
Random Field (GRU-CRF) based sequence tagging classifier to categorize the sentences into the
rhetoric labels. Our proposed methodology secured the 1st rank, with an F1 score of 0.557 on
the shared task 1 in the “Artificial Intelligence for Legal Assistance” (AILA) track of the Forum
of Information Retrieval Evaluation (FIRE), 2021[8, 9]. Variants of the classification framework
were also ranked at 2nd and 3rd positions – depicting the efectiveness of our framework.</p>
      <sec id="sec-1-1">
        <title>1.1. Related Work</title>
        <p>Legal document analysis and role labelling have traditionally been an expensive manual process
by domain experts. With the advent of legal analytics and machine learning techniques,
automatic labelling of the rhetorical role of legal sentences was studied. However, such approaches
rely on the availability of manual annotated datasets based on a set of rules crafted from domain
knowledge. An in-depth annotation study and curation of a gold standard corpus for the task
of sentence labelling can be found in [10].</p>
        <p>
          There have been several prior attempts towards automatically identifying rhetorical roles of
sentences in legal documents. A method for identification of factual and non-factual sentences
was developed in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] using fastText classifier, while Conditional Random Fields (CRF) were used
for rhetorical role labelling in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. The use of rule-based scripts, requiring lesser training data,
with machine learning approaches for rhetorical role identification was studied in [11].
        </p>
        <p>Deep learning model like hierarchical BiLSTM-CRF based classifier was recently shown to
perform better at than using only handcrafted linguistic features [7]. Further, use of pre-trained
language models like RoBERTa [12], along with TF-IDF based semantic features [13] were
shown to perform well on such tasks in AILA-2020.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Task Description</title>
      <p>The task of rhetorical role labelling of sentences in legal case judgements was first introduced
in AILA-2020 [14], following which a similar shared task was set up for AILA-2021 [8, 9]. The
dataset was created based on legal judgments from the Supreme Court of India, and were
subsequently manually labelled by legal experts [7].</p>
      <p>The sentences of the documents are to be classified into one the following seven labels:
• Facts: sentences that denote the chronology of events that led to filing the case;
• Argument: sentences that denote the arguments of the contending parties;
• Statute: relevant statute cited;
• Precedent: relevant precedent cited;
• Ruling by Lower Court: sentences correspond to the ruling/decision given by the lower
courts (e.g., Tribunal, High Court, etc.), as the dataset contains cases presented to the
Indian Supreme Court with a preliminary ruling from the lower courts;
• Ratio of the Decision: sentences that denote the rationale/reasoning given by the</p>
      <p>Supreme Court for the final judgement; and
• Ruling by Present Court: sentences that denote the final decision given by the Supreme</p>
      <p>Court for that case document.</p>
      <p>Overall, there were 70 annotated legal documents for training, with documents of difering
lengths and the annotated classes were unbalanced. In total there were around 11K training
sentences with Ratio of the Decision and Facts constituting 50% of the labels. The test suite
contained 10 documents for submission, with around 850 sentences for classification.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed Framework</title>
      <p>In this section, we introduce the diferent modules of our proposed framework for the above
task. In summary, we adopted 4 main strategies:
1. Sentence Embedding – Each sentence is encoded into a dense vector representation
using sentence embedding obtained from a fine-tuned domain-specific language model.
This provides semantic and contextual information of each sentence for classification tasks.
We also explore the use of both generic as well as domain-specific sentence embedding
techniques as discussed later;
2. Structural Encoding – Sentences belonging to a certain class might have structural
properties that are similar to each other (while being diferent from the sentences of
another class). For example, arguments might have diferent linguistic structure as
compared to court rulings, in terms of word ordering and their dependencies. Hence, in a
variant we employ sentence embeddings obtained from their dependency parse trees;
3. Meta-Embedding – Diverse embedding obtained from diferent encoding architectures
when concatenated have been shown in the literature to increase the overall classification
performance, as compared to the accuracy of individual embeddings. In our framework,
we concatenate sentence embeddings obtained from diferent language models to form
the final sentence representations provided to the classification layer; and
4. Document-Sentence Sequence Classification – As opposed to classifying each
sentence into the provided 7 labels, to incorporate a global view on the classification task,
we adapt sequence tagging [15] for sentence classification by considering long-term label
dependency between the sentences within a document. The intuition is that, documents
inherently might have a logical structure, which if considered, should improve the
overall accuracy of classification. For example, a document might first state the facts and
arguments, followed by ruling of the lower court, and finally the current court ruling.</p>
      <p>We next discuss the diferent strategies in more details, along with the various models
developed for the rhetoric role labelling task.</p>
      <sec id="sec-3-1">
        <title>3.1. Sentence Embeddings</title>
        <p>We compute high-dimension embeddings of sentences of the legal documents by using the
following two language models:
• Domain-Specific Model : To obtain domain based embedding of sentences, we use the
pre-trained “legal-bert-base-uncased” language model 1, which is pre-trained on diferent
legal documents. This enables our classification model to understand domain-based
terminology and semantic contextual information related to the domain of the task. We
further fine-tune the above model using the training data (of the task) with a batch size
of 64 and learning rate set to 2 − 5.
• Generic Model: For inducing a generic semantic understanding of natural language text
(for understanding general meaning of the sentences), we use the pre-trained sentence
transformer model “all-mpnet-base-v1” 2. This model is also fine-tuned on the training
dataset provided, with the same parameters as mentioned above (for the domain-specific
model).</p>
        <p>For both the above models, we use mean-pooling strategy to obtain the sentence embeddings.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Structural Encoding</title>
        <p>This module helps our framework to learn encodings of sentence structures, as additional
information cues. As mentioned previously, certain sentences (like arguments) might have
specific linguistic structures that might help in their classification. To this end, we use an
unsupervised approach based on Graph2Vec 3. The dependency parse tree (obtained using
SpaCy software) of each sentence is converted to an undirected graph and fed to Graph2Vec to
get the embedding of the sentence structures. Specifically we use the following parameters for
Graph2Vec: wl-iterations=1, dimensions=512, epochs=20, and min-count=1.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Meta-Embedding</title>
        <p>We combine the various sentence embeddings obtained from the above modules –
domainspecific, generic, and structural representations – in diferent combinations for the variants of
our framework (specified later). Meta-embedding is constructed by simply concatenating the
diferent sentence embeddings (the order of concatenation is irrelevant).</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Document-Sentence Sequence Classification</title>
        <p>The final classification module generates the output classes for each of the input sentences. As
discussed above, we deviate from the approach of individual sentence classification, but look
at the sentences in unison within a document – as this might enable the learning of implicit
structure and sentence class dependency within a document (e.g., a fact might be followed by
argument and then court ruling).</p>
        <p>Sequence tagging classifiers have been used in the literature for Named-Entity Recognition
(NER), Part-of-Speech (POS) tagging, and chunking [15]. We adopt a similar approach but for
document-sentence level classification. Specifically, in NER task, sentences are composed of
words, and each word in the word sequence (i.e., the sentences) is classified into entity types.
1available at huggingface.co/nlpaueb/legal-bert-base-uncased
2available at huggingface.co/sentence-transformers/all-mpnet-base-v1
3karateclub.readthedocs.io/en/latest/modules/root.html#karateclub.graph_embedding.graph2vec.Graph2Vec
Similarly, we consider a document to consist of sentences, and each sentence in the sentence
sequence (i.e., the document) is categorized by our classification layer. this enables information
lfow across the sentences in the document for identifying the sentence classes, improving the
overall classification accuracy of the proposed framework.</p>
        <p>Specifically, we use a bi-directional Gated Recurrent Unit (GRU) [ 16] along with a Conditional
Random Field (CRF) [17] layer on top as the classification module. A document is represented
as a sequence of sentences, and the input to the GRU units are sentence embedding obtained
from the diferent models as discussed above. We also use a fully connected and a dropout layer
for classification with Adam optimizer. We also use class weights during training the classifier
layer to account for training class data imbalance. The implementation uses tensorflow-keras
and tf2crf libraries, for classifying the sentences into the 7 categories.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. AILA-2021 Task 1 Results</title>
      <p>In this section, we report the performance of our framework on the rhetoric role labelling task
of AILA-2021. We consider the following three variants of our framework – where the input
sentence embeddings are diferent and depends on the embedding techniques used to create the
sentence meta-embedding. We use the following setup for the variants, as:
• Variant 1 — Only the fine-tuned domain-specific embedding (from
legal-bert-baseuncased language model) for each of the sentences is provided as input to the classification
layer. The total sentence embedding dimension is 768.
• Variant 2 – Here the fine-tuned domain-specific embedding along with unsupervised
structural embedding (from legal-bert-base-uncased and Graph2Vec) is concatenated for
each of the sentences is fed as input to the classification layer. That is, sentence
metaembedding (fine-tuned legal-bert-base-uncased + unsupervised graph2vec embedding) is
obtained from the 2 methods. The total sentence embedding dimension is 1280.
• Variant 3 – In the final run, we concatenate embeddings from all the three models to
obtain overall sentence meta-embedding, which is passed as input to the classification
layer. That is, we use domain-specific, generic and structural embedding to create the
sentence embedding (i.e, fine-tuned legal-bert-base-uncased + unsupervised graph2vec +
ifne-tuned all-mpnet-base-v1). The total sentence embedding dimension here is 2048.
The classification layer and other parameters are kept the same across all the diferent variants.
Empirical Scores. The classification performance is computed in terms of Precision, Recall
and F1-Score within each classification group, and is averaged to obtain the macro-F1 score for
the entire task. The results obtained by the proposed framework in the diferent category classes
are tabulated in Table 1.</p>
      <p>We obtained the 1st, 2nd and 3rd leaderboard positions for the task on the overall
performance of our framework, with a best F1 score of 0.557 from variant 1. It can be observed
that using only the fine-tuned language model for obtaining sentence embeddings achieved
the best overall results on this task dataset. The addition of structural embedding information
also produced comparable results (with a slight decrease in precision, but increase in the recall
scores). It is interesting to observe that the inclusion of generic sentence embeddings degraded
the classification performance of our model – as such embeddings fail to generalize in the
presence of domain-specific vocabulary.</p>
      <p>An important observation is that the Ruling by Lower Court is the hardest class for our
framework. This can be attributed to the presence of the closely related Ruling by Present Court
class, which might be confusing the classification module. This can be validated by the high
recall score but low precision score of the Ruling by Present Court category (see last row of
Table 1) – which indicates the model probably classifies both types into this single category.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In the work we present a framework for automatically assigning rhetorical roles to sentences
in legal documents. Our approach relies on embeddings from fine-tuned domain-specific
transformer based language model and sentence structure. The use of sequence tagging based
GRU-CRF classification layer enabled us to model long-distance sentence relationships within
documents for better classification performance. We secured ranks 1, 2 and 3 results on the
AILA-2021 shared task 1 with a best overall F1-score of 0.557.
[6] J. Savelka, K. D. Ashley, Segmenting U.S. Court Decisions into Functional and Issue Specific</p>
      <p>Parts, in: JURIX, 2018.
[7] P. Bhattacharya, S. Paul, K. Ghosh, S. Ghosh, A. Wyner, Identification of Rhetorical Roles
of Sentences in Indian Legal Judgments, in: Proc. International Conference on Legal
Knowledge and Information Systems (JURIX), 2019.
[8] V. Parikh, U. Bhattacharya, P. Mehta, B. A., P. Bhattacharya, K. Ghosh, S. Ghosh, A. Pal,
A. Bhattacharya, P. Majumder, Overview of the Third Shared Task on Artificial Intelligence
for Legal Assistance at Fire 2021, in: FIRE (Working Notes), 2021.
[9] V. Parikh, U. Bhattacharya, P. Mehta, B. A., P. Bhattacharya, K. Ghosh, S. Ghosh, A. Pal,
A. Bhattacharya, P. Majumder, FIRE 2021 AILA track: Artificial Intelligence for Legal
Assistance, in: Proceedings of the 13th Forum for Information Retrieval Evaluation, 2021.
[10] A. Z. Wyner, W. Peters, D. Katz, A Case Study on Legal Case Annotation, in: JURIX, 2013.
[11] V. R. Walker, K. Pillaipakkamnatt, A. M. Davidson, M. Linares, D. J. Pesce, Automatic
Classification of Rhetorical Roles for Sentences: Comparing Rule-Based Scripts with
Machine Learning, in: Workshop on Automated Semantic Analysis of Information in Legal
Texts (with ICAIL), 2019.
[12] S. B. Majumder, D. Das, Rhetorical Role Labelling for Legal Judgements Using ROBERTA,
in: Forum for Information Retrieval Evaluation-AILA, 2020.
[13] J. Gao, H. Ning, Z. Han, L. Kong, H. Qi, Legal Text Classification Model based on Text
Statistical Features and Deep Semantic Features, in: Forum for Information Retrieval
Evaluation-AILA, 2020.
[14] P. Bhattacharya, P. Mehta, K. Ghosh, S. Ghosh, A. Pal, A. Bhattacharya, P. Majumder,
Overview of the FIRE 2020 AILA Track: Artificial Intelligence for Legal Assistance, in:
Forum for Information Retrieval Evaluation, 2020.
[15] Z. Huang, W. Xu, K. Yu, Bidirectional LSTM-CRF Models for Sequence Tagging,
arXiv:1508.01991, 2015.
[16] K. Cho, B. van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio,
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine
Translation, in: Empirical Methods in Natural Language Processing (EMNLP), 2014, pp.
1724–1734.
[17] J. Laferty, A. McCallum, F. C. N. Pereira, Conditional Random Fields: Probabilistic Models
for Segmenting and Labeling Sequence Data, in: International Conference on Machine
Learning (ICML), 2001, pp. 282–289.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>O.</given-names>
            <surname>Shulayeva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Siddharthan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Z.</given-names>
            <surname>Wyner</surname>
          </string-name>
          ,
          <source>Recognizing Cited Facts and Principles in Legal Judgements, Artificial Intelligence and Law</source>
          <volume>25</volume>
          (
          <year>2017</year>
          )
          <fpage>107</fpage>
          -
          <lpage>126</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hiware</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rajgaria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Pochhi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <article-title>A Comparative Study of Summarization Algorithms Applied to Legal Case Judgments</article-title>
          , in: ECIR,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>I. Nejadghoii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bougueng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Witherspoon</surname>
          </string-name>
          ,
          <article-title>A Semi-supervised Training Method for Semantic Search of Legal Facts in Canadian Immigration Cases</article-title>
          , in: JURIX,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Saravanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ravindran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Raman</surname>
          </string-name>
          ,
          <article-title>Automatic Identification of Rhetorical Roles using Conditional Random Fields for Legal Document Summarization</article-title>
          , in:
          <source>International Joint Conference on Natural Language Processing</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Farzindar</surname>
          </string-name>
          , G. Lapalme,
          <source>LeTSum, An Automatic Legal Text Summarizing System</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>