<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Retrieval using Evidence Extraction from Court Judgements</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>TCS Research</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>India.</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Basit Ali</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ravina More</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sachin Pawar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Girish K. Palshikar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Evidence Extraction</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Evidence Information Model</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Natural Language Processing</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Prior Case Retrieval</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sentence Classifier</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hence</institution>
          ,
          <addr-line>we identify and use information about various</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>To demonstrate efectiveness of the proposed Evidence</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>strated in the experiments section. Moreover</institution>
          ,
          <addr-line>Ghosh et</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>tences. In the second step, we train a Weakly Supervised</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>One of the key constituents of court case descriptions is Evidence description and observations. Along with witness testimonies, evidence plays a significant role in the final decision of the case. We propose a weakly supervised technique to automatically identify sentences containing evidences. We represent the information related to evidences in these sentences in a semantically rich structure - Evidence Structure defined as an Evidence Information Model. We show that witness testimony information can also be represented using the same model. We demonstrate the efectiveness of our Evidence Information Model for the prior case retrieval application by proposing a matching algorithm for computing semantic similarity between a query and a sentence in a court case description. To the best of our knowledge, this is the first paper to apply NLP techniques for the extraction of evidence information from court judgements and use it for retrieving relevant prior court cases.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>rich semantic structure – Evidence Structure defined as
an Evidence Information Model. Along with Evidences,
we also identify and represent Witness Testimonies using
step approach for identifying evidence and testimony
sentences. In the first step, linguistic rules are used to
determine whether a sentence contains any evidence or</p>
      <sec id="sec-1-1">
        <title>Proceedings of the Fifth Workshop on Automated Semantic Analysis of</title>
      </sec>
      <sec id="sec-1-2">
        <title>Information in Legal Text (ASAIL 2021), June 25, 2021, São Paulo,</title>
        <p>nEvelop-O</p>
      </sec>
      <sec id="sec-1-3">
        <title>Brazil.</title>
        <p>© 2021 Copyright for this paper by its authors. Use permitted under Creative
testimonies. This structure does not capture important
semantic information like whether event is negated, what
are the causes behind the event, the manner in which
event takes place etc. We propose a richer semantic struc- 2.2. Evidence Structure
ture addressing these limitations and design a suitable
semantic matching algorithm for that structure. To the
best of our knowledge, this is the first paper to apply
NLP techniques for the extraction of evidence
information from court judgements and demonstrate its use for
retrieving relevant prior court cases.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Evidence Information Model</title>
      <sec id="sec-2-1">
        <title>The purpose of Evidence Information Model is to define</title>
        <p>a suitable structure to represent evidence information in
court judgements. In this section, we describe Semantic
Role Labelling in brief and how it is used to define our
proposed Evidence Structure.
2.1. Background: Semantic Role</p>
        <p>Labelling</p>
      </sec>
      <sec id="sec-2-2">
        <title>Semantic Role Labelling (SRL) is a technique in Natural</title>
        <p>Language Processing that identifies verbs/predicates in a
sentence, finds phrases connected to every predicate and
assigns an appropriate semantic role to every phrase. By
doing so, SRL helps machines to understand the roles of
important words within a sentence. Following are some
key semantic roles identified for a verb/predicate (often
corresponding to an action or event) by SRL techniques:
ARG0 : proto-agent or someone who performs the action
denoted by the verb
ARG1 : proto-patient or someone on whom the action
is performed on
ARGM-TMP : the time when the event took place
ARGM-CAU : the cause of the action
ARGM-PRP : the purpose of the action
ARGM-LOC : the location where the event took place
ARGM-MNR : the manner in which the action took
place
ARGM-NEG : the word indicating that the action did
not take place</p>
        <p>Consider the following example sentence:
O n A u g u s t 2 5 , 1 9 6 5 , t h e b a n k d i s h o n o u r e d t h e c h e q u e
d u e t o i n s u f f i c i e n t b a l a n c e .</p>
        <p>The various semantic roles to the verb d i s h o n o u r e d
are annotated as follows:
[ A R G M - T M P : O n A u g u s t 2 5 , 1 9 6 5 ] , [ A R G 0 : t h e b a n k ]
[ V : d i s h o n o u r e d ] [ A R G 1 : t h e c h e q u e ] [ A R G M - C A U : d u e
t o i n s u f f i c i e n t b a l a n c e ] .</p>
        <p>We use the predicates and corresponding arguments
obtained from the pre-trained AllenSRL model [3] to
instantiate our Evidence Structure for the queries and
candidate sentences.</p>
        <p>The Evidence Information Model represents every
Evidence Sentence giving information about one or more
Evidence Objects in an Evidence Structure. We define an
Evidence Object as one of the objects presented by the
counsels to the judge along with the information and
ifndings about the crime. It is thus, a physical entity
that can furnish some degree of support, contradiction
or opposition to some legal arguments. Some examples
of Evidence Objects are:
• Documents (a u t o p s y r e p o r t , p o s t - m o r t e m
r e p o r t , a f f i d a v i t , l e t t e r , c h e q u e , a g r e e m e n t ,
p e t i t i o n , F I R , s i g n a t u r e )
• Material objects (g u n , b u l l e t , c l o t h e s , k e r o s e n e</p>
        <p>c a n )
• Substances (p o i s o n , a l c o h o l , k e r o s e n e )
In Indian court case documents, such Evidence Objects are
also represented in the judgement document as E x h i b i t
A , E x . 2 , E v i d e n c e 2 3 and so on.</p>
        <p>On these lines, we define an Evidence Sentence as any
sentence containing one or more Evidence Objects
relevant to the current case but do not consist of
• any witness testimony which is not verifiable
• legal argumentation
• a reference to some prior case or some Act or</p>
        <p>Section
• directions or instructions given by the court or</p>
        <p>judge.</p>
        <p>We now present a formal definition of the Evidence
Structure. For every evidence present in an Evidence
Sentence, the structure consists of an optional Observation
Frame and a mandatory Evidence Frame. The Observation
Frame represents the source of the information and the
agent disclosing it. This information is optional as it may
or may not be explicitly stated in a sentence. It consists
of the following arguments:
• ObserverVerb or OV: The verb indicating
the observation/discovery/disclosure (e.g., f o u n d ,
r e v e a l e d , s t a t e d )
• ObserverAgent or A0: The source disclosing</p>
        <p>the information (e.g., p e r s o n , a g e n c y , a u t h o r i t y )
• EvidenceObject or EO: The Evidence Object in</p>
        <p>focus (e.g., p o s t - m o r t e m r e p o r t , F I R , l e t t e r )
The Evidence Frame captures details about the evidence
itself through the following arguments:
• EvidenceVerb or EV: the main verb of any
action, event or fact mentioned in a sentence or
revealed by the Evidence Object (e.g., k i l l e d , f o r g e d ,
e s c a p e d )
• Agent or A0: someone who initiates the action
indicated by the EvidenceVerb (e.g., t h e a c c u s e d ,</p>
        <p>R a m , A B C P v t . L t d . )
• Patient or A1: someone who undergoes the
action indicated by the EvidenceVerb. (e.g., t h e</p>
        <p>d e c e a s e d , a c h e q u e o f R s . 3 , 2 0 0 , h i s w i f e ) testimony sentences (e.g., stated, said) are treated similar
• Location or LOC: location where the action to observation verbs and represented using Observation
took place (e.g., i n t h e b e d r o o m , a t t h e b a n k , i n Frames. Similarly, other action/event verbs mentioned
M a l a y s i a ) in witness testimony sentences are represented using
• Time or TMP: timestamp of the action Evidence Frames. Table 2 shows examples of some
Wit(e.g., a b o u t 1 2 h o u r s b a c k , i n t h e m o r n i n g , o n ness Sentences along with the corresponding Evidence
M o n d a y ) Structure Instances. The advantage of representing
in• Cause or CAU: cause of the action (e.g., d u e t o formation about evidences and witness testimonies in
d o w r y , a s a r e s u l t o f t h e C B I e n q u i r y , o u t o f the same structure is that we can make use of both these
s h e e r s p i t e ) sources of information seamlessly, for prior case retrieval.
• Manner or MNR: manner in which the
action took place (e.g., a s p e r t h e c h a l l a n ,
f r a u d u l e n t l y , w i l f u l l y ) 3. Methodology
Table 1 shows examples of some Evidence Sentences</p>
        <p>In this section, we describe our overall methodology
Ianlosnogmweitchasthese,cOobrrseesrpvoantidoinngFrEavmideenmceayStbruecetumreptInysdtaunecteos. which consists of two phases. In the first phase, we
absence of ObservationVerb. In such cases, EvidenceOb- identify Evidence and Testimony Sentences using
linguisject may be present as a part of any argument in Evidence tic rules and weakly supervised sentence classifier. In the
second phase, we instantiate the Evidence Structures for
FFrraammee.oEf .tgh.,etfirhset scehneqtueencise pinreTseanblteas1 . 1 in the Evidence these identified sentences. For all our experiments, we</p>
        <p>Information about named entities and their types use a corpus of 30,032 Indian Supreme Court judgements
present in various arguments of Observation or Evidence ranging from the year 1952 to 2012.
frame is important. Hence, the Observation Frame and
Evidence Frame are also enriched by annotating enti- 3.1. Identification of Evidence and
ties such as P E R S O N , O R G A N I S A T I O N , G E O - P O L I T I C A L E N T I T Y , Testimony Sentences
L O C A T I O N , P R O D U C T , E V E N T , L A N G U A G E , D A T E , T I M E , P E R C E N T ,
M O N E Y , Q U A N T I T Y , O R D I N A L , C A R D I N A L , W E A P O N , S U B S T A N C E , We identify Evidence and Testimony sentences using a
D O C U M E N T , A R T I F A C T , W O R K _ O F _ A R T , W I T N E S S , B O D Y _ P A R T , and two-step approach. In the first step, we use linguistic
V E H I C L E present in the fields. rules to obtain Evidence and Testimony sentences. In the
Witness Information Model: Information in witness second step, we use these sentences to train a sentence
testimonies can also be represented using the same Ev- classifier.
idence Structure. The statement verbs used in witness</p>
        <p>He has categorically stated that by reason of enmity , A1 and A2 together have murdered his brother-in-law .
• OF = [ = stated,  0 = He]
EF = [ = murdered,  0 = A1 and A2 together,  1 = his brother-in-law,  = by reason of enmity]
Shri Dholey ( PW-6 ) reiterated about the dacoity and claimed that a pistol was brandished on him by one of the
accused persons .
• OF = [ = claimed,  0 = Shri Dholey ( PW-6 )]
EF = [ = brandished,  0 = by one of the accused persons,  1 = a pistol,  = on him]
Though he stated in the post-mortem report that death would have occurred about 12 hours back , he clarified that
there was possibility of injuries being received at about 9 A.M.
• OF = [ = stated,  0 = he,  = the post-mortem report]
EF = [ = occurred,  1 = death,   = about 12 hours back]
• EF = [ = clarified,  0 = he,  1 = that there was possibility of injuries being received at about 9 A.M. Deceased
Sarit Khanna was aged about 27 years]
He admitted , however , that Shri Buch had met him in connection with the covenant , but he denied that he had
received any letter Exhibit P-9 from Shri Buch or the lists Exhibits P- 10 to P- 12 regarding his private and
State properties , were a part thereof .
• OF = [ = admitted,  0 = He]
EF = [ = met,  0 = Shri Buch,  1 = him,   = in connection with the covenant]
• OF = [ = denied,  0 = He]</p>
        <p>EF = [ = received,  0 = he,  1 = any letter Exhibit P-9,  2 = from Shri Buch]
Step I: Linguistic Rules based Approach: As there • The classifier has two outputs - i) first output predicts
are no publicly annotated datasets for identification of a binary label indicating whether the sentence contains
Evidence and Testimony sentences, we rely on linguistic Evidence or not and ii) second output predicts a binary
rules to identify these sentences with high precision label indicating whether the sentence contains Testimony
as our first step. The linguistic rules for identifying or not.</p>
        <p>Evidence sentences are described in detail in Table 3. • 1824 sentences are labelled as Evidence and Testimony
These rules identified 62,310 sentences as Evidences both. These sentences are identified as Evidence as well
from our corpus. As there is no annotated dataset, as Testimony by both the sets of linguistic rules.
in order to estimate the precision of the linguistic • 60486 sentences are labelled as Evidence and
nonrules we use random sampling strategy. We selected Testimony. These sentences are identified as Evidence
a set 100 random sentences identified as Evidence by by the rules but not as Testimony.
the linguistic rule, and got them verified by a human • 34649 sentences are labelled as non-Evidence and
Testiexpert. The precision turned out to be 85%. Similarly, mony. These sentences are identified as Testimony by the
we use the linguistic rules proposed in Ghosh et al. [1] rules but not as Evidence.
for identifying Testimony and non-Testimony sentences • 14234 sentences are labelled as non-Evidence and
where the reported precision is around 85%. These non-Testimony. These sentences are identified as
nonrules identified 36,473 sentence as Testimony and 14,234 Testimony by the rules and not identified as Evidence.
sentences as non-Testimony from the same corpus. After this classifier is trained, we use it to classify all
the remaining sentences in the corpus. These sentences
Step II: Weakly Supervised Sentence Classification: are neither identified Evidence by the Evidence rules nor
We observed that although the linguistic rules identify as Testimony/non-Testimony by the Testimony rules.
UsEvidence and Testimony sentences with high precision, ing the prediction confidence, we selected top 10,000
they may miss to identify some sentences which should sentences classified as Evidence and top 5,000 sentences
have been identified as Evidence or Testimony (see exam- classified as Testimony. Table 4 shows some examples
ples in Table 4). Hence, we train a supervised sentence of sentences identified as Evidence by the classifier but
classifier to improve overall recall of identification of Ev- not by the linguistic rules. To estimate the precision,
idence and Testimony sentences. The classifier used is a we again employed the random sampling strategy. We
BiLSTM-based [4] multi-label sentence classifier whose selected 100 random sentences each from these high
conarchitecture is depicted in Figure 1. This classifier is ifdence Evidence and Testimony sentences and a human
weakly supervised since its training data is automatically expert verified them. The precision of 72% is observed for
created using the sentences identified by the linguistic Evidence sentences and 68% for Testimony sentences. The
rules as follows: precision of the sentence classifier is lower as compared</p>
        <p>Any sentence  should satisfy the following conditions in order to be identified as an Evidence Sentence:
E-R1  should contain at least one Evidence Object as defined in Section 2.2. The list of words corresponding to evidence
objects is created automatically by using WordNet hypernym structure. We create a list of all words for which the
following WordNet synsets are ancestors in hypernym tree – artifact (e.g., gun, clothes), document (e.g. report, letter),
substance (e.g. kerosene, blood). This list is looked up to identify evidence objects in a sentence.</p>
        <p>E-R2  should contain at least one action verb from a pre-defined set of verbs like tamper, kill, sustain, forge OR  should
contain at least one observation verb from a pre-defined set of verbs like report, show, find. Both the pre-defined sets
of verbs are prepared by observing multiple example sentences containing evidence objects.</p>
        <p>E-R3 In the dependency tree of  , the evidence object (identified by E-R1) should occur within the subtree rooted at the
action or observation verb (identified by E-R2) AND there should not be any other verb (except auxiliary verbs like
has, been, was, were, is) occurring between the two. This ensures that the evidence object always lies within the verb
phrase headed by the action or observation verb.</p>
        <p>S1: Raju PW2 took Preeti into the bath room at the instance of Accused No.1 who cut a length of wire of washing
machine and used it to choke her to death, who however, survived.</p>
        <p>S2: Raju PW2 took Satyabhamabai Sutar in the kitchen where the accused No.1 had already reached and was washing
the blood stained knife.</p>
        <p>S3: Hemlata was also killed by inflicting knife injuries.</p>
        <p>S4: Accused No.2 and Raju PW2 took the child into the room where Meerabai was lying dead in the pool of blood.
S5: Accused No.2 gave her blows by putting his knees on her stomach and when she was immobilised this way , the
Accused No.1 gave her knife blows on her neck with the result she also died.</p>
        <p>S6: Almirahs found in the flat were emptied to the extent the accused could put articles and other cash and
valuables in the air-bag obtained from the said flat.</p>
        <p>S7: Blood stained clothes of Accused No.2 were put in the air-bag along with stolen articles.
to the rules because, it is applied on a more dificult set
of sentences for which the linguistic rules fail to identify
any label. At the end of this two-step process (linguistic
rules followed by the sentence classifier), we have 112,401
sentences identified either as Evidence or as Testimony.
3.2. Evidence Structure Instances</p>
      </sec>
      <sec id="sec-2-3">
        <title>In this phase, we discuss the technique of instantiating</title>
        <p>Evidence Structures for sentences identified as Evidence
or Testimony in the previous phase. We used Semantic
Role Labelling [3] to identify and fill the arguments of
the Observation Frame and the Evidence Frame in the
Evidence Structure Instance for every candidate sentence.
This is demonstrated in Algorithm 1. We identify
Observation Frames using Observation Cue Verbs. For each of
these Observation Frames we identify the corresponding
Evidence Objects and Evidence Frames. For identifying
Evidence Objects, we first use Named Entity Recognition [ 5]
and WordNet based Entity Identification [ 6] to identify
the named entities in the sentence and annotate them in
the Frames extracted. The Evidence Objects in a phrase are
then obtained by selecting named entities annotated as
one of the following types - A R T I F A C T , V E H I C L E , W E A P O N ,
D O C U M E N T , W O R K _ O F _ A R T , S U B S T A N C E . This corresponds to
the  _  _ function used in Algorithm 1.
Observation Frames that do not contain a corresponding
Evidence Frame are redesigned as stand alone Evidence
Frames. We finally combine the Evidence Frame and the
Observation Frame into an Evidence Structure Instance.</p>
        <p>We measured the accuracy of 260 Evidence Structure
information present in a sentence on a phrase level. As
extraction is 86% and that of Evidence Frame extraction is
compared to this, we identify both Evidence and
Testi88%. We observed that most of the incorrect extractions
mony sentences, represent them in a rich structure and
were due to parsing error in the SRL model.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Prior Case Retrieval</title>
      <sec id="sec-3-1">
        <title>In order to demonstrate efectiveness of the proposed</title>
        <p>Evidence Structure, we apply it for the task of prior case
retrieval. This task is to create a relevance-based ranked
list of court judgements (documents) in our corpus for
a query. In order to retrieve prior cases for a query,
we represent the query using an Evidence Structure
Instance (  
query instance   
 ). We then compute the similarity of</p>
        <p>against each document instance</p>
        <p>obtained from every Evidence or Testimony
sentence in the corpus. Algorithm 2 shows the steps for
computing similarity. We refer to this algorithm as  ℎ
because of its semantic matching ability. We use cosine
similarity between the phrase embeddings of
corresponding arguments of the Evidence Structure Instances to
compute similarity. For obtaining phrase embedding
for any phrase (referred as  ℎ  
in Algorithm 2),
we consider the average of GloVe word embeddings [7]
of the words in that phrase excluding stop words. We
compute the similarity scores within corresponding
arguments of both the frames. These scores across diferent
arguments are combined to get a final similarity score
between   
 and   
 . We multiply the final
similarity score by a Sentence BERT [8] based similarity
score between the query and the sentence containing
  
 . This is necessary because errors in the
autoalso use that for prior case retrieval. This is a challenging
task due to the inherently complex nature of legal texts
and the finer granularity of matching involved.</p>
        <p>Ji et al. [12] propose an Evidence Information
Extraction system which captures evidence production
paragraph, evidence cross-examination paragraph,
evidence provider, evidence name, evidence content,
crossexamination party and cross-examination opinion
relating to an evidence presented in the court. While this
technique may suit well for Chinese court records that
follow a relatively structured representation, it does not
suit well to the Indian Court Records that contain
descriptive and varied formats of the court proceedings.</p>
        <p>Gomes and Ladeira [13] and Landthaler et al. [14]
performs full text search for legal document collection by
obtaining word2vec word embeddings and then taking
their average for computing similarity. However,
computing the average of the embeddings gives a lossy
representation where relative order of the words is lost. In
contrast, we represent the sentences using the Evidence
Structure Instances, where the structure itself takes care
of the relative ordering. Gomes and Ladeira [13]
demonstrate BM25 and TF-IDF for Prior Case retrieval. In our
results section, we demonstrate the comparative poor
performance of BM25 and TF-IDF in handling corner
cases.
6. Experimental Evaluation
sentences provides a complementary view of capturing
sentence similarity. Finally, the overall relevance score
of the query with a document is the maximum score
corresponding to any Evidence Structure Instance   
obtained from the document. Table 5 shows a running
example of how a similarity score is computed between

an Evidence Structure Instance (  
and an Evidence Structure Instance (  
document in the corpus.
 ) from a query
 ) from a</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Related</title>
    </sec>
    <sec id="sec-5">
      <title>Work</title>
      <sec id="sec-5-1">
        <title>While the task of evidence extraction from legal docu</title>
        <p>ments is related to several information retrieval and NLP
lot et al. [9] and Cartright et al. [10] have worked on</p>
      </sec>
      <sec id="sec-5-2">
        <title>Evidence Retrieval that identifies whole documents that</title>
        <p>contain an evidence. On the other hand, Rinott et al. [11]
mated SRL tool may lead to imperfect Evidence Structure
In this section, we discuss our experiments including
instances in some cases. A sentence similarity score
the dataset, baseline techniques, evaluation metrics and
which is not dependent on any such structure within the
analysis of results.
tasks, there are no established baselines for the task. Bel- in Trotman et al. [15]. This technique uses a
bag-of6.1. Dataset</p>
      </sec>
      <sec id="sec-5-3">
        <title>We use the Indian Supreme Court judgements from years</title>
        <p>1952 to 2012 freely available at http://liiofindia.org/in/
cases/cen/INSC/. There are 30032 court (documents)
containing 4,111,091 sentences where average sentence
length is 31 words and standard deviation of 24.
6.2. Baselines
For the task of prior case retrieval, we implement two
baseline techniques:
• BM25: It is a popular TF-IDF based relevance
computation technique. We use the BM25+ variant1 as described
words approach that ignores the sentence structure. We
1https://pypi.org/project/rank-bm25/
is referred as  . Underlines indicate the best performing results for each query across
6.3. Evaluation
document:
•  25
•  25
•  25
•  25

: Only Testimony sentences
 : Only Evidence sentences
 : All sentences
  : Only Testimony or Evidence sentences
• Sentence-BERT [8]: This technique is based on</p>
      </sec>
      <sec id="sec-5-4">
        <title>Siamese-BERT networks to obtain more meaningful sen</title>
        <p>tence embeddings as compared to vanilla BERT [16].
We used the pre-trained model
bert-base-nli-stsb-meantokens to obtain sentence embeddings for sentences.
Following Ghosh et al. [1], we use the pre-trained model
as it is and did not fine-tune it further. This is because
such fine-tuning needs annotated sentence pairs with
labels indicating whether the sentences in the pair are
semantically similar or not. Such annotated dataset is
expensive to create and our aim is to avoid any
dependence on manually annotated training data. Similar to
Ghosh et al. [1], we used sentence embeddings obtained
by Sentence-BERT to compute cosine similarity between
a query sentence and a candidate sentence in a
document. The overall similarity of a document with a query
is the maximum cosine similarity obtained for any of its
sentences with the query sentence. We use 3 settings
considering diferent sentences in each document:
•    : Only Testimony or Evidence sentences
•</p>
        <p>: Only Testimony sentences
•   : Only Evidence sentences</p>
      </sec>
      <sec id="sec-5-5">
        <title>All the baseline techniques and our proposed technique</title>
        <p>are evaluated using a set of queries and using certain
evaluation metrics to evaluate and compare the ranked
lists produced by each of these techniques.</p>
      </sec>
      <sec id="sec-5-6">
        <title>Queries: We chose 10 queries (shown in Table 6) which</title>
        <p>represent cases and evidence objects of diverse nature
(domestic violence, financial fraud etc.).</p>
      </sec>
      <sec id="sec-5-7">
        <title>Ground Truth: We created a set of gold-standard rele</title>
        <p>vant documents for each query using the standard pooling
technique [17]. We ran the following techniques to
produce a ranked list of documents for each query –  25
 25</p>
        <p>,    , and our proposed technique  ℎ</p>
      </sec>
      <sec id="sec-5-8">
        <title>We chose top 10 documents from the ranked list pro</title>
        <p>duced by each technique. Human experts verified the
relevance of each document for the query. Finally, after
discarding all the irrelevant documents, we got a set of
gold-standard relevant documents for each query2.
Metrics: We used R-Precision and Average Precision as
 ,
  .
our evaluation metrics [17].</p>
        <p>1. R-Precision (R-Prec): This calculates the the
number of relevant documents observed at  .
2. Average Precision (AP): This captures the joint
efect of Precision and Recall. It computes
precision at each rank of the predicted ranked list and
then computes mean of these precision values.</p>
      </sec>
      <sec id="sec-5-9">
        <title>2This dataset can be obtained from the authors on request</title>
        <p>6.4. Results similarity between argument phrases and presence
of co-references. Consider the following sentence for
Tbaabselelin6esshoawnds cooumrpparoraptoivseedevtealcuhantiioqnuer.esAuvltesrafogrevpaerriofours- which    incorrectly assigns a high score for query
 5 (see Table 6) – The police report also reveals that
rrcmeeoatpnrnriseecivsdeeaeonlrfitpni2engr5gfooanrnmlyyadnEiocvseicbd.ueeOmntttceehenreattr,hntradwensoTue2bls5tatsismeinloi nnbeyeistnstdeerinctp(aeSrtneiinoncgretesctnahfcosaeert- (ibtsnohodrmyeeEeovpfpioidiedescenoecncseoeausosefSdctporMeumolpcnlouetuu.tnsrdesHweeirnrevess,tfaoenuxnctcdehersbpe,yte ottphhtiheeeecderosc1taooarfrgrgpuiuemnlmleteenhtnetsst)
BERT) and  (our proposed technique  ℎ ) also are similar in meaning. We get cosine similarity of
consider only Evidence and Testimony sentences rather 0.36 between some poisonous compounds and three pieces
than considering all the sentences in a document. All of pellets which is misleading. It is not too low as
the baselines which consider only Testimony sentences, compared to another case where there are semantically
perform poorly as compared to the corresponding tech- similar argument phrases (e.g., cosine similarity between
niques using both Testimony and Evidence sentences. This
highlights the importance of evidence information as asrosmeenipcoiissojnuosuts 0c.o5m5poausnsdhsowanndin aTahbelaevy5)c.oAncsenwteraatrieonnooft
compared to using only witness testimony information resolving co-references, we are missing a few relevant
forCporniosridcearsiengretthrieevaavlearasgdeonpeerifnorGmhaonshceetacarl.o[s1s]a.ll the tdhoecufmolleonwtsi.nEg.gd.o,cu m endtofeosrnqoutearsysi gn3 a(sheieghTasbcolere6f)o–r
1fo0rqmuienrgietse,cohunriqpuroepionsteedr mteschonf ibqouteh R- Preics athned bAePs.t Tphere- Instead of surrendering before the police, the deceased
dpievreforsremqaunecreieosf. It a chiiesvethsemminoismtucmonRsi-sPtreenct oafc0ro.2s4s (tfhoer hsSahtdorutactbttyuermtephteiemnd.s.ttaonTckheiilsfloirtshsebhoeptoclaiiuscsene.ottIhneexmrpetilniacliittahlyteikoEnn,voiwhdeennwctaoes
 1) as compared to other baselines like  25  ,  25   correspond to the police in the previous sentence.
and    which have minimum R-Prec of 0 for some
queries. As described in Algorithm 2,  uses
SentenceBERT based similarity within sentences for producing 7. Conclusion and Future Work
an enhanced matching score. We experimented with a
variant of  which does not rely on Sentence-BERT In this paper, we discussed several NLP techniques for
based similarity. This variant resulted in average R-Prec identifying evidence sentences, representing them in the
of 0.36 and MAP of 0.30 across all the 10 queries. Al- semantically rich Evidence Structure and retrieving
relthough this is lower than    performance, the R-Prec evant prior cases by exploiting it. The proposed
techis still comparable with  25   (avg R-Prec of 0.36) and niques are weakly supervised as they do not rely on any
better than that of    (avg R-Prec of 0.28). manually annotated training data, except for the human</p>
        <p>For some queries, it is important to have some semantic expertise in designing the linguistic rules. Keeping in
understanding at sentence-level. For example,  4, which mind the importance of witness testimonies in addition
contains “negation”,  and  can capture the query’s to evidences, we also extracted and represented the
witmeaning in a better way.  handles such negations ness testimonies using the same Evidence Structure. For
in a more principled manner as the Evidence Structure the application of prior case retrieval, we evaluated our
Instance captures negation as one of its arguments. proposed technique along with several competent
base</p>
        <p>For  , the maximum matching score achieved for any lines, on a dataset of 10 diverse queries. We demonstrated
Evidence Structure Instance in a document, is considered that our technique performs comparably for most of the
as the overall matching score with the whole document. queries and is the best considering the overall
perforIn contrast,  25 based techniques directly compute mance across all 10 queries. The results highlight the
matching score for the whole document as they do not contribution of evidence and testimony information in
rely on sentence structure. This is one limitation of  improving prior case retrieval performance.
which we plan to address as a future work. However, as In future, we plan to apply advanced representation
 computes matching scores for individual Evidence learning techniques for learning dense or embedded
Structure instances, it is able to provide better interpre- representation of an entire Evidence Structure instance.
tation for each relevant document in terms of the actual Also, we plan to automatically determine the best suited
sentences which provided the maximum matching score. retrieval technique (BM25, Sentence-BERT or SemMatch)
Analysis of errors: We analyzed cases where    for any query based on its nature. We plan to explore
was assigned a lower score to a relevant document or a ensemble of multiple retrieval techniques for improving
higher score to a non-relevant document. We discovered prior case retrieval performance further.
3 main reasons - missing or incorrect arguments
within Evidence Structure instances, misleading high
detection, in: Proceedings of the 2015 conference
on empirical methods in natural language
process[1] K. Ghosh, S. Pawar, G. Palshikar, P. Bhattacharyya, ing, 2015, pp. 440–450.</p>
        <p>V. Varma, Retrieval of prior court cases using wit- [12] D. Ji, P. Tao, H. Fei, Y. Ren, An end-to-end joint
ness testimonies, JURIX (2020). model for evidence information extraction from
[2] M. Palmer, D. Gildea, P. Kingsbury, The proposi- court record document, Information Processing &amp;
tion bank: An annotated corpus of semantic roles, Management 57 (2020) 102305.</p>
        <p>Computational linguistics 31 (2005) 71–106. [13] T. Gomes, M. Ladeira, A new conceptual
frame[3] P. Shi, J. Lin, Simple BERT models for rela- work for enhancing legal information retrieval at
tion extraction and semantic role labeling, CoRR the brazilian superior court of justice, in:
Proceedabs/1904.05255 (2019). URL: http://arxiv.org/abs/ ings of the 12th International Conference on
Man1904.05255. a r X i v : 1 9 0 4 . 0 5 2 5 5 . agement of Digital EcoSystems, 2020, pp. 26–29.
[4] S. Hochreiter, J. Schmidhuber, Long short-term [14] J. Landthaler, B. Waltl, P. Holl, F. Matthes,
Extendmemory, Neural computation 9 (1997) 1735–1780. ing full text search for legal document collections
[5] M. Honnibal, I. Montani, S. Van Landeghem, using word embeddings., in: JURIX, 2016, pp. 73–82.</p>
        <p>A. Boyd, spaCy: Industrial-strength Natural Lan- [15] A. Trotman, A. Puurula, B. Burgess, Improvements
guage Processing in Python, 2020. URL: https://doi. to bm25 and language models examined, in:
Proorg/10.5281/zenodo.1212303. doi:1 0 . 5 2 8 1 / z e n o d o . ceedings of the 2014 Australasian Document
Com1 2 1 2 3 0 3 . puting Symposium, 2014, pp. 58–65.
[6] G. A. Miller, Wordnet: a lexical database for english, [16] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova,</p>
        <p>Communications of the ACM 38 (1995) 39–41. BERT: Pre-training of deep bidirectional
transform[7] J. Pennington, R. Socher, C. D. Manning, Glove: ers for language understanding, arXiv preprint
Global vectors for word representation, in: Pro- arXiv:1810.04805 (2018).
ceedings of the 2014 conference on empirical meth- [17] C. Manning, P. Raghavan, H. Schutze,
Introducods in natural language processing (EMNLP), 2014, tion to information retrieval, Natural Language
pp. 1532–1543. Engineering 16 (2010) 100–103.
[8] N. Reimers, I. Gurevych, Sentence-BERT: Sentence</p>
        <p>Embeddings using Siamese BERT-networks, in:
Proceedings of the 2019 Conference on
Empirical Methods in Natural Language Processing and
the 9th International Joint Conference on Natural
Language Processing (EMNLP-IJCNLP), 2019, pp.</p>
        <p>3973–3983.
[9] P. Bellot, A. Doucet, S. Geva, S. Gurajada, J. Kamps,</p>
        <p>G. Kazai, M. Koolen, A. Mishra, V. Moriceau,
J. Mothe, M. Preminger, E. SanJuan, R. Schenkel,
X. Tannier, M. Theobald, M. Trappett, Q. Wang,
Overview of INEX 2013, in: P. Forner,
H. Müller, R. Paredes, P. Rosso, B. Stein (Eds.),
Information Access Evaluation. Multilinguality,
Multimodality, and Visualization - 4th
International Conference of the CLEF Initiative, CLEF
2013, Valencia, Spain, September 23-26, 2013.</p>
        <p>Proceedings, volume 8138 of Lecture Notes in
Computer Science, Springer, 2013, pp. 269–281.</p>
        <p>URL: https://doi.org/10.1007/978-3-642-40802-1_27.</p>
        <p>doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 6 4 2 - 4 0 8 0 2 - 1 \ _ 2 7 .
[10] M.-A. Cartright, H. A. Feild, J. Allan, Evidence
finding using a collection of books, in: Proceedings of
the 4th ACM workshop on Online books,
complementary social media and crowdsourcing, 2011, pp.</p>
        <p>11–18.
[11] R. Rinott, L. Dankin, C. Alzate, M. M. Khapra,</p>
        <p>E. Aharoni, N. Slonim, Show me your evidence-an
automatic method for context dependent evidence</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>