=Paper=
{{Paper
|id=Vol-2888/paper1
|storemode=property
|title=Prior Case Retrieval using Evidence Extraction from Court Judgements
|pdfUrl=https://ceur-ws.org/Vol-2888/paper1.pdf
|volume=Vol-2888
|authors=Basit Ali,Ravina More,Sachin Pawar,Girish K. Palshikar
|dblpUrl=https://dblp.org/rec/conf/icail/AliMPP21
}}
==Prior Case Retrieval using Evidence Extraction from Court Judgements==
<pdf width="1500px">https://ceur-ws.org/Vol-2888/paper1.pdf</pdf>
<pre>
Prior Case Retrieval using Evidence Extraction from Court
Judgements
Basit Ali, Ravina More, Sachin Pawar and Girish K. Palshikar
TCS Research, Pune, India.


                                          Abstract
                                          One of the key constituents of court case descriptions is Evidence description and observations. Along with witness testimonies,
                                          evidence plays a significant role in the final decision of the case. We propose a weakly supervised technique to automatically
                                          identify sentences containing evidences. We represent the information related to evidences in these sentences in a semantically
                                          rich structure – Evidence Structure defined as an Evidence Information Model. We show that witness testimony information
                                          can also be represented using the same model. We demonstrate the effectiveness of our Evidence Information Model for the
                                          prior case retrieval application by proposing a matching algorithm for computing semantic similarity between a query and a
                                          sentence in a court case description. To the best of our knowledge, this is the first paper to apply NLP techniques for the
                                          extraction of evidence information from court judgements and use it for retrieving relevant prior court cases.

                                          Keywords
                                          Evidence Extraction, Evidence Information Model, Natural Language Processing, Prior Case Retrieval


1. Introduction                                                                                                    testimony information. Here, we use the rules proposed
                                                                                                                   in Ghosh et al. [1] for identification of witness testimonies
Evidences - typically based on documents (e.g., letter, re-                                                        and design new rules for identification of evidence sen-
ceipt, report, agreements, affidavits) and physical objects                                                        tences. In the second step, we train a Weakly Supervised
(e.g., knife, guns, photos, phone call data records) - are                                                         Sentence Classifier whose training data is automatically
often used by lawyers in their arguments during a court                                                            created using the sentences identified by the linguistic
case. The observations made through these evidences                                                                rules. It is a multi-label classifier which predicts whether
may have a significant effect on the judges’ final decision.                                                       any sentence contains an Evidence or Witness Testimony
In order to develop a deeper understanding of the past                                                             or both. Once all the Evidence and Testimony sentences
court cases, it is valuable to identify various Evidences                                                          are identified from the corpus of court judgements, we
discussed in these cases and the observations which are                                                            propose a Semantic Role Labelling (SRL) [2] based tech-
made about them or through them. Such information                                                                  nique to automatically instantiate Evidence Structures
about evidences has several applications such as under-                                                            for these sentences.
standing and representing legal arguments, determining                                                                To demonstrate effectiveness of the proposed Evidence
strengths and weaknesses of those arguments, identify-                                                             Structure, we discuss its use in the prior case retrieval ap-
ing relevant past cases in which similar evidences were                                                            plication. We propose a matching algorithm for comput-
discussed, etc.                                                                                                    ing semantic similarity between a query and a sentence
   In this paper, we discuss Natural Language Process-                                                             in a court judgement document. This algorithm makes
ing (NLP) based techniques for extracting information                                                              use of the proposed Evidence Structure in which both
regarding Evidences mentioned in court judgement doc-                                                              the query and the sentence are represented, resulting in
uments. We propose to represent this information in a                                                              a semantically sound similarity score between them.
rich semantic structure – Evidence Structure defined as                                                               Previously, Ghosh et al. [1] identified witness testi-
an Evidence Information Model. Along with Evidences,                                                               monies from court case documents and used them for
we also identify and represent Witness Testimonies using                                                           retrieving relevant prior cases. We propose that con-
the same Information Model. Initially, we discuss a two-                                                           sidering only witness testimonies leads to loss of key
step approach for identifying evidence and testimony                                                               information regarding evidences mentioned in a case.
sentences. In the first step, linguistic rules are used to                                                         Hence, we identify and use information about various
determine whether a sentence contains any evidence or                                                              evidences mentioned in the case documents leading to
Proceedings of the Fifth Workshop on Automated Semantic Analysis of                                                much better prior case retrieval performance as demon-
Information in Legal Text (ASAIL 2021), June 25, 2021, São Paulo,                                                  strated in the experiments section. Moreover, Ghosh et
Brazil.                                                                                                            al. [1] use a much limited semantic structure to repre-
Envelope-Open ali.basit@tcs.com (B. Ali); ravina.m@tcs.com (R. More);                                              sent information regarding events mentioned in witness
sachin7.p@tcs.com (S. Pawar); gk.palshikar@tcs.com
                                                                                                                   testimonies. This structure does not capture important
(G. K. Palshikar)
                                    © 2021 Copyright for this paper by its authors. Use permitted under Creative   semantic information like whether event is negated, what
                                    Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)                                        are the causes behind the event, the manner in which
event takes place etc. We propose a richer semantic struc-       2.2. Evidence Structure
ture addressing these limitations and design a suitable
                                                                 The Evidence Information Model represents every Evi-
semantic matching algorithm for that structure. To the
                                                                 dence Sentence giving information about one or more
best of our knowledge, this is the first paper to apply
                                                                 Evidence Objects in an Evidence Structure. We define an
NLP techniques for the extraction of evidence informa-
                                                                 Evidence Object as one of the objects presented by the
tion from court judgements and demonstrate its use for
                                                                 counsels to the judge along with the information and
retrieving relevant prior court cases.
                                                                 findings about the crime. It is thus, a physical entity
                                                                 that can furnish some degree of support, contradiction
2. Evidence Information Model                                    or opposition to some legal arguments. Some examples
                                                                 of Evidence Objects are:
The purpose of Evidence Information Model is to define                    • Documents (a u t o p s y r e p o r t ,                   post-mortem
a suitable structure to represent evidence information in                   report, affidavit, letter, cheque, agreement,
court judgements. In this section, we describe Semantic                     petition, FIR, signature)
Role Labelling in brief and how it is used to define our                  • Material objects (g u n , b u l l e t , c l o t h e s , k e r o s e n e
proposed Evidence Structure.                                                can)
                                                                          • Substances (p o i s o n , a l c o h o l , k e r o s e n e )
2.1. Background: Semantic Role                                   In Indian court case documents, such Evidence Objects are
                                                                 also represented in the judgement document as E x h i b i t
     Labelling                                                   A , E x . 2 , E v i d e n c e 2 3 and so on.
Semantic Role Labelling (SRL) is a technique in Natural              On these lines, we define an Evidence Sentence as any
Language Processing that identifies verbs/predicates in a        sentence containing one or more Evidence Objects rele-
sentence, finds phrases connected to every predicate and         vant to the current case but do not consist of
assigns an appropriate semantic role to every phrase. By                  • any witness testimony which is not verifiable
doing so, SRL helps machines to understand the roles of                   • legal argumentation
important words within a sentence. Following are some                     • a reference to some prior case or some Act or
key semantic roles identified for a verb/predicate (often                   Section
corresponding to an action or event) by SRL techniques:                   • directions or instructions given by the court or
ARG0 : proto-agent or someone who performs the action                       judge.
denoted by the verb                                                  We now present a formal definition of the Evidence
ARG1 : proto-patient or someone on whom the action               Structure. For every evidence present in an Evidence
is performed on                                                  Sentence, the structure consists of an optional Observation
ARGM-TMP : the time when the event took place                    Frame and a mandatory Evidence Frame. The Observation
ARGM-CAU : the cause of the action                               Frame represents the source of the information and the
ARGM-PRP : the purpose of the action                             agent disclosing it. This information is optional as it may
ARGM-LOC : the location where the event took place               or may not be explicitly stated in a sentence. It consists
ARGM-MNR : the manner in which the action took                   of the following arguments:
place                                                                     • ObserverVerb or OV: The verb indicating
ARGM-NEG : the word indicating that the action did                          the observation/discovery/disclosure (e.g., f o u n d ,
not take place                                                              revealed, stated)
   Consider the following example sentence:                               • ObserverAgent or A0 : The source disclosing
On August 25, 1965, the bank dishonoured the cheque                         the information (e.g., p e r s o n , a g e n c y , a u t h o r i t y )
due to insufficient balance.                                              • EvidenceObject or EO: The Evidence Object in
  The various semantic roles to the verb d i s h o n o u r e d              focus (e.g., p o s t - m o r t e m r e p o r t , F I R , l e t t e r )
are annotated as follows:                                        The Evidence Frame captures details about the evidence
[ARGM-TMP: On August 25 , 1965] , [ARG0: the bank]               itself through the following arguments:
[V: dishonoured] [ARG1: the cheque] [ARGM-CAU: due                        • EvidenceVerb or EV: the main verb of any ac-
to insufficient balance].                                                   tion, event or fact mentioned in a sentence or re-
  We use the predicates and corresponding arguments                         vealed by the Evidence Object (e.g., k i l l e d , f o r g e d ,
obtained from the pre-trained AllenSRL model [3] to                         escaped)
instantiate our Evidence Structure for the queries and                    • Agent or A0 : someone who initiates the action
candidate sentences.                                                        indicated by the EvidenceVerb (e.g., t h e a c c u s e d ,
                                                                            Ram, ABC Pvt. Ltd.)
                                                                          • Patient or A1 : someone who undergoes the ac-
                                                                            tion indicated by the EvidenceVerb. (e.g., t h e
Table 1
Example Evidence sentences with their Evidence Structure Instances
   The bank dishonoured the cheque due to insufficient balance.
   • E F = [𝐸𝑉 = d i s h o n o u r e d , 𝐴0 = T h e b a n k , 𝐴1 = t h e c h e q u e , 𝐶𝐴𝑈 = d u e t o i n s u f f i c i e n t b a l a n c e ]
   The report revealed that organo-phosphorus compound was found in the stomach , small intestines , large intestines
   , liver , spleen , kidney and brain of the deceased .
   • O F = [𝑂𝑉 = r e v e a l e d , 𝐸𝑂 = T h e r e p o r t ]
   E F = [𝐸𝑉 = f o u n d , 𝐿𝑂𝐶 = i n t h e s t o m a c h , s m a l l i n t e s t i n e s , l a r g e i n t e s t i n e s , l i v e r , s p l e e n , k i d n e y a n d b r a i n o f
   the deceased]
   The Magistrate found prima facie evidence that the appellant had fraudulently used in the Civil Suit forged cheque
   and committed him to the Sessions for trial
   • O F = [𝑂𝑉 = f o u n d , 𝑂𝐴 = T h e M a g i s t r a t e , 𝐸𝑂 = p r i m a f a c i e e v i d e n c e ]
   E F = [𝐸𝑉 = u s e d ,𝐴0 = t h e a p p e l l a n t , 𝐴1 = f o r g e d c h e q u e , 𝐿𝑂𝐶 = i n t h e C i v i l S u i t ]
   The prosecution case was that though the rough cash book showed that on September 29, 1950 a sum of Rs. 21,133 was
   sent to the Treasury by appellant Gupta , the Treasury figures in the challan showed that on that day only a sum
   of Rs. 1,133 was deposited into the Treasury and thus a sum of Rs.20,000 was dishonestly misappropriated .
   • O F = [𝑂𝑉 = s h o w e d , 𝐸𝑂 = t h e r o u g h c a s h b o o k ]
   E F = [𝐸𝑉 = s e n t , 𝐴0 = b y a p p e l l a n t G u p t a , 𝐴1 = a s u m o f R s . 2 1 , 1 3 3 , 𝐴2 = t o t h e T r e a s u r y , 𝐴𝑅𝐺 −𝑇 𝑀𝑃 = o n S e p t e m b e r 2 9 , 1 9 5 0 ]
   • O F = [𝑂𝑉 = s h o w e d , 𝐸𝑂 = t h e T r e a s u r y f i g u r e s i n t h e c h a l l a n ]
   E F = [𝐸𝑉 = d e p o s i t e d ,𝐴0 = b y a p p e l l a n t G u p t a , 𝐴1 = o n l y a s u m o f R s . 1 , 1 3 3 , 𝐴2 = i n t o t h e T r e a s u r y , 𝑇 𝑀𝑃 = o n t h a t d a y ]
   • O F = [𝑂𝑉 = s h o w e d , 𝐸𝑂 = t h e T r e a s u r y f i g u r e s i n t h e c h a l l a n ]
   E F = [𝐸𝑉 = m i s a p p r o p r i a t e d , 𝐴1 = a s u m o f R s . 2 0 , 0 0 0 , 𝑀𝑁 𝑅 = d i s h o n e s t l y ]


             deceased, a cheque of Rs. 3,200, his wife)                                               testimony sentences (e.g., s t a t e d , s a i d ) are treated similar
         • Location or LOC: location where the action                                                 to observation verbs and represented using Observation
             took place (e.g., i n t h e b e d r o o m , a t t h e b a n k , i n                      Frames. Similarly, other action/event verbs mentioned
             Malaysia)                                                                                in witness testimony sentences are represented using
         • Time or TMP: timestamp of the action                                                       Evidence Frames. Table 2 shows examples of some Wit-
             (e.g., a b o u t 1 2 h o u r s b a c k , i n t h e m o r n i n g , o n                   ness Sentences along with the corresponding Evidence
             Monday)                                                                                  Structure Instances. The advantage of representing in-
         • Cause or CAU: cause of the action (e.g., d u e t o                                         formation about evidences and witness testimonies in
             dowry, as a result of the CBI enquiry, out of                                            the same structure is that we can make use of both these
             sheer spite)                                                                             sources of information seamlessly, for prior case retrieval.
         • Manner or MNR: manner in which the
             action took place (e.g., a s p e r t h e c h a l l a n ,
             fraudulently, wilfully)                                                                  3. Methodology
    Table 1 shows examples of some Evidence Sentences
                                                                                                      In this section, we describe our overall methodology
along with the corresponding Evidence Structure Instances.
                                                                                                      which consists of two phases. In the first phase, we
In some cases, Observation Frame may be empty due to
                                                                                                      identify Evidence and Testimony Sentences using linguis-
absence of ObservationVerb. In such cases, EvidenceOb-
                                                                                                      tic rules and weakly supervised sentence classifier. In the
ject may be present as a part of any argument in Evidence
                                                                                                      second phase, we instantiate the Evidence Structures for
Frame. E.g., t h e c h e q u e is present as 𝐴1 in the Evidence
                                                                                                      these identified sentences. For all our experiments, we
Frame of the first sentence in Table 1.
                                                                                                      use a corpus of 30,032 Indian Supreme Court judgements
    Information about named entities and their types
                                                                                                      ranging from the year 1952 to 2012.
present in various arguments of Observation or Evidence
frame is important. Hence, the Observation Frame and
Evidence Frame are also enriched by annotating enti-                                                  3.1. Identification of Evidence and
ties such as P E R S O N , O R G A N I S A T I O N , G E O - P O L I T I C A L E N T I T Y ,               Testimony Sentences
LOCATION, PRODUCT, EVENT, LANGUAGE, DATE, TIME, PERCENT,
MONEY, QUANTITY, ORDINAL, CARDINAL, WEAPON, SUBSTANCE,
                                                                                                      We identify Evidence and Testimony sentences using a
D O C U M E N T , A R T I F A C T , W O R K _ O F _ A R T , W I T N E S S , B O D Y _ P A R T , and
                                                                                                      two-step approach. In the first step, we use linguistic
V E H I C L E present in the fields.
                                                                                                      rules to obtain Evidence and Testimony sentences. In the
Witness Information Model: Information in witness                                                     second step, we use these sentences to train a sentence
testimonies can also be represented using the same Ev-                                                classifier.
idence Structure. The statement verbs used in witness
Table 2
Example Witness Testimony sentences with their Evidence Structure Instances
 He has categorically stated that by reason of enmity , A1 and A2 together have murdered his brother-in-law .
 • O F = [𝑂𝑉 = s t a t e d , 𝐴0 = H e ]
 E F = [𝐸𝑉 = m u r d e r e d , 𝐴0 = A 1 a n d A 2 t o g e t h e r , 𝐴1 = h i s b r o t h e r - i n - l a w , 𝐶𝐴𝑈 = b y r e a s o n o f e n m i t y ]
 Shri Dholey ( PW-6 ) reiterated about the dacoity and claimed that a pistol was brandished on him by one of the
 accused persons .
 • O F = [𝑂𝑉 = c l a i m e d , 𝐴0 = S h r i D h o l e y ( P W - 6 ) ]
 E F = [𝐸𝑉 = b r a n d i s h e d , 𝐴0 = b y o n e o f t h e a c c u s e d p e r s o n s , 𝐴1 = a p i s t o l , 𝐶𝐴𝑈 = o n h i m ]
 Though he stated in the post-mortem report that death would have occurred about 12 hours back , he clarified that
 there was possibility of injuries being received at about 9 A.M.
 • O F = [𝑂𝑉 = s t a t e d , 𝐴0 = h e , 𝐸𝑂 = t h e p o s t - m o r t e m r e p o r t ]
 E F = [𝐸𝑉 = o c c u r r e d , 𝐴1 = d e a t h , 𝑇 𝑀𝑃 = a b o u t 1 2 h o u r s b a c k ]
 • E F = [𝑂𝑉 = c l a r i f i e d , 𝐴0 = h e , 𝐴1 = t h a t t h e r e w a s p o s s i b i l i t y o f i n j u r i e s b e i n g r e c e i v e d a t a b o u t 9 A . M . D e c e a s e d
 Sarit Khanna was aged about 27 years]
 He admitted , however , that Shri Buch had met him in connection with the covenant , but he denied that he had
 received any letter Exhibit P-9 from Shri Buch or the lists Exhibits P- 10 to P- 12 regarding his private and
 State properties , were a part thereof .
 • O F = [𝑂𝑉 = a d m i t t e d , 𝐴0 = H e ]
 E F = [𝐸𝑉 = m e t , 𝐴0 = S h r i B u c h , 𝐴1 = h i m , 𝑇 𝑀𝑃 = i n c o n n e c t i o n w i t h t h e c o v e n a n t ]
 • O F = [𝑂𝑉 = d e n i e d , 𝐴0 = H e ]
 E F = [𝐸𝑉 = r e c e i v e d , 𝐴0 = h e , 𝐴1 = a n y l e t t e r E x h i b i t P - 9 , 𝐴2 = f r o m S h r i B u c h ]


Step I: Linguistic Rules based Approach: As there                                          • The classifier has two outputs - i) first output predicts
are no publicly annotated datasets for identification of                                   a binary label indicating whether the sentence contains
Evidence and Testimony sentences, we rely on linguistic                                    Evidence or not and ii) second output predicts a binary
rules to identify these sentences with high precision                                      label indicating whether the sentence contains Testimony
as our first step. The linguistic rules for identifying                                    or not.
Evidence sentences are described in detail in Table 3.                                     • 1824 sentences are labelled as Evidence and Testimony
These rules identified 62,310 sentences as Evidences                                       both. These sentences are identified as Evidence as well
from our corpus. As there is no annotated dataset,                                         as Testimony by both the sets of linguistic rules.
in order to estimate the precision of the linguistic                                       • 60486 sentences are labelled as Evidence and non-
rules we use random sampling strategy. We selected                                         Testimony. These sentences are identified as Evidence
a set 100 random sentences identified as Evidence by                                       by the rules but not as Testimony.
the linguistic rule, and got them verified by a human                                      • 34649 sentences are labelled as non-Evidence and Testi-
expert. The precision turned out to be 85%. Similarly,                                     mony. These sentences are identified as Testimony by the
we use the linguistic rules proposed in Ghosh et al. [1]                                   rules but not as Evidence.
for identifying Testimony and non-Testimony sentences                                      • 14234 sentences are labelled as non-Evidence and
where the reported precision is around 85%. These                                          non-Testimony. These sentences are identified as non-
rules identified 36,473 sentence as Testimony and 14,234                                   Testimony by the rules and not identified as Evidence.
sentences as non-Testimony from the same corpus.                                              After this classifier is trained, we use it to classify all
                                                                                           the remaining sentences in the corpus. These sentences
Step II: Weakly Supervised Sentence Classification:                                        are neither identified Evidence by the Evidence rules nor
We observed that although the linguistic rules identify                                    as Testimony/non-Testimony by the Testimony rules. Us-
Evidence and Testimony sentences with high precision,                                      ing the prediction confidence, we selected top 10,000
they may miss to identify some sentences which should                                      sentences classified as Evidence and top 5,000 sentences
have been identified as Evidence or Testimony (see exam-                                   classified as Testimony. Table 4 shows some examples
ples in Table 4). Hence, we train a supervised sentence                                    of sentences identified as Evidence by the classifier but
classifier to improve overall recall of identification of Ev-                              not by the linguistic rules. To estimate the precision,
idence and Testimony sentences. The classifier used is a                                   we again employed the random sampling strategy. We
BiLSTM-based [4] multi-label sentence classifier whose                                     selected 100 random sentences each from these high con-
architecture is depicted in Figure 1. This classifier is                                   fidence Evidence and Testimony sentences and a human
weakly supervised since its training data is automatically                                 expert verified them. The precision of 72% is observed for
created using the sentences identified by the linguistic                                   Evidence sentences and 68% for Testimony sentences. The
rules as follows:                                                                          precision of the sentence classifier is lower as compared
Table 3
Linguistic Rules for identifying Evidence Sentences
  Any sentence 𝑆 should satisfy the following conditions in order to be identified as an Evidence Sentence:
   E-R1 𝑆 should contain at least one Evidence Object as defined in Section 2.2. The list of words corresponding to evidence
         objects is created automatically by using WordNet hypernym structure. We create a list of all words for which the
         following WordNet synsets are ancestors in hypernym tree – a r t i f a c t (e.g., g u n , c l o t h e s ), d o c u m e n t (e.g. r e p o r t , l e t t e r ),
         s u b s t a n c e (e.g. k e r o s e n e , b l o o d ). This list is looked up to identify evidence objects in a sentence.
   E-R2 𝑆 should contain at least one action verb from a pre-defined set of verbs like t a m p e r , k i l l , s u s t a i n , f o r g e OR 𝑆 should
         contain at least one observation verb from a pre-defined set of verbs like r e p o r t , s h o w , f i n d . Both the pre-defined sets
         of verbs are prepared by observing multiple example sentences containing evidence objects.
   E-R3 In the dependency tree of 𝑆, the evidence object (identified by E-R1) should occur within the subtree rooted at the
         action or observation verb (identified by E-R2) AND there should not be any other verb (except auxiliary verbs like
         h a s , b e e n , w a s , w e r e , i s ) occurring between the two. This ensures that the evidence object always lies within the verb
         phrase headed by the action or observation verb.


Table 4
Example of Evidence Sentences Identified by the Classifier but not by the linguistic rules
 S1: Raju PW2 took Preeti into the bath room at the instance of Accused No.1 who cut a length of wire of washing
 machine and used it to choke her to death, who however, survived.
 S2: Raju PW2 took Satyabhamabai Sutar in the kitchen where the accused No.1 had already reached and was washing
 the blood stained knife.
 S3: Hemlata was also killed by inflicting knife injuries.
 S4: Accused No.2 and Raju PW2 took the child into the room where Meerabai was lying dead in the pool of blood.
 S5: Accused No.2 gave her blows by putting his knees on her stomach and when she was immobilised this way , the
 Accused No.1 gave her knife blows on her neck with the result she also died.
 S6: Almirahs found in the flat were emptied to the extent the accused could put articles and other cash and
 valuables in the air-bag obtained from the said flat.
 S7: Blood stained clothes of Accused No.2 were put in the air-bag along with stolen articles.


                                                                                   3.2. Evidence Structure Instances
                                                                In this phase, we discuss the technique of instantiating
                                                                Evidence Structures for sentences identified as Evidence
                                                                or Testimony in the previous phase. We used Semantic
                                                                Role Labelling [3] to identify and fill the arguments of
                                                                the Observation Frame and the Evidence Frame in the
                                                                Evidence Structure Instance for every candidate sentence.
                                                                This is demonstrated in Algorithm 1. We identify Obser-
                                                                vation Frames using Observation Cue Verbs. For each of
                                                                these Observation Frames we identify the corresponding
                                                                Evidence Objects and Evidence Frames. For identifying Ev-
                                                                idence Objects, we first use Named Entity Recognition [5]
Figure 1: Architecture of the BiLSTM-based multi-label sen- and WordNet based Entity Identification [6] to identify
tence classifier (T: Testimony, NT: Non-Testimony, E: Evidence, the named entities in the sentence and annotate them in
NE: Non-Evidence)                                               the Frames extracted. The Evidence Objects in a phrase are
                                                                then obtained by selecting named entities annotated as
                                                                one of the following types - A R T I F A C T , V E H I C L E , W E A P O N ,
to the rules because, it is applied on a more difficult set D O C U M E N T , W O R K _ O F _ A R T , S U B S T A N C E . This corresponds to
of sentences for which the linguistic rules fail to identify the 𝑔𝑒𝑡_𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒_𝑜𝑏𝑗𝑒𝑐𝑡 function used in Algorithm 1. Ob-
any label. At the end of this two-step process (linguistic servation Frames that do not contain a corresponding
rules followed by the sentence classifier), we have 112,401 Evidence Frame are redesigned as stand alone Evidence
sentences identified either as Evidence or as Testimony.        Frames. We finally combine the Evidence Frame and the
                                                                Observation Frame into an Evidence Structure Instance.
                                                                   We measured the accuracy of 260 Evidence Structure
Instances obtained from 100 random Evidence and Testi-       use Context Dependent Evidence Detection to find evidence
mony sentences. The accuracy of the Observation Frame        information present in a sentence on a phrase level. As
extraction is 86% and that of Evidence Frame extraction is   compared to this, we identify both Evidence and Testi-
88%. We observed that most of the incorrect extractions      mony sentences, represent them in a rich structure and
were due to parsing error in the SRL model.                  also use that for prior case retrieval. This is a challenging
                                                             task due to the inherently complex nature of legal texts
                                                             and the finer granularity of matching involved.
4. Prior Case Retrieval                                         Ji et al. [12] propose an Evidence Information Ex-
                                                             traction system which captures evidence production
In order to demonstrate effectiveness of the proposed
                                                             paragraph, evidence cross-examination paragraph, evi-
Evidence Structure, we apply it for the task of prior case
                                                             dence provider, evidence name, evidence content, cross-
retrieval. This task is to create a relevance-based ranked
                                                             examination party and cross-examination opinion relat-
list of court judgements (documents) in our corpus for
                                                             ing to an evidence presented in the court. While this
a query. In order to retrieve prior cases for a query,
                                                             technique may suit well for Chinese court records that
we represent the query using an Evidence Structure In-
                                                             follow a relatively structured representation, it does not
stance (𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑄 ). We then compute the similarity of
                                                             suit well to the Indian Court Records that contain de-
query instance 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑄 against each document instance
                                                             scriptive and varied formats of the court proceedings.
𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝐷 obtained from every Evidence or Testimony sen-
                                                                Gomes and Ladeira [13] and Landthaler et al. [14] per-
tence in the corpus. Algorithm 2 shows the steps for com-
                                                             forms full text search for legal document collection by
puting similarity. We refer to this algorithm as 𝑆𝑒𝑚𝑀𝑎𝑡𝑐ℎ
                                                             obtaining word2vec word embeddings and then taking
because of its semantic matching ability. We use cosine
                                                             their average for computing similarity. However, com-
similarity between the phrase embeddings of correspond-
                                                             puting the average of the embeddings gives a lossy rep-
ing arguments of the Evidence Structure Instances to
                                                             resentation where relative order of the words is lost. In
compute similarity. For obtaining phrase embedding
                                                             contrast, we represent the sentences using the Evidence
for any phrase (referred as 𝑃ℎ𝑟𝑎𝑠𝑒𝑉 𝑒𝑐 in Algorithm 2),
                                                             Structure Instances, where the structure itself takes care
we consider the average of GloVe word embeddings [7]
                                                             of the relative ordering. Gomes and Ladeira [13] demon-
of the words in that phrase excluding stop words. We
                                                             strate BM25 and TF-IDF for Prior Case retrieval. In our
compute the similarity scores within corresponding argu-
                                                             results section, we demonstrate the comparative poor
ments of both the frames. These scores across different
                                                             performance of BM25 and TF-IDF in handling corner
arguments are combined to get a final similarity score
                                                             cases.
between 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑄 and 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝐷 . We multiply the final
similarity score by a Sentence BERT [8] based similarity
score between the query and the sentence containing          6. Experimental Evaluation
𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝐷 . This is necessary because errors in the auto-
mated SRL tool may lead to imperfect Evidence Structure      In this section, we discuss our experiments including
instances in some cases. A sentence similarity score         the dataset, baseline techniques, evaluation metrics and
which is not dependent on any such structure within the      analysis of results.
sentences provides a complementary view of capturing
sentence similarity. Finally, the overall relevance score    6.1. Dataset
of the query with a document is the maximum score cor-
responding to any Evidence Structure Instance 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝐷      We use the Indian Supreme Court judgements from years
obtained from the document. Table 5 shows a running          1952 to 2012 freely available at http://liiofindia.org/in/
example of how a similarity score is computed between        cases/cen/INSC/. There are 30032 court (documents)
an Evidence Structure Instance (𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑄 ) from a query     containing 4,111,091 sentences where average sentence
and an Evidence Structure Instance (𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝐷 ) from a       length is 31 words and standard deviation of 24.
document in the corpus.
                                                             6.2. Baselines
5. Related Work                                              For the task of prior case retrieval, we implement two
                                                             baseline techniques:
While the task of evidence extraction from legal docu- • BM25: It is a popular TF-IDF based relevance computa-
ments is related to several information retrieval and NLP tion technique. We use the BM25+ variant1 as described
tasks, there are no established baselines for the task. Bel- in Trotman et al. [15]. This technique uses a bag-of-
lot et al. [9] and Cartright et al. [10] have worked on words approach that ignores the sentence structure. We
Evidence Retrieval that identifies whole documents that
contain an evidence. On the other hand, Rinott et al. [11]       1
                                                                   https://pypi.org/project/rank-bm25/
 input         : 𝑠 (sentence), 𝑆𝑅𝐿_𝑃 (set of semantic frames in 𝑠 as per any semantic role labeller, each frame 𝑃 consists of a
                 predicate 𝑃.𝑉 and corresponding arguments 𝑃.𝐴𝑅𝐺0 , 𝑃.𝐴𝑅𝐺1 , 𝑃.𝐴𝑅𝐺2 , 𝑃.𝐴𝑅𝐺𝑀-𝐿𝑂𝐶, etc.)
 output        : 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑠 = Evidence Structure Instances of the input sentence consisting of 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝐹 𝑟𝑎𝑚𝑒 (𝑂𝐹) and
                 𝐸𝑣𝑖𝑑𝑒𝑛𝑐𝑒𝐹 𝑟𝑎𝑚𝑒 (𝐸𝐹)
 parameter : 𝑂𝐵𝑆_𝑉 𝐸𝑅𝐵𝑆 = {a c c e p t , a d d , a d m i t , a g r e e , a l l e g e , a l l o w , a l t e r , a p p r i s e , a s s e r t , b r i e f , b u i l d , c h a l l e n g e ,
                 claim, clarify, complain, confirm, corroborate, decline, demand, deny, depose, describe, disclose, dismiss,
                 examine, exhibit, find, include, indicate, inform, mention, note, notice, observe, obtain, occur, point,
                 prepare, present, receive, recover, refuse, reject, remember, report, reveal, say, show, state, submit,
                 s u g g e s t , t e l l , w i t h d r a w }, 𝑁 𝐸𝐺_𝑊 𝑂𝑅𝐷𝑆 = {n o , n o t , n e i t h e r , n o r , n e v e r }
 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑠 ∶= ∅
 𝑂𝐹 𝑠 ∶= ∅
 // Obtain Observation Frames in the sentence 𝑠
 foreach 𝑃 ∈ 𝑆𝑅𝐿_𝑃 such that 𝑃.𝑉 ∈ 𝑂𝐵𝑆_𝑉 𝐸𝑅𝐵𝑆 do
     𝑂𝐹 ∶= Create empty Observation Frame
     𝑂𝐹 .𝑉 ∶= 𝑃.𝑉
     𝑂𝐹 .𝑁 𝐸𝐺 ∶= 𝑃.𝐴𝑅𝐺𝑀-𝑁 𝐸𝐺
     𝑂𝐹 .𝐴0 ∶= 𝑃.𝐴𝑅𝐺0
     𝑂𝐹 .𝐴1 ∶= 𝑃.𝐴𝑅𝐺1
     // If any of the arguments of the predicate starts with a negative word, then we negate the verb.
     if 𝑂𝐹 .𝐴0 or 𝑂𝐹 .𝐴1 starts with any word from 𝑁 𝐸𝐺_𝑊 𝑂𝑅𝐷𝑆 then
           𝑂𝐹 .𝑁 𝐸𝐺 ∶= 𝑇 𝑟𝑢𝑒
     𝑂𝐹 .𝐸𝑂 ∶= 𝑔𝑒𝑡_𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒_𝑜𝑏𝑗𝑒𝑐𝑡(𝑃.𝐴𝑅𝐺0 ) ∪ 𝑔𝑒𝑡_𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒_𝑜𝑏𝑗𝑒𝑐𝑡(𝑃.𝐴𝑅𝐺𝑀-𝐿𝑂𝐶)
     𝑂𝐹 𝑠 ∶= 𝑂𝐹 𝑠 ∪ {𝑂𝐹 }

 // Obtain corresponding Evidence Frames for every Observation Frame
 foreach 𝑂𝐹 ∈ 𝑂𝐹 𝑠 do
     𝐹 𝑜𝑢𝑛𝑑𝐸𝐹 ∶= 𝐹 𝑎𝑙𝑠𝑒
     foreach 𝑃 ∈ 𝑆𝑅𝐿_𝑃 such that 𝑃.𝑉 occurs within the span of 𝑂𝐹 .𝐴1 do
         if 𝑃.𝑉 is a copula verb and any of 𝑃.𝐴𝑅𝐺0 or 𝑃.𝐴𝑅𝐺1 does not exist then
              continue
             𝐸𝐹 ∶= Create empty Evidence Frame
             𝐸𝐹 .𝑉 ∶= 𝑃.𝑉
             𝐸𝐹 .𝑁 𝐸𝐺 ∶= 𝑃.𝐴𝑅𝐺𝑀-𝑁 𝐸𝐺
             // If any of the arguments of the predicate starts with a negative word, then we negate the verb
             if 𝑂𝐹 .𝐴0 or 𝑂𝐹 .𝐴1 starts with any word from 𝑁 𝐸𝐺_𝑊 𝑂𝑅𝐷𝑆 then
                   𝐸𝐹 .𝑁 𝐸𝐺 ∶= 𝑇 𝑟𝑢𝑒
              foreach argument 𝐴𝑅𝐺 ∈ 𝑃.𝑎𝑟𝑔𝑢𝑚𝑒𝑛𝑡𝑠 do
                  𝐸𝐹 .𝐴𝑅𝐺 ∶= 𝑃.𝐴𝑅𝐺
              delete(𝑂𝐹 .𝐴1 )
              𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡 ∶= {(𝑂𝐹 , 𝐸𝐹 )}
              𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑠 ∶= 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑠 ∪ 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡
              𝐹 𝑜𝑢𝑛𝑑𝐸𝐹 ∶= 𝑇 𝑟𝑢𝑒

       // If no Evidence Frame exists for an Observation Frame, transfer the Observation Frame to the Evidence
          Frame
       if FoundEF == False then
            𝐸𝐹 ∶= Create empty Evidence Frame
            𝐸𝐹 .𝑉 ∶= 𝑂𝐹 .𝑉
            𝑃 ∶= 𝑃 ′ ∈ 𝑆𝑅𝐿_𝑃 such that 𝑃 ′ .𝑉 = 𝑂𝐹 .𝑉
            foreach argument 𝐴𝑅𝐺 ∈ 𝑃.𝑎𝑟𝑔𝑢𝑚𝑒𝑛𝑡𝑠 do
                 𝐸𝐹 .𝐴𝑅𝐺 ∶= 𝑃.𝐴𝑅𝐺
             clear(𝑂𝐹)
             𝑂𝐹 .𝐸𝑂 ∶= 𝑔𝑒𝑡_𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒_𝑜𝑏𝑗𝑒𝑐𝑡(𝑃.𝐴𝑅𝐺0 ) ∪ 𝑔𝑒𝑡_𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒_𝑜𝑏𝑗𝑒𝑐𝑡(𝑃.𝐴𝑅𝐺𝑀-𝐿𝑂𝐶)
             //Add all the required arguments to Evidence Frame
             𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡 ∶= {(𝑂𝐹 , 𝐸𝐹 )}
             𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑠 ∶= 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑠 ∪ 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡
 return(𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑠)
Algorithm 1: 𝑔𝑒𝑡_𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒_𝑠𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑒_𝑖𝑛𝑠𝑡𝑎𝑛𝑐𝑒𝑠: Algorithm for instantiating Evidence Structure for a sentence
  input : 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑄 : Evidence Structure Instance from a query sentence 𝑄
           𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝐷 : Evidence Structure Instance from a sentence 𝐷 in the corpus
  output : Similarity score between 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑄 and 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝐷
  // Checking for negation
  if 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑄 .𝑂𝐹 .𝑁 𝐸𝐺 ≠ 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝐷 .𝑂𝐹 .𝑁 𝐸𝐺 then return 0

  if 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑄 .𝐸𝐹 .𝑁 𝐸𝐺 ≠ 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝐷 .𝐸𝐹 .𝑁 𝐸𝐺 then return 0


  // Computing similarity between main predicates, using cosine similarity of their word embeddings
  𝑠𝑖𝑚𝐸 ∶= 𝐶𝑜𝑠𝑖𝑛𝑒𝑆𝑖𝑚(𝑊 𝑜𝑟𝑑𝑉 𝑒𝑐(𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑄 .𝐸𝐹 .𝑉 ), 𝑊 𝑜𝑟𝑑𝑉 𝑒𝑐(𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝐷 .𝐸𝐹 .𝑉 ))

  // Computing similarity between corresponding Evidence Objects, using cosine similarity of their phrase
       embeddings
  𝑠𝑖𝑚𝐸𝑂 ∶= 𝐶𝑜𝑠𝑖𝑛𝑒𝑆𝑖𝑚(𝑃ℎ𝑟𝑎𝑠𝑒𝑉 𝑒𝑐(𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑄 .𝑂𝐹 .𝐸𝑂), 𝑃ℎ𝑟𝑎𝑠𝑒𝑉 𝑒𝑐(𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝐷 .𝑂𝐹 .𝐸𝑂))

  // Computing similarity between other arguments, using cosine similarity of their phrase embeddings
  𝑛𝑢𝑚𝑎𝑟𝑔𝑠 ∶= 0
  𝑠𝑖𝑚𝑎𝑟𝑔𝑠 ∶= 0
  foreach 𝑎𝑟𝑔 ∈ (𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑄 .𝐸𝐹 .𝑎𝑟𝑔𝑢𝑚𝑒𝑛𝑡𝑠 − {𝑉 }) do
      if 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑄 .𝐸𝐹 .𝑎𝑟𝑔 exists then
          𝑠𝑖𝑚𝑎𝑟𝑔𝑠 ∶= 𝑠𝑖𝑚𝑎𝑟𝑔𝑠 + 𝐶𝑜𝑠𝑖𝑛𝑒𝑆𝑖𝑚(𝑃ℎ𝑟𝑎𝑠𝑒𝑉 𝑒𝑐(𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑄 .𝐸𝐹 .𝑎𝑟𝑔), 𝑃ℎ𝑟𝑎𝑠𝑒𝑉 𝑒𝑐(𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝐷 .𝐸𝐹 .𝑎𝑟𝑔))
          𝑛𝑢𝑚𝑎𝑟𝑔𝑠 ∶= 𝑛𝑢𝑚𝑎𝑟𝑔𝑠 + 1

  𝑠𝑖𝑚𝑎𝑟𝑔𝑠 ∶= 𝑠𝑖𝑚𝑎𝑟𝑔𝑠 /𝑛𝑢𝑚𝑎𝑟𝑔𝑠

  // Computing overall similarity
  𝑠𝑖𝑚𝑓 𝑖𝑛𝑎𝑙 ∶= 𝑠𝑖𝑚𝐸 × 𝑠𝑖𝑚𝑎𝑟𝑔𝑠 × 𝑠𝑖𝑚𝐸𝑂

  // The overall similarity is multiplied by the Sentence-BERT based sentence similarity between Q and D
  𝑠𝑖𝑚𝑓 𝑖𝑛𝑎𝑙 ∶= 𝑠𝑖𝑚𝑓 𝑖𝑛𝑎𝑙 × 𝐶𝑜𝑠𝑖𝑛𝑒𝑆𝑖𝑚(𝑆𝑒𝑛𝑡𝑉 𝑒𝑐(𝑄), 𝑆𝑒𝑛𝑡𝑉 𝑒𝑐(𝐷))

  return 𝑠𝑖𝑚𝑓 𝑖𝑛𝑎𝑙
      Algorithm 2: 𝑆𝑒𝑚𝑀𝑎𝑡𝑐ℎ: Algorithm for computing similarity between 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑄 and 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝐷


Table 5
Example of the proposed 𝑆𝑒𝑚𝑀𝑎𝑡𝑐ℎ algorithm in action
 Query: T h e a u t o p s y r e p o r t r e v e a l s t h a t s o m e p o i s o n o u s c o m p o u n d s a r e f o u n d i n t h e s t o m a c h o f t h e d e c e a s e d .
 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝑄 ∶ O F = [𝑂𝑉 = r e v e a l s , 𝐸𝑂 = T h e a u t o p s y r e p o r t ]; E F = [𝐸𝑉 = f o u n d , 𝐴1 = s o m e p o i s o n o u s c o m p o u n d s , 𝐿𝑂𝐶 = i n t h e
 stomach of the deceased]
 Sentence:                The report of the Chemical Examiner showed that a heavy concentration of arsenic was found in the
 viscera.
 𝐸𝑣𝑆𝑡𝑟𝑢𝑐𝑡𝐷 ∶ O F = [𝑂𝑉 = s h o w e d , 𝐸𝑂 = T h e r e p o r t o f t h e C h e m i c a l E x a m i n e r ]; E F = [𝐸𝑉 = f o u n d , 𝐴1 = a h e a v y c o n c e n t r a t i o n o f
 a r s e n i c , 𝐿𝑂𝐶 = i n t h e v i s c e r a ]
 • Similarity between main predicates, their arguments and evidence objects
 𝑠𝑖𝑚𝐸 ∶= 𝐶𝑜𝑠𝑖𝑛𝑒𝑆𝑖𝑚(𝑊 𝑜𝑟𝑑𝑉 𝑒𝑐(f o u n d ), 𝑊 𝑜𝑟𝑑𝑉 𝑒𝑐(f o u n d )) = 1.0
 𝑠𝑖𝑚𝐴1 ∶= 𝐶𝑜𝑠𝑖𝑛𝑒𝑆𝑖𝑚(𝑃ℎ𝑟𝑎𝑠𝑒𝑉 𝑒𝑐(s o m e p o i s o n o u s c o m p o u n d s ), 𝑃ℎ𝑟𝑎𝑠𝑒𝑉 𝑒𝑐(a h e a v y c o n c e n t r a t i o n o f a r s e n i c )) = 0.5469
 𝑠𝑖𝑚𝐿𝑂𝐶 ∶= 𝐶𝑜𝑠𝑖𝑛𝑒𝑆𝑖𝑚(𝑃ℎ𝑟𝑎𝑠𝑒𝑉 𝑒𝑐(i n t h e s t o m a c h o f t h e d e c e a s e d ), 𝑃ℎ𝑟𝑎𝑠𝑒𝑉 𝑒𝑐(i n t h e v i s c e r a )) = 0.3173
 𝑠𝑖𝑚𝑎𝑟𝑔𝑠 ∶= (𝑠𝑖𝑚𝐴1 + 𝑠𝑖𝑚𝐿𝑂𝐶 )/2.0 = 0.4321
 𝑠𝑖𝑚𝐸𝑂 ∶= 𝐶𝑜𝑠𝑖𝑛𝑒𝑆𝑖𝑚(𝑃ℎ𝑟𝑎𝑠𝑒𝑉 𝑒𝑐(T h e a u t o p s y r e p o r t ), 𝑃ℎ𝑟𝑎𝑠𝑒𝑉 𝑒𝑐(T h e r e p o r t o f t h e C h e m i c a l E x a m i n e r )) = 0.8641
 • Final similarity
 𝑠𝑖𝑚𝑓 𝑖𝑛𝑎𝑙 ∶= 𝑠𝑖𝑚𝐸 × 𝑠𝑖𝑚𝑎𝑟𝑔𝑠 × 𝑠𝑖𝑚𝐸𝑂 × 𝑠𝑖𝑚𝑆𝐵𝐸𝑅𝑇 = 1.0 × 0.4321 × 0.8641 × 0.607 = 0.2266 (Ranked within top 10 relevant documents)
Table 6
Evaluation of various techniques for the task of prior case retrieval. All entries are of the form (R-Prec; Avg. Precision). (Note:
Our proposed approach 𝑆𝑒𝑚𝑀𝑎𝑡𝑐ℎ is referred as 𝑆𝑀. Underlines indicate the best performing results for each query across
multiple techniques)
  What are the cases where⋯
  𝑄1 : b l o o d s t a i n s w e r e f o u n d o n c l o t h e s o f t h e d e c e a s e d .
  𝑄2 : t h e d e c e a s e d h a d a t t a c k e d s o m e p e r s o n w i t h s t i c k s .
  𝑄3 : t h e p o l i c e h a s m u r d e r e d t h e d e c e a s e d .
  𝑄4 : s o m e e v i d e n c e s h o w s t h a t t h e e x h i b i t e d g u n w a s n o t u s e d .
  𝑄5 : t h e a u t o p s y r e p o r t r e v e a l s t h a t s o m e p o i s o n o u s c o m p o u n d s a r e f o u n d i n t h e s t o m a c h o f t h e d e c e a s e d .
  𝑄6 : t h e d e c e a s e d i s a t t a c k e d w i t h a k n i f e .
  𝑄7 : a l e t t e r b y t h e d e c e a s e d r e v e a l t h a t d o w r y w a s d e m a n d e d .
  𝑄8 : a c h e q u e w a s d i s h o n o u r e d d u e t o i n s u f f i c i e n t f u n d s .
  𝑄9 : b r i b e w a s d e m a n d e d b y p o l i c e .
  𝑄10 : a s i g n a t u r e w a s f o r g e d o n a n a f f i d a v i t .

  Query         𝐵𝑀25𝑎𝑙𝑙          𝐵𝑀25𝑇           𝐵𝑀25𝐸           𝐵𝑀25𝑇 𝐸             𝑆𝐵𝑇                𝑆𝐵𝐸          𝑆𝐵𝑇 𝐸             𝑆𝑀𝑇              𝑆𝑀𝐸             𝑆𝑀𝑇 𝐸
    𝑄1         0.24; 0.26       0.06; 0.02      0.59; 0.49       0.59; 0.52       0.00; 0.01         0.24; 0.15    0.18; 0.14       0.00; 0.01       0.24; 0.16       0.24; 0.14
    𝑄2         0.25; 0.43       0.00; 0.05      0.00; 0.04       0.00; 0.06       0.00; 0.01         0.00; 0.00    0.00; 0.00       0.25; 0.14       0.25; 0.25       0.50; 0.30
    𝑄3         0.00; 0.01       0.00; 0.03      0.33; 0.33       0.33; 0.35       0.33; 0.12         0.00; 0.00    0.00; 0.09       0.33; 0.12       0.00; 0.00       0.33; 0.12
    𝑄4         0.17; 0.06       0.00; 0.01      0.00; 0.02       0.00; 0.04       0.00; 0.01         0.42; 0.25    0.42; 0.22       0.08; 0.04       0.25; 0.27       0.33; 0.29
    𝑄5         0.30; 0.43       0.10; 0.05      0.40; 0.35       0.40; 0.37       0.20; 0.15         0.70; 0.80    0.70; 0.80       0.00; 0.02       0.40; 0.40       0.40; 0.40
    𝑄6         0.31; 0.42       0.33; 0.28      0.38; 0.35       0.46; 0.52       0.23; 0.14         0.33; 0.38    0.36; 0.40       0.20; 0.18       0.28; 0.27       0.41; 0.42
    𝑄7         0.25; 0.35       0.00; 0.08      0.50; 0.54       0.50; 0.33       0.00; 0.04         0.00; 0.12    0.00; 0.09       0.25; 0.06       0.00; 0.00       0.25; 0.06
    𝑄8         0.48; 0.46       0.01; 0.09      0.67; 0.71       0.71; 0.73       0.05; 0.02         0.62; 0.67    0.62; 0.67       0.00; 0.00       0.57; 0.63       0.57; 0.64
    𝑄9         0.20; 0.23       0.20; 0.17      0.20; 0.21       0.40; 0.31       0.40; 0.39         0.20; 0.21    0.50; 0.51       0.40; 0.41       0.10; 0.12       0.50; 0.48
   𝑄10         0.50; 0.52       0.00; 0.11      0.25; 0.16       0.25; 0.21       0.00; 0.01         0.00; 0.04    0.00; 0.03       0.25; 0.13       0.50; 0.61       0.50; 0.61
   Avg         0.27; 0.32       0.08; 0.09      0.33; 0.32       0.36; 0.34       0.12; 0.09         0.25; 0.26    0.28; 0.30       0.18; 0.11       0.26; 0.27       0.40; 0.35


use 4 settings considering different sentences in each                                   6.3. Evaluation
document:
                                                                                         All the baseline techniques and our proposed technique
• 𝐵𝑀25𝑎𝑙𝑙 : All sentences
                                                                                         are evaluated using a set of queries and using certain
• 𝐵𝑀25𝑇 𝐸 : Only Testimony or Evidence sentences
                                                                                         evaluation metrics to evaluate and compare the ranked
• 𝐵𝑀25𝑇 : Only Testimony sentences
                                                                                         lists produced by each of these techniques.
• 𝐵𝑀25𝐸 : Only Evidence sentences
                                                                                         Queries: We chose 10 queries (shown in Table 6) which
• Sentence-BERT [8]: This technique is based on
                                                                                         represent cases and evidence objects of diverse nature
Siamese-BERT networks to obtain more meaningful sen-
                                                                                         (domestic violence, financial fraud etc.).
tence embeddings as compared to vanilla BERT [16].
                                                                                         Ground Truth: We created a set of gold-standard rele-
We used the pre-trained model b e r t - b a s e - n l i - s t s b - m e a n -
                                                                                         vant documents for each query using the standard pooling
t o k e n s to obtain sentence embeddings for sentences. Fol-
                                                                                         technique [17]. We ran the following techniques to pro-
lowing Ghosh et al. [1], we use the pre-trained model
                                                                                         duce a ranked list of documents for each query – 𝐵𝑀25𝑎𝑙𝑙 ,
as it is and did not fine-tune it further. This is because
                                                                                         𝐵𝑀25𝑇 𝐸 , 𝑆𝐵𝑇 𝐸 , and our proposed technique 𝑆𝑒𝑚𝑀𝑎𝑡𝑐ℎ𝑇 𝐸 .
such fine-tuning needs annotated sentence pairs with
                                                                                         We chose top 10 documents from the ranked list pro-
labels indicating whether the sentences in the pair are
                                                                                         duced by each technique. Human experts verified the
semantically similar or not. Such annotated dataset is
                                                                                         relevance of each document for the query. Finally, after
expensive to create and our aim is to avoid any depen-
                                                                                         discarding all the irrelevant documents, we got a set of
dence on manually annotated training data. Similar to
                                                                                         gold-standard relevant documents for each query2 .
Ghosh et al. [1], we used sentence embeddings obtained
                                                                                         Metrics: We used R-Precision and Average Precision as
by Sentence-BERT to compute cosine similarity between
                                                                                         our evaluation metrics [17].
a query sentence and a candidate sentence in a docu-
                                                                                              1. R-Precision (R-Prec): This calculates the the
ment. The overall similarity of a document with a query
                                                                                                  number of relevant documents observed at 𝑅.
is the maximum cosine similarity obtained for any of its
                                                                                              2. Average Precision (AP): This captures the joint
sentences with the query sentence. We use 3 settings
                                                                                                  effect of Precision and Recall. It computes preci-
considering different sentences in each document:
                                                                                                  sion at each rank of the predicted ranked list and
• 𝑆𝐵𝑇 𝐸 : Only Testimony or Evidence sentences
                                                                                                  then computes mean of these precision values.
• 𝑆𝐵𝑇 : Only Testimony sentences
• 𝑆𝐵𝐸 : Only Evidence sentences
                                                                                               2
                                                                                                   This dataset can be obtained from the authors on request
6.4. Results                                                similarity between argument phrases and presence
                                                            of co-references. Consider the following sentence for
Table 6 shows comparative evaluation results for various
                                                            which 𝑆𝑀𝑇 𝐸 incorrectly assigns a high score for query
baselines and our proposed technique. Average perfor-
                                                            𝑄5 (see Table 6) – T h e p o l i c e r e p o r t a l s o r e v e a l s t h a t
mance of 𝐵𝑀25𝑇 𝐸 is better than 𝐵𝑀25𝑎𝑙𝑙 indicating that
                                                            three pieces of pellets were found by the doctor in the
considering only Evidence and Testimony sentences for
                                                            b o d y o f d e c e a s e d M o n u . Here, except the 𝐴1 argument
representing any document, results in better prior case
                                                            (s o m e p o i s o n o u s c o m p o u n d s vs t h r e e p i e c e s o f p e l l e t s )
retrieval performance. Other two baselines 𝑆𝐵 (Sentence-
                                                            in Evidence Structure instances, other arguments
BERT) and 𝑆𝑀 (our proposed technique 𝑆𝑒𝑚𝑀𝑎𝑡𝑐ℎ) also
                                                            are similar in meaning. We get cosine similarity of
consider only Evidence and Testimony sentences rather
                                                            0.36 between s o m e p o i s o n o u s c o m p o u n d s and t h r e e p i e c e s
than considering all the sentences in a document. All
                                                            o f p e l l e t s which is misleading. It is not too low as
the baselines which consider only Testimony sentences,
                                                            compared to another case where there are semantically
perform poorly as compared to the corresponding tech-
                                                            similar argument phrases (e.g., cosine similarity between
niques using both Testimony and Evidence sentences. This
                                                            s o m e p o i s o n o u s c o m p o u n d s and a h e a v y c o n c e n t r a t i o n o f
highlights the importance of evidence information as
                                                            a r s e n i c is just 0.55 as shown in Table 5). As we are not
compared to using only witness testimony information
                                                            resolving co-references, we are missing a few relevant
for prior case retrieval as done in Ghosh et al. [1].
                                                            documents. E.g., 𝑆𝑀𝑇 𝐸 does not assign a high score for
   Considering the average performance across all the
                                                            the following document for query 𝑄3 (see Table 6) –
10 queries, our proposed technique 𝑆𝑀𝑇 𝐸 is the best per-
                                                            Instead of surrendering before the police, the deceased
forming technique in terms of both R-Prec and AP. The
                                                            had attempted to kill the police. In retaliation, he was
performance of 𝑆𝑀𝑇 𝐸 is the most consistent across the
                                                            s h o t b y t h e m . . This is because t h e m in the Evidence
diverse queries. It achieves minimum R-Prec of 0.24 (for
                                                            Structure instance for s h o t is not explicitly known to
𝑄1 ) as compared to other baselines like 𝐵𝑀25𝑎𝑙𝑙 , 𝐵𝑀25𝑇 𝐸
                                                            correspond to t h e p o l i c e in the previous sentence.
and 𝑆𝐵𝑇 𝐸 which have minimum R-Prec of 0 for some
queries. As described in Algorithm 2, 𝑆𝑀 uses Sentence-
BERT based similarity within sentences for producing 7. Conclusion and Future Work
an enhanced matching score. We experimented with a
variant of 𝑆𝑀 which does not rely on Sentence-BERT In this paper, we discussed several NLP techniques for
based similarity. This variant resulted in average R-Prec identifying evidence sentences, representing them in the
of 0.36 and MAP of 0.30 across all the 10 queries. Al- semantically rich Evidence Structure and retrieving rel-
though this is lower than 𝑆𝑀𝑇 𝐸 performance, the R-Prec evant prior cases by exploiting it. The proposed tech-
is still comparable with 𝐵𝑀25𝑇 𝐸 (avg R-Prec of 0.36) and niques are weakly supervised as they do not rely on any
better than that of 𝑆𝐵𝑇 𝐸 (avg R-Prec of 0.28).             manually annotated training data, except for the human
   For some queries, it is important to have some semantic expertise in designing the linguistic rules. Keeping in
understanding at sentence-level. For example, 𝑄4 , which mind the importance of witness testimonies in addition
contains “negation”, 𝑆𝐵 and 𝑆𝑀 can capture the query’s to evidences, we also extracted and represented the wit-
meaning in a better way. 𝑆𝑀 handles such negations ness testimonies using the same Evidence Structure. For
in a more principled manner as the Evidence Structure the application of prior case retrieval, we evaluated our
Instance captures negation as one of its arguments.         proposed technique along with several competent base-
   For 𝑆𝑀, the maximum matching score achieved for any lines, on a dataset of 10 diverse queries. We demonstrated
Evidence Structure Instance in a document, is considered that our technique performs comparably for most of the
as the overall matching score with the whole document. queries and is the best considering the overall perfor-
In contrast, 𝐵𝑀25 based techniques directly compute mance across all 10 queries. The results highlight the
matching score for the whole document as they do not contribution of evidence and testimony information in
rely on sentence structure. This is one limitation of 𝑆𝑀 improving prior case retrieval performance.
which we plan to address as a future work. However, as           In future, we plan to apply advanced representation
𝑆𝑀 computes matching scores for individual Evidence learning techniques for learning dense or embedded
Structure instances, it is able to provide better interpre- representation of an entire Evidence Structure instance.
tation for each relevant document in terms of the actual Also, we plan to automatically determine the best suited
sentences which provided the maximum matching score. retrieval technique (BM25, Sentence-BERT or SemMatch)
Analysis of errors: We analyzed cases where 𝑆𝑀𝑇 𝐸 for any query based on its nature. We plan to explore
was assigned a lower score to a relevant document or a ensemble of multiple retrieval techniques for improving
higher score to a non-relevant document. We discovered prior case retrieval performance further.
3 main reasons - missing or incorrect arguments
within Evidence Structure instances, misleading high
References                                                                           detection, in: Proceedings of the 2015 conference
                                                                                     on empirical methods in natural language process-
 [1] K. Ghosh, S. Pawar, G. Palshikar, P. Bhattacharyya,                             ing, 2015, pp. 440–450.
     V. Varma, Retrieval of prior court cases using wit-                        [12] D. Ji, P. Tao, H. Fei, Y. Ren, An end-to-end joint
     ness testimonies, JURIX (2020).                                                 model for evidence information extraction from
 [2] M. Palmer, D. Gildea, P. Kingsbury, The proposi-                                court record document, Information Processing &
     tion bank: An annotated corpus of semantic roles,                               Management 57 (2020) 102305.
     Computational linguistics 31 (2005) 71–106.                                [13] T. Gomes, M. Ladeira, A new conceptual frame-
 [3] P. Shi, J. Lin, Simple BERT models for rela-                                    work for enhancing legal information retrieval at
     tion extraction and semantic role labeling, CoRR                                the brazilian superior court of justice, in: Proceed-
     abs/1904.05255 (2019). URL: http://arxiv.org/abs/                               ings of the 12th International Conference on Man-
     1904.05255. a r X i v : 1 9 0 4 . 0 5 2 5 5 .                                   agement of Digital EcoSystems, 2020, pp. 26–29.
 [4] S. Hochreiter, J. Schmidhuber, Long short-term                             [14] J. Landthaler, B. Waltl, P. Holl, F. Matthes, Extend-
     memory, Neural computation 9 (1997) 1735–1780.                                  ing full text search for legal document collections
 [5] M. Honnibal, I. Montani, S. Van Landeghem,                                      using word embeddings., in: JURIX, 2016, pp. 73–82.
     A. Boyd, spaCy: Industrial-strength Natural Lan-                           [15] A. Trotman, A. Puurula, B. Burgess, Improvements
     guage Processing in Python, 2020. URL: https://doi.                             to bm25 and language models examined, in: Pro-
     org/10.5281/zenodo.1212303. doi:1 0 . 5 2 8 1 / z e n o d o .                   ceedings of the 2014 Australasian Document Com-
     1212303.                                                                        puting Symposium, 2014, pp. 58–65.
 [6] G. A. Miller, Wordnet: a lexical database for english,                     [16] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova,
     Communications of the ACM 38 (1995) 39–41.                                      BERT: Pre-training of deep bidirectional transform-
 [7] J. Pennington, R. Socher, C. D. Manning, Glove:                                 ers for language understanding, arXiv preprint
     Global vectors for word representation, in: Pro-                                arXiv:1810.04805 (2018).
     ceedings of the 2014 conference on empirical meth-                         [17] C. Manning, P. Raghavan, H. Schutze, Introduc-
     ods in natural language processing (EMNLP), 2014,                               tion to information retrieval, Natural Language
     pp. 1532–1543.                                                                  Engineering 16 (2010) 100–103.
 [8] N. Reimers, I. Gurevych, Sentence-BERT: Sentence
     Embeddings using Siamese BERT-networks, in:
     Proceedings of the 2019 Conference on Empiri-
     cal Methods in Natural Language Processing and
     the 9th International Joint Conference on Natural
     Language Processing (EMNLP-IJCNLP), 2019, pp.
     3973–3983.
 [9] P. Bellot, A. Doucet, S. Geva, S. Gurajada, J. Kamps,
     G. Kazai, M. Koolen, A. Mishra, V. Moriceau,
     J. Mothe, M. Preminger, E. SanJuan, R. Schenkel,
     X. Tannier, M. Theobald, M. Trappett, Q. Wang,
     Overview of INEX 2013,                                    in: P. Forner,
     H. Müller, R. Paredes, P. Rosso, B. Stein (Eds.),
     Information Access Evaluation. Multilinguality,
     Multimodality, and Visualization - 4th Interna-
     tional Conference of the CLEF Initiative, CLEF
     2013, Valencia, Spain, September 23-26, 2013.
     Proceedings, volume 8138 of Lecture Notes in
     Computer Science, Springer, 2013, pp. 269–281.
     URL: https://doi.org/10.1007/978-3-642-40802-1_27.
     doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 6 4 2 - 4 0 8 0 2 - 1 \ _ 2 7 .
[10] M.-A. Cartright, H. A. Feild, J. Allan, Evidence find-
     ing using a collection of books, in: Proceedings of
     the 4th ACM workshop on Online books, comple-
     mentary social media and crowdsourcing, 2011, pp.
     11–18.
[11] R. Rinott, L. Dankin, C. Alzate, M. M. Khapra,
     E. Aharoni, N. Slonim, Show me your evidence-an
     automatic method for context dependent evidence

</pre>