On the importance of legal catchphrases in precedence retrieval∗

                                Edwin Thuma†                                            Nkwebi P. Motlogelwa‡
                         University of Botswana                                           University of Botswana
                     Department of Computer Science                                   Department of Computer Science
                          Gaborone, Botswana                                               Gaborone, Botswana
                         thumae@mopipi.ub.bw                                             motlogel@mopipi.ub.bw

ABSTRACT                                                                  training set and current cases. Queries were formulated using le-
This paper presents our working notes for FIRE 2017, Informa-             gal catchphrases from the most relevant documents in the training
tion Retrieval from Legal documents -Task 2 (Precedence retrieval).       set.
Common Law Systems around the world recognize the importance                 For retrieval, we deployed the parameter-free DPH term weight-
of precedence in Law. In making decisions, Judges are obliged to          ing model to score and rank prior cases. Moreover investigate whether
consult prior cases that had already been decided to ensure that          taking the dependence of query terms in to consideration when
there is no divergence in treatment of similar situations in differ-      ranking and scoring prior cases could improve thr retrieval perfor-
ent cases. Our approach was to investigate the effectiveness of us-       mance.Previous work has shown that incorporating term depen-
ing legal catchphrases in precedence retrieval. To improve retrieval      dency in scoring and ranking documents could significantly im-
performance, we incorporated term dependency in our retrieval. In         prove the retrieval performance [4]. In addition we deployed query
addition, we investigate the effects of deploying query expansion         expansion where the original queries are reformulated by adding
on the retrieval performance. Our results show an improvement in          new terms to investigate its impact on retrieval performance. Pre-
the retrieval performance when we incorporate term dependence             vious research has shown that query expansion could improve re-
in scoring and ranking prior cases. However, we see a degradation         trieval effectiveness [1].
in the retrieval performance when we deploy query expansion.
                                                                             This paper is structured as follows. Section 2 contains a back-
                                                                          ground on algorithms used. Section 3 describes the experimental
KEYWORDS
                                                                          setup. In Section 4, we describe the methodologies used for the 3
Precedent retrieval, term dependency, query expansion, legal catch-       runs submitted by team UB_Botswana_Legal for Task 2. Section 5
phrases                                                                   presents results and discussions.

                                                                          2 BACKGROUND
1 INTRODUCTION
                                                                          In this section, we begin by presenting a brief but essential back-
Common Law Systems around the world recognize the importance
                                                                          ground on the different algorithms used in our experimental in-
of precedence in Law. In making decisions, Judges are obliged to
                                                                          vestigation and evaluation. We start by describing the TF-IDF term
align their decisions to relevant prior cases. Thus, when lawyers
                                                                          weighting model, in Section 2.1. We then describe DPH term weight-
prepare for cases, they research extensively on prior cases. In ad-
                                                                          ing model in Section 2.2, Lastly we describe the Bose-Einstein 1
dition, Judges also consult prior cases that had already been de-
                                                                          (Bo1) model for query expansion in Section 2.3.
cided to ensure that a similar situation is treated similarly in every
case [3]. This can be overwhelming due to the enormous number             2.1 TF-IDF term weighting model
of prior cases and length of each. Task 2 of the Information re-
trieval in Legal Documents track (precedence retrieval), explores         In our experimental setup, we used T F -IDF [5] to score and rank
techniques and tools that could ease this task [3]. In general, prece-    documents. Generally, T F -IDF calculates the weight of each term
dence retrieval will retrieve a ranked list of prior cases that are re-   t as the product of its term frequency (t f ) weight in document d
lated to a certain current case.                                          and its inverse document frequency (id ft ).
                                                                                                              ∑                      N
                                                                                    scoreT F -I D F (d, Q ) =   1 + log(t f ) ∗ log      (1)
   In this work we investigate the importance of legal catchphrases                                                                 d ft
                                                                                                        t ∈Q
as queries in precedent retrieval. These legal catchphrases are ex-
tracted from current cases. To achieve this, we used a training set
of documents provided for Task 1 (catchphrase extraction) where
case documents have corresponding gold standard catchphrase. We
used Term Frequency-Inverse Document Frequency (TF-IDF) term
weighting model to identify similarity between documents in the


∗ On the importance of legal catchphrases in precedence retrieval
† Lecturer, Department of Computer Science, University of Botswana
‡ Lecturer, Department of Computer Science, University of Botswana
   Where:                                                                                                                               using the full Potter stemming algorithm, and stopword removal
      • t f is the term frequency of term t in document d.                                                                              using terrier stopword list.
      • d ft is the document frequency of term t - the number of
        documents in the collection that the term t occurs in.                                                                          4 METHODOLOGY
      • id f = log dNf is the inverse document frequency of term t                                                                      4.1 query formulation
                      t
        in a collection of N documents                                                                                                  Query Generation For the different Runs
                                                                                                                                           For all the runs in this task, we indexed the 100 case documents
2.2 DPH Term Weighting Model                                                                                                            provided in task1, which had the corresponding catchphrases us-
Our baseline system used the parameter-free DPH term weight-                                                                            ing Terrier-4.2 IR platform. During indexing, each case document
ing model from the Divergence from Randomness (DFR) frame-                                                                              was first tokenised and stopwords were removed using the Terrier
work [2]. The DPH term weighting model calculates the score of a                                                                        stopword list. Each token was then stemmed using the full Porter
document d for a given query Q as follows:                                                                                              stemming algorithm.
                         ∑                    (                                                                           )             For each current case provided in task 2, We used the TF-IDF term
  score DP H (d, Q ) =   t ∈Q qt f · norm ·
                                                               avд_l
                                               t f · log((t f · l ) · ( tNf c )) + 0.5 · log(2 · π · t f · (1 − t M LE ))     (2)
                                                                                                                                        weighting model in Terrier 4.2 to score and rank the indexed case
where qt f , t f and t f c are the frequencies of the term t in the query                                                               documents. Each case document was first pre-processed using the
Q , in the document d and in the collection C respectively. N is num-                                                                   same pre-processing steps undertaken during indexing. After re-
ber of documents in the collection C, avд_l is the average length of                                                                    trieving the top 40 case documents, we formulated queries for each
documents in the collection C and l is the length of the document                                                                       current case using the gold standard catchphrases that appear in
                  tf                               (1−t M LE ) 2
d. t M LE = l and norm =                              t f +1 .
                                                                                                                                        these ranked case documents and also in the current case docu-
                                                                                                                                        ment used for retrieval.
2.3 Bose-Einstein 1 (Bo1) model for Query
    Expansion                                                                                                                           4.2 UB_Botswana_Legal_Task2_R1
                                                                                                                                        Using the formulated queries, we deployed the parameter-free DPH
In our experimental investigation and evaluation, we used the Terrier-
                                                                                                                                        Divergence from Randomness term weighting model in Terrier-
4.0 Divergence from Randomness (DFR) Bose-Einstein 1 (Bo1) model
                                                                                                                                        4.2 IR platform as our baseline system to score and rank the prior
to select the most informative terms from the topmost documents
                                                                                                                                        cases.
after a first pass document ranking. The DFR Bo1 model calculates
the information content of a term t in the top-ranked documents
as follows [1]:                                                                                                                         4.3 UB_Botswana_Legal_Task2_R2
                                                                                                                                        We used UB_Botswana_Legal_Task2_R1 as the baseline system. In
                                                      1 + Pn (t )                                                                       addition, we deployed the Sequential Dependence (SD) variant of
                 w (t ) = t f x · log2                            + log2 (1 + Pn (t ))                                        (3)
                                                        Pn (t )                                                                         the Markov Random Fields for term dependence. Sequential De-
                                      tfc                                                                                               pendence only assumes a dependence between neighbouring query
                                                   Pn (t ) =        (4)                                                                 terms [4, 6]. In this work, we used a default window size of 2 as
                                       N
                                                                                                                                        provided in Terrier-4.2.
where Pn (t ) is the probability of t in the whole collection, t f x is
the frequency of the query term in the top x ranked documents,
t f c is the frequency of the term t in the collection, and N is the
                                                                                                                                        4.4 UB_Botswana_Legal_Task2_R3
number of documents in the collection.                                                                                                  We used UB_Botswana_Legal_Task2_R1 as the baseline system. In
                                                                                                                                        addition, we deployed a simple pseudo-relevance feedback on the
3 EXPERIMENTAL SETUP                                                                                                                    local collection. We used the Bo1 model for query expansion to
                                                                                                                                        select the 10 most informative terms from the top 3 ranked docu-
3.1 Document Collection                                                                                                                 ments after the first pass retrieval (on the local collection) [6]. We
In this work we use the document collection provided by the In-                                                                         then performed a second pass retrieval on this local collection with
formation Retrieval in Legal Documents track organizers. It com-                                                                        the new expanded query.
prised 200 documents representing current cases and 2000 docu-
ments representing prior cases [3]. For each current case, the ob-                                                                      5 RESULTS AND DISCUSSION
jective is to retrieve relevant ranked prior cases such that the most
                                                                                                                                        This work set out to investigate the importance of legal catchphrases
relevant appear at the top of the list and the least relevant at the
                                                                                                                                        in precedence retrieval. The results of our submission in Table 1
bottom together with scores for prior case.
                                                                                                                                        were evaluated by the organizing committee of this task. Since
                                                                                                                                        most of the catchphrases were bi-grams and tri-grams, our exploita-
                                                                                                                                        tion of sequential term dependency variant for the Markov Ran-
3.2 Precedence Retrieval Experimental                                                                                                   dom Fields for term dependence led to improvements in retrieval
    Platform                                                                                                                            performance in terms of Mean Average Precision and Precision @
For all our experimental evaluation, we used Terrier-4.2, an open                                                                       10. Our attempt to improve retrieval performance using query ex-
source Information Retrieval (IR) platform. Documents were pre-                                                                         pansion resulted in degradation in the retrieval performance. We
processed before indexing: tokenising text, stemming each token                                                                         suspect this might have been to due to query drift.
                                                                                                                                    2
                                      Table 1: Fire 2017 UB-Botswana Legal Run Evaluation results for Task 2

               Run ID                                       Mean Average Precision           Mean reciprocal Rank   Precision@10   Recall@100
               UB_Botswana_Legal_Task2_R3                   0.1671                           0.3478                 0.1225         0.559
               UB_Botswana_Legal_Task2_R1                   0.1487                           0.3506                 0.112          0.546
               UB_Botswana_Legal_Task2_R2                   0.1078                           0.3017                 0.0785         0.43


REFERENCES
[1] G. Amati. 2003. Probabilistic Models for Information Retrieval based on Diver-
    gence from Randomness. University of Glasgow,UK, PhD Thesis (June 2003), 1 –
    198.
[2] G. Amati, E. Ambrosi, M. Bianchi, C. Gaibisso, and G. Gambosi. 2007. FUB, IASI-
    CNR and University of Tor Vergata at TREC 2007 Blog Track. In Proceedings of
    the 16th Text REtrieval Conference (TREC-2007). Text REtrieval Conference (TREC),
    Gaithersburg, Md., USA., 1–10.
[3] Arpan Mandal, Kripabandhu Ghosh, Arnab Bhattacharya, Arindam Pal, and Sap-
    tarshi Ghosh. 2017. Overview of the FIRE 2017 track: Information Retrieval from
    Legal Documents (IRLeD). In Working notes of FIRE 2017 - Forum for Information
    Retrieval Evaluation (CEUR Workshop Proceedings). CEUR-WS.org.
[4] Donald Metzler and W. Bruce Croft. 2005. A Markov Random Field Model for
    Term Dependencies. In Proceedings of the 28th Annual International ACM SIGIR
    Conference on Research and Development in Information Retrieval (SIGIR ’05). ACM,
    New York, NY, USA, 472–479. https://doi.org/10.1145/1076034.1076115
[5] Juan Ramos. 1999. Using TF-IDF to Determine Word Relevance in Document
    Queries. (1999).
[6] Edwin Thuma, Nkwebi Peace Motlogelwa, and Tebo Leburu-Dingalo. 2017. UB-
    Botswana Participation to CLEF eHealth IR Challenge 2017: Task 3 (IRTask1 : Ad-
    hoc Search). In Working Notes of CLEF 2017 - Conference and Labs of the Evaluation
    Forum, Dublin, Ireland, September 11-14, 2017. http://ceur-ws.org/Vol-1866/paper_
    73.pdf


                                                                                         3