<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Query Revaluation Method For Legal Information Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Liang Liu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lexiao Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhongyuan Han</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Beihang University</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Foshan University</institution>
          ,
          <addr-line>Foshan</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Heilongjiang Institute of Technology</institution>
          ,
          <addr-line>Harbin</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we introduced in detail the method of implementing the task of identifying relevant prior cases in artificial intelligence for legal assistance. For the task, we transformed the problem into a retrieval task and used the BM25 retrieval model to try to make it perform better in this task. The improved method wins second place on MAP and the second place on BPREF.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Legal Information Retrieval</kwd>
        <kwd>Language Model</kwd>
        <kwd>BM25</kwd>
        <kwd>IDF</kwd>
        <kwd>Identifying Relevant Prior Case</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Methods</title>
      <p>For the task of Identifying relevant prior cases, we treated it as an information retrieval task and
submitted three runs with BM25.
2.1.
According to the statistics, the query, which is a description of the situation in Query_doc, contains
over 500 words on average, and the document, which the prior case in Object_casedocs, contains over
3,000 words on average. For traditional retrieval, the query sentence in the task is too long.</p>
      <p>Consequently, we should preprocess the data to shorten the length of the sentence without losing
its main meaning. As we all know, the common method is to remove all stop words, we also chose
this method and converted the text to lowercase. Finally, we use Lucene toolkit4 to index the
document.
2.2.</p>
      <p>double_liu_2020_1
For this submission, we chose the BM25 model and improved it by modifying its relevant calculation,
as follows:</p>
      <p>n
BM25(D, Q)   IDF (qi ) 
i1</p>
      <p>
        TF (qi , D)  (k1  1)
TF (qi , D)  k1  (1  b  b 
| D |
avgdl
)
In this formula (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), we given the definition of BM25, where qi is the word in Q, |D| is the length of
document D, and avgdl is the average document length in the text collection, k1 and b are the
parameters of BM25. In this task, we set the parameter k1=2.99 and b=0.65.
      </p>
      <p>Furthermore, we modify the relevant computation to get an improved BM25.</p>
      <p>
        rel(D, Q)  BM 25(D, Q)  BM 25(D, Q' ) (
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
where Q is a query sentence with stop words removed, and Q' is a keyword that is further extracted
from Q, here we choose the IDF algorithm to sort the words in Q, and form the top m% words into Q'.
m is a free parameter, and we set m=50.
2.3.
      </p>
      <p>double_liu_2020_2
Inspired by the former, we split the method in double_liu_2020_1 into two sub-methods as our
double_liu_2020_2 and double_liu_2020_3.</p>
      <p>
        In the double_liu_2020_2 submission, we chose the first half of formula (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) to form our method
one, as follows:
      </p>
      <p>rel(D, Q)  BM 25(D, Q)
All the other settings are followed double_liu_2020_1.
2.4.</p>
      <p>
        double_liu_2020_3
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
For this submission, we choose the second half of formula (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) to form method three, as shown below:
rel(D, Q)  BM 25(D, Q' ) (
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
All the other settings are also followed double_liu_2020_1.
2.5.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Other methods</title>
    </sec>
    <sec id="sec-4">
      <title>2.5.1 Cosine Similarity</title>
      <p>We also tried other experiments, but the results were not satisfactory.</p>
      <p>For this method, we want to rank the cosine similarity between the query sentence and the document
as an indicator. Firstly, we use the bag-of-words model to construct word vectors for the query
sentence and the document respectively and then calculate the cosine similarity and rank. The formula
for cosine similarity is as follows:
4 https://lucene.apache.org/
rel(D, Q)  Cos( A, B) </p>
      <p>A  B
| A |  | B |</p>
      <p>
Where A and B are two vectors.
2.5.2 Generalized Jaccard Similarity
n
 ( Ai  Bi )
i1
n
 ( Ai )2 
i1
n
 (Bi )2
i1
In this method, we choose to use generalized Jaccard similarity as an indicator to sort.
n
 min(Ai , Bi )
rel(D, Q)  J( A, B)  i1
n
 max( Ai , Bi )
i1</p>
    </sec>
    <sec id="sec-5">
      <title>2.5.3 Cosine with Jaccard</title>
      <p>In this method, we improve the previous two methods and introduce the parameter k. The specific
formula is as follows:</p>
      <p>rel(D,Q)  k  Cos( A, B)  (1 k)  J ( A, B) (7)
where k is a free parameter, and we set k=0.3</p>
    </sec>
    <sec id="sec-6">
      <title>3. Results 3.1.</title>
    </sec>
    <sec id="sec-7">
      <title>Evaluation Measures</title>
      <p>Standard Information retrieval metrics like Measures like Precision, Recall, Mean Average Precision
(MAP)5, Discounted Cumulative Gain(DCG) and Mean Reciprocal Rank(MRR) will be used for
evaluation in the task.
3.2.</p>
    </sec>
    <sec id="sec-8">
      <title>Evaluation Results</title>
      <p>Run_ID
double_liu_2020_3
double_liu_2020_1
double_liu_2020_2</p>
      <p>Jaccard
Cosine_Jaccard_k</p>
      <p>
        Cosine
5 https://trec.nist.gov/pubs/trec16/appendices/measures.pdf
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
(6)
      </p>
    </sec>
    <sec id="sec-9">
      <title>4. Conclusion</title>
      <p>In this task, we describe a method that uses an improved BM25 to identify relevant priors, and it can
be concluded that using certain algorithms to extract keywords will improve efficiency. Compared
with other submissions of the task, our improved BM25 model can get the second place in MAP and
BPREF.</p>
    </sec>
    <sec id="sec-10">
      <title>5. Acknowledgments</title>
      <p>This work is supported by National Social Science Fund of China (No.18BYY125).</p>
    </sec>
    <sec id="sec-11">
      <title>6. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Mandal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghosh</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhattacharya</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghosh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Overview of the fire 2017 irled trac k: Information retrieval from legal documents//</article-title>
          <source>Proceedings of FIRE 2017 - Forum for Information Retrieval Evaluation</source>
          ,
          <year>2017</year>
          :
          <fpage>63</fpage>
          -
          <lpage>68</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Bhattacharya</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghosh</surname>
            ,
            <given-names>K</given-names>
          </string-name>
          , Ghosh,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Majumder</surname>
          </string-name>
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <source>Overv iew of the FIRE 2019 AILA Track: Artificial Intelligence for Legal Assistance//Proceedings of F IRE 2019 - Forum for Information Retrieval Evaluation</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ning</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kong</surname>
          </string-name>
          , L.,
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>Y</given-names>
          </string-name>
          .,
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>Z</given-names>
          </string-name>
          .:
          <article-title>Fire2019@aila: Legal information ret rieval using improved bm25//</article-title>
          <source>Proceedings of FIRE 2019 - Forum for Information Retrieval Evaluation</source>
          ,
          <year>2019</year>
          :
          <fpage>40</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Robertson</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaragoza</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taylor M. Simple</surname>
          </string-name>
          <article-title>BM25 extension to multiple weighted fields//Procee dings of the thirteenth ACM international conference on Information and knowledge managemen t</article-title>
          .
          <year>2004</year>
          :
          <fpage>42</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Bhattacharya.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <article-title>Overview of the Fire 2020 AILA track: Artificial Intelligence for Legal Assistance</article-title>
          .
          <source>In Proc. of FIRE 2020 - Forum for Information Retrieval Evaluation</source>
          , Hyderabad, India,
          <source>December 16-20</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>