<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Language Model for Legal Retrieval and Bert-based Model for Rhetorical Role Labeling for Legal Judgments</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yujie Xu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tang Li</string-name>
          <email>itangkk@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhongyuan Han</string-name>
          <email>hanzhongyuan@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Foshan University</institution>
          ,
          <addr-line>Foshan</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Heilongjiang Institute of Technology</institution>
          ,
          <addr-line>Harbin</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper mainly introduces the solutions to the two tasks published in FIRE2020(forum for information retrieval evaluation), For Task1 (statistic retrieval), The task 1 is, for a given query(description of a situation), identify relevant statutes and prior-cases. This task includes two subtasks, Task1a (identifying relevant prior cases) and Task1b (identifying relevant statistics), For these two subtasks, we use the language model to score each query, and then rank them according to the score. For Task2(rhetorical role labeling for legal judgments), It requires us to classify sentences. We think it's a multi-classification problem, and finally, we use Bert to complete the classification task. In the final result, the score of Task1a is 0.125, the score of Task1b is 0.2003, and the accuracy of Task2 is 0.549. The results and experiments show that the language model is a better way to complete Task1 and Bert is better to complete task2.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Legal Retrieval</kwd>
        <kwd>Rhetorical Role Labeling</kwd>
        <kwd>Language Model</kwd>
        <kwd>Bert</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>With the gradual maturity of the social legal system, laws and regulations have become more
detailed and standardized, and people's demand for legal aid is gradually increasing. Compared
with the low efficiency of artificial legal aid, a series of advantages such as high efficiency and
high accuracy of artificial intelligence legal aid is gradually highlighted.</p>
      <p>In this regard, FIRE2020 proposed a task and named it AILA2020 (artificial intelligence for
legal assistance) to improve the legal aid of artificial intelligence, For the two subtasks in Task1,
they provided 10 short descriptions of a legal situation, 3000 judgments delivered by the
Supreme Court of India and 197 statutes (Sections of Acts) from Indian law. Retrieve the most
relevant case documents or statements for a given query. For Task 2 they provide 8096
rhetorical sentences as training data and 1905 test data, Among them, 8096 training data
sentences are classified into one of the following seven semantic segments / rhetorical roles,
They are Fact, Ruling by Lower Court, Argument, Statute, Precedent, Ratio of the decision and
Ruling by Present Court We are required to divide 1905 test data into these seven categories.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methods</title>
    </sec>
    <sec id="sec-3">
      <title>2.1 Methods for Task1a</title>
      <p>Fig. 1 describes our method of solving Task1 with the language model.</p>
    </sec>
    <sec id="sec-4">
      <title>2.2 Methods for Task1b</title>
      <p>
        For Task1b, we not only choose the method of Task1a, but also choose the Jelinek-Mercer
language model[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to calculate the similarity between query and document. Eq.(2) is used to
calculate the similarity between query and document. Before retrieval, we also process the given
data by word-based n-gram and character-based n-gram. We find that the performance of
character-based n-gram is much better than that of word-based n-gram, while the n-gram based
on 2-7 achieves the best result.
      </p>
      <p>p(w |ˆD )   D pML (w |ˆD )  (1   D ) p(w |ˆC )
（2）</p>
    </sec>
    <sec id="sec-5">
      <title>2.3 Methods for Task2</title>
    </sec>
    <sec id="sec-6">
      <title>3. Experimental Setting</title>
      <p>2 http://www.lemurproject.org/
3 https://github.com/bojone/bert4keras</p>
      <p>For Task2, we think that this is a multi-classification problem. We use the Logistic
Regression Model and lighter version of Bert3. The weight of bert is set with uncased_ L-12_
H768_ A-12. 8096 training data without any processing are used to fine-tuning the Bert model
with the parameters(max-Len = 124, batch_ Size = 24, units = 7, epoch = 2).</p>
    </sec>
    <sec id="sec-7">
      <title>3.1 Parameter Selection</title>
      <p>For Task1a, we tried to take different values of μ and λof Two-Stage Language Model to observe
their effects. In fig. 2, we take the results of different λ when μ= 1500 and μ= 2500.</p>
      <p>For Task1b, we tried to take different values of μ and λ of Two-Stage Language Model to
observe their effects. In Fig. 3, we take the results of different λ when μ=1500 and μ= 2500.</p>
      <p>In Task1b, we tried to select different n-gram processing to observe their effects. The experimental
results are shown in Table 1.</p>
      <p>In conclusion, μ = 2500, λ = 0.8 can achieve better results. In Task1b processing, character level
2 + 3 + 4 + 5 + 6 + 7 has higher accuracy than other results.</p>
    </sec>
    <sec id="sec-8">
      <title>3.2 Experimental Results</title>
      <p>
        For Task 1, we submitted three groups of results. Table 2 and Table 3 are the experimental
results of the test data we submitted[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>For Task 2, we submitted two sets of results. Table 4 shows the experimental results of the test
data we submitted.</p>
    </sec>
    <sec id="sec-9">
      <title>4. Conclusions</title>
      <p>This paper introduces the evaluation method we used in FIRE2020 AILA. Compared with other
results, we have exposed many deficiencies. For the task of identifying related prior cases, the final
evaluation results show that BM25 and TF-IDF are better than our methods. while for
multiclassification tasks, Bert shows good results.</p>
    </sec>
    <sec id="sec-10">
      <title>5. Acknowledgements</title>
      <p>This work is supported by National Social Science Fund of China (No.18BYY125).</p>
    </sec>
    <sec id="sec-11">
      <title>6. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>ChengXiang</given-names>
            <surname>Zhai</surname>
          </string-name>
          , John Lafferty, “
          <article-title>Two-Stage Language Models for Information Retrieval”</article-title>
          .
          <source>The Twenty-Fifth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Guodong</given-names>
            <surname>Ding</surname>
          </string-name>
          , Bin Wang, “
          <article-title>GJM-2: A Special Case of General Jelinek-Mercer Smoothing Method for Language Modeling Approach to Ad Hoc IR ”</article-title>
          . Information Retrieval Technology,
          <source>Second Asia Information Retrieval Symposium</source>
          ,
          <string-name>
            <surname>AIRS</surname>
          </string-name>
          <year>2005</year>
          ,
          <string-name>
            <given-names>Jeju</given-names>
            <surname>Island</surname>
          </string-name>
          , Korea,
          <source>October 13-15</source>
          ,
          <year>2005</year>
          , Proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <article-title>Paheli and Mehta, Parth and Ghosh, Kripabandhu and Ghosh, Saptarshi and Pal, Arindam and Bhattacharya, Arnab and Majumder, Prasenjit, Overview of the FIRE 2020 AILA track: Artificial Intelligence for Legal Assistance</article-title>
          .
          <source>Proceedings of FIRE 2020 - Forum for Information Retrieval Evaluation</source>
          . Hyderabad, India, December,
          <year>2020</year>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>