<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Catch Phrase Extraction From Legal Documents Using Deep Neural Network</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sourav Das</string-name>
          <email>sourav.dd2015@cs.iiests.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ranojoy Barua</string-name>
          <email>baruaranojoy1@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Indian Institute of Engineering Science and, Technology</institution>
          ,
          <addr-line>Shibpur, Howrah, West Bengal</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper is based on finding and extracting important key phrases (catchphrase) from a document from which the the document can be summarized. This is important as this will reduce time consumption in summarization of documents. This work is realized with the help of deep neural network to train an model for recognizing such important key phrases based on various calculated parameters.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>The legal system depends on citation of previous cases which
allows better judgment but with a huge number of cases to study,
the search for suitable cases becomes dificult. This problem can
divided into two parts. First is key phrase extraction and secondly
ifnding suitable matches based on the key phrases found in the
document. The main motive of the paper is to find eficient way
for key phrase extraction. In this approach, deep neural network
provides an elegant way to extract catchphrases, which then can
be used to take reference from while searching for previous
similar cases.The features used include grammar, Tf-idf, position in a
document etc. Thus extracting most important key words in the
document, which can then be used further as requirement. This
approach can be used to minimize the required human efort. The
words are further divided based on their weights to determine it’s
importance in the document.</p>
    </sec>
    <sec id="sec-2">
      <title>METHOD</title>
      <p>2.1 Data
All files are legal documents recorded by the Supreme Court of
India. A total of 400 documents are used during the course of the
experiment, out of which 100 are having gold standard catchphrase
(catchphrase by human) which are used for training, other 300 are
used to generate output.</p>
      <sec id="sec-2-1">
        <title>2.2 Procedure</title>
        <p>In this experiment for each file a set of potential meaningful phrases
are created and then are classified using deep neural network. Steps
involved
1. Preprocessing
2. Create potential meaningful phrases based on common grammar
of phrases
3. Feature selection
4. Label the vectors
5. Classification
6. Training the model
Weight = (number of occurances of unique POS)/(total number of
phrase in UGS) * 100
4. Find if the phrase exactly matches with any phrase from super
gold standard. If exact match is found find number of times exact
match occurs.
5. Find the number of times the unique words word of the phrase
matches with an word in the super gold standard file also keep a
track of how many individual words of the phrase found a match
in the super gold standard.</p>
        <p>Now combine all these features to create a large feature vector.
Note : Not all the feature mentioned are used as it will lead to a
large feature vector and some features cover the other features so
only a set of these features is used as the final feature vector
We intend to apply supervised learning but presently we have an
feature vector without any label. So, we need apply labels to apply
supervised learning. We label the data in two class i.e., phrase
eligible for catchphrase and not eligible for catch phrase.
So the criteria for labeling are
1. Should have Tf-idf value greater than 0.0
2. The phrase should hold one part of speech belonging to super
gold standard.
3. The phrase should have at least one word matching with super
gold standard.
4. May or may not have exact match with a phrase with super gold
standard.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Conditions for labeling</title>
        <p>1. If all conditions satisfies labeled as valid.
2. If only condition-4 satisfies labeled as valid.
3. If all other condition satisfies other than condition-4 it is valid.
4. Else it will be not valid.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.2.5 Classification</title>
        <p>For classification purpose we have used deep neural network, the
network have three layer deep. It have two internal layer each
having 28 nodes and an output layer having 2 nodes.</p>
        <p>Architecture of each layer
Output = input . (weight) + bias
Then sigmoid function is applied to squash all the values between
0 and 1.</p>
        <p>The model is trained for 200 epochs. During training gradient
descent optimizer is used to optimize the result. Softmax layer is
applied on the output generated by the output layer to obtain the
ifnal result.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 RESULT</title>
      <p>Accuracy of the model is calculated by divided the 100 available
samples in a set 70-30. 70 are used for training and rest for testing
and the accuracy ranges from (76-82) percentage.</p>
      <p>Final result obtained from the evaluation produces
1. Mean R precision : 0.0262223166667
2. Mean Precision at 10 : 0.0246666666667
3.Mean Precision at 20 :0.0208333333333
4. Mean Recall at 100 : 0.0868031271116
5. Mean Average Precision : 0.0618723522608
6. Overall Recall : 0.160995639731
Some of the way by which result can be improved by increasing
the number of epochs, getting more features, combine the result of
multiple run, using Adam Optimizer instead of Gradient Descent
optimizer.</p>
    </sec>
    <sec id="sec-4">
      <title>4 CONCLUSION</title>
      <p>In this work we have developed a framework where if the network
is trained by using previous cases then it will produce catchphrase
which in turn will help to find precedent much faster than human
can do.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mandal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pal</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          .
          <article-title>Overview of the FIRE 2017 track: Information Retrieval from Legal Documents (IRLeD)</article-title>
          .
          <source>The LATEX Companion. In Working notes of FIRE</source>
          <year>2017</year>
          <article-title>Forum for Information Retrieval Evaluation, Bangalore</article-title>
          , India, December 8-
          <issue>10</issue>
          ,
          <year>2017</year>
          ,
          <string-name>
            <given-names>CEUR</given-names>
            <surname>Workshop</surname>
          </string-name>
          <article-title>Proceedings</article-title>
          . CEURWS.org,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Bird</surname>
          </string-name>
          , Steven, Edward Loper and Ewan
          <string-name>
            <surname>Klein</surname>
          </string-name>
          (
          <year>2009</year>
          ),
          <article-title>Natural Language Processing with Python. The LATEX Companion</article-title>
          .
          <article-title>OReilly Media Inc</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Martín</given-names>
            <surname>Abadi</surname>
          </string-name>
          et al.
          <source>TensorFlow The LATEX Companion. Large-scale machine learning on heterogeneous systems</source>
          ,
          <year>2015</year>
          .
          <article-title>Software available from tensorflow</article-title>
          .
          <source>org.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>