Catch Phrase Extraction From Legal Documents Using
                          Deep Neural Network
                             Sourav Das                                                       Ranojoy Barua
            Indian Institute of Engineering Science and                        Indian Institute of Engineering Science and
                        Technology,Shibpur                                                 Technology,Shibpur
                       Howrah, West Bengal                                                Howrah, West Bengal
                   sourav.dd2015@cs.iiests.ac.in                                       baruaranojoy1@gmail.com

ABSTRACT
This paper is based on finding and extracting important key phrases      2.2.1 Preprocessing
(catchphrase) from a document from which the the document can            Read the whole file remove stop words, punctuations, non-ASCII
be summarized. This is important as this will reduce time consump-       characters and numbers. Store the modified file for future use.
tion in summarization of documents. This work is realized with the
help of deep neural network to train an model for recognizing such       2.2.2 Phrase generation
important key phrases based on various calculated parameters.            Generate all potential meaningful phrase based on common gram-
                                                                         mars of different phrase.

1 INTRODUCTION                                                           2.2.3 Feature selection
The legal system depends on citation of previous cases which al-         Different features are extracted on these phrases
lows better judgment but with a huge number of cases to study,           1. Total summation of the Tf-idf value of all the words in the phrase.
the search for suitable cases becomes difficult. This problem can        Tf-idf value is calculated to find the important words from the train
divided into two parts. First is key phrase extraction and secondly      documents.
finding suitable matches based on the key phrases found in the
document. The main motive of the paper is to find efficient way          Note : All the gold standard catchwords are combined and will be
for key phrase extraction. In this approach, deep neural network         referred as super gold standard
provides an elegant way to extract catchphrases, which then can
be used to take reference from while searching for previous simi-        2. Find all the different type of part of speech present present in
lar cases.The features used include grammar, Tf-idf, position in a       super gold standard and create a standard vector with respect to
document etc. Thus extracting most important key words in the            which find the different part of speech in each phrase also keep a
document, which can then be used further as requirement. This            trace number of times each part of speech occurs.
approach can be used to minimize the required human effort. The
words are further divided based on their weights to determine it’s       3. Now multiply each part of speech in the vector created with
importance in the document.                                              its particular weight calculated from super gold standard


2 METHOD                                                                 Weight = (number of occurances of unique POS)/(total number of
2.1 Data                                                                 phrase in UGS) * 100
All files are legal documents recorded by the Supreme Court of
India. A total of 400 documents are used during the course of the        4. Find if the phrase exactly matches with any phrase from super
experiment, out of which 100 are having gold standard catchphrase        gold standard. If exact match is found find number of times exact
(catchphrase by human) which are used for training, other 300 are        match occurs.
used to generate output.
                                                                         5. Find the number of times the unique words word of the phrase
2.2 Procedure                                                            matches with an word in the super gold standard file also keep a
In this experiment for each file a set of potential meaningful phrases   track of how many individual words of the phrase found a match
are created and then are classified using deep neural network. Steps     in the super gold standard.
involved
1. Preprocessing                                                         Now combine all these features to create a large feature vector.
2. Create potential meaningful phrases based on common grammar
of phrases                                                               Note : Not all the feature mentioned are used as it will lead to a
3. Feature selection                                                     large feature vector and some features cover the other features so
4. Label the vectors                                                     only a set of these features is used as the final feature vector
5. Classification
6. Training the model                                                    2.2.4 Labeling
We intend to apply supervised learning but presently we have an               4 CONCLUSION
feature vector without any label. So, we need apply labels to apply           In this work we have developed a framework where if the network
supervised learning. We label the data in two class i.e., phrase eli-         is trained by using previous cases then it will produce catchphrase
gible for catchphrase and not eligible for catch phrase.                      which in turn will help to find precedent much faster than human
So the criteria for labeling are                                              can do.
1. Should have Tf-idf value greater than 0.0
2. The phrase should hold one part of speech belonging to super
gold standard.                                                                REFERENCES
3. The phrase should have at least one word matching with super               [1] A. Mandal, K. Ghosh, A. Bhattacharya, A. Pal and S. Ghosh. Overview of the FIRE
gold standard.                                                                    2017 track: Information Retrieval from Legal Documents (IRLeD). The LATEX Com-
                                                                                  panion. In Working notes of FIRE 2017 Forum for Information Retrieval Evalua-
4. May or may not have exact match with a phrase with super gold                  tion, Bangalore, India, December 8-10, 2017, CEUR Workshop Proceedings. CEUR-
standard.                                                                         WS.org, 2017.
                                                                              [2] Bird, Steven, Edward Loper and Ewan Klein (2009), Natural Language Processing
                                                                                  with Python. The LATEX Companion. OReilly Media Inc.
Conditions for labeling                                                       [3] Martín Abadi et al. TensorFlow The LATEX Companion. Large-scale machine learn-
1. If all conditions satisfies labeled as valid.                                  ing on heterogeneous systems, 2015. Software available from tensorflow.org.
2. If only condition-4 satisfies labeled as valid.
3. If all other condition satisfies other than condition-4 it is valid.
4. Else it will be not valid.

2.2.5 Classification
For classification purpose we have used deep neural network, the
network have three layer deep. It have two internal layer each hav-
ing 28 nodes and an output layer having 2 nodes.
Architecture of each layer

Output = input . (weight) + bias

Then sigmoid function is applied to squash all the values between
0 and 1.
The model is trained for 200 epochs. During training gradient de-
scent optimizer is used to optimize the result. Softmax layer is ap-
plied on the output generated by the output layer to obtain the
final result.


3 RESULT
Accuracy of the model is calculated by divided the 100 available
samples in a set 70-30. 70 are used for training and rest for testing
and the accuracy ranges from (76-82) percentage.

Final result obtained from the evaluation produces
1. Mean R precision : 0.0262223166667
2. Mean Precision at 10 : 0.0246666666667
3.Mean Precision at 20 :0.0208333333333
4. Mean Recall at 100 : 0.0868031271116
5. Mean Average Precision : 0.0618723522608
6. Overall Recall : 0.160995639731

Some of the way by which result can be improved by increasing
the number of epochs, getting more features, combine the result of
multiple run, using Adam Optimizer instead of Gradient Descent
optimizer.


                                                                          2