<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Catch Phrase Extraction From Legal Documents Using Deep Neural Network</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Sourav</forename><surname>Das</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Indian Institute of Engineering Science and Technology</orgName>
								<address>
									<settlement>Shibpur Howrah</settlement>
									<region>West Bengal</region>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Ranojoy</forename><surname>Barua</surname></persName>
							<email>baruaranojoy1@gmail.com</email>
							<affiliation key="aff1">
								<orgName type="institution">Indian Institute of Engineering Science and Technology</orgName>
								<address>
									<settlement>Shibpur Howrah</settlement>
									<region>West Bengal</region>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Catch Phrase Extraction From Legal Documents Using Deep Neural Network</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">92B360ADF9FCB3D392F3EF647B8C53FC</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T03:15+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper is based on finding and extracting important key phrases (catchphrase) from a document from which the the document can be summarized. This is important as this will reduce time consumption in summarization of documents. This work is realized with the help of deep neural network to train an model for recognizing such important key phrases based on various calculated parameters.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>The legal system depends on citation of previous cases which allows better judgment but with a huge number of cases to study, the search for suitable cases becomes difficult. This problem can divided into two parts. First is key phrase extraction and secondly finding suitable matches based on the key phrases found in the document. The main motive of the paper is to find efficient way for key phrase extraction. In this approach, deep neural network provides an elegant way to extract catchphrases, which then can be used to take reference from while searching for previous similar cases.The features used include grammar, Tf-idf, position in a document etc. Thus extracting most important key words in the document, which can then be used further as requirement. This approach can be used to minimize the required human effort. The words are further divided based on their weights to determine it's importance in the document. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">METHOD</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.3">Feature selection</head><p>Different features are extracted on these phrases 1. Total summation of the Tf-idf value of all the words in the phrase. Tf-idf value is calculated to find the important words from the train documents.</p><p>Note : All the gold standard catchwords are combined and will be referred as super gold standard 2. Find all the different type of part of speech present present in super gold standard and create a standard vector with respect to which find the different part of speech in each phrase also keep a trace number of times each part of speech occurs.</p><p>3. Now multiply each part of speech in the vector created with its particular weight calculated from super gold standard Weight = (number of occurances of unique POS)/(total number of phrase in UGS) * 100 4. Find if the phrase exactly matches with any phrase from super gold standard. If exact match is found find number of times exact match occurs. 5. Find the number of times the unique words word of the phrase matches with an word in the super gold standard file also keep a track of how many individual words of the phrase found a match in the super gold standard. Now combine all these features to create a large feature vector.</p><p>Note : Not all the feature mentioned are used as it will lead to a large feature vector and some features cover the other features so only a set of these features is used as the final feature vector</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.4">Labeling</head><p>We intend to apply supervised learning but presently we have an feature vector without any label. So, we need apply labels to apply supervised learning. We label the data in two class i.e., phrase eligible for catchphrase and not eligible for catch phrase. So the criteria for labeling are 1. Should have Tf-idf value greater than 0.0 2. The phrase should hold one part of speech belonging to super gold standard. 3. The phrase should have at least one word matching with super gold standard. 4. May or may not have exact match with a phrase with super gold standard.</p><p>Conditions for labeling 1. If all conditions satisfies labeled as valid. 2. If only condition-4 satisfies labeled as valid.</p><p>3. If all other condition satisfies other than condition-4 it is valid. 4. Else it will be not valid. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">RESULT</head><p>Accuracy of the model is calculated by divided the 100 available samples in a set 70-30. 70 are used for training and rest for testing and the accuracy ranges from (76-82) percentage.</p><p>Final result obtained from the evaluation produces 1. Mean R precision : 0.0262223166667 2. Mean Precision at 10 : 0.0246666666667 3.Mean Precision at 20 :0.0208333333333 4. Mean Recall at 100 : 0.0868031271116 5. Mean Average Precision : 0.0618723522608 6. Overall Recall : 0.160995639731 Some of the way by which result can be improved by increasing the number of epochs, getting more features, combine the result of multiple run, using Adam Optimizer instead of Gradient Descent optimizer.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>2. 2</head><label>2</label><figDesc>.5 Classification For classification purpose we have used deep neural network, the network have three layer deep. It have two internal layer each having 28 nodes and an output layer having 2 nodes. Architecture of each layer Output = input . (weight) + bias Then sigmoid function is applied to squash all the values between 0 and 1. The model is trained for 200 epochs. During training gradient descent optimizer is used to optimize the result. Softmax layer is applied on the output generated by the output layer to obtain the final result.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>PreprocessingRead the whole file remove stop words, punctuations, non-ASCII characters and numbers. Store the modified file for future use.2.2.2 Phrase generationGenerate all potential meaningful phrase based on common grammars of different phrase.</figDesc><table><row><cell>2.2.1</cell></row><row><cell>2.1 Data</cell></row><row><cell>All files are legal documents recorded by the Supreme Court of</cell></row><row><cell>India. A total of 400 documents are used during the course of the</cell></row><row><cell>experiment, out of which 100 are having gold standard catchphrase</cell></row><row><cell>(catchphrase by human) which are used for training, other 300 are</cell></row><row><cell>used to generate output.</cell></row><row><cell>2.2 Procedure</cell></row><row><cell>In this experiment for each file a set of potential meaningful phrases</cell></row><row><cell>are created and then are classified using deep neural network. Steps</cell></row><row><cell>involved</cell></row><row><cell>1. Preprocessing</cell></row><row><cell>2. Create potential meaningful phrases based on common grammar</cell></row><row><cell>of phrases</cell></row><row><cell>3. Feature selection</cell></row><row><cell>4. Label the vectors</cell></row><row><cell>5. Classification</cell></row><row><cell>6. Training the model</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">CONCLUSION</head><p>In this work we have developed a framework where if the network is trained by using previous cases then it will produce catchphrase which in turn will help to find precedent much faster than human can do.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">). The L AT E X Companion</title>
		<author>
			<persName><forename type="first">A</forename><surname>Mandal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Ghosh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bhattacharya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ghosh</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Overview of the FIRE 2017 track: Information Retrieval from Legal Documents</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting><address><addrLine>Bangalore, India</addrLine></address></meeting>
		<imprint>
			<publisher>IRLeD</publisher>
			<date type="published" when="2017">December 8-10, 2017. 2017</date>
		</imprint>
	</monogr>
	<note>Working notes of FIRE 2017 Forum for Information Retrieval Evaluation</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Natural Language Processing with Python</title>
		<author>
			<persName><forename type="first">Steven</forename><surname>Bird</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Edward</forename><surname>Loper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ewan</forename><surname>Klein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The L AT E X Companion</title>
				<imprint>
			<publisher>OReilly Media Inc</publisher>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">Martín</forename><surname>Abadi</surname></persName>
		</author>
		<ptr target="org" />
		<title level="m">Large-scale machine learning on heterogeneous systems</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note>TensorFlow The L AT E X Companion. Software available from tensorflow</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
