<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Machine Learning Approaches for Catchphrase Extraction in Legal Documents</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Tshepho</forename><surname>Koboyatshwene</surname></persName>
							<email>tshepho.koboyatshwene@mopipi.ub.bw</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Botswana</orgName>
								<address>
									<settlement>Gaborone</settlement>
									<country key="BW">Botswana</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Moemedi</forename><surname>Lefoane</surname></persName>
							<email>moemedi.lefoane@mopipi.ub.bw</email>
							<affiliation key="aff1">
								<orgName type="institution">University of Botswana</orgName>
								<address>
									<settlement>Gaborone</settlement>
									<country key="BW">Botswana</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Lakshmi</forename><surname>Narasimhan</surname></persName>
							<email>lakshmi.narasimhan@mopipi.ub.bw</email>
							<affiliation key="aff2">
								<orgName type="institution">University of Botswana</orgName>
								<address>
									<settlement>Gaborone</settlement>
									<country key="BW">Botswana</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Machine Learning Approaches for Catchphrase Extraction in Legal Documents</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">2B6FC1B3169492942652CEFEFDBA57FA</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T03:16+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Catchphrase extraction</term>
					<term>Legal domain</term>
					<term>IRLeD</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The purpose of this research was to automatically extract catchphrases given a set of Legal documents. For this task, our focus was mainly on the Machine learning approaches: a comparative approach was used between the unsupervised and supervised approaches. The idea was to compare the different approaches to see which one of the two was comparatively better for automatic catchphrase extraction given a dataset of Legal documents. To perform this, two open source text mining software were used; one for the unsupervised approach while another one was used for the supervised approach. We then fine tuned some parameters for each tool before extracting catchphrases. The training dataset was used when fine tuning parameters in order to find optimal parameters that were then used for generating the final catchphrases. Different metrics were used to evaluate the results. We used the most common measures in Information Extraction which include Precision and Recall and the results from the two Machine learning approaches were compared. In general our results showed that the supervised approach performed far much better than the unsupervised approach.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>Automatic keyword or catch phrase extraction is an area of research that seems like it has not been exploited much. Determining catchphrases manually can be time consuming, expensive and usually require expertise to perform the work <ref type="bibr" target="#b0">[1]</ref>, this therefore has motivated research towards automatic keyword extraction. There are different terminologies used to define terms that represent the most relevant or useful information contained in a document such as: key phrases, key segments, key terms and keywords <ref type="bibr" target="#b0">[1]</ref>. In the FIRE2017 Information Retrieval from Legal Documents (IRLeD) task <ref type="bibr" target="#b1">[2]</ref>, the word "catchphrase" is used instead of a keyword or key phrase in the Legal domain.</p><p>Keyword Extraction involves automatically searching and identifying keywords within a document that best describes the subject of the document <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b5">6]</ref>. Methods used for automatic keyword extraction can be classified into different approaches. According to Beliga et al and Lima et al <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b5">6]</ref>, the methods can use Simple statistical approaches, Linguistics approaches, Machine Learning approaches among others.</p><p>As the name suggests, Simple statistical approaches are very simple, they do not need any training and are language and domain independent. Keywords can be identified by using statistics of the word such as word frequency, word co-occurrences, term frequency-inverse document frequency (TF-IDF), N-gram statistics. The disadvantage with using this approach is that in some domains such as health and medical, the most important keyword may appear only once <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b5">6]</ref>. Linguistic approaches looks at linguistic features of words, sentence and document such as lexical, syntactic structure and semantic analysis <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b5">6]</ref>. Machine Learning approaches consists of both unsupervised and supervised. See Section 2. Other approaches consist of a combination of the methods described above and could also incorporate heuristic knowledge such as the position, the length, the layout feature of terms <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b5">6]</ref>. This paper is organized as follows: We first presented related work for the supervised and unsupervised Machine learning approaches mainly focusing on Rapid Automatic Keyword Extraction, RAKE <ref type="bibr" target="#b4">[5]</ref> and Multi-purpose automatic topic indexing, MAUI <ref type="bibr" target="#b3">[4]</ref>, followed by the approach we suggested which included all the experimental setups performed. Thirdly we outlined a brief overview of measures used for evaluating the results. We then presented and discussed the results. Lastly we concluded and briefly talked about possible future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">RELATED WORK</head><p>According to Lima et al and Rose et al <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6]</ref>, RAKE is an unsupervised Machine Learning approach which does not require any training and works by first selecting candidates keywords. Lima et al and Rose et al <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6]</ref> outlined RAKE's input parameters consisting of a stop list, a set of phrase delimiters, and a set of word delimiters. Firstly, the document is partitioned into candidate keywords using the phrase and word delimiters. After the selection of candidate keywords a graph of word co-occurrences is then created. Each candidate keywords is then assigned a score. Several metrics were used to calculate the score namely: word frequency, f req(w ), word degree, deд(w ) and the ratio of word degree to word frequency defined as [ratio = deд(w ) f r eq (w ) ]. Candidate keywords are then ranked starting with the highest.</p><p>According to Medelyan <ref type="bibr" target="#b3">[4]</ref> MAUI was build based on four opensource software components: the Keyphrase extraction algorithm (Kea) used for phrase filtering and computing n-gram extractions, Weka used for creating topic indexing models and applying them to new documents, Jena used for incorporating controlled vocabularies coming from external sources and Wikipedia Miner used for accessing Wikipedia data. The four open-source software are used together with other classes to form a single topic indexing algorithm used to generate candidate topics, to compute their features, to build the topic indexing model and to apply the model to new document <ref type="bibr" target="#b3">[4]</ref>. To create a model, a training dataset with known keyphrases is required. The only keyphrases that would then be classified will be the ones that have already been incorporated in the training data. Candidate phrases are selected in three steps namely: cleaning of input, phrase identification and lastly case-folding and stemming <ref type="bibr" target="#b6">[7]</ref>. MAUI has a parameter that can be varied in order to control the size of the training set. Some candidates catchphrases are discarded based on their frequency of occurrence before creating a model. This will therefore reduce the size of the model <ref type="bibr" target="#b3">[4]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">PROPOSED APPROACH</head><p>A keyword extraction library called RAKE <ref type="bibr" target="#b4">[5]</ref> was used for the unsupervised approach while MAUI <ref type="bibr" target="#b3">[4]</ref> was used for the supervised approach. RAKE <ref type="bibr" target="#b4">[5]</ref> and MAUI <ref type="bibr" target="#b3">[4]</ref> consisted of parameters that were fine tuned before generating catchphrases. The approach used in this research was to set RAKE and MAUI parameters to different values. Then use part of the training dataset with known catchphrases for evaluation. The results of each approach were evaluated individually in order to determine optimal parameters that would be used for extracting catchphrases on the testing data. We then generated the final catchphrases using the testing data provided and the optimal parameters that yielded better results on each approach.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Experimental Setup 3.2 Dataset</head><p>For the IRLeD task, the dataset provided contained the following:</p><p>(1) Train docs -consisted of 100 case statements.</p><p>(2) Train catches -contained the gold standard catchwords for each of the 100 case statements provided in the Train docs. (3) Test docs -contained 300 test case statements. For each of these 300 statements, a set of catchphrases was generated.</p><p>The training dataset was randomly divided into two groups consisting of 90 documents and 10 documents from the dataset. The 90 documents dataset was only used for training the supervised machine learning approach while the remaining 10 documents dataset were used for testing both the unsupervised and supervised methodologies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Experiment 1 -RAKE parameter tuning on training dataset</head><p>RAKE consisted of the following parameters which were fine tuned for different experiments in order to find the optimal parameter values that yielded the best performance on the training set provided.</p><p>Table <ref type="table" target="#tab_0">1</ref> provides more details on parameters experimented with as well as performance results.</p><p>(1) The number of character can be varied in order to select keywords with a certain number of characters represented as No of Char/word in Table <ref type="table" target="#tab_0">1</ref>.</p><p>(2) The number of phrases for each keyword can be tuned to varies words represented as No of word/phrase in Table <ref type="table" target="#tab_0">1</ref> (3) The number of times a keyword appears in a given text can be limited to a certain number represented as keyword frequency in Table <ref type="table" target="#tab_0">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Experiment 2 -MAUI parameter tuning on training dataset</head><p>As it was done in Section 3.3, parameter turning experiments were performed in order to find the optimal parameters for MAUI. The only parameter tuned for MAUI was to vary the frequency of occurrence of each keyword and discard some keywords based on that.</p><p>The default MAUI parameter discards any candidate phrase(s) that appeared less than two times. See Table <ref type="table" target="#tab_1">2</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5">Final Run 1: Using RAKE</head><p>RAKE was used to generate catchphrases for the Test documents provided with parameters tuned to 3 3 1: meaning each word had atleast 3 characters, each phrase had at most 3 words and each keyword appeared in the text at least once. UBIRLeD_1 -Catchphrases were generated for each document together with the corresponding scores for each catchphrase.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.6">Final Run 2: Using MAUI</head><p>The supervised machine learning approach (MAUI) was used where a classifier was trained by using all the training documents provided with known training catchphrases in the training set. No candidates were discarded prior to training the model. We then used the trained model to generate catchphrases for the test documents. UBIRLeD_2: 150 catchphrases were generated for each test document. The highest ranked catchphrases appeared first for each test document.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">EVALUATION</head><p>Several measures were used to evaluate the results of the two approaches. In this experiments we looked at Recall, Precision and Mean Average Precision among others.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Recall Measure</head><p>According to Manning et al <ref type="bibr" target="#b2">[3]</ref> Recall is defined as the fraction of relevant documents that are retrieved. In this task, we were interested in the fraction of relevant catchphrases retrieved in each document. The formula for Recall is given in Figure <ref type="figure">1</ref>, where tp represents true positive; these are relevant retrieved catchphrases and f n represents false negative; these are relevant but not retrieved catchphrases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Recall = tp tp + f n</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 1: Recall equation as described by Manning et al [3]</head><p>Recall@K would be the proportion of relevant catchphrases that have been retrieved in the top-K. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Precision Measure</head><p>Precision is described as the fraction retrieved documents which are relevant according to Manning et al <ref type="bibr" target="#b2">[3]</ref>. In this time precision will be the fraction of retrieved catchphrases that are relevant. The formula for Precision is given in Figure <ref type="figure" target="#fig_0">2</ref>, where tp represents true positive and f p represents false positives; a situation in which nonrelevant catchphrases have been retrieved as relevant. Precision@K, would be the proportion of top-K catchphrases that are relevant. Mean Precision@K, will then cover the Mean of the Precision@K of each test document in the whole collection.</p><formula xml:id="formula_0">Precision = tp tp + f p</formula><p>We used Manning et al <ref type="bibr" target="#b2">[3]</ref>'s ideas when finding the Mean R precision. Computing Mean R precision required knowledge of all catchphrases that were relevant on each test document where R represented the total number of expected relevant catchphrases for a particular test document. R was then used as the cutoff for calculating precision. Precision would be equal to recall at the Rth position. Suppose that R relevant catchphrases were expected for test document Td1, and only r relevant catchphrases were retrieved at position R. We would only calculate precision of the top R catchphrases retrieved using the formula given in Figure <ref type="figure" target="#fig_1">3</ref>. The Mean R precision would be the mean of R precision of all the test documents (queries)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Mean Average Precision Measure</head><p>Mean Average Precision (MAP) value is defined as "the arithmetic mean of average precision values for individual information needs" Manning et al <ref type="bibr" target="#b2">[3]</ref>. The formula for Mean Average Precision (MAP) is given in Figure <ref type="figure" target="#fig_2">4</ref>, where MAP (Q ) is mean of average precision across the whole collection list of queries being the test documents in this task. Precision(R jk ) is the precision score of ranked retrieved catchphrases from the top results until position k for test document j. For each of the test documents, a set of ranked catchphrases was produced, which was then used to compute precision and average precision (AP). Average precision is the mean of the precision scores after each relevant catchphrase is retrieved.</p><formula xml:id="formula_1">RPrecision = r R</formula><formula xml:id="formula_2">MAP (Q ) = 1 |Q | |Q | ∑ j=i 1 m j m j ∑ k=1</formula><p>Precision(R jk ) </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">RESULTS</head><p>Consider the results displayed in Table <ref type="table" target="#tab_2">3</ref> UBIRLed_1 and UBIRLed_2 rows contain performance measures obtained after using the generated catchphrases from RAKE and MAUI respectively as mentioned in . Using the performance measures stated in Section 4, we observed that MAUI; the supervised approach, performed far much better than RAKE; the unsupervised approach. Comparing the results based on Mean Precision@10, we discovered that the proportion of top 10 catchphrases which were relevant was more effective using MAUI, MAUI result was 0.254 while RAKE result was 0.013. We also looked at the Mean Recall@100, MAUI still outperformed RAKE by retrieving more relevant catchphrases in the top 100. When finding MAP, the assumption was that we were interested in finding more relevant catchphrases for each test documents and hence we computed the Mean of average precision values of each test documents. The value of MAP obtained for MAUI was higher than the value computed using RAKE results. The Mean R precision value for MAUI had far much better proportion of retrieved catchphrases which were relevant considering the cutoff point which was equals the number of relevant catchphrases expected for each and every document provided in the testing dataset. Overall recall, RAKE was better although that was the only measure good compared to MAUI's performance. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">CONCLUSION AND FUTURE WORK</head><p>In this paper we had proposed and compared two Machine Learning approaches namely: RAKE and MAUI for the unsupervised and supervised approaches respectively. In the proposed approach, fine tuning parameters before generating candidate catchphrases resulted in obtaining the optimal parameters for each method used.</p><p>Based on the optimal parameters used for generating the final catchphrases, overall MAUI had high performance compared to RAKE. The differences in the performance was observed in most areas. RAKE achieved the highest recall but the precision was very low compared to MAUI. We strongly believe that Legal domain is an area which still requires a lot of work on Information Extraction.</p><p>For the future work, we plan to experiment with different techniques used on the supervised approach in Machine learning and evaluate the performance after applying the different techniques.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Precision equation as described by Manning et al [3]</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: R Precision equation as described by Manning et al [3]</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: MAP equation as given by Manning et al [3]</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 : Results for RAKE parameter tuning</head><label>1</label><figDesc></figDesc><table><row><cell></cell><cell cols="2">RAKE Experiments For Parameter Tuning</cell><cell></cell><cell></cell></row><row><cell cols="5">Test Number No of Char/word No of words/phrase keyword frequency Recall</cell></row><row><cell>1</cell><cell>5</cell><cell>3</cell><cell>4</cell><cell>5.65</cell></row><row><cell>2</cell><cell>3</cell><cell>3</cell><cell>1</cell><cell>25.78</cell></row><row><cell>3</cell><cell>3</cell><cell>3</cell><cell>2</cell><cell>19.64</cell></row><row><cell>4</cell><cell>3</cell><cell>3</cell><cell>3</cell><cell>13.04</cell></row><row><cell>5</cell><cell>3</cell><cell>3</cell><cell>4</cell><cell>8.62</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 : Results for MAUI parameter tuning</head><label>2</label><figDesc></figDesc><table><row><cell cols="2">MAUI Experiments For Parameter Tuning</cell><cell></cell></row><row><cell cols="3">Test Number Frequency of phrases to keep Recall</cell></row><row><cell>1</cell><cell>1</cell><cell>68.27</cell></row><row><cell>2</cell><cell>2</cell><cell>48.62</cell></row><row><cell>3</cell><cell>3</cell><cell>31.24</cell></row><row><cell>4</cell><cell>10</cell><cell>6.03</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 : Final Results for RAKE and MAUI using Test documents</head><label>3</label><figDesc></figDesc><table><row><cell></cell><cell></cell><cell cols="2">RAKE and MAUI Results</cell><cell></cell><cell></cell></row><row><cell cols="4">Evaluation Metrics Mean R precision Mean Precision@10 Mean Recall@100</cell><cell>MAP</cell><cell>Overall Recall</cell></row><row><cell>UBIRLed_1</cell><cell>0.02316392684</cell><cell>0.01366666667</cell><cell>0.1723154757</cell><cell cols="2">0.04634794783 0.4992190452</cell></row><row><cell>UBIRLed_2</cell><cell>0.1901020309</cell><cell>0.2543333333</cell><cell>0.3050612978</cell><cell>0.3703664676</cell><cell>0.3259790763</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">An Overview of Graph-Based Keyword Extraction Methods and Approaches</title>
		<author>
			<persName><forename type="first">Slobodan</forename><surname>Beliga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ana</forename><surname>Metrovi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sanda</forename><surname>Martini-Ipi</surname></persName>
		</author>
		<ptr target="http://hrcak.srce.hr/140857" />
	</analytic>
	<monogr>
		<title level="j">Journal of information and organizational sciences</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="20" />
			<date type="published" when="2015-06">2015. June 2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Overview of the FIRE 2017 track: Information Retrieval from Legal Documents (IRLeD)</title>
		<author>
			<persName><forename type="first">Arpan</forename><surname>Mandal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kripabandhu</forename><surname>Ghosh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Arnab</forename><surname>Bhattacharya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Arindam</forename><surname>Pal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Saptarshi</forename><surname>Ghosh</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Working notes of FIRE 2017 -Forum for Information Retrieval Evaluation</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Christopher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Prabhakar</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hinrich</forename><surname>Raghavan</surname></persName>
		</author>
		<author>
			<persName><surname>Schütze</surname></persName>
		</author>
		<ptr target="http://nlp.stanford.edu/IR-book/information-retrieval-book.html" />
		<title level="m">Introduction to Information Retrieval</title>
				<meeting><address><addrLine>Cambridge, UK</addrLine></address></meeting>
		<imprint>
			<publisher>Cambridge University Press</publisher>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">Olena</forename><surname>Medelyan</surname></persName>
		</author>
		<ptr target="http://cds.cern.ch/record/1198029Presented" />
		<title level="m">Human-competitive automatic topic indexing</title>
				<imprint>
			<date type="published" when="2009-07">2009. 2009. July 2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Automatic Keyword Extraction from Individual Documents</title>
		<author>
			<persName><forename type="first">Stuart</forename><surname>Rose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dave</forename><surname>Engel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nick</forename><surname>Cramer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wendy</forename><surname>Cowley</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2010">2010. 2010</date>
			<biblScope unit="page" from="1" to="20" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">KEYWORD EXTRACTION: A COM-PARATIVE STUDY USING GRAPH BASED MODEL AND RAKE</title>
		<author>
			<persName><forename type="first">Lima</forename><surname>Subramanian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Karthik</surname></persName>
		</author>
		<idno type="DOI">10.21474/IJAR01/3616</idno>
		<ptr target="https://doi.org/10.21474/IJAR01/3616" />
	</analytic>
	<monogr>
		<title level="j">Int. J. of Adv. Res</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="1133" to="1137" />
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">Ian</forename><forename type="middle">H</forename><surname>Witten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gordon</forename><forename type="middle">W</forename><surname>Paynter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Eibe</forename><surname>Frank</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Carl</forename><surname>Gutwin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Craig</forename><forename type="middle">G</forename><surname>Nevill-Manning</surname></persName>
		</author>
		<idno type="arXiv">arXiv:cs/9902007</idno>
		<ptr target="http://arxiv.org/abs/cs/9902007" />
		<title level="m">KEA: Practical Automatic Keyphrase Extraction</title>
				<imprint>
			<date type="published" when="1999-02-05">1999. 5 Feb. 1999</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
