<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Keyphrase Generation Technique Based upon Keyphrase Extraction and Reasoning on Loosely Structured Ontologies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dario De Nart</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlo Tasso</string-name>
          <email>carlo.tassog@uniud.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Arti cial Intelligence Lab Department of Mathematics and Computer Science University of Udine</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Associating meaningful keyphrases to documents and web pages is an activity that can greatly increase the accuracy of Information Retrieval and Personalization systems, but the growing amount of text data available is too large for an extensive manual annotation. On the other hand, automatic keyphrase generation, a complex task involving Natural Language Processing and Knowledge Engineering, can signi cantly support this activity. Several di erent strategies have been proposed over the years, but most of them require extensive training data, which are not always available, su er high ambiguity and di erences in writing style, are highly domain-speci c, and often rely on a wellstructured knowledge that is very hard to acquire and encode. In order to overcome these limitations, we propose in this paper an innovative unsupervised and domain-independent approach that combines keyphrase extraction and keyphrase inference based on loosely structured, collaborative knowledge such as Wikipedia, Wordnik, and Urban Dictionary. Such choice introduces a higher level of abstraction in the generated KPs that allows us to determine if two texts deal with similar topics even if they do not share a word.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Due to the constant growth of the amount of text data available on the Web
and in digital libraries, the demand for automatic summarization and real-time
information ltering has rapidly increased. However, such systems need
metadata that can precisely and compactly represent the content of the document.
As broadly discussed in literature and proven by web usage analysis [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], is
particularly convenient for such metadata to come in the form of KeyPhrases (KP),
since they can be very expressive (much more than single Keywords), pretty
much straightforward in their meaning, and have a high cognitive plausibility,
because humans tend to think in terms of KPs rather than single Keywords. In
the rest of this paper we will refer to KP generation as the process of
associating a meaningful set of KPs to a given text, regardless to their origin, while we
will call KP extraction the act of selecting a set of KP from the text and KP
inference the act of associating to the text a set of KP that may not be found
inside it. KP generation is a trivial and intuitive task for humans, since anyone
can tell at least the main topics of a given text, or decide whether it belongs to
a certain domain (news item, scienti c literature, narrative, etc., ...) or not, but
it can be extremely hard for a machine since most of the documents available
lack any kind of semantic hint.
      </p>
      <p>Over the years several authors addressed this issue proposing di erent
approaches towards both KP extraction and inference, but, in our opinion, each
one of them has severe practical limitations that prevent massive employment of
automatic KP generation in Information Retrieval, Social Tagging, and Adaptive
Personalization. Such limitations are the need of training data, the
impossibility of associating to a given text keyphrases which are not already included in
that text, an high domain speci city, and the need of structured, detailed, and
expansive domain knowledge coded in the form of a thesaurus or an ontology.</p>
      <p>In this paper we propose an unsupervised KP generation method that
combines KP Extraction and KP inference based on Ontology Reasoning upon
knowledge sources that though not being formal ontologies can be seen as loosely
structured ones, in order to associate to any given text a meaningful and detailed
set of keyphrases.</p>
      <p>The rest of the paper is organized as follows: in Section 2 we brie y introduce
some related works; in Section 3 we present our keyphrase extraction technique;
in Section 4 we illustrate our keyphrase inference technique; in Section 5 we
discuss some experimental results and, nally, in Section 6 we conclude the
paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>Many works over the past few years have discussed di erent solutions for the
problem of automatically tagging documents and Web pages as well as the
possible applications of such technologies in the elds of Personalization and
Information Retrieval in order to signi cantly reduce information overload and increase
accuracy. Both keyphrase extraction and inference have been widely discussed in
literature. Several di erent keyphrase extraction techniques have been proposed,
which usually are structured into two phases:
{ a candidate phrase identi cation phase, in which all the possible phrases are
detected in the text;
{ a selection phase in which only the most signi cant of the above phrases are
chosen as keyphrases.</p>
      <p>
        The wide span of proposed methods can be roughly divided into two distinct
categories:
{ Supervised approaches : the underlying idea of these methods is that KP
Extraction can be seen as a classi cation problem and therefore solved with a
su cient amount of training data (manually annotated) and machine
learning algorithms [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Several authors addressed the problem in this direction
[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and many systems that implement supervised approaches are available,
such as KEA [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], Extractor2, and LAKE [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. All the above systems can
be extremely e ective and, as far as reliable data sets are available, can be
awlessly applied to any given domain [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. However, requiring training data
in order to work properly, implies two major drawbacks: (i) the quality of
the extraction process relies on the quality of training data and (ii) a model
trained on a speci c domain just won't t another application domain unless
is trained again.
{ Unsupervised approaches : this second class of methods eliminates the need
for training data by selecting candidate KP according to some ranking
strategy. Most of the proposed systems rely on the identi cation of noun phrases
(i.e. phrases made of just nouns) and then proceed with a further
selection based on heuristics such as frequency of the phrase [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] or upon phrase
clustering [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. A third approach proposed by [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], exploits a
graphbased ranking model algorithm, bearing much similarity to the notorious
Page Rank algorithm, in order to select signi cant KPs and identify related
terms that can be summarized by a single phrase. All the above techniques
share the same advantage over the supervised strategies, that is being truly
domain independent, since they rely on general principles and heuristics and
therefore there is no need for training data. However, such generalist
approaches may not always lead to excellent results, especially when dealing
with peculiar documents whose structure does not satisfy the assumptions
that drive the KP extraction process.
      </p>
      <p>
        Hybrid approaches have been proposed as well, incorporating semi-supervised
domain knowledge in an otherwise unsupervised extraction strategy [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], but
still remain highly domain-speci c. Keyphrase extraction, however, is severely
limited by the fact it can ultimately return only words contained in the input
document, which are highly prone to ambiguity and subject to the nuances
of di erent writing styles (e.g: an author can write \mining frequent patterns"
where another one would write \frequent pattern mining" ). Keyphrase inference
can overcome these limitations and has been widely explored in literature as well,
spanning from systems that simply combine words appearing in the text in order
to construct rather than extract phrases [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to systems that assign Keyphrases
that may built with terms that never appear in the document. In the latter case,
KPs come from a controlled dictionary, possibly an ontology; in such case, a
classi er is trained in order to nd which entries of the exploited dictionary may
t the text [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. If the dictionary of possible KPs is an ontology, its structure
can be exploited in order to provide additional evidence for inference [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and,
by means of ontological reasoning, evaluate relatedness between terms [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] a KP inference technique is discussed, which is based on a very speci c
domain OWL ontology and which combines both KP Extraction and inference,
in the context of a vast framework for personalized document annotation. KP
inference based on dictionaries, however, is strongly limited by the size, the
domain coverage, and the speci city level of the considered dictionary.
      </p>
    </sec>
    <sec id="sec-3">
      <title>System Overview</title>
      <p>
        In order to test our approach and to support our claims we developed a new
version of the system presented in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] which introduces an original innovation, i.e.
the exploitation of a number of generalist online External Knowledge Sources,
rather than a formal ontology, in order to improve extraction quality and infer
meaningful KPs not included in the input text but preserving domain
independence.
      </p>
      <p>In Figure 1 the overall organization of the proposed system is presented. It
is constituted by the following main components:
{ A KP Extraction Module (KPEM ), devoted to analyse the text end extract
from it meaningful KPs. It is supported by some linguistic resources, such
as a POS tagger (for the English Language) and a Stopwords Database and
it accesses online some External Knowledge Sources (EKS ) mainly exploited
in order to provide support to the candidate KPs identi ed in the text (as
explained in the following section). The KPEM receives in input an
unstructured text and it produces in output a ranked list of KPs, which is stored in
an Extracted Keyphrases Data Base(EKPDB ).
{ A KP Inference Module (KPIM ), which works on the KP list produced
by the KPEM and it is devoted to infer new KPs, (possibly) not already
included in the input text. It relies on some ontological reasoning based on
the access to the External Knowledge Sources, exploited in order to identify
concepts which are related to the concepts referred to by the KPs previously
extracted by the KPEM. Inferred KPs are stored in the Inferred KP Data
Base (IKPDB ).</p>
      <p>The access to the online External Knowledge Sources is provided by a
Generalized Knowledge Gateway (GKG). Both the EKPDB and the IKPDB can be
accessed through Web Services by external applications, providing in such a way
and advanced KP Generation service to interested Web users, which can exploit
such capability in other target applications.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Phrase Extraction</title>
      <p>
        KPEM is an enhanced version of DIKPE, the unsupervised, domain independent
KP extraction approach described in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In a nutshell, DIKPE
generates a large set of candidate KPs; the exploited approach then merges di erent
types of knowledge in order to identify meaningful concepts in a text, also trying
to model a human-like KP assignment process. In particular we use:Linguistic
Knowledge (POS tagging, sentence structure, punctuation); Statistical
Knowledge (frequency, tf/idf,...); knowledge about the structure of a document
(position of the candidate KP in the text, title, subtitles, ...); Meta-knowledge
provided by the author (html tags,...); knowledge coming from online external
knowledge sources, useful for validating candidate keyphrases which have been
socially recognized, for example, in collaborative wikis (e.g. Wikipedia, Wordnik,
and other online resources).
      </p>
      <p>By means of the above knowledge sources, each candidate phrase, is
characterized by a set of features, such as, for example:
{ Frequency : the frequency of the phrase in the text;
{ Phrase Depth: at which point of the text the phrase occurs for the rst time,
the sooner it appears, the higher the value;
{ Phrase Last Occurrence: at which point of the text the phrase occurs for the
last time, the later it appears, the higher the value;
{ Life Span: the fraction of text between the rst and the last occurrence of
the phrase;
{ POS value: a parameter taking into account the grammatical composition of
the phrase, excluding some patterns and assigning higher priority to other
patterns (typically, for example but not exclusively, it can be relevant to
consider the number of nouns in the phrase over the number of words in the
phrase).
{ WikiFlag : a parameter taking into account the fact that the phrase is or is
not an entry of collaborative external knowledge sources (EKS).
A weighted mean of the above features, called Keyphraseness is then computed
and the KPs are sorted in descending keyphraseness order. The weight of each
feature can be tuned in order to t particular kinds of text, but, usually, a
generalist preset can be used with good results. The topmost n KPs are nally
suggested.</p>
      <p>In this work, we extended the DIKPE system with the GKG to access EKS,
allowing access to multiple knowledge sources at the same time. We also added
a more general version of the WikiFlag feature.This feature is computed as
follows: if the phrase matches an entry in at least one of the considered knowledge
sources, the its value is set to 1, otherwise the phrase is split into single terms
and the WikiFlag value is the percentage corresponding to the number of terms
that have a match in at least one of the considered knowledge sources. By
doing so, a KP that does not match as phrase, but is constituted by terms that
match as single words, still gets a high score, but lower than a KP that features
a perfect match. The WikiFlag feature is processed as all the other features,
concurring to the computation of the keyphraseness and, therefore, in uencing
the ranking of the extracted KPs. The rationale of this choice is that a KP is
important insofar it represents a meaningful concept or entity, rather than a
random combination of words, and matching a whole phrase against collaborative
human-made knowledge sources (as the EKS are) guarantees that it makes
better sense, providing a strong form of human/social validation. This also reduces
the tendency of the system to return typos, document parsing errors, and other
meaningless strings as false positives.</p>
      <p>Another improvement over the original DIKPE approach is represented by
the fact that, instead of suggesting the top n KPs extracted, the new system
evaluates the decreasing trend of Keyphraseness among ordered KPs, it detects
the rst signi cant downfall in the keyphraseness value, and it suggests all the
KPs occurring before that (dynamic) threshold. By doing so, the system suggests
a variable number of high-scored KPs, while the previous version suggests a xed
number of KPs, that could have been either too small or too large for the given
text.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Phrase inference</title>
      <p>The KP Inference Module (KPIM), as well as the knowledge-based WikiFlag
feature described in the previous section, rely on a set of external knowledge
sources that are accessed via web. We assume that (i) there is a way to match
extracted KPs with entities described in EKSs (e.g.: querying the exploited
service using the KP as search key) and (ii) each one of the EKSs considered is
organized according to some kind of hierarchy, as shown in (Figure 2), even if
very weak and loosely structured, in which is possible to associate to any entity
a set of parent entities and another set made of related entities. Such sets may
be void, since we do not assume each entity being linked to at least another one,
nor the existence of a root entity that is ancestor to all the other entities in the
ontology.</p>
      <p>Even if such structure is loose, assuming its existence is not trivial at all, but
an increasing number of collaborative resources allow users to classify and link
together knowledge items, generating a pseudo-ontology. Clear examples of this
tendency are Wikipedia, where almost any article contains links to other articles
and many articles are grouped into categories, and Wordnik, an online
collaborative dictionary where any word has sets of hypernyms, synonyms, hyponyms
and related words associated. Recently also several entertainment sites, like
Urban Dictionary, have begun to provide these possibilities, making them eligible
knowledge sources for our approach. Knowledge sources may be either
generalist (like Wikipedia), or speci c (like the many domain-speci c wikis hosted on
wikia.com) and several di erent EKS can be exploited at the same time in order
to provide better results.</p>
      <p>In the case of Wikipedia, parent entities are given by the categories, that are
thematic groups of articles (i.e.: \Software Engineering" belongs to the
\Engineering Disciplines" category). An entry may belong to several categories, for
example the entry on \The Who" belongs to the \musical quartets" category
as well as to the \English hard rock musical groups" one and the \Musical
groups established in 1964" one. Related entities, instead, can be deduced by
links contained in the entry associated to the given entity: such links can be
very numerous and heterogeneous, but the most closely related ones are often
grouped into one or more templates, that are the thematic collections of
internal Wikipedia links usually displayed on the bottom of the page, as shown in
Figure 3. For instance, in a page dedicated to a lm director, it is very likely to
nd a template containing links to the all movies he directed or the actors he
worked with.</p>
      <p>Wordnik, instead, provides hierarchical information explicitly by associating
to any entity lists of hypernyms (parent entities) and synonyms (related entities).</p>
      <p>The inference algorithm considers the topmost half of the extracted KPs,
that typically is still a signi cantly larger set than the one suggested, and, for
each KP that can be associated to an entity, retrieves from each EKS a set of
parent entities and a set of related entities. If a KP corresponds to more than one
entity on one or more EKSs, all of the retrieved entities are taken into account.
The sets associated to single KPs are then merged into a table of related entities
and a table of parent entities for the whole text. Each retrieved entity is scored
accordingly to the sum of the Keyphraseness value of the KPs from which it has
been derived and then it is sorted by descending score. The top entries of such
tables are suggested as meaningful KPs for the input document.</p>
      <p>By doing so, we select only entities which are related or parent to a signi
cant number of hi-scored KPs, addressing the problem of polysemy among the
extracted KP. For instance, suppose we extracted \Queen" and \Joy Division"
from the same text (Figure 4): they both are polysemic phrases since the rst
may refer to the English band as well as to a regent and the latter to the English
band or to Nazi concentration camps. However, since they appear together, and
they are both part of the \musical quartets" category in Wikipedia, we it can
be deduced that the text is about music rather than politics or World War II.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Evaluation</title>
      <p>Formative tests were performed in order to test the accuracy of the inferred KPs
and their ability to add meaningful information to the set of extracted KPs,
regardless of the domain covered by the input text. Three data sets, dealing
with di erent topics, were processed, article by article, with the same feature
weights and exploiting Wikipedia and Wordnik as External Knowledge Source.
For each article a list of extracted KPs and one of inferred KPs were generated,
then the occurrences of each KP were counted, in order to evaluate which portion
of the data set is covered by each KP. We call set coverage the fraction of the
data set labelled with a single KP. Since the topics covered in the texts included
in each data set are known a-priori, we expect the system to generate KPs that
associate the majority of the texts in the data set to their speci c domain topic.</p>
      <p>The rst data set contained 113 programming tutorials, spanning from brief
introductions published on blogs and forums to extensive articles taken from
books and journals, covering both practical and theoretical aspects of
programming. A total of 776 KPs were extracted and 297 were inferred. In Table 1
are reported the most frequently extracted and inferred KPs. As expected,
extracted KPs are highly speci c and tend to characterize a few documents in the
set (the most frequent KP covers just the 13% of the data set), while inferred
ones provide an higher level of abstraction, resulting in an higher coverage over
the considered data set. However some Inferred KPs are not accurate, such as
\ Botanical nomenclature \ that clearly derive from the presence of terms such
as \tree", \branch", \leaf", and \forest" that are frequently used in Computer
Science, and \Aristotele" which comes from the frequent references to Logic,
which Wikipedia frequently associates with the Greek philosopher.
The second data set contained 159 car reviews taken from American and British
magazines written by professional journalists. Unlike the previous data set, in
which all the texts share a very speci c language and provide technical
information, in this set di erent writing stiles and di erent kinds of target audiences are
present. Some of the reviews are very speci c, focusing on technical details, while
others are more aimed at entertaining rather than informing. Most of the
considered texts, however, stand at some point between these two ends, providing a
good deal of technical information together with an accessible and entertaining
style.</p>
      <p>In Table 2 the most frequently extracted and inferred KPs are reported. While
extracted KPs clearly identify the automotive domain, inferred ones don't, with
only the 44% of the considered texts being covered by the \Automobile" KP and
the 64% being labelled with \English-language lms". However this is mostly
due to the fact that several reviews tend to stress a car's presence in popular
movies (eg: Aston Martin in the 007 franchise or any given Japanese car in the
Fast and Furious franchise) and only 18 out of 327 (5.5%) di erent inferred
KPs deal with cinema and television. KP such as \Unites States" and \United
Kingdom" are also frequently inferred due to the fact that the reviewed cars
are mostly designed for USA and UK markets, have been tested in such
countries, and several manufacturers are based in those countries. As a side note,
98% of the considered text are correctly associated with the manufacturer of the
reviewed car. The third data set contained reviews of 211 heavy metal albums
published in 2013. Reviews were written by various authors, both professionals
and non-professionals, and combine a wide spectrum of writing styles, from
utterly speci c, almost scienti c, to highly sarcastic, with many puns and popular
culture references.</p>
      <p>Extracted Keyphrase Set coverage Inferred Keyphrase
metal 0,23 Music genre
album 0,21 Record label
death metal 0,17 Record producer
black metal 0,17 United States
band 0,16 Studio album
bands 0,08 United Kingdom
death 0,08 Bass guitar
old school 0,07 Single (music)
sound 0,06 Internet Movie Database
albums 0,05 Heavy metal music
power metal 0,05 Allmusic</p>
      <p>In Table 3 are reported the most frequently extracted and inferred KPs. All
the documents in the set were associated with the Inferred KP \Music Genre"
and the 97% of them with \Record Label", which clearly associates the texts
with the music domain. Evaluation and development, however, are still ongoing
and new knowledge sources, such as domain-speci c wikis and Urban Dictionary,
are being considered.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions</title>
      <p>In this paper we proposed a truly domain independent approach to both KP
extraction and inference, able to generate signi cant semantic metadata with
di erent layers of abstraction for any given text without need for training. The
KP extraction part of the system provides a very ne granularity, producing
KPs that may not be found in a controlled dictionary (such as Wikipedia),
but characterize the text. Such KPs are extremely valuable for the purpose of
summarization and provide great accuracy when used as search keys. However,
they are not widely shared, meaning, from an information retrieval point of view,
a very low recall. On the other hand, the KP inference part generates only KPs
taken from a controlled dictionary (the union of the considered EKS) that are
more likely to be general and, therefore, shared among a signi cant number of
texts.</p>
      <p>
        As shown in the previous section, our approach can annotate a set of
documents with good precision, however, a few unrelated KPs may be inferred,
mostly due to ambiguities of the text and to the generalist nature of the
exploited Knowledge Sources. This unrelated terms, fortunately, tend to appear
in a limited number of cases and to be clearly unrelated not only to the
majority of the generated KPs, but to also each other. In fact, our next step in
this research will be precisely to identify such false positives by means of an
estimate of the Semantic Relatedness [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] between terms in order to identify,
for each generated KP, a list of related concepts and detect concept clusters in
the document.
      </p>
      <p>
        The proposed KP generation technique can be applied both in the
Information Retrieval domain and in the Adaptive Personalization one. The previous
version of the DIKPE system has already been integrated with good results in
RES [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], a personalized content-based recommender system for scienti c papers
that suggests papers accordingly to their similarity with one or more documents
marked as interesting by the user, and in the PIRATES framework [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] for tag
recommendation and automatic document annotation. We expect this extended
version of the system to provide an even more accurate and complete KP
generation and, therefore, to improve the performance of these existing systems, in
this way supporting the creation of new Semantic Web Intelligence tools.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Barker</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cornacchia</surname>
          </string-name>
          , N.:
          <article-title>Using noun phrase heads to extract document keyphrases</article-title>
          .
          <source>In: Advances in Arti cial Intelligence</source>
          , pp.
          <volume>40</volume>
          {
          <fpage>52</fpage>
          . Springer (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bracewell</surname>
            ,
            <given-names>D.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuriowa</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Multilingual single document keyword extraction for information retrieval</article-title>
          .
          <source>In: Natural Language Processing and Knowledge Engineering</source>
          ,
          <year>2005</year>
          .
          <source>IEEE NLP-KE'05. Proceedings of 2005 IEEE International Conference on</source>
          . pp.
          <volume>517</volume>
          {
          <fpage>522</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Danilevsky</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Desai</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>J</given-names>
            ., Han, J
          </string-name>
          .: Kert:
          <article-title>Automatic extraction and ranking of topical keyphrases from content-representative document titles</article-title>
          .
          <source>arXiv preprint arXiv:1306.0271</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>DAvanzo</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Magnini</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vallin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Keyphrase extraction for summarization purposes: The lake system at duc-2004</article-title>
          .
          <source>In: Proceedings of the 2004 document understanding conference</source>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>De Nart</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferrara</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tasso</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Personalized access to scienti c publications: from recommendation to explanation</article-title>
          . In: User Modeling, Adaptation, and Personalization, pp.
          <volume>296</volume>
          {
          <fpage>301</fpage>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Dumais</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Platt</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heckerman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sahami</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Inductive learning algorithms and representations for text categorization</article-title>
          .
          <source>In: Proceedings of the seventh international conference on Information and knowledge management</source>
          . pp.
          <volume>148</volume>
          {
          <fpage>155</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ferrara</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tasso</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Integrating semantic relatedness in a collaborative ltering system</article-title>
          .
          <source>In: Mensch &amp; Computer Workshopband</source>
          . pp.
          <volume>75</volume>
          {
          <issue>82</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ferrara</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tasso</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Extracting keyphrases from web pages</article-title>
          .
          <source>In: Digital Libraries and Archives</source>
          , pp.
          <volume>93</volume>
          {
          <fpage>104</fpage>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Litvak</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Last</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Graph-based keyword extraction for single-document summarization</article-title>
          .
          <source>In: Proceedings of the workshop on multi-source multilingual information extraction and summarization</source>
          . pp.
          <volume>17</volume>
          {
          <fpage>24</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Marujo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gershman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Carbonell, J.,
          <string-name>
            <surname>Frederking</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neto</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          :
          <article-title>Supervised topical key phrase extraction of news stories using crowdsourcing, light ltering and co-reference normalization</article-title>
          .
          <source>arXiv preprint arXiv:1306.4886</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Medelyan</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Witten</surname>
            ,
            <given-names>I.H.</given-names>
          </string-name>
          :
          <article-title>Thesaurus based automatic keyphrase indexing</article-title>
          .
          <source>In: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries</source>
          . pp.
          <volume>296</volume>
          {
          <fpage>297</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Mihalcea</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tarau</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Textrank:
          <article-title>Bringing order into texts</article-title>
          .
          <source>In: Proceedings of EMNLP</source>
          . vol.
          <volume>4</volume>
          .
          <string-name>
            <surname>Barcelona</surname>
          </string-name>
          ,
          <string-name>
            <surname>Spain</surname>
          </string-name>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Pouliquen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steinberger</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ignat</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Automatic annotation of multilingual text collections with a conceptual thesaurus</article-title>
          .
          <source>arXiv preprint cs/0609059</source>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Pudota</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dattolo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baruzzo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferrara</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tasso</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Automatic keyphrase extraction and ontology mining for content-based tag recommendation</article-title>
          .
          <source>International Journal of Intelligent Systems</source>
          <volume>25</volume>
          (
          <issue>12</issue>
          ),
          <volume>1158</volume>
          {
          <fpage>1186</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Sarkar</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>A hybrid approach to extract keyphrases from medical documents</article-title>
          .
          <source>arXiv preprint arXiv:1303.1441</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Silverstein</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marais</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Henzinger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moricz</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Analysis of a very large web search engine query log</article-title>
          .
          <source>In: ACm SIGIR Forum</source>
          . vol.
          <volume>33</volume>
          , pp.
          <volume>6</volume>
          {
          <fpage>12</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Strube</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ponzetto</surname>
            ,
            <given-names>S.P.</given-names>
          </string-name>
          :
          <article-title>Wikirelate! computing semantic relatedness using wikipedia</article-title>
          .
          <source>In: AAAI</source>
          . vol.
          <volume>6</volume>
          , pp.
          <volume>1419</volume>
          {
          <issue>1424</issue>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Turney</surname>
          </string-name>
          , P.D.:
          <article-title>Learning to extract keyphrases from text</article-title>
          .
          <source>national research council. Institute for Information Technology, Technical Report ERB-1057</source>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Turney</surname>
          </string-name>
          , P.D.:
          <article-title>Learning algorithms for keyphrase extraction</article-title>
          .
          <source>Information Retrieval</source>
          <volume>2</volume>
          (
          <issue>4</issue>
          ),
          <volume>303</volume>
          {
          <fpage>336</fpage>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Witten</surname>
            ,
            <given-names>I.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paynter</surname>
            ,
            <given-names>G.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutwin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nevill-Manning</surname>
            ,
            <given-names>C.G.</given-names>
          </string-name>
          :
          <article-title>Kea: Practical automatic keyphrase extraction</article-title>
          .
          <source>In: Proceedings of the fourth ACM conference on Digital libraries</source>
          . pp.
          <volume>254</volume>
          {
          <fpage>255</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>