<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic Entity Detection Based on News Cluster Structure</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aleksey Alekseev</string-name>
          <email>a.a.alekseevv@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Natalia Loukachevitch</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Research Computing Center of Lomonosov Moscow State University</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we consider a method for extraction of alternative names of a concept or a named entity mentioned in a news cluster. The method is based on the structural organization of news clusters and exploits comparison of various contexts of words. The word contexts are used as basis for multiword expression extraction and main entity detection. At the end of cluster processing we obtain groups of near-synonyms, in which the main synonym of a group is determined.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>Entity Detection</kwd>
        <kwd>Lexical Cohesion</kwd>
        <kwd>News Clusters</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>An important step in news processing is thematic clustering of news articles
describing the same event. Such news clusters are the basic units of information presentation
in news services.</p>
      <p>After a news cluster is formed, it undergoes various kinds of automatic processing:
─ Duplicates are removed from the cluster. Duplicate is a message that almost
completely repeats the content of an initial document,
─ A cluster is categorized to a thematic category,
─ A summary of a cluster is created, usually containing the sentences from different
documents of the cluster (multi-document summary) etc.</p>
      <p>The formation of a cluster can represent a serious problem. It is especially difficult
to form clusters correctly for complex hierarchical events having some duration in
time and distributed geographic location (world championships, elections) [1], [2].</p>
      <p>A part of news cluster forming and processing problems is due to the fact that in
cluster documents the same concepts or entities may be named differently. Lexical
chain approaches could partly overcome this problem using thesaurus information [3],
[4]. However in a pre-created resource, it is impossible to fix all variants for entity
naming in various clusters. For example, the U.S. air base in Kyrgyzstan may be
called in documents of the same news cluster as Manas base, Manas airbase, Manas,
base at Manas International Airport, U.S. base, U.S. air base and etc.</p>
      <p>The problem of alternative names for named entities is partly solved by
coreference resolution techniques (Russian President Dmitry Medvedev, President
Medvedev, Dmitry Medvedev) [5], [6]. In Entity Detection and Tracking Evaluations,
mainly such entities as organizations, persons and locations are detected and provided
with coreferential relations [7]. But main entities of a cluster can be events such as air
base closure and air base withdrawal. Besides, the variability of entity names in news
clusters refers not only to concrete entities but also to concepts, which can also be
main discussed entities such as ecology or economic problems.</p>
      <p>News clusters as sources of various paraphrases are studied in several works. In [8]
the authors describe the procedure of corpus construction for paraphrase extraction in
the terrorist domain. The study in [9] is devoted to creation of a corpus of similar
sentences from news clusters as a source for further paraphrase analysis. These
studies are aimed to obtain general knowledge about a domain or linguistic means of
paraphrasing, but it is also important to extract near-synonyms or coreferential
expressions of various types from a news cluster and to use them to improve the
processing of the same news cluster or a corresponding theme.</p>
      <p>In this paper we consider a method for extraction of main entities from a news
cluster including named entities, activities and concepts. The method is based on the
structural organization of news clusters and exploits comparison of various contexts
of words. The word contexts are used as a basis for multiword expression extraction
and main entities detection. At the end of cluster processing we obtain main entities of
a news cluster and their mention expressions presented as a group of near-synonyms,
in which the main synonym of a group is determined. Such synonym groups include
both single words and multiword expressions. In this paper we study only simple
features generated from a news cluster without attraction of additional semantic and
other types of information as a basic line for future research. The experiments were
carried out for Russian news flows.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Principles of Cluster Processing</title>
      <p>Processing of cluster texts is based on the structure of coherent texts, which have such
properties as the topical structure and cohesion.</p>
      <p>Van Dijk [10] describes the topical structure of a text, the macrostructure, as a
hierarchical structure in a sense that the theme of a whole text can be identified and
summed up to a single proposition. The theme of the whole text can usually be
described in terms of less general themes, which in turn can be characterized in terms of
even more specific themes. Every sentence of a text corresponds to a subtheme of the
text.</p>
      <p>The macrostructure of a connected text defines its global coherence: “Without such
a global coherence, there would be no overall control upon the local connections and
continuations” [10]. Sentences must be connected appropriately according to the
given local coherence criteria, but the sequence would go simply astray without some
constraint on what it should be about globally.</p>
      <p>Cohesion, that is surface connectivity between text sentences, is often expressed
through anaphoric references (i.e. pronouns) or by means of lexical or semantic
repetitions. Lexical cohesion is modeled on the basis of lexical chains [11].</p>
      <p>The proposition of the main theme, that is an interaction between theme
participants, should be represented in specific text sentences, which should refine and
elaborate the main theme. This means that if a text is devoted to description of relations
between thematic elements C1…Cn, then references to these participants should be
met in different roles to the same verb in text sentences.</p>
      <p>Thus if even very semantically close entities C1 and C2 often co-occur in the same
sentences of a text, it means that the text is devoted to consideration of relations
between these entities and they represent different elements of the text theme [12], [13].
At the same time, if two lexical expressions С1 and С2 are rarely met in the same
sentences but occur very frequently in neighbor sentences then we can suppose that they
are elements of lexical cohesion, and there is a semantic relation between them.</p>
      <p>A news cluster is not a coherent text but cluster documents are devoted to the same
theme. Therefore statistical features of the topical structure are considerably enhanced
in a thematic cluster, and on such a basis we try to extract unknown information from
a cluster.</p>
      <p>To check our idea that near-synonyms can be more often met in neighbour
sentences than in the same sentences we have carried out the following experiment.
More than 20 large news clusters have been matched with terms of Sociopolitical
thesaurus [14] and thesaurus-based potential near-synonyms have been detected. Such
types of near-synonyms include (these examples are translations from Russian, in
Russian the ambiguity of expressions is absent):
─ nouns – thesaurus synonyms (Kyrgyzstan – Kirghizia),
─ adjective – noun derivates (Kyrgyzstan – Kyrgyz),
─ hypernym and hyponym nouns(deputy – representative),
─ hypernym–hyponym noun - adjective (national – Russia),
─ part-whole relations between nouns (parliament – parliamentarian),
─ part-whole relations for adjective and noun (American – Washington),</p>
      <p>For each cluster we considered all these pairs of expressions with a frequency
filter: the frequencies of the expressions in a cluster should be more than a quarter of the
number of documents in the corresponding cluster. For these pairs we computed the
ratio between their co-occurrence in the same sentence clauses Fsegm and in neighbour
sentences Fsent. Table 1 shows the results of our experiment.
From the table we can see that the most closely-related expressions (synonyms,
derivates) are much more frequent in neighbour sentences than in the same clauses of
the same sentences. Further, the more the distance in a sense between expressions is
the more the ratio Fsegm/Fsent is until stabilization near the value equal 1.5.</p>
      <p>We can also see that noun-noun and noun-adjective pairs have different values of
the ratio. We suppose that in many cases adjectives are elements of noun groups,
which can play own roles in a news cluster. Therefore the first step in detection of
main entities should be extraction of multiword expressions denoting main entities of
the cluster.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Stages of Cluster Processing</title>
      <p>Cluster processing consists of three main stages. At the first stage noun and adjective
contexts are accumulated. The second stage is devoted to multiword expression
recognition. At the third stage the search of near-synonyms is performed.</p>
      <p>In next sections we consider processing stages in more detail. As an example we
use the news cluster, which is devoted to Kyrgyzstan and the United States agreement
denunciation on U.S. air base located at the Manas International Airport (19.02.2009).
This news cluster contains 195 news documents and is assembled on the basis of the
algorithm described in [1].
3.1</p>
      <sec id="sec-3-1">
        <title>Extraction of Word Contexts</title>
        <p>Sentences are divided into segments between punctuation marks. Contexts of word W
include nouns and adjectives situated in the same sentence segments as W. The
following types of contexts are extracted:
─ Neighboring words: neighboring adjectives or nouns situated directly to the right
or left from W (Near),
─ Across verb words: adjectives and nouns occurring in sentence segments with a
verb, and the verb is located between W and these adjectives or nouns
(AcrossVerb),
─ Not near words: adjectives and nouns that are not separated with a verb from W
and are not direct neighbors to W (NotNear).</p>
        <p>In addition, adjective and noun words that co-occur in neighboring sentences are
memorized (Ns). For this context extraction only sentence fragments from the
beginning up to a segment with a verb are taken into consideration. It allows us to extract
the most significant words from neighboring sentences.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Extraction of Multiword Expressions</title>
        <p>
          We consider recognition of multiword expressions as a necessary step before
nearsynonym extraction. An important basis for multiword expression recognition is the
frequency of word sequences [15]. However, a news cluster is a structure where
various word sequences are repeated a lot of times. We supposed that the main criterion
for multiword expression extraction from clusters is the significant excess in
cooccurrence frequency of neighbor words in comparison with their separate occurrence
frequency in segments of sentences (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ):
        </p>
        <sec id="sec-3-2-1">
          <title>Near &gt; 2 * (AcrossVerb + NotNear)</title>
          <p>
            (
            <xref ref-type="bibr" rid="ref1">1</xref>
            )
In addition, the restrictions on frequencies of potential component words are imposed.
          </p>
          <p>Search for candidate pairs is performed in order of the value “Near - (AcrossVerb
+ NotNear)“ reducing. If a suitable pair has been found, its component words are
joined together into a single object and all contextual relationships are recalculated.
The procedure starts again and repeats until at least one join is performed.</p>
          <p>As a result, such expressions as Parliament of Kyrgyzstan, the U.S. military,
denunciation of agreement with the U.S., Kyrgyz President Kurmanbek Bakiyev were
extracted from the example cluster.
3.3</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>Detection of Near-Synonyms</title>
        <p>At the third stage, search for near-synonyms is produced. For assuming a semantic
relationship between expressions U1 and U2, the following factors are exploited:
─ U1 and U2 have formal resemblance (for example, words with the same beginning),
─ U1 and U2 co-occur more often in neighboring sentences than within segments of
the same sentences; here we use results of the experiment described in section 2;
─ U1 and U2 have similar contexts based on Near, AcrossVerb, NotNear and Ns
features, which are determined by calculating scalar products of corresponding
vectors (NearScalProd, AVerbScalProd, NotNearScalProd, NsentScalProd),
─ U1 and U2 should be enough frequent in a cluster to present main entities.
Note that if the comparison of word contexts is a well known procedure for synonym
detection and taxonomy construction [16], but the generation of contexts from
neighboring sentences has not been described in the literature.</p>
        <p>Near-synonyms detection consists of several steps. A different set of criteria is
applied at each step. The lookup is performed in order of frequency decreasing: for
every expression U1, all expressions U2 having a lower frequency than U1, are
considered. If all conditions are satisfied, then less frequent expression U2 is postulated as a
synonym of U1 expression, all U2 contexts are transferred to U1 contexts, the
expressions U1 and U2 become joined together. As a result the sets of near-synonyms
(synonym groups) are produced, i.e. linguistic expressions that are equivalent with respect
to the content of the cluster.</p>
        <p>We assume that U1 and U2 expressions, when they are enclosed in such a synonym
group, are closely related in sense, or their referents in current cluster are closely
related to each other, so that U2 does not represent separate thematic significance with
respect to U1. For example, such words as parliament and parliamentarian have a
close semantic relationship between them in general context, but they are not
synonyms. But within a particular cluster, e.g., in which decision-making process in a
parliament is discussed, these words may be classified as near-synonyms.</p>
        <p>At the first step (3.1) semantic similarity between expressions consisting of similar
words is sought, e.g. Kyrgyzstan - Kyrgyz, Parliament of Kyrgyzstan - Kyrgyz
Parliament. We used simple similarity measure – the same beginning of words.</p>
        <p>
          To connect words with the same beginning in synonym groups, the following
conditions are required: the co-occurrence frequency in neighboring sentences is
significantly higher than co-occurrence frequency in the same sentences (
          <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
          ) (see section
2); both expressions should have sufficient frequencies in the cluster. The procedure
is iterative:
        </p>
        <sec id="sec-3-3-1">
          <title>Ns &gt; 2 * (AcrossVerb + Near + NotNear) Ns &gt; 1</title>
        </sec>
        <sec id="sec-3-3-2">
          <title>NearScalProd &gt; 0.1</title>
          <p>
            At the third step (3.3) we are looking for semantic similarity between the expressions
with equal length and including at least one the same word, for example, Manas Base
If expressions are rarely located in neighboring sentences (Ns &lt; 2), then the scalar
product similarity of contexts is required:
(
            <xref ref-type="bibr" rid="ref2">2</xref>
            )
(
            <xref ref-type="bibr" rid="ref3">3</xref>
            )
(
            <xref ref-type="bibr" rid="ref5">5</xref>
            )
NearScalProd + NotNearScalProd + AVerbScalProd + NSentScalProd &gt; 0.4
(
            <xref ref-type="bibr" rid="ref4">4</xref>
            )
At the second step (3.2) semantic similarity between expressions, one of which is
included into another, is sought, for instance, Parliament - Parliament of Kyrgyzstan,
airbase - Manas airbase. The meaning of this step lies in the fact that a cluster might
not mention any other parliaments, except of the Kyrgyz Parliament, i.e. in both cases
the same object is mentioned. Similarity of neighbor contexts is required here:
- Manas Airbase, the U.S. military - the U.S. side. High frequency of co-occurrence in
neighboring sentences is required (
            <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
            ):
          </p>
        </sec>
        <sec id="sec-3-3-3">
          <title>NS &gt; 2 * (AcrossVerb + Near + NotNear)</title>
          <p>NS &gt; 1</p>
          <p>
            Finally, at last step (3.4) semantic similarity between arbitrary linguistic
expressions, mentioned in cluster documents, is searched, e.g. USA - American, Kyrgyzstan
Bishkek. An assumption on semantic similarity between arbitrary expressions requires
the maximum number of conditions: high frequency of co-occurrence in neighboring
sentences (
            <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
            ); restrictions on occurrence frequencies of candidates, context
similarity:
          </p>
        </sec>
        <sec id="sec-3-3-4">
          <title>NS &gt; 2 * (AcrossVerb + Near + NotNear)</title>
        </sec>
        <sec id="sec-3-3-5">
          <title>NS &gt; 0.1 * MaxAcrossVerb</title>
          <p>The following synonym groups were automatically assembled for the example cluster
as a result of described stages (the main synonym of a group, which was
automatically determined, is highlighted with bold font):
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation of Method</title>
      <p>To test the introduced method we took 10 news clusters on various topics with more
than 40 documents in each cluster.</p>
      <p>
        Two measures of quality were tested for multiword expression extraction. Firstly,
we evaluated the percentage of syntactically correct groups among all extracted
expressions. Secondly, we have attracted a professional linguist and asked her to select
the most significant multiword expressions (
        <xref ref-type="bibr" rid="ref10 ref5 ref6 ref7 ref8 ref9">5-10</xref>
        ) for each cluster, and to arrange
them in descending order of importance.
      </p>
      <p>
        So for the example cluster, the following expressions were considered significant
by the linguist:
─ Manas Airbase
─ Parliament of Kyrgyzstan
─ Manas base
─ Kyrgyz Parliament
─ Denunciation of agreement
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
(
        <xref ref-type="bibr" rid="ref8">8</xref>
        )
(
        <xref ref-type="bibr" rid="ref9">9</xref>
        )
─ Government's decision
Note that such an evaluation task differs from evaluation of automatic keyword
extraction from texts [17], when experts are asked to identify the most important
thematic words and phrases of a text. In our case we tested exactly multiword expression
extraction. In addition, a list created by the linguist could contain semantic repetitions
(Parliament of Kyrgyzstan - Kyrgyz Parliament).
      </p>
      <p>364 multiword expressions were automatically extracted from test clusters, 312
(87.9%) of which were correct syntactic groups. With account of phrase frequencies,
correct syntactic expressions achieved 91.4% precision. The linguist chose 70 most
important multiword expressions for clusters and 72.6% of them were automatically
extracted by the system.</p>
      <p>We tested extracted synonym groups by evaluating semantic relatedness of every
synonym in a group to its main synonym. Every occurrence of supposed synonyms
was tested. If more than a half of all occurrences of such a synonym in a cluster were
related to the main synonym in the group, the synonymic relation was considered as
correct.</p>
      <p>Table 2 contains information about the quality of generated synonym groups
calculated in number of expressions and in their frequencies.
To assess the contribution of co-occurrence in neighboring sentences, we conducted
detailed testing of the same beginning expression joining (step 3.1) for the example
cluster (Table 3). Table 3 shows that Ns factor adding, as it is done in step 3.1,
improves precision and recall of near-synonym recognition. The proposed method has
not the absolutely best F-measure value, but the precision less than 80% is
inadmissible for the near-synonym detection task. Therefore, the BasicLine should not be
considered as the best approach.
In this paper we have described two experiments on news clusters: multiword
expression extraction and detection of near-synonyms presenting the same main entity of a
news cluster. In addition to known methods of context comparison, we exploited
cooccurrence frequency in neighboring sentences for near-synonym detection. We
conducted the testing procedure for the introduced method.</p>
      <p>In future we are going to use extracted near-synonyms in such operations as cluster
boundaries correction, automatic summarization, novelty detection, formation of
subclusters and etc. We also intend to study methods of combination automatically
extracted near-synonyms, methods of coreference resolution and thesaurus relations.
6</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Dobrov</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pavlov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Basic line for news clusterization methods evaluation</article-title>
          .
          <source>In: Proceedings of the 5-th Russian Conference RCDL-2010</source>
          (
          <year>2010</year>
          )
          <article-title>(in Russian)</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Allan</surname>
          </string-name>
          , J.:
          <article-title>Introduction to Topic Detection and Tracking</article-title>
          .
          <source>In: Topic detection and tracking</source>
          , Kluwer Academic Publishers Norwell, MA, USA,. pp.
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kit</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Webster</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A Query-Focused Multi-Document Summarizer Based on Lexical Chains</article-title>
          .
          <source>In: Proceedings of the Document Understanding Conference DUC2007</source>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dobrov</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loukachevitch</surname>
          </string-name>
          , N.:
          <source>Summarization of News Clusters Based on Thematic Representation. In: Computational Linguistics and Intelligent Technologies: Proceedings of the International Conference Dialog`2009</source>
          , pp.
          <fpage>299</fpage>
          -
          <lpage>305</lpage>
          (
          <year>2009</year>
          )
          <article-title>(In Russian)</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Duame</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marcu</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>A large Scale Exploration of Global Features for a Joint Entity Detection and Tracking Model</article-title>
          .
          <source>In: Proceedings of Human Language Conference and Conference on Empirical Methods in Natural Language Processing</source>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>104</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ng</surname>
          </string-name>
          , V.:
          <article-title>Machine learning for coreference resolution: from local classification to global ranking</article-title>
          .
          <source>In: Proceedings of ACL-2005</source>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Doddington</surname>
            , G., Mitchell,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Przybocki</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramshaw</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strassel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weishedel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>The Automatic Content Extraction (ACE): Task, Data, Evaluation</article-title>
          .
          <source>In: Proceedings of Fourth International Conference on Language Resources and Evaluation</source>
          ,
          <string-name>
            <surname>LREC</surname>
          </string-name>
          <year>2004</year>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Barzilay</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Learning to Paraphrase: an Unsupervised Approach Using Multiple Sequence Alignment</article-title>
          .
          <source>In: Proceedings of HLT/NACCL-2003</source>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Dolan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quirk</surname>
          </string-name>
          , Ch.,
          <string-name>
            <surname>Brockett</surname>
          </string-name>
          , Ch.:
          <article-title>Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources</article-title>
          .
          <source>In: Proceedings of COLING-2004</source>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Dijk</surname>
          </string-name>
          , van T.:
          <article-title>Semantic Discourse Analysis</article-title>
          .
          <source>In: Teun A. van Dijk</source>
          , (Ed.),
          <source>Handbook of Discourse Analysis</source>
          , vol.
          <volume>2</volume>
          ., pp.
          <fpage>103</fpage>
          -
          <lpage>136</lpage>
          , London: Academic Press (
          <year>1985</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Hirst</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>St-Onge</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Lexical Chains as representation of context for the detection and correction malapropisms</article-title>
          . In: WordNet:
          <article-title>An electronic lexical database and some of its applications / C. Fellbaum, editor</article-title>
          . Cambrige, MA: The MIT Press (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Hasan</surname>
          </string-name>
          , R.:
          <article-title>Coherence and Cohesive harmony</article-title>
          . J.
          <string-name>
            <surname>Flood</surname>
          </string-name>
          , Understanding reading comprehension, Newark, DE: IRA, pp.
          <fpage>181</fpage>
          -
          <lpage>219</lpage>
          (
          <year>1984</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Loukachevitch</surname>
          </string-name>
          , N.:
          <article-title>Multigraph representation for lexical chaining</article-title>
          .
          <source>In: Proceedings of SENSE workshop</source>
          , pp.
          <fpage>67</fpage>
          -
          <lpage>76</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Loukachevitch</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dobrov</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Evaluation of Thesaurus on Sociopolitical Life as Information Retrieval Tool</article-title>
          . In: M.Gonzalez
          <string-name>
            <surname>Rodriguez</surname>
          </string-name>
          , C. Paz Suarez Araujo (Eds.),
          <source>Proceedings of Third International Conference on Language Resources and Evaluation (LREC2002)</source>
          , Vol.
          <volume>1</volume>
          , pp.
          <fpage>115</fpage>
          -
          <lpage>121</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Witten</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paynter</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutwin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Newill-Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>KEA: practical automatic keyphrase extraction</article-title>
          .
          <source>In: Proceedings of the fourth ACM conference on Digital Libraries</source>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Callan</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A metric-based framework for automatic taxonomy induction</article-title>
          .
          <source>In: Proceedings of ACL-2009</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. Su Nam Kim,
          <string-name>
            <surname>Medelyan</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Min-Yen</surname>
            <given-names>Kan</given-names>
          </string-name>
          , Baldwin,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>Automatic Keyphrase Extraction from Scientific Articles</article-title>
          .
          <source>In: Proceedings of the 5-th International Workshop on Semantic Evaluation, ACL -2010</source>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>26</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>