<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>INEX Tweet Contextualization Track at CLEF 2012: Query Reformulation using Terminological Patterns and Automatic Summarization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jorge Vivaldi</string-name>
          <email>jorge.vivaldi@upf.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iria da Cunha</string-name>
          <email>iria.dacunha@upf.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universitat Pompeu Fabra Institut Universitari de Lingu stica Aplicada Barcelona</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The tweet contextualization INEX task at CLEF 2012 consists of the developing of a system that, given a tweet, can provide some context about the subject of the tweet, in order to help the reader to understand it. This context should take the form of a readable summary, not exceeding 500 words, composed of passages from a provided Wikipedia corpus. Our general approach to get this objective is the following: we perform some automatic reformulations of the initial tweets provided for the task (obtaining a list of terms related with the main topic of all them using terminological patterns). Then, using these reformulated tweets, we obtain related documents with the search engine Indri. Finally, we use REG, an automatic extractive summarization system based on graphs, to summarize these documents and provide the summary associated to each tweet.</p>
      </abstract>
      <kwd-group>
        <kwd>INEX</kwd>
        <kwd>CLEF</kwd>
        <kwd>Tweets</kwd>
        <kwd>Terms</kwd>
        <kwd>Named Entities</kwd>
        <kwd>Wikipedia</kwd>
        <kwd>Automatic Summarization</kwd>
        <kwd>REG</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The tweet contextualization INEX (Initiative for the Evaluation of XML
Retrieval) task at CLEF 2012 (Conference and Labs of the Evaluation Forum)
consists of the developing of a system that, given a tweet, can provide some
context about the subject of the tweet, in order to help the reader to understand
it. This context should take the form of a readable summary, not exceeding
500 words, composed of passages from a provided Wikipedia corpus. Like in the
Question-Answering (QA) of INEX 2011, the task to be performed by the
participating groups is contextualizing tweets, that is answering questions of the form
\what is this tweet about?" using a recent cleaned dump of the Wikipedia. The
general process involves: tweet analysis, passage and/or XML elements retrieval
and construction of the answer. Relevant passages would be segments
containing relevant information and also containing as little non-relevant information
as possible (the result is speci c to the question).</p>
      <p>The test data are about 1000 tweets in English collected by the organizers
of the task from Twitter. They were selected among informative accounts (for
example, @CNN, @TennisTweets, @PeopleMag, @science...), in order to avoid
purely personal tweets that could not be contextualized. Information such as
the user name, tags or URLs is provided. The document collection for all the
participants, that is the corpus, has been rebuilt based on a dump of the English
Wikipedia from November 2011. Resulting documents are made of a title, an
abstract and sections with sub-titles.</p>
      <p>
        We consider that automatic extractive summarization systems could be useful
in this QA task, taking into account that a summary can be de ned as \a
condensed version of a source document having a recognizable genre and a very
speci c purpose: to give the reader an exact and concise idea of the contents of
the source" [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Summaries can be divided into \extracts", if they contain the
most important sentences extracted from the original text (ex. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]), and \abstracts", if these sentences are re-written or paraphrased, generating
a new text (ex. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]). Most of the current automatic summarization
systems are extractive.
      </p>
      <p>
        Our general approach is the following: we perform some automatic
reformulations of the initial queries provided for the task (obtaining a list of terms related
with the main topic of all the tweets using terminological patterns). Then, using
these reformulated queries, we obtain related documents with the search engine
Indri1. Finally, we use REG ([
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]), an automatic extractive summarization
system based on graphs, to summarize these documents and provide the nal
summary associated to each query.
      </p>
      <p>
        This approach is similar to the one used at QA@INEX track 2010 (see [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ])
and 2011 (see [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]), since the same summarization system is employed.
Nevertheless, in our past participations, the system was semi-automatic, while in this
work the system is totally automatic, from the reformulation of the queries
using terminological patterns, until the multi-document summarization of all the
retrieved documents.
      </p>
      <p>
        The evaluation of the participant systems involves two aspects:
informativeness and readability. Informativeness evaluation is automatic, using the
automatic evaluation system FRESA (FRamework for Evaluating Summaries
Automatically) ([
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]), and readability evaluation is carried out manually
(evaluating syntactic incoherence, unsolved anaphora, redundancy, etc.).
      </p>
      <p>Following this introduction, the paper is organized as follows. In Section 2,
the summarization system REG is shown. In Section 3, some information about
terminology and terminological patterns is given. In Section 4, the
methodol1 Indri is a search engine from the Lemur project, a cooperative work between the
University of Massachusetts and Carnegie Mellon University in order to build language
modelling information retrieval tools: http://www.lemurproject.org/indri/
ogy is explained. In Section 5, experimental settings and results are presented.
Finally, in Section 6, conclusions are exposed.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>State-of-the-art and Resources</title>
      <sec id="sec-2-1">
        <title>Term Extraction</title>
        <p>
          The notion of term that we have adopted in this work is based on the
\Communicative Theory of Terminology" [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]: a term is a lexical unit (single/multiple
word) that activates a specialized meaning in a thematically restricted domain.
Terms detection implies the distinction between domain-speci c terms and
general vocabulary. Its results are useful for any NLP task containing a domain
speci c component such as: ontology and (terminological) dictionary building,
text indexing, automatic translation and summarization systems, among others.
In spite of its large application eld, its reliable and practical recognition still
constitutes a bottleneck for many applications.
        </p>
        <p>
          As shown in [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] and [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] among others, there are several methods to
obtain the terms from a corpus. On the one hand, there are methods based
on linguistic knowledge, like Ecode [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. On the other hand, there are methods
based on single statistical measures, such as ANA [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] or a combination of them,
such as EXTERMINATOR [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. Some tools combine both linguistic knowledge
and statistically based methods, such as TermoStat [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], the algorithm shown in
[
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] or the bilingual extractors by [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] and [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]. However, none of these tools uses
any kind of semantic knowledge. Notable exceptions are Metamap [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ], Trucks
[
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] and YATE [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ], among others. Also Wikipedia must be considered, since it
is a very promising resource that is increasingly being used for both monolingual
([
          <xref ref-type="bibr" rid="ref32">32</xref>
          ], [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ]) and multilingual term extraction [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ].
        </p>
        <p>
          Most of the tools, in particular those including an important linguistic
component, takes into consideration the fact that terms usually follow a small number
of POS patterns. In [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ] it was shown that three patterns (noun, noun-adjective
and noun-preposition-noun) cover more that 90% of the entries found in medical
terminological dictionaries. Many of the above mentioned tools make some use
of this fact. Nevertheless, some researchers like in [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ] dynamically calculate the
list of patterns found in terminological resources.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Named Entities Extraction</title>
        <p>
          Named Entity Recognition (NER) may be de ned as the task to identify names
referring to persons, organizations and locations in free text; later this task
has been expanded to obtain other entities like dates and numeric expressions.
This task was originally introduced as possible types of llers in Information
Extraction systems at the 6th Message Understanding Conference [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ]. Although
initially this task was limited to identify such expressions, later it has been
expanded to their labeling with one entity type label (\person", \organization",
etc.). Note that an entity (such as \Stanford", the American university at the
U.S.) can be referenced using several surface forms (e.g., \Stanford University"
and \Stanford") and a single surface form (e.g., \Stanford") can refer to several
entities (the university but also an American nancer, several places in the UK
or a nancial group). See [
          <xref ref-type="bibr" rid="ref38">38</xref>
          ] for an interesting review.
        </p>
        <p>NER has proved to be a task useful for a number of NLP tasks as question
answering, textual entailment and coreference resolution, among others. The recent
interest in emerging areas like bioinformatics allows to expand this recognition
task to proteins, drugs and chemical names. While early studies were mostly
based on handcrafted rules, most recent ones use supervised machine learning
as a way to automatically induce rule-based systems or sequence labeling
algorithms starting from a collection of training examples.</p>
        <p>
          Often, corpus processing tools include some text handling facilities to perform
simple NER detection for facilitating later processing. Some of them are based
in language speci c peculiarities such as initial upper case letters together with
some heuristics for name entities placed at the beginning of the sentence. This
is the case of the tool used for this experiment (see a description in [
          <xref ref-type="bibr" rid="ref39">39</xref>
          ]).
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>The REG System</title>
        <p>
          REG ([
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]) is an Enhanced Graph summarizer (REG) for extract
summarization, using a graph approach. The strategy of this system has two main
stages: a) to carry out an adequate representation of the document and b) to give
a weight to each sentence of the document. In the rst stage, the system makes
a vectorial representation of the document. In the second stage, the system uses
a greedy optimization algorithm. The summary generation is done with the
concatenation of the most relevant sentences (previously scored in the optimization
stage).
        </p>
        <p>
          REG algorithm contains three modules. The rst one carries out the vectorial
transformation of the text with ltering, lemmatization/stemming and
normalization processes. The second one applies the greedy algorithm and calculates the
adjacency matrix. We obtain the score of the sentences directly from the
algorithm. Therefore, sentences with a higher score are selected as the most relevant.
Finally, the third module generates the summary, selecting and concatenating
the relevant sentences. The rst and second modules use CORTEX [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], a
system that carries out an unsupervised extraction of the relevant sentences of a
document using several numerical measures and a decision algorithm.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <p>A main point in this research is to consider that named entities as well as words
sequences that agree with the typical terminological patterns (see section 2.1) are
representative of the tweets' topic. To test this assertion, we design a
methodology to automatically retrieve all signi cant sequences from the tweets that
satisfy the above mentioned criteria.</p>
      <p>The rst step is to POS tag the tweets le. As a matter of fact, and in order
to keep the process fully automatic, a minimal manipulation of the tweets le
has been done. It includes only a minor modi cation to allow the text handling
tool to keep the tweet id connected to the tweet itself.</p>
      <p>
        The next step, terminological patterns extraction, has been done using an
already existent module of the YATE term extraction tool [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]. This information,
together with the POS tagged tweet (to obtain proper nouns info) is used to build
the query string for Indri.
      </p>
      <p>Some care has been taken to keep track of multiword sequences as indicated
by the Indri query language speci cation (see examples below).</p>
      <p>In order to enrich the queries, we use a local installation of a Wikipedia
dump2 to expand the terms with redirection information from such Wikipedia
info. In this way, a query term like \Falklands" may be searched in the Wikipedia
to nd that it can be also referenced as \Falkland Islands"; therefore, the nal
query term is rewritten as:
#syn(Falklands #1(Falkland Islands))</p>
      <p>This strategy is also useful to nd acronyms expansion as \USGS" and
\United States Geological Survey" resulting in the following query:
#syn("USGS" #1(United States Geological Survey))
Moreover, it allows to nd words with di erent spellings as:
#syn(#1(Christine de Pisan) #1(Christine de Pizan))</p>
      <p>The resulting query has been delivered to Indri, using track organizer's script,
to obtain the Wikipedia pages relevant to every query. The following is an
example of a full tweet:</p>
      <p>
        Increasingly, central banks, especially in emerging markets,
have been the marginal buyers of gold http://t.co/9mftD5ju
via WSJ.
and its corresponding query string:
#1(marginal buyers of gold),#1(emerging markets),
#1(central banks),#syn("WSJ" #1(The Wall Street Journal))
The resulting set of Wikipedia pages has been split in several documents.
Each document contains the pages relevant to the query. Such document is the
input to the REG summarization system (see section 2.3), which builds a
summary with the signi cant passages.
2 This resource has been otained using [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>Experiments Settings and Results</title>
      <p>As mentioned in section 3, the process is fully automatic. No human intervention
has taken place; therefore, errors and/or mistakes in the process may have a
multiplicative e ect. Most of such issues are exempli ed as follows:
1. Tweet itself. The tweets le (including 1000 tweets) prepared by the
organization includes several errors like: mispelling, joined words, foreign language,
etc. Consider the following examples:
{ 169657757870456833: \Lakers now 17-12 on the season &amp; 12-2 at home.
@paugasol 20pts 13rebs 4blks. Bynum 15pts 15rebs. @0goudelock 10pts,
two 3 PTers."
{ 169904294642978816: \@ranaoboy @Utcheychy @Jhpiego Thx for the
#wiwchat RTs! Great conversation!"
{ 169655717538701312: \METTA. WORLD. PEACE."
{ 170175722449670145: \http://t.co/amQ6IShA"
{ 170207412366745600: \RT @MexicanProblms: #41. When you're
eating junk food y tu mom te dice que no comas &amp;quot;chucherias.&amp;quot;
#MexicanProblems".</p>
      <p>
        Please note that, in some cases, it results in an empty query string or the
resulting sentence is too short, causing POS tagging errors due to lack of
context.
2. POS tagging. The output of most of the tools used for tagging (TreeTagger
in this case) has some error rate. Unfortunately, errors mentioned above as
well as extremely short sentences have a negative in uence in the tagger
performance.
3. Wikipedia expansion. It may happen that information added through
Wikipedia expansion is not fully useful. This may be the case the only added
information is the change of the case of some letters of the query term.
4. Indri query system. As shown in [
        <xref ref-type="bibr" rid="ref41">41</xref>
        ], this retrieval system has its own limits.
5. REG summarization system. The retrieval system issues a number of
Wikipedia pages; therefore, it would be necessary to use a multidocument
summarization system. As a matter of fact, REG is a single document summarizer,
so some redundance may appear in the summaries.
      </p>
      <p>Some of the above issues may cause unusual results in the terminological
patterns extraction tool. Therefore, in such cases, the pages retrieved by Indri
may not correspond to the information available in Wikipedia about tweets'
topics.</p>
      <p>
        The evaluation of all the participant systems in the tweet contextualization
INEX task at CLEF 2012 involves two aspects: informativeness and readability.
On the one hand, as mentioned, to evaluate the informativeness the automatic
FRESA package is used. This evaluation framework includes document-based
summary evaluation measures based on probabilities distribution, speci cally,
the Kullback-Leibler (KL) divergence and the Jensen-Shannon (JS) divergence.
As in the ROUGE package [
        <xref ref-type="bibr" rid="ref42">42</xref>
        ], FRESA supports di erent n-grams and skip
ngrams probability distributions. FRESA environment has been used in the
evaluation of summaries produced in several European languages (English, French,
Spanish and Catalan), and it integrates ltering and lemmatization in the
treatment of summaries and documents.
      </p>
      <p>Table 1 includes the o cial results of the informativeness evaluation in the
the tweet contextualization INEX task at CLEF 2012. This table presents the
scores of the 33 participant runs.</p>
      <p>As shown in Table 1, our run (165) obtains the position 22 in the rank.
Exactly, it obtains 0.8818 using unigrams, 0.9630 using bigrams and 0.9634 using
skip bigrams. The best run in the ranking (178) obtains 0.7734, 0.8616 and
0.8623, respectively.</p>
      <p>On the other hand, readability is evaluated manually. Evaluators are asked
to evaluate several aspects related to syntactic incoherence, unsolved anaphora,
redundancy, etc. The speci c orders given to evaluators are:
{ Syntax S: \Tick the box is the passage contains a syntactic problem (bad
segmentation for example)".
{ Anaphora A: \Tick the box if the passage contains an unsolved anaphora".
{ Redundancy R: \Tick the box if the passage contains a redundant
information, i.e. an information that have already been given in a previous passage".
{ Trash T: \Tick the box if the passage does not make any sense in its context
(i.e. after reading the previous passages). These passages must then be
considered as trashed, and readability of following passages must be assessed as
if these passages were not present".</p>
      <p>The score is the average normalized number of words in valid passages, and
participants are ranked according to this score. Summary word numbers are
normalized to 500 words each.</p>
      <p>Table 2 includes the nal results of readability evaluation in the tweet
contextualization INEX task at CLEF 2012. Estimated average scores are available
for:
{ Relevance: proportion of text that makes sense in context.
{ Syntax: proportion of text without syntax problems.
{ Structure: proportion of text without broken anaphora and avoiding
redundancy.</p>
      <p>These measures were estimated on the same pool of tweets as for previously
released informativeness evaluation by organizers.</p>
      <p>Runs that failed to provide at least 6 consistent summaries in this pool have
been kept apart because the estimates were too uncertain for inclusion in the
o cial results. Because of this reason, in Table 2 only 27 runs are shown.</p>
      <p>As shown in Table 2, our run (165) obtains the position 7 in the rank. Exactly,
it obtains 0.5936 using unigrams, 0.6049 using bigrams and 0.5442 using skip
bigrams. The best run in the ranking (185) obtains 0.7728, 0.7452 and 0.6446,
respectively.</p>
      <p>These results show that the performance of our system is not so good
regarding informativeness, but it is much better regarding readability. This di erence
between informativeness and readability is also shown by other systems (see for
example the best runs in both categories, 178 and 185). In our case, we consider
that the mentioned mistakes in the tweets and the fact that the terminology
extraction is totally automatic can cause that the pages retrieved by Indri are
not as relevant as expected. Nevertheless, using an automatic summarization
system, we can guarantee that the quality of readability is acceptable.
In this paper, our strategy and results for the tweet contextualization INEX task
at CLEF 2012 are presented. The task consists of the developing of a system
that, given a tweet, can provide some context about the subject of the tweet,
in order to help the reader to understand it. This context should take the form
of a readable summary, not exceeding 500 words, composed of passages from
a provided Wikipedia corpus. The test data are about 1000 tweets in English
collected by the organizers of the task from Twitter.</p>
      <p>Our system performs some automatic reformulations of the initial tweets
provided for the task (obtaining a list of terms related with their main topic using
terminological patterns). Then, using these reformulated tweets, we obtain
related documents with the search engine Indri. Finally, we use REG to summarize
these documents and provide the nal summary associated to each tweet.</p>
      <p>The results show that, comparing to the other participants, the performance
of our system is not so good regarding informativeness (probably due to mistakes
in the tweets and problems in the terminology extraction process), but it is much
better regarding readability (probably due to the fact of using a summarization
system).</p>
      <p>In the future we plan to follow several parallel lines: i) to improve term
selection and its expansion to re ne the queries and therefore to improve the
pertinence of the Wikipedia pages retrieved by Indri; ii) to further investigate
the actual pertinence of the Wikipedia retrieved pages to the query; and iii)
to check the actual weight of summarization process in the full task by testing
other summarization systems.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Saggion</surname>
            , H.; Lapalme,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>Generating Indicative-Informative Summaries with SumUM</article-title>
          .
          <source>Computational Linguistics</source>
          <volume>28</volume>
          (
          <issue>4</issue>
          ).
          <fpage>497</fpage>
          -
          <lpage>526</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Edmunson</surname>
            ,
            <given-names>H. P.</given-names>
          </string-name>
          (
          <year>1969</year>
          ).
          <article-title>New Methods in Automatic Extraction</article-title>
          .
          <source>Journal of the Association for Computing Machinery</source>
          <volume>16</volume>
          .
          <fpage>264</fpage>
          -
          <lpage>285</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Nanba</surname>
            , H.; Okumura,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>Producing More Readable Extracts by Revising Them</article-title>
          .
          <source>In Proceedings of the 18th Int. Conference on Computational Linguistics (COLING-2000). Saarbrucken</source>
          .
          <volume>1071</volume>
          -
          <fpage>1075</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gaizauskas</surname>
            , R.; Herring,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Oakes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Beaulieu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Willett</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Fowkes</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Jonsson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2001</year>
          ).
          <article-title>Intelligent access to text: Integrating information extraction technology into text browsers</article-title>
          .
          <source>In Proceedings of the Human Language Technology Conference. San Diego</source>
          . 189-
          <fpage>193</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Reger</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>Extract-based Summarization with Simplication</article-title>
          .
          <source>In Proceedings of the 2nd Document Understanding Conference at the 40th Meeting of the Association for Computational Linguistics</source>
          .
          <fpage>90</fpage>
          -
          <lpage>96</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Torres-Moreno</surname>
            ,
            <given-names>J-M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Velazquez-Morales</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Meunier</surname>
            ,
            <given-names>J. G.</given-names>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>Condenses de textes par des methodes numeriques</article-title>
          .
          <source>In Proceedings of the 6th Int. Conference on the Statistical Analysis of Textual Data (JADT)</source>
          .
          <source>St. Malo</source>
          .
          <volume>723</volume>
          -
          <fpage>734</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. da Cunha, I.; Fernandez,
          <string-name>
            <surname>S.</surname>
          </string-name>
          ; Velazquez,
          <string-name>
            <given-names>P.</given-names>
            ;
            <surname>Vivaldi</surname>
          </string-name>
          , J.; SanJuan, E.;
          <string-name>
            <surname>Torres-Moreno</surname>
            ,
            <given-names>J-M.</given-names>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>A new hybrid summarizer based on Vector Space Model</article-title>
          ,
          <source>Statistical Physics and Linguistics. Lecture Notes in Computer Science</source>
          <volume>4827</volume>
          .
          <fpage>872</fpage>
          -
          <lpage>882</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ono</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sumita</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Miike</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>1994</year>
          ).
          <article-title>Abstract generation based on rhetorical structure extraction</article-title>
          .
          <source>In Proceedings of the Int. Conference on Computational Linguistics. Kyoto</source>
          .
          <volume>344</volume>
          -
          <fpage>348</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Paice</surname>
            ,
            <given-names>C. D.</given-names>
          </string-name>
          (
          <year>1990</year>
          ).
          <article-title>Constructing literature abstracts by computer: Techniques and prospects</article-title>
          .
          <source>Information Processing and Management</source>
          <volume>26</volume>
          .
          <fpage>171</fpage>
          -
          <lpage>186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Radev</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>Language Reuse and Regeneration: Generating Natural Language Summaries from Multiple On-Line Sources</article-title>
          .
          <source>PhD Thesis</source>
          . New York, Columbia University.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Torres-Moreno</surname>
            ,
            <given-names>J-M.</given-names>
          </string-name>
          ;
          <article-title>Ram rez</article-title>
          , J. (
          <year>2010</year>
          ).
          <article-title>REG : un algorithme glouton applique au resume automatique de texte</article-title>
          .
          <source>Proceedings of the 10th Int. Conference on the Statistical Analysis of Textual</source>
          . Roma, Italia.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Torres-Moreno</surname>
            ,
            <given-names>J-M.</given-names>
          </string-name>
          ;
          <article-title>Ram rez</article-title>
          , J.; da Cunha,
          <string-name>
            <surname>I.</surname>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Un resumeur a base de graphes, independant de la langue</article-title>
          .
          <source>In Proceedings of the Int. Workshop African</source>
          HLT 2010. Djibouti.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Vivaldi</surname>
          </string-name>
          , J.; da Cunha, I.;
          <article-title>Ram rez</article-title>
          , J. (
          <year>2011</year>
          ).
          <article-title>The REG summarization system with question reformulation at QA@INEX track 2010</article-title>
          . In Geva, S. et al. (eds.).
          <source>INEX 2010, Lecture Notes in Computer Science</source>
          <volume>6932</volume>
          .
          <fpage>295</fpage>
          -
          <lpage>302</lpage>
          . Berl n: Springer.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Vivaldi</surname>
            , J.; da Cunha,
            <given-names>I.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <source>QA@INEX Track</source>
          <year>2011</year>
          :
          <article-title>Question Expansion and Reformulation Using the REG Summarization System</article-title>
          .
          <source>Lecture Notes in Computer Science (LNCS) 7424</source>
          .
          <fpage>257</fpage>
          -
          <lpage>268</lpage>
          . Berlin: Springer.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Saggion</surname>
          </string-name>
          , H.;
          <string-name>
            <surname>Torres-Moreno,</surname>
          </string-name>
          J-M.
          <article-title>; da Cunha, I.;</article-title>
          <string-name>
            <surname>SanJuan</surname>
          </string-name>
          , E.;
          <string-name>
            <surname>Velazquez-Morales</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ; SanJuan,
          <string-name>
            <surname>E.</surname>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Multilingual Summarization Evaluation without Human Models</article-title>
          .
          <source>In Proceedings of the 23rd Int. Conference on Computational Linguistics (COLING</source>
          <year>2010</year>
          ). Pekin.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Torres-Moreno</surname>
            ,
            <given-names>J-M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Saggion</surname>
          </string-name>
          , H.;
          <article-title>da Cunha, I.;</article-title>
          <string-name>
            <surname>SanJuan</surname>
          </string-name>
          , E.;
          <string-name>
            <surname>Velazquez-Morales</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Summary Evaluation With and Without References</article-title>
          .
          <source>Polibitis: Research journal on Computer science and computer engineering with applications 42.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Torres-Moreno</surname>
            ,
            <given-names>J-M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Saggion</surname>
          </string-name>
          , H.; da Cunha, I.;
          <string-name>
            <surname>Velazquez-Morales</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ; SanJuan,
          <string-name>
            <surname>E.</surname>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Ealuation automatique de resumes avec et sans reference</article-title>
          .
          <source>In Proceedings of the 17e Conference sur le Traitement Automatique des Langues Naturelles (TALN)</source>
          . Montreal: Univ. de Montreal et Ecole Polytechnique de Montreal.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Cabre</surname>
            ,
            <given-names>M. T.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>La terminolog a. Representacion y comunicacion</article-title>
          .
          <source>Barcelona: IULA.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Cabre</surname>
          </string-name>
          , M. T.;
          <string-name>
            <surname>Estopa</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Vivaldi,
          <string-name>
            <surname>J.</surname>
          </string-name>
          (
          <year>2001</year>
          ).
          <article-title>Automatic term detection. A review of current systems</article-title>
          .
          <source>Recent Advances in Computational Terminology</source>
          <volume>2</volume>
          .
          <fpage>53</fpage>
          -
          <lpage>87</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Pazienza</surname>
          </string-name>
          , M. T.;
          <string-name>
            <surname>Pennacchiotti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zanzotto</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          (
          <year>2005</year>
          ).
          <article-title>Terminology Extraction: An Analysis of Linguistic and Statistical Approaches</article-title>
          .
          <source>Studies in Fuzziness and Soft Computing</source>
          <volume>185</volume>
          .
          <fpage>255</fpage>
          -
          <lpage>279</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Ahrenberg</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Term Extraction: A Review. (Unpublished draft)</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Alarcon</surname>
            , R.; Sierra,
            <given-names>G.</given-names>
          </string-name>
          ; Bach,
          <string-name>
            <surname>C.</surname>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>ECODE: A Pattern Based Approach for De nitional Knowledge Extraction</article-title>
          .
          <source>In Proceedings of the XIII EURALEX Int. Congress. Barcelona: IULA</source>
          ,
          <string-name>
            <surname>UPF</surname>
          </string-name>
          , Documenta Universitaria.
          <fpage>923</fpage>
          -
          <lpage>928</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Enguehard</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Pantera</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>1994</year>
          ).
          <article-title>Automatic Natural Acquisition of a Terminology</article-title>
          .
          <source>Journal of Quantitative Linguistics</source>
          <volume>2</volume>
          (
          <issue>1</issue>
          ).
          <fpage>27</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Patry</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Langlais</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2005</year>
          ).
          <article-title>Corpus-based terminology extraction</article-title>
          .
          <source>In Proceedings of 7th Int. Conference on Terminology and Knowledge Engineering</source>
          . Copenhagen.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Drouin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>Acquisition automatique des termes: l'utilisation des pivots lexicaux specialises</article-title>
          .
          <source>Ph.D. Thesis</source>
          . Montreal (Canada): Universite de Montreal.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Frantzi</surname>
          </string-name>
          , K. T.;
          <string-name>
            <surname>Ananiadou</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Tsujii</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2009</year>
          ). Erdmann,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Nakayama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ;
            <surname>Hara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ;
            <surname>Nishio</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>The C-value/NC-value Method of Automatic Recognition for Multi-word Terms</article-title>
          .
          <source>Lecture Notes in Computer Science</source>
          <volume>1513</volume>
          .
          <fpage>585</fpage>
          -
          <lpage>604</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Vintar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Bilingual term recognition revisited: The bag-of-equivalents term alignment approach and its evaluation</article-title>
          .
          <source>Terminology</source>
          <volume>16</volume>
          (
          <issue>2</issue>
          ).
          <fpage>141</fpage>
          -
          <lpage>158</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <given-names>Gomez</given-names>
            <surname>Guinovart</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>A Hybrid Corpus-Based Approach to Bilingual Terminology Extraction</article-title>
          .
          <source>In I. Moskowich and B. Crespo (eds.)</source>
          .
          <source>Encoding the Past, Decoding The Future: Corpora in the 21st Century. Cambridge Scholar Publishing: Newcastle upon Tyne</source>
          .
          <volume>147</volume>
          -
          <fpage>175</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Aronson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>An overview of MetaMap: historical perspective and recent advances</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>17</volume>
          (
          <issue>3</issue>
          ).
          <fpage>229</fpage>
          -
          <lpage>236</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Maynard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>Term Recognition Using Combined Knowledge Sources</article-title>
          .
          <source>Ph.D. Thesis</source>
          . Manchester Metropolitan University. Manchester (UK).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Vivaldi</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2001</year>
          ).
          <article-title>Extraccion de candidatos a termino mediante combinacion de estrategias heterogeneas</article-title>
          .
          <source>Ph.D. thesis</source>
          . Universitat Politecnica de Catalunya.
          <source>Barcelona (Spain).</source>
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Vivaldi</surname>
          </string-name>
          , J.; Rodr guez, H. (
          <year>2010</year>
          ).
          <article-title>Using Wikipedia for term extraction in the biomedical domain: rst experiences</article-title>
          .
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>45</volume>
          .
          <fpage>251</fpage>
          -
          <lpage>254</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Cabrera-Diego</surname>
          </string-name>
          , L.;
          <string-name>
            <surname>Sierra</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Vivaldi</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Pozzi,
          <string-name>
            <surname>M.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Using Wikipedia to Validate Term Candidates for the Mexican Basic Scienti c Vocabulary</article-title>
          .
          <source>In Proceedings of LaRC 2011: First Int. Conference on Terminology, Languages, and Content Resources</source>
          . Seoul. 76-
          <fpage>85</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Erdmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nakayama</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hara</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nishio</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Improving the Extraction of Bilingual Terminology from Wikipedia</article-title>
          .
          <source>ACM Transactions on Multimedia Computing, Communications and Applications</source>
          <volume>5</volume>
          (
          <issue>4</issue>
          ).
          <year>31</year>
          .
          <fpage>1</fpage>
          -
          <lpage>31</lpage>
          .
          <fpage>16</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <surname>Estopa</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary</article-title>
          .
          <source>Ph.D. Thesis</source>
          . Pompeu Fabra University. Barcelona (Spain).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <surname>Nazar</surname>
            , R.; Cabre,
            <given-names>M. T.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Supervised Learning Algorithms Applied to Terminoloy Extraction</article-title>
          .
          <source>In 10th Terminology and Knowledge Engineering Conference.</source>
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37.
          <string-name>
            <surname>Grishman</surname>
            , R.; Sundheim,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>1996</year>
          ). Message Understanding Conference - 6:
          <string-name>
            <given-names>A</given-names>
            <surname>Brief</surname>
          </string-name>
          <article-title>History</article-title>
          .
          <source>In Proceedings of the 16th Int. Conference on Computational Linguistics</source>
          .
          <fpage>466</fpage>
          -
          <lpage>471</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38.
          <string-name>
            <surname>Nadeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sekine</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>A survey of named entity recognition and classi - cation</article-title>
          .
          <source>Journal of Linguisticae Investigationes</source>
          <volume>30</volume>
          (
          <issue>1</issue>
          ).
          <fpage>3</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          39. Mart nez, H.;
          <string-name>
            <surname>Vivaldi</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Villegas,
          <string-name>
            <surname>M.</surname>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Text handling as a Web Service for the IULA processing pipeline</article-title>
          .
          <source>In Proceedings of the 7th conference on International Language Resources and Evaluation (LREC'10)</source>
          .
          <fpage>22</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          40.
          <string-name>
            <surname>Zesch</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ; Muller,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Gurevych</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary</article-title>
          .
          <source>In 6th LREC Conference Proceedings. 1646-1652.</source>
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          41.
          <string-name>
            <surname>Strohman</surname>
          </string-name>
          , Trevor; Metzler, Donald; Turtle, Howard; Croft,
          <string-name>
            <surname>Bruce</surname>
          </string-name>
          (
          <year>2005</year>
          ).
          <article-title>Indri: A language-model based search engine for complex queries</article-title>
          . University of Massachusetts Amherst.
          <source>CIIR Technical Report IR-407.</source>
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          42.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C-Y.</given-names>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>ROUGE: A Package for Automatic Evaluation of Summaries</article-title>
          .
          <source>In Proceedings of Text Summarization Branches Out: ACL-04 Workshop</source>
          . 74-
          <fpage>81</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>