<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of INEX Tweet Contextualization 2014 track</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Patrice Bellot</string-name>
          <email>patrice.bellot@univ-amu.fr</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Veronique Moriceau</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Josiane Mothe</string-name>
          <email>josiane.mothe@irit.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eric SanJuan</string-name>
          <email>eric.sanjuan@univ-avignon.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xavier Tannier</string-name>
          <email>xtannierg@limsi.fr</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IRIT, UMR 5505, Universite de Toulouse, Institut Universitaire de Formation des Maitres Midi-Pyrenees</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LIA, Universite d'Avignon et des Pays de Vaucluse</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>LIMSI-CNRS, University Paris-Sud</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>LSIS - Aix-Marseille University</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <fpage>494</fpage>
      <lpage>500</lpage>
      <abstract>
        <p>140 characters long messages are rarely self-content. The Tweet Contextualization aims at providing automatically information - a summary that explains the tweet. This requires combining multiple types of processing from information retrieval to multi-document summarization including entity linking. Running since 2010, the task in 2014 was a slight variant of previous ones considering more complex queries from RepLab 2013. Given a tweet and a related entity, systems had to provide some context about the subject of the tweet from the perspective of the entity, in order to help the reader to understand it.</p>
      </abstract>
      <kwd-group>
        <kwd>Short text contextualization</kwd>
        <kwd>Tweet understanding</kwd>
        <kwd>Automatic summarization</kwd>
        <kwd>Question answering</kwd>
        <kwd>Focus information retrieval</kwd>
        <kwd>XML</kwd>
        <kwd>Natural language processing</kwd>
        <kwd>Wikipedia</kwd>
        <kwd>Text readability</kwd>
        <kwd>Text informativeness</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The task in 2014 is a slight variant of previous ones and it is complementary
to CLEF RepLab. Previously, given a tweet, systems had to help the user to
understand it by providing a short textual summary. This summary had to be
readable on a mobile device without having to scroll too much. In addition, the
user should not have to query any system and the system should use a resource
freely available. More speci cally, the guideline speci ed the summary should be
500 words long and built from sentences extracted from a dump of Wikipedia.</p>
      <p>In 2014 a small variant of the task has been explored, considering more
complex queries from RepLab 2013, but using the same corpus. The new use
case of the task was the following: given a tweet and a related entity, the system
must provide some context about the subject of the tweet from the perspective
of the entity, in order to help the reader answering questions of the form "why
this tweet concerns the entity? should it be an alert?".</p>
      <p>In the remaining we give details about the 2014 track in English language
set up and results.</p>
      <p>We also give preliminary results about the pilot task in Spanish.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data collection</title>
      <p>The o cial document collection for 2014 was the same as in 2013. Between 2011
and 2013 the corpus did change every year but not the use case. In 2014, the
same corpus was reused but the use case evolved. Since 2014 TC topics are a
selection of tweets from RepLab 2013, it was necessary to use prior WikiPedia
dumps. Some participants also used the 2012 corpus raising up the question of
the impact of updating the WikiPedia over these tasks.</p>
      <p>Let us recall that the document collection has been built based on yearly
dumps of the English WikiPedia since November 2011. We released a set of tools
to convert a WikiPedia dump into a plain XML corpus for an easy extraction
of plain text answers. The same perl programs released for all participants have
been used to remove all notes and bibliographic references that are di cult to
handle and keep only non empty Wikipedia pages (pages having at least one
section).</p>
      <p>The resulting automatically generated documents from WikiPedia dump,
consist of a title (title), an abstract (a) and sections (s). Each section has a
subtitle (h). Abstract and sections are made of paragraphs (p) and each paragraph
can contain entities (t) that refer to other Wikipedia pages.</p>
      <p>As tweets, 240 topics have been collected from RepLab 2013 corpus. These
tweets have been selected in order to make sure that:
{ they contained \informative content" (in particular, no purely personal
messages),
{ the document collections from Wikipedia had related content, so that a
contextualization was possible.</p>
      <p>In order to avoid that fully manual, or not robust enough systems could
achieve the task, all tweets were to be treated by participants, but only a random
sample of them was to be considered for evaluation.</p>
      <p>These tweets were provided in XML and tabulated format with the following
information:
{ the category (4 distinct),
{ an entity name from the wikipedia (64 distinct),
{ a manual topic label (235 distinct).</p>
      <p>The entity name was to be used as an entry point into WikiPedia or DBpedia.
The context of the generated summaries was expected to be fully related to this
entity. On the contrary, the usefulness of topic labels for this automatic task was
and remains an open question at this moment because of their variety.</p>
    </sec>
    <sec id="sec-3">
      <title>Evaluation</title>
      <p>Like in 2013, the entire evaluation process was carried out by the organizers.</p>
      <p>
        Tweet contextualization [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is evaluated on both informativeness and
readability. Informativeness aims at measuring how well the summary explains the
tweet or how well the summary helps a user to understand the tweet content. On
the other hand, readability aims at measuring how clear and easy to understand
the summary is.
      </p>
      <p>
        Informativeness. It is based on lexical overlap between a set of relevant
passages (RPs) and participant summaries based on LogSim divergence introduced
in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Given an integer &gt; 30, and two texts T; S the LogSim divergence can
be restated as:
(1)
Z :
(2)
LS( S j T ) =
      </p>
      <p>X P (!j T ):
!2 T
min( T (!); S (!))
max( T (!); S (!)
where for any text Z,</p>
      <p>Z is the set of n-grams in Z and for any n-gram ! 2</p>
      <p>Z (!) = log(1 +</p>
      <p>P (!j Z ))</p>
      <p>The parameter used in LS formula represents the summary allowed
maximal length in words (500 in our case).</p>
      <p>
        Once the pool of RPs (t-rels) is constituted, the process is automatic and
can be applied to uno cial runs. The release of these pools is one of the main
contributions of Tweet Contextualization tracks at INEX[
        <xref ref-type="bibr" rid="ref1 ref3 ref4">4, 3, 1</xref>
        ].
      </p>
      <p>In previous editions t-rels were based on a pooling of participant submitted
passages. Organizers then selected among them those that were relevant. In 2013,
to build a more robust reference, two manual runs by participants were added
using di erent on line research engines to nd relevant WikiPedia pages and
copying the relevant passages into the reference.</p>
      <p>This year, even though there were only ve participants, the variety of
submitted passages was too high compared to the number of runs. One reason was
that this year topics included more facets and converting them into queries for
a Research Engine was less straightforward. As a consequence, it was not
possible to rely on a pooling from participant runs because it would have been too
sparse and incomplete. It was nally decided to rely on a thorough manual run
by the organizers based on the reference system that was made available to all
participants at http://qa.termwatch.es</p>
      <p>A manual query in Indri language was set up for every topic over ve. These
queries have been re ned until they provide only a set of relevant passages using
the reference system on the 2013 corpus. From this RPs we extracted two
trels, one merging all passages for each tweets, another by considering the Noun
Phrases (NPs) only from the passages to reduce the risk of introducing document
identi ers in the passages.</p>
      <p>The average length of queries to build the reference is 8 tokens with a
minimum of 2 and a maximum of 14. Therefore, e cient queries are much shorter
than tweets. The average number of relevant tokens in the t-rels based on
passages is 620, and on the t-rels based on NPs is only 300.</p>
      <p>Readability. By contrast, readability is evaluated manually and cannot be
reproduced on uno cial runs. In this evaluation the assessor indicates where
he misses the point of the answers because of highly incoherent grammatical
structures, unsolved anaphora, or redundant passages. Since 2012, three metrics
have been used: Relaxed metric, counting passages where the T box has not
been checked; Syntax, counting passages where the S box was not checked
either, and the Structure (or Strict) metric counting passages where no box
was checked at all. As in previous editions, participant runs have been ranked
according to the average, normalized number of words in valid passages.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>In 2014, 4 combined teams from six countries (Canada, France, Germany, India,
Russia, Tunesia) submitted 12 runs to the Tweet Contextualization track in the
framework of CLEF INEX lab 20145. The total number of submitted passages
was 54; 932 with an average length of 32 tokens. The total number of tokens was
1; 764; 373 with an average of 7; 352 per tweet.</p>
      <p>We also generated two reference runs based one the organizer's system made
available to participants using 2013 and 2012 corpus respectively.</p>
      <p>To read the scores, the lower they are the better since these are divergences.
Informativeness results based on passage t-rels are presented in Table 1, and
those on NPs t-rels in Table 3. Statistical signi cance of di erences between
scores in Table 1 are indicated in Table 2. Readability results are presented in
Table 4.</p>
      <p>Both informativeness rankings in Table 1 and in Table 3 are highly correlated,
however discrepancies between the two rankings show that di erences between
top ranked runs rely on tokens outside NPs, mainly verbs since functional words
are removed in the evaluation.</p>
      <p>Table 4 reveals that readability of reference runs is low, meanwhile they are
made of longer passages than average to ensure local syntax correctness.</p>
      <p>Since reference runs are using the same system and index as the manual run
used to build the t-rels, they tend to minimize the informativeness divergence
with the reference. However, average divergence remains high pointing out that
selecting the right passages in the restricted context of an entity, was more
di cult than previous more generic tasks. Considering readability, the fact that
reference runs are low ranked con rms that nding the right compromise between
readability and informativeness remains the main di culty of this task.</p>
      <p>This year, the best participating system for informativeness used association
rules. Since contextualization was restricted to a facet described by an entity, it
could be that association rules helped to focus on this aspect.
5 Two other teams from Mexico and Spain participated to the pilot task in Spanish
submitting three runs not considered in this overview.
13 12
0 0
fre2 fre2 361 360 359 368 369 370 356 357 364 358 363 362</p>
      <p>The best participating system for readability used an advanced
summarization systems that introduced minor changes in passages to improve readability.
Changing the content of the passages was not allowed, however this tend to show
that to deal with readability some rewriting is required. Moreover, since this year
evaluation did not include a pool of passages from participants, systems that
provided modi ed passages have been disadvantaged in informativeness evaluation.
A extra set of topics (only tweet texts) has been released in Spanish to try a
di erent language and a slightly di erent task. Topics in Spanish are opinionated
personal tweets about music bands, cars and politics. Like for tweets in English,
they were also manually selected from CLEF RepLab 2013 test set among those
without external url and with at least 15 words. Contextualization had to help
the reader to understand the opinion polarity, allusions and humor.</p>
      <p>Three runs from two teams have been submitted to this task. Informativeness
results based on passage t-rels based on a manual run over a random subset of
seven topics are presented in Table 5. Best run uses entity extractions in tweets
and complex language model queries based on these entities. Other runs provide
plain de nitions of terms in tweets. Best run signi cantly outperforms two other
runs.</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>Text contextualization can be viewed as a way to provide more information
on the corresponding text in the objective to make it understandable and to
relate this text to information that explains it. This year we experimented a less
generic task where only information that explains the tweet opinion and/or the
relation with a given entity is considered as relevant. Surprisingly, participant
runs showed that thorough and updated information about opinions related to
entities can be extracted from WikiPedia page textual content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bellot</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doucet</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Geva</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurajada</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kamps</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kazai</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koolen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mishra</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moriceau</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Preminger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>SanJuan</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Schenkel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tannier</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Theobald</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trappett</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          :
          <article-title>Overview of inex 2013</article-title>
          . In: Forner,
          <string-name>
            <given-names>P.</given-names>
            , Muller, H.,
            <surname>Paredes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <surname>B</surname>
          </string-name>
          . (eds.)
          <source>CLEF. Lecture Notes in Computer Science</source>
          , vol.
          <volume>8138</volume>
          , pp.
          <volume>269</volume>
          {
          <fpage>281</fpage>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>SanJuan</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Bellot</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moriceau</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tannier</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Overview of the inex 2010 question answering track (qa@inex)</article-title>
          . In: Geva,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Kamps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Schenkel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Trotman</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.)
          <source>INEX. Lecture Notes in Computer Science</source>
          , vol.
          <volume>6932</volume>
          , pp.
          <volume>269</volume>
          {
          <fpage>281</fpage>
          . Springer (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>SanJuan</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Moriceau</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tannier</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bellot</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mothe</surname>
          </string-name>
          , J.:
          <article-title>Overview of the inex 2012 tweet contextualization track</article-title>
          . In: Forner,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Womser-Hacker</surname>
          </string-name>
          ,
          <string-name>
            <surname>C</surname>
          </string-name>
          . (eds.) CLEF (Online Working Notes/Labs/Workshop) (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>SanJuan</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Moriceau</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tannier</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bellot</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mothe</surname>
          </string-name>
          , J.:
          <article-title>Overview of the inex 2011 question answering track (qa@inex)</article-title>
          . In: Geva,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Kamps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Schenkel</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (eds.)
          <source>Focused Retrieval of Content and Structure, Lecture Notes in Computer Science</source>
          , vol.
          <volume>7424</volume>
          , pp.
          <volume>188</volume>
          {
          <fpage>206</fpage>
          . Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>