<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>IRIT at INEX 2013: Tweet Contextualization Track</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Liana Ermakova</string-name>
          <email>liana.ermakova.87@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Josiane Mothe</string-name>
          <email>josiane.mothe@irit.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institut de Recherche en Informatique de Toulouse 118 Route de Narbonne</institution>
          ,
          <addr-line>31062 Toulouse Cedex 9</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2011</year>
      </pub-date>
      <abstract>
        <p>The paper presents IRIT's approach used at INEX Tweet Contextualization Track 2013. Systems had to provide a context to a tweet. This year we further modified our approach presented at INEX 2011 and 2012 underlain by the product of scores based on hashtag processing, TF-IDF cosine similarity measure enriched by smoothing from local context and document beginning, named entity recognition and part-of-speech weighting. We assumed that relevant sentences come from relevant documents therefore we multiply sentence score by document relevance. We also used generalized POS (e.g. we merge regular adverbs, superlative and comparative into a single adverb group). We introduced sentence quality measure based on Flesch reading ease test, lexical diversity, meaningful word ratio and punctuation ratio. Our approach was ranked first, second and third over 24 runs submitted by all participants on different reference pools according to informativeness evaluation. At the same time it obtained the best readability score.</p>
      </abstract>
      <kwd-group>
        <kwd>Information retrieval</kwd>
        <kwd>tweet contextualization</kwd>
        <kwd>summarization</kwd>
        <kwd>sentence extraction</kwd>
        <kwd>readability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Twitter is an online social network and microblogging that enables to send and read
text messages up to 140 characters [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In March 2013, the Twitter got more than 200
million active users how write more that 400 million tweet every day [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However,
tweets are quite short and they may contain information that is not understandable to a
user without some context. Therefore, providing concise coherent context seems to be
helpful. INEX Tweet Contextualization Track aims to evaluate systems providing
context to a tweet [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The context should be a readable summary up to 500 words
extracted from a dump of the Wikipedia from November 2012. This year two
languages were used: English and Spanish. English query set included 598 tweets in
English, while Spanish subtrack was based on 354 personal tweets in Spanish.
      </p>
      <p>
        The paper presents IRIT’s approach used at INEX Tweet Contextualization Track
2013. We consider tweet contextualization task as multi-document extractive
summarization. This year we further modified our approach presented at INEX 2011 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and
2012 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] underlain by the product of scores based on hashtag processing, TF-IDF
cosine similarity measure enriched by smoothing from local context and document
beginning, named entity (NE) recognition and part-of-speech (POS) weighting. We
assumed that relevant sentences come from relevant documents therefore we multiply
sentence score by document relevance. We also used generalized POS (e.g. we merge
regular adverbs, superlative and comparative into a single adverb group). We
introduced sentence quality measure based on Flesch reading ease test, lexical diversity,
meaningful word ratio and punctuation ratio.
      </p>
      <p>The paper is organized as follows. Firstly, we recall the principles of the
20112012 system we developed and describe the modifications we made. Then, we present
the results and discuss them. Future development description concludes the paper.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Method Description</title>
      <sec id="sec-2-1">
        <title>Preprocessing</title>
        <p>Preprocessing includes several steps.</p>
        <p>Firstly, we treat tweets themselves, i.e. special symbols like hashtags and replies.</p>
        <p>
          The hashtag symbol # “is used to mark keywords or topics in a Tweet. It was
created organically by Twitter users as a way to categorize messages” and facilitate a
search [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Hashtags are inserted before relevant keywords or phrases anywhere in
tweets. Popular hashtags often represents trending topics. Bearing it in mind, we put
higher weight to words occurring in hashtags. Usually key phrases are marked as a
single hashtag. Thus, we split hashtags by capitalized letters.
        </p>
        <p>
          Moreover, important information may be found in @replies, e.g. when a user reply
to the post of a politician or other famous person. “An @reply is any update posted by
clicking the "Reply" button on a Tweet” [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Since people may use their names as
Twitter accounts we treat them analogically to hashtags, i.e. they are split by
capitalized letters.
        </p>
        <p>
          We assume that relevant sentence come from relevant documents, so we applied a
search engine to find them. We use the tweet as a query. We choose the Terrier
platform [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], an open-source search engine developed by the School of Computing
Science, University of Glasgow. It implements various weighting and retrieval models
and allows stemming and blind relevance feedback. Terrier is suitable for different
languages including English and Spanish. We choose Porter stemmer [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] for the
English subtrack and Snowball stemmer [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] for the Spanish one.
        </p>
        <p>
          The next step is to parse tweets and retrieved texts. For the English subtrack we
applied Stanford CoreNLP which integrates such tools as POS tagger [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], named
entity recognizer [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], parser and the co-reference resolution system. It uses the Penn
Treebank tag set [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. For the Spanish subtask we integrated Tree Tagger [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] and
Apache OpenNLP [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Tree Tagger was used for lemmatization and POS tagging,
while sentence detector, named entity recognition were performed by OpenNLP.
        </p>
        <p>Then, we merged annotation obtained by parsers and Wikipedia tagging.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Searching for Relevant Sentences</title>
        <p>We modified the extraction component developed for INEX 2011-2012. The general
idea of the approach 2011 was to compute similarity between the query and sentences
and to retrieve the most similar passages.</p>
        <p>We model a sentence as a set of vectors. The first vector represents the tokens
occurred within the sentence (unigram representation). Tokens are associated with
lemmas. A lemma has the following features: POS, frequency and IDF. The second
vector corresponds to bigrams. In both vector representation stop-words are retrieved.
However, functional words, such as conjunctions, prepositions and determiners, are
not taken into account in the unigram representation. NE comparison is hypothesized
to be very efficient for contextualizing tweets about news. Therefore, the third vector
refers to found named entities. Thereby, the same token may appear in several
vectors.</p>
        <p>For unigram and bigram vectors, we computed cosine, Jaccard and dice similarity
measures, between a sentence and a target tweet. NE vectors are treated in the
following way:
(1)
where is floating point parameter given by a user (by default it is equal to
1.0), is the number of NE appearing in both query and sentence,
is the number of NE appearing in the query.</p>
        <p>Each sentence has a set of attributes, e.g. which section it belongs to, whether it is a
title or header, whether it has personal verbs etc.</p>
        <p>We introduced an algorithm for smoothing from the local context. We assumed
that the importance of the context reduces as the distance increases. Thus, the nearest
sentences should produce more effect on the target sentence sense than others. For
sentences with the distance greater than k this coefficient was zero. The total of all
weights should be equal to one. The system allows taking into account k neighboring
sentences with the weights depending on their remoteness from the target sentence.</p>
        <p>Moreover, this year we added smoothing from document beginning. Wikipedia
abstracts contain the summary of the entire paper; therefore they can be also used for
smoothing.</p>
        <p>
          In 2013, we did not applied anaphora resolution since it did not improve much our
system according to evaluation in 2012 [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Neither we used sentence reordering as it
was not evaluated.
        </p>
        <p>We assumed that relevant sentences come from relevant documents therefore we
multiply sentence score by document relevance or/and by inverted document rank.
We tried to use generalized POS (e.g. we merge regular adverbs, superlative and
comparative into a single adverb group).
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Improving Readability</title>
        <p>
          We introduced sentence quality measure based on the product of the Flesch reading
ease test [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], lexical diversity, meaningful word ratio and punctuation score.
        </p>
        <p>Flesch Reading Ease test is a readability test designed to indicate comprehension
difficulty when reading a passage (higher scores corresponds to texts that are easier to
read):</p>
        <p>We defined lexical diversity as the number of different lemmas used within a
sentence divided by the total number of tokens in this sentence.</p>
        <p>Analogically, meaningful word ration is the number of non-stop words within a
sentence divided by the total number of tokens in this sentence.</p>
        <p>Punctuation score is estimated by the formula:
(2)
(3)</p>
        <p>In order to treat redundancy each sentence was mapped into a noun set. These sets
were compared pairwise and if the normalized intersection was greater than a
predefined threshold the sentences were rejected.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Evaluation</title>
      <p>
        Summaries in English were evaluated according to their informativeness and
readability [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Informativeness was estimated as the overlap of a summary with 3 pools of
relevant passages:
1. Prior set (PRIOR) of relevant pages selected by organizers. PRIOR included 40
tweets, i.e. 380 passages or 11 523 tokens.
2. Pool selection (POOL) of most relevant passages from participant submissions for
45 selected tweets. POOL contained 1 760 passages, i.e. 58 035 tokens.
3. All relevant texts (ALL) merged together with extra passages from a random pool
of 10 tweets. ALL is based on 70 tweets having 2 378 relevant passages of 77 043
tokens.
      </p>
      <p>As in previous years, the lexical overlap between a summary and a pool was
estimated in three terms: Unigrams, Bigrams and Skip bigrams representing the
proportion of shared unigrams, bigrams and bigrams with gaps of two tokens respectively.
Official ranking was based on decreasing score of divergence with ALL estimated by
skip bigrams.</p>
      <p>At the English subtrack we submitted 3 runs differing by sentence quality score
and smoothing.</p>
      <p>Our best run 275 was ranked first, second and third over 24 runs submitted by all
participants on the PRIOR, POOL and ALL respectively (see Table 1; IRIT’s runs are
set off in bold). It means that our best run is composed from the sentence of the most
relevant documents. Among automatic runs our method was classified first (PRIOR
and POOL) and second (ALL): the run 256 is marked as manual. It is also obvious
that ranking is sensitive to not only pool selection, but also choice of divergence.
According to bigrams and skip bigrams our best run is 275, while according to
unigrams the best run is 273. We can also see than the runs 273 and 274 are quite close.
In the run 273 each sentence is smoothed by its local context and first sentences from
Wikipedia article which it is taken from. The run 274 has the same parameters except
it does not have any smoothing. So, we can conclude that smoothing improves
Informativeness. In our best run 275 punctuation score is not taken into account, it has
slightly different formula for NE comparison and no penalization for numbers.</p>
      <p>Readability was estimated as mean average scores per summary over soundness
(no unresolved anaphora), non-redundancy and syntactical correctness among
relevant passages of the ten tweets having the largest text references. According to all
metrics except redundancy our approach was the best among all participants (see
Table 2; IRIT’s runs are set off in bold). Runs were officially ranked according to
mean average scores. Readability evaluation also showed that the run 275 is the best
by relevance, soundness and syntax. However, the run 274 is much better in terms of
avoiding redundant information. The runs 273 and 274 are close according readability
assessment as well.
k
n
a
R
n
u</p>
      <p>R
256
258
275
273
This year we further developed our approach firstly introduced at INEX 2011 which
is based on hashtag processing, TF-IDF cosine similarity measure enriched by
smoothing from local context and document beginning, named entity recognition and
part-of-speech weighting. We enriched our method by sentence quality measure based
on Flesch reading ease test, lexical diversity, meaningful word ratio and punctuation
ratio. We also used generalized POS (e.g. we merge regular adverbs, superlative and
comparative into a single adverb group). Sentence score depends on document
relevance and sentence type.</p>
      <p>We submitted 3 runs in English differing by sentence quality score and smoothing
and 1 run in Spanish.</p>
      <p>Our approach was ranked first, second and third over 24 runs submitted by all
participants on the PRIOR, POOL and ALL respectively. Among automatic runs our
method was classified first (PRIOR and POOL) and second (ALL).</p>
      <p>Readability was estimated as mean average scores per summary over resolved
anaphora, non-redundancy and syntactical correctness among relevant passages of the
ten tweets having the largest text references. According to all metrics except
redundancy our approach was the best.</p>
      <p>In future we plan to automatize parameter selection by machine learning methods.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Boyd</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Golder</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lotan</surname>
          </string-name>
          , G.:
          <article-title>Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter</article-title>
          .
          <source>Proceedings of the 2010 43rd Hawaii International Conference on System Sciences</source>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . IEEE Computer Society (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Celebrating #Twitter7 | Twitter Blog, https://blog.twitter.com/
          <year>2013</year>
          /celebratingtwitter7.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>3. INEX 2013 Tweet Contextualization Track, https://inex.mmci.unisaarland.de/tracks/qa/.</mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ermakova</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mothe</surname>
          </string-name>
          , J.: IRIT at INEX:
          <article-title>Question Answering Task</article-title>
          .
          <source>Focused Retrieval of Content and Structure</source>
          . pp.
          <fpage>219</fpage>
          -
          <lpage>226</lpage>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ermakova</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mothe</surname>
          </string-name>
          , J.: IRIT at INEX 2012:
          <article-title>Tweet Contextualization</article-title>
          , http://www.clef-initiative.eu/documents/71612/3e9ecc64-fae6
          <string-name>
            <surname>-</surname>
          </string-name>
          4af3-93fd1a6a6fabb5d6, (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Twitter</given-names>
            <surname>Help Center | What Are Hashtags</surname>
          </string-name>
          (&amp;quot;#&amp;quot; Symbols)?, https://support.twitter.com/articles/49309-what
          <article-title>-are-hashtags-symbols.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Twitter Help Center |
          <article-title>What are @Replies and Mentions?</article-title>
          , https://support.twitter.com/groups/31-twitter-basics/topics/109-tweetsmessages/articles/14023-what
          <article-title>-are-replies-and-mentions.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ounis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amati</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plachouras</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lioma</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Terrier: A High Performance and Scalable Information Retrieval Platform</article-title>
          .
          <source>Proceedings of ACM SIGIR'06 Workshop on Open Source Information Retrieval (OSIR</source>
          <year>2006</year>
          ). , Seattle, Washington, USA (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Porter</surname>
            ,
            <given-names>M.F.</given-names>
          </string-name>
          :
          <article-title>An algorithm for suffix stripping. Readings in information retrieval</article-title>
          . Morgan Kaufmann Publishers Inc., San Francisco (
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Snowball</surname>
          </string-name>
          , http://snowball.tartarus.org/.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singer</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Feature-rich part-ofspeech tagging with a cyclic dependency network</article-title>
          .
          <source>Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1</source>
          . pp.
          <fpage>173</fpage>
          -
          <lpage>180</lpage>
          . Association for Computational Linguistics, Stroudsburg, PA, USA (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Finkel</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grenager</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Incorporating non-local information into information extraction systems by Gibbs sampling</article-title>
          .
          <source>Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics</source>
          . pp.
          <fpage>363</fpage>
          -
          <lpage>370</lpage>
          . Association for Computational Linguistics, Stroudsburg, PA, USA (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Marcus</surname>
            ,
            <given-names>M.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santorini</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marcinkiewicz</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>Building a large annotated corpus of English: the Penn Treebank, (</article-title>
          <year>1993</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Schmid</surname>
          </string-name>
          , H.:
          <article-title>Probabilistic Part-of-Speech Tagging Using Decision Trees</article-title>
          .
          <source>Proceedings of the International Conference on New Methods in Language Processing</source>
          . , Manchester, UK (
          <year>1994</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Apache</surname>
          </string-name>
          OpenNLP - Welcome to Apache OpenNLP, http://opennlp.apache.org/index.html.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Flesch</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>A new readability yardstick</article-title>
          .
          <source>Journal of Applied Psychology</source>
          .
          <volume>32</volume>
          ,
          <fpage>p221</fpage>
          -
          <lpage>233</lpage>
          (
          <year>1948</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>