<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>USI Participation at SMERP 2017 Text Retrieval Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anastasia Giachanou</string-name>
          <email>anastasia.giachanou@usi.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ida Mele</string-name>
          <email>ida.mele@usi.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Crestani</string-name>
          <email>fabio.crestani@usi.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Informatics, Universita della Svizzera italiana (USI)</institution>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <abstract>
        <p>This report describes the participation of the Universita della Svizzera italiana (USI) at the SMERP Workshop Data Challenge Track for the text retrieval task for both Level 1 and Level 2. For this task, we propose a methodology based on query expansion and boolean expressions. For Level 1, we submitted two di erent methods based on query expansion, where queries were expanded using terms mined from an earthquake-related collection of tweets. In this way, we managed to extract useful expansion terms for each query. In addition to the query expansion, we tried to improve the quality of the retrieved results by incorporating Part-Of-Speech tags. For Level 2, we additionally used information from the partial ground truth that was provided by the organizers in relation to our submitted runs on Level 1. The results showed that our query expansion method had the highest performance in terms of MAP and precision on both levels. In addition, we managed to achieve the second best performance on Level 1 among the submitted semi-supervised approaches in terms of bpref metric.</p>
      </abstract>
      <kwd-group>
        <kwd>Twitter</kwd>
        <kwd>emergency situations</kwd>
        <kwd>text retrieval</kwd>
        <kwd>query expansion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The advent of social media has changed the way in which people communicate
and exchange information during emergency situations. A large number of user
generated data is posted online during emergencies (e.g., earthquake, hurricane)
with the aim to share information or assist relief operations [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. For example,
in case of an earthquake people post information about resource-distribution
centers (i.e., where people can nd shelters or pick up food), or emergency call
numbers, and money-donation campaigns. However, the amount of posted data
is very large and therefore e ective methodologies are needed to help people
extract content relevant to their information needs.
      </p>
      <p>
        One of the most well known microblogs used to share information on
emergencies is Twitter1. A large number of researchers have used Twitter to address
di erent problems that range from microblog retrieval [
        <xref ref-type="bibr" rid="ref2 ref9">2, 9</xref>
        ] and tweet
recommendation [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to sentiment analysis [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ] and from irony detection [
        <xref ref-type="bibr" rid="ref12 ref4">4, 12</xref>
        ] to sentiment
      </p>
    </sec>
    <sec id="sec-2">
      <title>1 http://twitter.com/</title>
      <p>
        dynamics [
        <xref ref-type="bibr" rid="ref3 ref5">3, 5</xref>
        ]. Extracting useful and relevant information from Twitter is very
challenging since tweets are very short and contain a lot of abbreviations and
slang language. Expanding the query with more relevant terms is an e ective
way to address the vocabulary mismatch problem that is mainly caused by their
short length [
        <xref ref-type="bibr" rid="ref11 ref8">8, 11</xref>
        ].
      </p>
      <p>In this report, we describe our participation for the text retrieval task at
Exploitation of Social Media for Emergency Relief and Preparedness (SMERP)
Data Challenge Track. The evaluation campaign proposed two di erent tasks,
text retrieval and text summarization. In this report we present our
methodologies on the text retrieval task for both Level 1 (tweets posted the rst day of
earthquake) and Level 2 (tweets posted the second day of earthquake). To
address the text retrieval task, we propose to expand the initial query with relevant
terms and form boolean expressions for each of the provided queries.</p>
      <p>For Level 1, we submitted two di erent methods based on query expansion,
where queries were expanded using terms mined from an earthquake-related
collection of tweets. In this way, we managed to extract useful expansion terms
for each query. The terms were then manually selected in order to create a subset
of terms to use in the query expansion. In addition to the query expansion,
we tried to improve the quality of the retrieved results by incorporating
PartOf-Speech (POS) tags. For Level 2, organizers provided us with information
about which tweets submitted in our runs for Level 1 were actually relevant.
In other words, we were provided with a ground truth for the tweets retrieved
with our submitted runs. To this end, for Level 2 we expanded each query using
information about the relevant tweets from Level 1. We also tried to further
improve the performance using a classi er and information from POS tags.</p>
      <p>The results showed that plain query expansion is more e ective than
incorporating information from POS tags. We also noticed that the query expansion
method managed to obtain the highest performance in terms of MAP and
precision on both levels. These measures are two of the most well known performance
measures for evaluation of information-retrieval methods. In addition, we
managed to achieve the second best performance for the text retrieval task (Level 1)
among the submitted semi-supervised approaches in terms of bpref metric, the
o cial performance measure used by the organizers for the nal ranking of the
participants.</p>
      <p>This report is organized as follows. In Section 2 we present in detail our
methodology for the task of text retrieval. In Section 3 we present and discuss
the results of our experiments, whereas Section 4 concludes our participation in
text retrieval task.
2</p>
      <sec id="sec-2-1">
        <title>Methodology</title>
        <p>In this section rst we brie y present the task of text retrieval and the provided
queries/topics. Then, we present our methodology for the text retrieval task for
both Level 1 and Level 2.</p>
        <sec id="sec-2-1-1">
          <title>The Text Retrieval Task</title>
          <p>For the text retrieval task the organizers released a large collection of tweets
that were posted on Twitter during the earthquake in Italy in August 2016. The
text retrieval task was divided in two di erent phases/levels. For Level 1 the
organizers released a collection of 52,469 tweets that were posted on the rst
day of the earthquake. For Level 2 the organizers released 19,751 tweets posted
on the second and third day. In addition, the organizers provided information on
which tweets among the ones we submitted in our runs for Level 1 were actually
relevant. This information could be used for the submissions of Level 2.</p>
          <p>Besides data, the organizers gave us four di erent topics representing di erent
information needs. The aim was to retrieve the relevant tweets for each provided
topic. A brief description of the topics is the following:
1. SMERP-T1: Identify the messages which describe the availability of some
resources.
2. SMERP-T2: Identify the messages which describe the requirement or need
of some resources.
3. SMERP-T3: Identify the messages which contain information related to
infrastructure damage, restoration, and casualties.
4. SMERP-T4: Identify the messages which describe on-ground rescue activities
of di erent NGOs and Government organizations.
2.2</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Text Retrieval on the First Day (Level 1)</title>
          <p>The task of text retrieval consists in retrieving the relevant tweets for four
different queries (topics). For Level 1, we expanded each query with terms that
were selected from another collection containing the tweets posted during Nepal
earthquake that occurred on the 25th of April 2015. To be more speci c, the
collection contained 90,000 tweets posted from the 1st to the 5th of May 2015.</p>
          <p>One issue when using a collection related to a di erent but similar event (both
events are earthquakes, but one occurred in Nepal and the other one in Italy)
is that there could be terms speci c to the country (e.g., names of locations,
people). Hence, we aimed at creating a general collection about earthquakes
by using the tweets posted during the earthquake in Nepal and removing all
terms related to Nepal. To do so, we rst removed URLs and some speci c
characters (e.g., @, #), then we extracted the entities from the Nepal collection
(e.g., Kathmandu, Mahadevstan, Rahul Gandhi) and lter them out. The last
step consisted in removing the retweets. At the end of this cleaning process, we
had 22,017 tweets, 198,280 tokens, and 12,379 unique tokens.</p>
          <p>After cleaning the Nepal collection, we got a collection that is made of general
tweets about earthquake and could be used to learn the representative
terminology used when an earthquake occurs. We will refer to this collection as Ce.
Since we did not have any training data for Level 1, we decided to follow a
semiautomatic method where useful terms for expanding the queries were extracted
as follows:
1. For each query, we retrieved tweets from Ce. These tweets were retrieved
using the terms that appear on the query's description. For the rst two
queries we also included some terms related to means of transportation such
as helicopter, airplane, train, car, truck, bus, and plane.
2. Given the tweets retrieved for each query, we calculated the T F IDF of
their single terms, bigrams, and trigrams.
3. We manually selected some verb phrases and noun phrases for each query
which were either synonyms or additional terms that complemented the
description of the query.</p>
          <p>At this stage, we had a list of expansion terms and phrases for each query.
We submitted two runs, and for both of them our methodology was based on
the combination of query expansion (QE) and boolean conjunctions of two
different phrases (P h1 AND P h2). Regarding the two rst queries (SMERP-T1
&amp; SMERP-T2), P h1 consisted of two lists of candidate phrases that described
the availability (for SMERP-T1) or the requirement (for SMERP-T2) of the
resources, whereas P h2 was the same for both queries and referred to the di erent
resources available/requested. For SMERP-T3, P h1 included phrases that
described damage or restoration, whereas P h2 referred to keywords that described
the infrastructure. Finally, for SMERP-T4 we combined keywords that showed
rescue and relief activities (P h1) with phrases that referred to Non-Governmental
Organizations (NGOs) (P h2). To learn the NGOs we used a method based on
Kullback-Leibler divergence that is described in Section 2.4.</p>
          <p>For our rst run (USI 1 1) we used boolean queries and we did not consider
the POS of the di erent phrases. This method was expected to retrieve a lot of
the relevant tweets but with low precision.</p>
          <p>For our second run (USI 1 2) we used the POS tags and forced P h1 to be
a verb phrase and P h2 to be a noun phrase. The NLTK toolkit2 was used for
the POS tagging. However, SMERP-T1 &amp; SMERP-T2 were very similar and
required additional information to di erentiate keywords that might be relevant
for both of them. For example, consider the following two tweets: \People
donated quite a bit of money to help the victims," \Consider to help by donating
money" they have a signi cant overlap of keywords, however, the rst tweet is
more relevant to SMERP-T1 and the second one to SMERP-T2. Therefore, for
the rst two queries in USI 1 2 we additionally di erentiated the queries based
on speci c POS tags. We considered that only the following POS tags were
useful to show announcement or availability of resources or of donations: the past
tense (VBD), present participle (be + VBG), future tense (will + VB), present
tense (PRP + VB), or past participle (VBN). The verbs that appeared in any of
these forms were useful for SMERP-T1. For SMERP-T2 we considered that the
verbs raise, donate, give had to be in the base form (VB), whereas for the rest
of the verbs we did not make any di erentiation (they can be in any verb form).
Finally, as explained earlier, for SMERP-T3 &amp; SMERP-T4, we considered that
the keywords of P h1 are only verbs.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2 http://www.nltk.org/</title>
      <p>In Table 1 we report the summary of the two submitted runs for the task of
text retrieval (Level 1).
For Level 2 we applied a similar methodology adopted for Level 1 with the
di erence that instead of using the external collection (the one about Nepal
earthquake ltered by Nepal's entities), we expanded the queries with terms
extracted from the relevant tweets of the rst day of the SMERP collection.
Such tweets were annotated as relevant from SMERP organizers and released
after Level 1 was completed.</p>
      <p>We expected that this would improve the results of our runs because the
Nepal collection, despite our ltering based on entities speci c of Nepal, may
contain contry-speci c terms that can be noisy. Similar to the methodology,
adopted for Level 1, we decided to manually select the expansion terms. Hence,
our methods are characterized as semi-automatic.</p>
      <p>We submitted three di erent runs. Similar to Level 1, the rst run (USI 2 1)
was based on the combination of query expansion and boolean conjunctions
of two di erent phrases (P h1 AND P h2). Regarding SMERP-T1 &amp;
SMERPT2, the rst phase (P h1) consisted of two lists of candidate phrases related
to the availability (for SMERP-T1) or the requirement (for SMERP-T2) of the
resources, while the second phase (P h2) was the same for both queries and refers
to the di erent resources available/requested. For SMERP-T3, P h1 included
phrases that describe damage or restoration, whereas P h2 referred to keywords
that describe infrastructure. Concerning SMERP-T4, we used keywords related
to rescue and relief activities (P h1) together with phrases that refer to NGOs
(P h2).</p>
      <p>In the rst run (USI 2 1) we did not consider POS tags. For example, we
did not di erentiate between the terms donation and donate. This approach is
similar to methodologies based on term stemming. We expected that this method
would retrieve a lot of relevant tweets, but its precision would be low.</p>
      <p>As already mentioned, one of the main challenges for the text retrieval task
was that SMERP-T1 &amp; SMERP-T2 were very similar and additional information
was required to di erentiate keywords that might be relevant for both of them.
We submitted two additional runs in the attempt to address this problem. For
the second run (USI 2 2) we built a binary classi er for each of the four topics
that were trained to di erentiate between relevant and non-relevant tweets. We
used a Nave Bayes classi er that was trained on unigrams, bigrams, and POS
tags. Also, we used the same number of training data for the two classes in the
training phase.</p>
      <p>For the third run (USI 2 3), we leveraged POS tags at query time. For
SMERP-T1, we assumed that only the following POS tags were useful to show
announcement or availability of a resource or a donation: (1) the present tense
for the verbs provide, send, o er, (2) the present participle for the verbs send,
o er, gather, collect, raise, and (3) the past participle for the verbs donate, raise,
collect. For SMERP-T2, we considered that the verbs raise, donate had to be
in the base form, the verbs appeal, ask in present participle whereas the verbs
require, need in past participle form. Finally, for the topic SMERP-T4 a list of
relevant NGOs was required. For our runs on Level 1, we had created an initial
list of NGOs using the Nepal collection. For Level 2, we used this initial list but
we kept only the NGOs that also appeared in the relevant tweets for SMERP-T4
(annotated as relevant from SMERP organizers).</p>
      <p>For the text retrieval task of Level 2, we submitted three runs. Table 2 shows
the summary of the submitted runs.</p>
      <p>Run id Task Description of the run
USI 2 1 Retrieval QE
USI 2 2 Retrieval QE + classi er</p>
      <p>
        USI 2 3 Retrieval QE + POS-on-query-terms
As already mentioned, regarding SMERP-T4, we additionally learned the
NonGovernmental Organizations (NGOs). To this end, we considered an initial query
that should be able to retrieve the tweets mentioning di erent NGOs. Such query
is a single-term query containing the term fdonateg. Then, we made the
assumption that users refer to the NGOs using their usernames (e.g., @crocerossa), so
we built a language model for the query and the collection using as tokens the
usernames (@username). We calculated Kullback-Leibler divergence (KLD) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
between the query language model Q and the collection language model C. We
expect that the usernames with high divergence are good indicators of an NGO.
Formally, let w be a word that refers to a username that appears in the
collection, and Q be the model of the query q (e.g., the query fdonateg), then we can
estimate the KLD of the username w as:
      </p>
      <p>KLD(w) = P (wjQ) log</p>
      <p>P (wjQ)</p>
      <p>P (wjC)
where P (wjC) is the probability of the username w in the collection and is
estimated as follows:
tf (w; C)</p>
      <p>P (wjC) = PD2C jDj
while P (wjQ) is the probability of a word in the query model Q and is estimated
as follows:
tf (w; Q)
P (wjQ) = P</p>
      <p>D2Q jDj
where D 2 Q are the documents relevant to the query q.</p>
      <p>We used smoothing to address the problem of zero-frequencies. We managed
to have a list of candidates where the higher is the KLD value and more likely
the candidate is one NGO's name. From this list, after we normalized the KLD
values, we kept the candidates with a value over 0.1. With this approach, we
could learn some NGOs (e.g., crocerossa, globalgiving). The nal query is a
boolean query in the form of P h1 AND P h2, where P h1 shows a rescue activity
and P h2 can be any of the extracted NGOs.
3</p>
      <sec id="sec-3-1">
        <title>Results and Discussion</title>
        <p>SMERP organizers used bpref metric as the o cial evaluation metric for
ranking the methodologies proposed in the text retrieval task. The bpref
measure is used when there are partial relevance judgments (i.e., just a subset of
the documents is annotated). It is de ned as the number of documents that
are labeled as not relevant and are ranked before those documents that are
labeled as relevant. The measure is called bpref because the preference relations
are binary. It is computed using the preference relation of whether judged
relevant documents are retrieved ahead of judged irrelevant documents. In terms
of bpref, USI 1 1 was ranked as the second best run among the semi-supervised
approaches. In general, we observe that USI 1 1 was better than USI 1 2,
showing that POS tags were not very e ective. However, at the time this report is
written, we do not have access to per topic performance and we can not do
further analysis.</p>
        <p>Table 4 shows the performance of the submitted runs for the text retrieval
task for Level 2 ranked according to MAP. Similar to the performance results
of Level 1, we had the highest scores in terms of MAP, precision and recall
whereas in terms of bpref we obtained lower performance. From the results we
Run id Description of the run MAP bpref Precision@20 Recall@1000
USI 1 1 QE 0.0789 (1st) 0.1899 0.5000 0.1825
USI 1 2 QE + POS-on-query-terms 0.0553 (2nd) 0.1063 0.6250 0.1063
can observe that combining query expansion with boolean expressions allows
to get the best scores among our submitted runs. In other words, the classi er
and the use of POS tags did not manage to improve the performance. For the
classi cation we had limited training data and we believe that this could be one
of the reasons of the poor performance. However, further analysis is required to
better understand the reasons why the classi er or the information from POS
tags did not allow to improve the performance of the query-expansion method.
In this report we presented the participation of the Universita della Svizzera
italiana (USI) at the SMERP Workshop Data Challenge Track for the task of
text retrieval and the two levels (Level 1 and Level 2). Our methodology was
based on query expansion and boolean expressions. For Level 1, we submitted
two di erent methods based on query expansion, where queries were expanded
using terms mined from an earthquake-related collection of tweets. In addition
to the query expansion, we tried to improve the quality of retrieved results by
incorporating POS tags. In addition, we submitted three di erent runs for Level
2 that were also based on query expansion and boolean expressions. For Level
2, we used information from the partial ground truth that was provided by the
organizers in relation to our submitted runs on Level 1.</p>
        <p>The results showed that our runs had the highest performance in terms of
MAP and precision, two metrics that are usually applied to evaluate the
performance of information retrieval systems. In addition, we managed to achieve the
second best performance in terms of bpref measure for the text retrieval task
among the submitted semi-supervised approaches of Level 1.</p>
        <p>Acknowledgments. This research was partially funded by the Swiss National
Science Foundation (SNSF) under the project OpiTrack.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Alawad</surname>
            ,
            <given-names>N.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anagnostopoulos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leonardi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mele</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silvestri</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Networkaware recommendations of novel tweets</article-title>
          .
          <source>In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval</source>
          ,
          <string-name>
            <surname>SIGIR</surname>
          </string-name>
          <year>2016</year>
          , Pisa, Italy,
          <source>July 17-21</source>
          ,
          <year>2016</year>
          . pp.
          <volume>913</volume>
          {
          <issue>916</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bandyopadhyay</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghosh</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Majumder</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mitra</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Query expansion for microblog retrieval</article-title>
          .
          <source>International Journal of Web Science</source>
          <volume>1</volume>
          (
          <issue>4</issue>
          ),
          <volume>368</volume>
          {
          <fpage>380</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bollen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pepe</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena</article-title>
          . vol.
          <volume>11</volume>
          , pp.
          <volume>450</volume>
          {
          <issue>453</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Far as,
          <string-name>
            <given-names>D.I.H.</given-names>
            ,
            <surname>Patti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Irony detection in twitter: The role of a ective content</article-title>
          .
          <source>ACM Trans. Internet Technol</source>
          .
          <volume>16</volume>
          (
          <issue>3</issue>
          ),
          <volume>19</volume>
          :1{
          <fpage>19</fpage>
          :
          <fpage>24</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Giachanou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Tracking sentiment by time series analysis</article-title>
          .
          <source>In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '16</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Go</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhayani</surname>
          </string-name>
          , R.:
          <article-title>Twitter sentiment analysis</article-title>
          .
          <source>Entropy</source>
          <volume>17</volume>
          ,
          <issue>252</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kouloumpis</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , Wilson,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Moore</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.D.</surname>
          </string-name>
          :
          <article-title>Twitter sentiment analysis: The good the bad and the omg</article-title>
          ! vol.
          <volume>11</volume>
          , p.
          <volume>164</volume>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lau</surname>
            ,
            <given-names>C.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tjondronegoro</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Microblog retrieval using topical features and query expansion</article-title>
          .
          <source>In: TREC</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Liang</surname>
          </string-name>
          , S., de Rijke, M.:
          <article-title>Burst-aware data fusion for microblog search</article-title>
          .
          <source>Information Processing &amp; Management</source>
          <volume>51</volume>
          (
          <issue>2</issue>
          ),
          <volume>89</volume>
          {
          <fpage>113</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          , Schutze, H.:
          <article-title>Foundations of Statistical Natural Language Processing</article-title>
          . MIT Press, Cambridge, MA, USA (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Massoudi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsagkias</surname>
          </string-name>
          , M.,
          <string-name>
            <surname>de Rijke</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weerkamp</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Incorporating query expansion and quality indicators in searching microblog posts</article-title>
          .
          <source>In: Proceedings of the 33rd European Conference on Advances in Information Retrieval</source>
          . pp.
          <volume>362</volume>
          {
          <fpage>367</fpage>
          . ECIR'
          <volume>11</volume>
          , Springer-Verlag, Berlin, Heidelberg (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Reyes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veale</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>A multidimensional approach for detecting irony in twitter</article-title>
          .
          <source>Language Resources and Evaluation</source>
          <volume>47</volume>
          (
          <issue>1</issue>
          ),
          <volume>239</volume>
          {
          <fpage>268</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lampert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cameron</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robinson</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Power</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Using social media to enhance emergency situation awareness</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          <volume>27</volume>
          (
          <issue>6</issue>
          ),
          <volume>52</volume>
          {
          <fpage>59</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>