<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Relevancer to Detect Relevant Tweets: The Nepal Earthquake Case</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ali Hürriyeto gˇlu</string-name>
          <email>a.hurriyetoglu@let.ru.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antal van den Bosch</string-name>
          <email>a.vandenbosch@let.ru.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nelleke Oostdijk</string-name>
          <email>n.oostdijk@let.ru.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre for Language Studies, Radboud University</institution>
          ,
          <addr-line>P.O. Box 9103, NL-6500 HD, Nijmegen</addr-line>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>In this working note we describe our submission to the
FIRE 2016 Microblog track Information Extraction from
Microblogs Posted during Disasters [1]. The task in this
track was to extract all relevant tweets pertaining to seven
given topics from a set of tweets. The tweet set was collected
using key terms related to the Nepal Earthquake1.</p>
      <p>
        Our submission is based on a semi-automatic approach
in which we used Relevancer, a complete analysis pipeline
designed for analyzing a tweet collection. The main
analysis steps supported by Relevancer are (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) preprocessing the
tweets, (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) clustering them, (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) manually labeling the
coherent clusters, and (4) creating a classi er that can be used for
classifying tweets that are not placed in any coherent
cluster, and for classifying new (i.e. previously unseen) tweets
using the labels de ned in step (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ).
      </p>
      <p>The data and the system are described in more detail in
Sections 2 and 3, respectively.</p>
    </sec>
    <sec id="sec-2">
      <title>DATA</title>
      <p>At the time of download (August 3, 2016), 49,660 tweet
IDs were available out of the 50,068 tweet IDs provided for
this task. The missing tweets had been deleted by the
people who originally posted them. We used only the English
tweets, 48,679 tweets in all, based on the language tag
provided by the Twitter API. Tweets in this data set were
already deduplicated by the task organisation team as much
as possible.</p>
      <p>The nal tweet collection contains tweets that were posted
between April 25, 2015 and May 10, 2015. The daily
distribution of the tweets is visualized in Figure 1.</p>
    </sec>
    <sec id="sec-3">
      <title>SYSTEM OVERVIEW</title>
      <p>The typical analysis steps of the Relevancer were applied
to the data provided for this task. The current focus of
the Relevancer tool is the text and the date of posting of
a tweet. Relevancer aims at discovering and distinguishing
between the di erent topically coherent information threads
in a tweet collection[3, 2]. Tweets are clustered such that
each cluster represents an information thread and the
clusters can be used to train a classi er.</p>
      <p>Each step of the analysis process is described in some
detail in the following subsections2.
1https://en.wikipedia.org/wiki/April 2015 Nepal
earthquake
2See http://relevancer.science.ru.nl and https://bitbucket.
1000
800
tnu 600
o
c
t
teew 400
200</p>
      <p>date
3.1</p>
    </sec>
    <sec id="sec-4">
      <title>Normalisation</title>
      <p>Normalisation starts with converting user names and URLs
that occur in the tweet text to the dummy values `usrusrusr'
and `urlurlurl' respectively.</p>
      <p>After inspection of the data, we decided to normalise a
number of phenomena. First, we removed certain
automatically generated parts at the beginning and at the end of a
tweet text. We determined these manually, e.g. `live
updates:', `I posted 10 photos on Facebook in the album' and
`via usrusrusr'. After that, words that end in `. . . ' were
removed as well. These words are mostly incomplete due
to the length restriction of a tweet text, and are usually at
the end of tweets generated from within another application.
Also, we eliminated any consecutive duplication of a token.
Duplication of tokens mostly occurs with the dummy forms
for user names and urls, and event-related key words and
entities. For instance, two of three consecutive tokens at
the beginning of the tweet #nepal: nepal: nepal earthquake:
main language groups (10 may 2015) urlurlurl
#crisismanagement were removed in this last step of normalization.
This last step facilitates the process of identifying the
actual content of the tweet text.
3.2</p>
    </sec>
    <sec id="sec-5">
      <title>Clustering and labeling</title>
      <p>The clustering step aims at nding topically coherent groups
of tweets that we call information threads. These groups are
org/hurrial/relevancer for further details.
labeled as relevant, irrelevant, or incoherent. Coherent
clusters were selected from the output of the clustering
algorithm K-Means3, with k = 200, i.e. a preset number of 200
clusters. Coherency of a cluster is calculated based on the
distance between the tweets in a particular cluster and the
cluster center. Tweets that are in incoherent clusters (as
determined by the algorithm) were clustered again by
relaxing the coherency restrictions until the algorithm reaches
the requested number of coherent clusters. The second stop
criterion for the algorithm is the limit of the coherency
parameter relaxation.</p>
      <p>The coherent clusters were extended with the tweets that
are not in any coherent cluster. This step was performed
by iterating all coherent clusters in descending order of the
total length of the tweets in a cluster and adding tweets
that have a cosine similarity higher than 0.85 with respect
to the center of a cluster to that respective cluster. The
total number of tweets that were transferred to the clusters
this way was 847.</p>
      <p>As Relevancer takes dates of posts as relevant
information, the tool rst searches for coherent clusters of tweets
in each day separately. Then, in a second step it clusters
all tweets from all days that previously were not placed in
any coherent cluster. Applying the two steps sequentially
enables Relevancer to detect local and global information
threads as coherent clusters respectively.</p>
      <p>For each cluster thus identi ed, a list of tweets is presented
to an expert who then determines which are the relevant and
irrelevant clusters4. Clusters that contain both relevant and
irrelevant tweets are labeled as incoherent by the expert5.
Relevant clusters are those which an expert considers to be
relevant for the aim she wants to achieve. In the present
context more speci cally, clusters that are about a topic
speci ed as relevant by the task organisation team should
be labeled as relevant. Any other coherent cluster should be
labeled as irrelevant.
3.3</p>
    </sec>
    <sec id="sec-6">
      <title>Creating the classifier</title>
      <p>The classi er was trained with the tweets labeled as
relevant or irrelevant in the previous step. Tweets in the
incoherent clusters were not used. The Naive Bayes method
was used to train the classi er.</p>
      <p>We used a small set of stop words. These are a small set
of key words (nouns), viz. nepal, earthquake, quake,
kathmandu and their hashtag versions6, determiners the, a, an,
conjunctions and, or, prepositions to, of, from, with, in, on,
for, at, by, about, under, above, after, before, and the news
related words breaking and news and their hashtag versions.
The normalized forms of the user names and URLs usrusrusr
and urlurlurl are included in the stop word list as well.</p>
      <p>We optimized the smoothing prior parameter to be 0.31
by cross-validation, comparing the classi er performance with
equally separated 20 values of between 0 and 2. Word
un3We used scikit-learn v0.17.1 for all machine learning tasks
in this study http://scikit-learn.org.
4The rst author of this working note had the role of being
the expert for this task. A real scenario would require a
domain expert.
5Although the algorithmic approach determines the clusters
that were returned as coherent, the expert may not agree
with it.
6This set was based on our observation as we did not have
access to the key words that were used to collect this data
set.
igrams and bigrams were used as features. The performance
of the classi er on a 15% held-out data is provided below in
Tables 1 and 2 7.</p>
      <sec id="sec-6-1">
        <title>Irrelevant Relevant</title>
      </sec>
      <sec id="sec-6-2">
        <title>Irrelevant</title>
      </sec>
      <sec id="sec-6-3">
        <title>Relevant</title>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Clustering and labeling relevant tweets</title>
      <p>Relevant tweets, as predicted by the automatic classi er,
were clustered without ltering them based on the coherency
criteria. In contrast to the rst clustering step, the output
of K-means was used as is, again with k = 200. These
clusters were annotated using the seven topics as predetermined
by the task. To the extent possible, incoherent clusters were
labeled using the closest provided topic. Otherwise, the
cluster was labeled as irrelevant.</p>
      <p>The clusters that have a topic label contain 8,654 tweets.
Since the remaining clusters, containing 2,646 tweets, were
evaluated as irrelevant, they were not included in the
submitted set.
4.</p>
    </sec>
    <sec id="sec-8">
      <title>RESULTS</title>
      <p>The result of our submission was recorded under the ID
relevancer ru nl. The performance of our results was
evaluated by the organisation committee at ranks 20, 1,000, and
all, considering the tweets retrieved in the respective ranks.
As announced by the organisation committee, our results are
as follows: 0.3143 precision at rank 20, 0.1329 and 0.0319
recall and Mean Average Precision (MAP) at rank 1,000
respectively, and 0.0406 MAP considering all tweets in our
submitted results.</p>
      <p>We generated an additional calculation for our results
based on the annotated tweets provided by task organizers.
The overall precision and recall are 0.081 and 0.34
respectively. The performance for the topics FMT1 (available
resources), FMT2 (required resources), FMT3 (available
medical resources), FMT4 (required medical resources), FMT5
7Since we optimize the classi er for this collection, the
performance of the classi er on unseen data is not relevant here.
(resource availability at certain locations), FMT6 (NGO and
governmental organization activities), and FMT7
(infrastructure damage and restoration reports) is provided in the
Table 3.
percentage</p>
    </sec>
    <sec id="sec-9">
      <title>REFERENCES</title>
      <p>FMT1
FMT2
FMT3
FMT4
FMT5
FMT6
FMT7
0.17
0.35
0.19
0.06
0.05
0.05
0.25</p>
      <p>On the basis of these results, we conclude that the success
of our method di ers drastically across topics. In Table 3,
we observe that there is a clear relation between the F1-score
and the percentage of the tweets per topic in the manually
annotated data. Consequently, we conclude that our method
performs better in case the topic is presented well in the
collection.</p>
    </sec>
    <sec id="sec-10">
      <title>CONCLUSION</title>
      <p>In this study we applied the methodology supported by
the Relevancer system in order to identify relevant
information by enabling human input in terms of cluster labels. This
method has yielded an average performance in comparison
to other participating systems.</p>
      <p>We observed that clustering tweets for each day separately
enabled the unsupervised clustering algorithm to identify
speci c coherent clusters in a shorter time than the time
spent on clustering the whole set. Moreover, this setting
provided an overview that realistically changes each day, for
each day following the day of the earthquake.</p>
      <p>Our approach is optimized to incorporate human input.
In principle, an expert should be able to re ne a tweet
collection until she reaches a point where the time spent on a task
is optimal and the performance is su cient. However, with
this particular task, an annotation manual was not available
and the expert had to stop after one iteration without being
sure to what extent certain information threads were
actually relevant to the task at hand; for example, are (clusters
of) tweets pertaining to providing or collecting funds for the
disaster victims considered to be relevant or not.</p>
      <p>It is important to note that the Relevancer system yields
the results in random order, as it has no ranking mechanism
that ranks posts for relative importance. We speculate that
rank-based performance metrics are not optimally suited for
evaluating it.</p>
      <p>In our future work we will aim to increase the precision
and diminish the performance di erences across topics,
possibly by downsampling or upsampling methods to tackle
class imbalance.</p>
    </sec>
    <sec id="sec-11">
      <title>ACKNOWLEDGEMENTS</title>
      <p>This research was funded by the Dutch national research
programme COMMIT.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          .
          <article-title>Overview of the FIRE 2016 Microblog track: Information Extraction from Microblogs Posted during Disasters</article-title>
          .
          <source>In Working notes of FIRE 2016 - Forum for Information Retrieval Evaluation</source>
          , Kolkata, India, December 7-
          <issue>10</issue>
          ,
          <year>2016</year>
          ,
          <string-name>
            <given-names>CEUR</given-names>
            <surname>Workshop</surname>
          </string-name>
          <article-title>Proceedings</article-title>
          . CEUR-WS.org,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hu</surname>
          </string-name>
          rriyetoglu, C. Gudehus,
          <string-name>
            <given-names>N.</given-names>
            <surname>Oostdijk</surname>
          </string-name>
          , and A. van den Bosch. Relevancer:
          <article-title>Finding and labeling relevant information in tweet collections</article-title>
          . In E. Spiro and Y.-Y. Ahn, editors,
          <source>Social Informatics</source>
          , volume
          <volume>10046</volume>
          . Springer International Publishing,
          <year>November 2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hu</surname>
          </string-name>
          <article-title>rriyetoglu, A. van den Bosch, and</article-title>
          <string-name>
            <given-names>N.</given-names>
            <surname>Oostdijk</surname>
          </string-name>
          .
          <article-title>Analysing role of key term in ections in knowledge discovery on twitter</article-title>
          .
          <source>In International Workshop on Knowledge Discovery on the Web</source>
          , Cagliari, Italy,
          <year>September 2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>