Information Retrieval from Microblogs during Natural
                            Disasters

                      Roshni Chakraborty                                             Maitry Bhavsar
                   Indian Institute of Technology                            Indian Institute of Technology
                            Patna, Bihar                                              Patna, Bihar
                                India                                                     India
                    roshni.pcs15@iitp.ac.in                                bhavsar.mtcs15@iitp.ac.in

ABSTRACT                                                          and conclusion in section 5.
In this paper, we devise an information retrieval system
which can filter and rank tweets according to relevance to        2.     DATA COLLECTION AND
the query. We devise methods to understand relationships                 PRE-PROCESSING
among entities and action verbs from a small set of manually
                                                                    FIRE Microblog Track provided a set of about 50,000
annotated tweets. We further use these relationships to filter
                                                                  tweet-ids which we used to the access the tweets. We fil-
tweets and rank them accordingly. Our results (as published
                                                                  tered only the relevant tweet information from these tweets,
by FIRE Microblog Track) show that we have high precision
                                                                  that consist of tweet text, tweet id, etc. We further filtered
score in detection of topmost 20 tweets.
                                                                  some tweets from the whole set of tweets. For example,
                                                                  during disasters, there are a number of tweets that express
1.   INTRODUCTION                                                 grief, urge people to pray or help. These messages are gen-
   FIRE 2016 Microblog track [1] provided us with about           eral messages, hence we made a bag of words that express
50,000 tweets related to Nepal Earthquake of April 2015.          only urge, request, pray etc. and removed those tweets that
In this paper, we segregate tweets into different categories,     contain words from this bag.
namely, availability of resources, requirement of resources,
availability of medical resources and facilities, requirement     3.     METHODOLOGY
of medical resources and facilities and information related
                                                                     In this section, we discuss our procedure. We do not use
to infrastructure destruction or restoration. We devised a
                                                                  any external source of information. We use NLTK toolkit1
mechanism to learn text attributes of tweets to segregate
                                                                  to perform text based analysis on tweets. We rely on tweet
them into specific categories.
                                                                  text attributes to filter tweets of relevance. In order to
   We manually annotate a random sample of 1000 tweets
                                                                  understand the text attributes of tweets, we select a ran-
into specified categories of information, a tweet can also be-
                                                                  dom sample of 1000 tweets from the whole set of 50,000
long to multiple groups. For example, a tweet of destruc-
                                                                  tweets. We manually group tweets according to different
tion of a bridge also might convey information about the re-
                                                                  queries by FIRE, a tweet can nevertheless belong to differ-
quirement of basic amenities. Hence, different categories of
                                                                  ent categories.
tweets had different text attributes that pertain to a specific
                                                                     We perform a set of operations on tweet text for every
information related to that query. We aimed at identifying
                                                                  group (as specified before). Firstly, we remove the stopwords
those text attributes, i.e, combination of different words for
                                                                  from these tweets. Stopwords hardly represent any special
any particular query. We further created networks of each
                                                                  characteristic of an entity. After removal of the stop words,
query’s text attributes’ combinations. The edges represent
                                                                  we use POS Tagger to select only the nouns and verbs from
the interrelationships among these text attributes which aid
                                                                  the tweets. We then rank the entities of all tweets according
in segregation of tweets according to different queries. We
                                                                  to frequency. We select a subset of these entities according to
will describe the methodology in details in later sections.
                                                                  the ranks, we also include the entities specified in the query
   Tweets are informal, so a vocabulary gap exists even among
                                                                  itself by FIRE. This step gives us a list of the important
tweets of same strata. So, we did not depend only on text
                                                                  entities for a specific query.
analysis of named entities, like food packets but rather com-
                                                                     Often, an entity to entity matching fails to resolve tweets
bined them with the set of important verbs that identifies
                                                                  of different genre, i.e., a tweet containing information of
a correct relationship among those. We weighed the differ-
                                                                  medical aids can either highlight availability or requirement
ent identified keywords of each category according to their
                                                                  of the same. So, we identify the different set of possible
relevance to the query. We, thereby, could identify tweets
                                                                  actions of any entities, to understand the underlying rela-
due to their presence of relevant keywords for a query. The
                                                                  tionships. We further rank the bigrams to identify the set
published results from FIRE suggest we could accurately
                                                                  of working verbs to highlight specific actions. Thus, this
identify tweets of high relevance of different categories with
                                                                  set of related working verbs and entities signify tweets of a
good precision and recall.
                                                                  particular category.
   We have divided the paper into following sections. We
                                                                     However, there remains a vocabulary gap among different
discuss about data collection and pre-processing in the next
                                                                  tweets of even same category due to their informal structure.
section, followed by our procedure of identification of tweets
                                                                  1
in section 3 and finally results and discussion in section 4          www.nltk.org
                                                                    ually annotated tweets into similar groups. The main action
                                                                    verbs represent donations, transport, relief inf ormation,
                                                                    build. We represent the relationships between these different
                                                                    set of action verbs with different set of entities in the graphs
                                                                    1 and 2 and the set of keywords of each group in table 1.
                                                                    Thus a new tweet is selected if it contains the existing rela-
                                                                    tionship, as mentioned by the arrow, i.e., it must contain at
                                                                    least an entity and verb from the nodes the arrow connects.

                                                                                                     Words Representing
                                                                           Node Name
                                                                                                             the Nodes
                                                                           Green4                           off to nepal
                                                                                                        survivor, victim,
                                                                           Green5
                                                                                                                affect
                                                                                                       food, water, cloth,
                                                                                                        blanket, biscuit,
                                                                           Green6                      power, plane, bus,
                                                                                                         material, beef,
                                                                                                            equipment
                                                                                                     volunteer, helicopter,
                                                                           Green7
Figure 1: Graph Relationships of Resource Avail-                                                         item, tool, app
ability Information                                                        Green8                               team
                                                                                                      shelter, tent, house,
                                                                           Green9
                                                                                                                home
                                                                           Green1 0                        relief, rescue
                                                                           Blue1                               donate
                                                                                                          transfer, sell,
                                                                                                        distribut, suppl,
                                                                                                       send, sent, deliver,
                                                                           Blue2
                                                                                                     dispatch, offer, land,
                                                                                                       deploy, transport,
                                                                                                               prepar
                                                                                                          relief, rescue,
                                                                                                          working, aid,
                                                                           Blue3
                                                                                                        support, engage,
                                                                                                                rush
                                                                           Blue4                                build
Figure 2: Graph Relationships of Resource Require-                         Blue5                      need, want, require
ment Information
                                                                    Table 1: Word Dictionary of Resource Availability
                                                                    and Requirement Related Information
Tweets of both requirement and availability of medical re-
sources may contain entities, like blood and working verbs
like donate but are completely different in meaning. Hence,         3.1    Requirement of Resources
segregation only on the basis of keywords fails to differen-           In this section, we intend to filter all tweets that mention
tiate these relationships. We analyze the context of those          the requirement or need of some resource, like human re-
keywords relationships, which reflects the actual meaning,          sources or infrastructure like tents, water filter, power sup-
as in the absence of question tags (like, where, how, what,         ply, etc. We studied our manually annotated tweets, and
etc), or request tags (please, etc) in availability based tweets.   found the main action verbs that denote requirement of re-
The segregation of tweets into different categories thus re-        sources are, need related or relief related. We highlight the
quires identification of proper entities, actions, and context      different relationships among these various entities in figure
to understand it’s relevance.                                       3 and include details of the different terms in table 1. Thus,
   We further have ranked an entity and the action verbs            we later select those tweets from the total list of tweets if
according to their importance, which we will explain later.         it contains the relationship represented by the arrow, i.e., it
We formulate separate bipartite graphs for each query, that         contains at least an entity and action verb from the list of
represents the relationships among the entities, actions and        keywords that the arrow connects.
context. While a set of nodes represent entities’ names, an-
other set of nodes represent the names of verbs (i.e., actions).    3.2    Availability of Medical Resources
These relationships were formulated from the manually an-              In this section, we identify messages that mention the
notated tweets. We give a brief overview of the specific            availability of some medical resources like blood, blood bank,
words and their relationships for each query in the next sec-       medicine, etc. Firstly, we distinguish different action verbs
tion.                                                               from the manually annotated tweets that contain informa-
   We select the different types of action verbs from our man-      tion related to this query, the verbs are namely donation,
Figure 3: Graph II Relationships of Medical Re-
source Availability Information
                                                                  Figure 5: Graph Relationships for Devastation Re-
                                                                  lated Information

                                                                  3.3   Requirement of Medical Resources
                                                                     In this section, we identify messages that mention the re-
                                                                  quirement of some medical resources like blood, blood bank,
Figure 4: Graph Relationships of Medical Resource                 medicine, etc. We represent the actions and their corre-
Requirement Information                                           sponding entities in graph 5, the arrows represent the rela-
                                                                  tionships among the both. The table 2 represents the set of
                                                                  keywords for each entity or action. Thus, we filter all those
transport, rescue etc. There are some action verbs that are       tweets from the whole set of fifty thousand tweets which
ambiguous in meaning, example need reflects both the need         contain the relationships, i.e., at least a keyword from both
and the availability of resources. On further analysis of need    the nodes of an arrow.
mentioned tweets, we found need is used in availability of
resources tweets only in conditional statements (example, if
                                                                  3.4   Infrastructure Damage And Report
is a conditional clause).                                               of Restoration
   We represent the actions and their corresponding entities         In this section, we identify messages that mention the
in the next two graphs, namely graph 4 and graph 2, the           damage or restoration of any communication or structural
arrows represent the relationships among the both. The            infrastructures. However, the general statements about any
table 2 represents the set of keywords for each entity or         structure is not relevant. We filter the possible set of in-
action. Thus, we filter all those tweets from the whole set       frastructure names from our manual annotated tweets and
of fifty thousand tweets which contain the relationships, i.e.,   the different set of actions related to them. After detection
at least a keyword from both the nodes of an arrow. There         of the relationships among the action verb and entity name
are also some stringent relationships, that comprise of more      from the manually annotated tweets, we select only those
than just an entity and action name, as illustrated in graph      tweets that contain We visualize the different relationships
4.                                                                among different set of entities in Figure 6, and highlight the
                                                                  set of keywords in table 3.
                                 Words Representing
       Node Name                                                                                  Words Representing
                                        the Nodes                        Node Name
                                   blood, bloodbank,                                                    the Nodes
       Green5                      medicine, medical,                                                 hotel, debris,
                                           doctor                                                   building, temple,
                                 healthcare, hospital,                                            rubble, tower, road,
       Green6                                                                                         bridge, house,
                                    patient, diabities
                                    provide, survivor,                   Green1                    railway, dam, tent,
       Green7                                                                                     heritage, monument,
                                       victim,affect
       Blue1                         donate, donated                                              power grid, engineer,
                                                                                                        equipment,
                                  reach, transfer, sell,
                                                                                                         electricity
                                     distribut, suppl,
                                   send, sent, deliver,                                              reduce, flatten,
       Blue2                                                                                        destroy, devastat,
                                 dispatch, offer, land,
                                   deploy, transport,                                                   avalanche,
                                                                         Blue1
                                     prepar, continu                                                damage, restore,
                                                                                                    capture, collapse,
                                      rescue, relief,
       Blue3                                                                                           build, builds
                                 support, engag, rush
                                                                                                     devastat, terrif,
       Blue4                       need, want require                    Blue2
                                                                                                        heartbreak
                                       call, contact,
       Blue5                                                                                         footage, image,
                                          helpline                       Blue3
                                                                                                           picture
Table 2: Word Dictionary of Medical Resource
                                                                  Table 3: Word Dictionary for Devastation Related
Availability and Requirement Information
                                                                  Information
        Node Name                                      Words Representing the Nodes                                      Score
        Action1                                                 relief, rescue, aid                                       0.10
        Action2                   build, transfer, sell, distribut, send, sent, deliver, supply, donat, need               0.4
        Action3                                     deploy, dispatch, lad, transport, fly                                  0.3
        Action4        prepare, offer, launch, allow, provide, make, support, engag, rush, help, working, in action        0.2
        Entity1                     volunteer, food, biscuit, shelter, tent, house, home, cloth, blanket                   0.7
        Entity2           power, equipment, material, item, team, helicopter, bus, plane, call, helpline,contact           0.5

                                              Table 4: KeyWord Relevance Score


4.     SELECTION OF TWEETS                                                1. Precision at rank 20, i.e., considering up to the top 20
   The above graphs represent different entities, and their set              tweets for each topic.
of actions for a particular query. For a given query, we match
                                                                          2. Recall at rank 1000.
the relationships among the new tweet with the prescribed
relationships. Thus, a tweet is selected if it contains the               3. Mean Average Precision at rank 1000.
specified relationships of entities of that query. We further
rank those tweets according to it’s relevance to the query in             4. MAP overall, i.e., considering all tweets retrieved in
the next section.                                                            the run.

4.1      Score of Tweets                                             6.     CONCLUSION
   In this section, we rank the selected tweets by their rel-          In this paper, we devise a mechanism to extract the con-
evance to query. In order to rank the tweets, we score the           textual, content relationships of entities. We are able to fil-
different keyword relationships of a query. The keywords             ter tweets of high relevance for different queries by matching
are segregated into two different sections, entities and ac-         these relationships. We require a small number of manual
tion verbs. We give importance to words that signify better          annotated tweets to attain our results.
temporal relevance than others, i.e., there is a major differ-
ence between tweets like food items sent to affected areas by
Indian government, India dispatched 500 packets of rice to
                                                                     7.     REFERENCES
Nepal and India will dispatch food packets by saturday. We           [1] S. Ghosh and K. Ghosh. Overview of the FIRE 2016
give a brief description of our scoring mechanism.                       Microblog track: Information Extraction from
                                                                         Microblogs Posted during Disasters. In Working notes
     1. T emporal Importance : An action verb is given more              of FIRE 2016 - Forum for Information Retrieval
        importance if it highlights immediate action rather              Evaluation, Kolkata, India, December 7-10, 2016,
        than future. This is illustrated by Action2 and Action3 ,        CEUR Workshop Proceedings. CEUR-WS.org,
        Action4 .                                                        December 2016.

     2. Relevance : Some action verbs, represent greater rel-
        evance in times of calamity, as expressed in Action1 .
        Similarly, there are some entities (as in Entity1 ), which
        are the basis needs of human livelihood, like food and
        shelter which are more important than information re-
        lated to other entity (as in Entity2 ).

The different scores of the keywords are given in table 4.
Thus, a tweet’s score is the summation of it’s keywords’
scores. We hereby, could rank the tweets by their relevance
score accordingly.

         Metric Name                       Result
         P recision@20                     0.770
         Recall@1000                       0.4344
         M AP @1000                        0.2186
         OverallM AP                       0.2208

                        Table 5: Result


5.     RESULTS
  In this section, we highlight our results, FIRE Microblog
Track matched our selected tweets with a manual annota-
tor’s results. We briefly give an explanation of the metrics
and our results are depicted in Table 5. The metrics are.