=Paper= {{Paper |id=Vol-2036/T2-2 |storemode=property |title=Detecting the Need for Resources and their Availability |pdfUrl=https://ceur-ws.org/Vol-2036/T2-2.pdf |volume=Vol-2036 |authors=Nelleke Oostdijk,Ali Hürriyetoglu |dblpUrl=https://dblp.org/rec/conf/fire/OostdijkH17 }} ==Detecting the Need for Resources and their Availability== https://ceur-ws.org/Vol-2036/T2-2.pdf
               Detecting the Need for Resources and their Availability

                                 Nelleke Oostdijk*                                                   Ali Hürriyetoǧlu
         Centre for Language Studies (CLS), Radboud University                  CLS, Radboud University, Nijmegen and Statistics
                            Erasmusplein 1                                                       Netherlands
                  Nijmegen 6525 HT, The Netherlands                                              CBS-Weg 11
                          n.oostdijk@let.ru.nl                                            Heerlen, The Netherlands
                                                                                            a.hurriyetoglu@cbs.nl

ABSTRACT                                                                                      Table 1: Composition Test Dataset
In this working note we describe our submission to the FIRE2017
IRMiDiS track. We participated in both sub-tasks, the first of which                         Language tag         Tweet count         Tweet %
was directed at identifying the need or availability of specific re-                         English                   35,682           79.72
sources and the second at matching tweets expressing the need for                            Hindi                       3,060           6.84
a resource with tweets mentioning their availability. Our linguisti-                         Nepali                      4,180           9.34
cally motivated approach using pattern matching of word n-grams                              Othera                      1,837           4.10
achieved an overall average MAP score of 0.2458 for sub-task (1),                            All                       44,759          100.00
outperforming our machine-learning approach (MAP 0.1739) while              a
                                                                                Other is used to refer to a wide range of languages/language tags as well as the
being surpassed by two other (automatic) systems. The linguis-                                     UND tag which occurred with 976 tweets.
tic approach was also used in sub-task (2). There it was the best-
performing approach with an f-score of 0.3793.
                                                                          Participants were given two datasets which they could use to
CCS CONCEPTS                                                           develop their methods. The first set (development data) included
•Computing methodologies →Information extraction; Nat-                 roughly 18,000 tweets without any class labels. The second set
ural language processing; Support vector machines; •Information        (train data) contained some 800 tweets with class labels identifying
systems →Content ranking; Social tagging; Social networks; Con-        them as need tweet (211) or availability tweet (718). The test data
tent analysis and feature selection; Information extraction; Expert    contained 46,920 tweets of which we managed to download 44,759
search; Clustering and classification; Web and social media search;    tweets. The composition of the dataset we downloaded is shown
                                                                       in Table 1.
1 INTRODUCTION                                                         3 SUB-TASK (1): IDENTIFYING NEED AND
The FIRE2017 IRMiDiS track[1] was directed at microblogs and the         AVAILABILITY
aim was to identify actionable information such as what resources
are needed or available during a disaster. In this track there were    For sub-task (1) participants were expected to develop methodolo-
two sub-tasks: (1) identifying need tweets and availability tweets     gies for identifying need tweets and availability tweets. The fol-
and (2) matching need tweets and availability tweets. The dataset      lowing descriptions were provided: 1
consisted of tweets posted during the Nepal 2015 earthquake.                    Need-tweets: Tweets which inform about the need
   Our submission consisted of the results of two runs each using               or requirement of some specific resource such as
a different semi-automatic approach addressing sub-task (1), and                food, water, medical aid, shelter, mobile or inter-
one run with one of our approaches addressing sub-task (2).                     net connectivity, etc.
   The structure of the text below is as follows: The data are de-                  Availability-tweets: Tweets which inform about
scribed in more detail in Section 2. In Section 3 we describe the               the availability of some specific resources. This
approaches used to address subtask (1) and the results obtained.                class includes both tweets which inform about po-
In Section 4 the approach used for subtask 2 is described as well as            tential availability such as resources being trans-
the results obtained. Section 5 summarizes the main findings and                ported or dispatched to the disaster-struck area,
includes suggestions for future work.                                           as well as tweets informing about the actual avail-
                                                                                ability in the disaster-struck area, such as food be-
2 DATA                                                                          ing distributed, etc.
The data for this track were tweets posted during the 2015 Nepal
Earthquake and include tweets in English as well as tweets in local    3.1 The Linguistic Approach
languages such as Hindi and Nepali and code-mixed tweets. The          The approach we used here was based on that applied on a previous
data were provided by the organizers in the form of tweet IDs, to-     occasion.[3] In this approach a (monolingual English) lexicon and
gether with a script for downloading them.                             a set of hand-crafted rules are used to tag the relevant n-grams.
*                                                                      1
    This is the corresponding author                                       https://sites.google.com/site/irmidisfire2017/
The tagged output is then used for assigning class labels. As for           Table 2: Evaluation Results Sub-Task 1 for Availability
the current track approaches were expected to be able to handle             Tweets
also non-English data, a pre-processing step was introduced for
dealing with these data. In order to produce the results in a ranked         Submission                       Availability Tweets
fashion, where this is not part of the approach as such, we used             Run ID                     Prec.@100 Recall@100 MAP
some heuristics. More details on each of the steps outlined above            Radboud_CLS_task1_1          .7300         .3153     .2062
are given below.                                                             Radboud_CLS_task1_2          .6100         .2224     .1660
   Pre-processing. As a pre-processing step we translated part of
the non-English tweets to English by means of Google Translate
                                                                              Table 3: Evaluation Results Sub-Task 1 for Need Tweets
(https://translate.google.com), that is, only tweets that carried the
(Twitter) language tag HI or NE were translated. All other non-
English tweets (again, according to the language tag provided) were            Submission                         Need Tweets
left untouched, as were all tweets that the language tag identified            Run ID                     Prec.@100 Recall@100         MAP
as English where in fact they were either mixed tweets (English                Radboud_CLS_task1_1          .7500       .4309          .2853
and some other language, possibly Hindi or Nepali) or a completely             Radboud_CLS_task1_2          .4900       .3934          .1812
different language.
   An example of a tweet where Twitter identified the language
as Hindi is given below (Example 1). Its translation as produced              Example 2 shows the result of the tagging of the (translated)
by Google Translate is shown in Example 1a. As neither of the               tweet shown in Example 1a.
authors has any knowledge of Hindi or Nepali, we had no way of
knowing to what extent the language tag and the translation were              [Example 2]
reliable and we simply had to take the translation for what it is                   food [res-a] water [res-a] medicine [res-a] lack of
worth. However, based on what we saw with some other tweets in                      [act-n] electricity [res-a]
languages that we do know, we could tell that the language tag was
sometimes completely off. Yet, we decided that we were not going               Assigning class labels. The labeling module we developed auto-
to spend any time on this issue within the scope of the present             matically assigns a class label (’Nepal-Need’ or ’Nepal-Avail’) to
task.                                                                       each tweet. Input for this module is the output of the tagging
                                                                            described above. For each tweet, the tags assigned are uniqued,
  [Example 1]
                                                                            that is a single instance of each tag type is maintained. Example
                                                                            3 shows the result of this step for our example tweet. The module
                                                                            makes use of the label pattern list which specifies which labels are
                                                                            to be associated with the (combinations of) tags that can occur in a
                                                                            tweet (the tag patterns). For our example this means that given the
                                                                            tag pattern [act-n][res-a] the label ’Nepal-Need’ is assigned (Exam-
  [Example 1a]
                                                                            ple 3a). A typical pattern for a tweet expressing the availability of
                                                                            a resource is [act-a] [res-a].
                                                                              [Example 3]
                                                                                    [act-n] [res-a]

   Tagging relevant n-grams. A lexicon was constructed contain-               [Example 3a]
ing some 1,400 items. The lexicon contains mostly unigrams and                      [act-n] [res-a] Nepal-Need
bigrams that are considered relevant for the task at hand. All lex-
ical entries are typed. Main types are V(erb)-like or N(oun)-like.             Ranking the output. As this method does not yield any confi-
V-like items typically are verb forms (e.g. distribute) and nominal-        dence or likelihood scores, a ranking was obtained in the following
izations (e.g. distribution) which express some action. With each           manner. The output was first ranked based on a human-estimated
V-like item a tag is associated which indicates whether the items           confidence of specific class label + tag pattern combinations. This
expresses need or availability (tags act-n and act-a respectively).         resulted in an initial ranking of the sets of tweets that showed a
N-like items are typically nouns (e.g. food). With each N-like item         particular tag pattern. The final ranking was obtained by ordering
a tag is associated which indicates whether the item is a specific          the tweets within these ranked sets according to their tweet ID.
(e.g. water) or more general resource (e.g. aid). Tags here are                Results. The results as evaluated by the organizers are shown
res-a and res. In addition to the lexicon there is a small rule set         in Tables 2 and 3 as run ID Radboud_CLS_task1_1. The results
(currently comprising 10 rules) which specifies how lexical items           were the best of all submissions using a semi-automatic approach
may combine to form multiword n-grams. The lexicon and the                  on all counts (Precision@100, Recall@100, and MAP, both for the
rules are used by a software module we have developed in another            need tweets and the availability tweets). The average MAP of the
project. The module assigns the appropriate tags to the relevant            Availability and Necessity tweets are 0.2458 and 0.1736 for Rad-
word n-grams.                                                               boud_CLS_task1_1 and Radboud_CLS_task1_2 respectively.
                                                                        2
3.2 The Relevancer Approach                                                                          food [res-a] medicine [res-a] water [res-a] san-
For sub-task (1) we also applied the Relevancer approach[2]. Rel-                                itation [res-a] sanitation materials [res-a] materi-
evancer was used to generate 194 clusters for the tweets tagged                                  als [res]
as English or Hindi. English clusters, which are one third of the                          [Example 5]
clusters, were annotated and used as training data for the sup-
port vector machines (SVM) based classifier. The cluster annota-                                 Pakistan distributing Beef, Missionaries distribut-
tion yielded 272 availability and 38 need tweets. The training data                              ing Bible @RSSorg @bst_official @ArtofLiving dis-
was extended with additional data from [3], the gold annotations                                 tributing Food+Water #Earthquake
released by the organization team, and the development data re-                                      food [res-a] water [res-a]
leased in the scope of this shared task. The final classifier was
                                                                                           [Example 6]3
used to predict label of the test tweets. The classifier confidence
was used to rank the results.
   Results. The results as evaluated by the organizers are shown
in Table 2 as run ID Radboud_CLS_task1_2.

4 SUB-TASK (2): MATCHING NEED AND                                                                food [res-a] water [res-a]
  AVAILABILITY                                                                             [Example 7]4
In sub-task (2) participants were required to develop methodolo-
gies for matching need tweets with appropriate availability tweets.
For this task we used the tagging output we had obtained in the
processing the tweets for sub-task 1 using the linguistic approach.
For every need tweet all the word n-grams that had been tagged as
identifying a resource, we would attempt to find an exact match in                         [Example 7a]
the availability tweets. In both cases (need and availability tweets),                           RT @abpnewshindi: Food, water and blankets are
the ranked list was used and the software program would work its                                 sent to Nepal in the aircraft. s. See Jaishankar
way down. Since the task was to find up to 5 availability tweets                                 #NepalEarthquake Live- http://t.co/MG3hLqR5bO
with each need tweet and the algorithm would always start at the
top of the list of ranked availability tweets, only a small portion                                  food [res-a] water [res-a] blankets [res-a]
of the availability tweets actually appears in the matching results.                       [Example 8]5
Only when no exact match could be found, would the software at-
tempt to find near-matches. This was typically the case for tweets
where at best tweets could be found that yielded a partial match
(e.g. matching 2 out of 4 requested resources).
   To illustrate the approach described above, let us get back to the
example we have been using throughout this paper. Our example                              [Example 8a]
tweet was identified as a need tweet. From the tagging we obtained
                                                                                                 Under the service of religion, Christian mission-
(shown in Example 2) we would only keep those word n-grams and
                                                                                                 aries are distributing the Bible in food, water, and
their tags that identified a resource. The result of this is shown in
                                                                                                 clothing in Nepal. Http://t.co/4E6IHcEqM4 via @the-
Example 2b.
                                                                                                 lapine
    [Example 2b]                                                                                     clothing [res-a] food [res-a] water [res-a]
          food [res-a] water [res-a] medicine [res-a] elec-
          tricity [res-a]                                                                 Results.
                                                                                          The results were submitted under run ID Radboud_CLS_task2_1
   In order to find matching availability tweets we would look for                     and evaluated by the organizers as follows: precision@5 0.3305,
tweets where the same word n-grams could be found, regardless of                       recall 0.4450 and f-score 0.3793. Thus this approach was found to
the order in which they occurred. The five matching tweets found                       outperform all other approaches.
amongst the highest ranking availability tweets in the case of our
example were those shown in Examples 4-8.2                                             5 DISCUSSION AND CONCLUSIONS
    [Example 4]                                                                        Our participation in the IRMiDiS track was rather successful: we
          RT @aolnepal: Anybody interested in donating                                 achieved a third and sixth place in the overall (MAP) ranking for
          food, medicine, water and sanitation materials for
                                                                                       3
          earthquake victims, contact Art… http://t…                                     The language tag for this tweet was ’und’, therefore no preprocessing was applied.
                                                                                       4
                                                                                         The language tag here was for Hindi. Example 7a shows the English translation
                                                                                       obtained from Google Translate.
2                                                                                      5
 In each case the word n-grams identified and tagged as resource are shown below         The language tag here was for Hindi. Example 8a shows the English translation
the tweet text.                                                                        obtained from Google Translate.
                                                                                   3
              Table 4: Distribution across Languages                                    ACKNOWLEDGMENTS
                                                                                        The authors are grateful to Peter Beinema for his help with the
        Language        Test set    Availability           Need                         software used in sub-task (2).
        Tag              Count      Count      %       Count     %
        English          35,682      2,913 8.16         1,235 3.46                      REFERENCES
        Hindi             3,060        252 8.24            55 1.80                      [1] Moumita Basu, Saptarshi Ghosh, Kripabandhu Ghosh, and Monojit Choudhury.
        Nepali            4,180        254 6.08            78 1.87                          2017. Overview of the FIRE 2017 track: Information Retrieval from Microblogs
                                                                                            during Disasters (IRMiDis). In Working notes of FIRE 2017 - Forum for Information
        Other             1,837         39 2.12             8 0.44                          Retrieval Evaluation (CEUR Workshop Proceedings). CEUR-WS.org.
        All              44,759      3,458 7.73         1,376 3.07                      [2] Ali Hürriyetoǧlu, Nelleke Oostdijk, Mustafa Erkan Başar, and Antal van den
                                                                                            Bosch. 2017. Supporting Experts to Handle Tweet Collections About Significant
                                                                                            Events. Springer International Publishing, Cham, 138–141. https://doi.org/10.
                                                                                            1007/978-3-319-59569-6_14
sub-task (1), and scored best of all approaches on sub-task (2).6 Our                   [3] Ali Hürriyetoğlu and Nelleke Oostdijk. 2017. Extracting Humanitarian Informa-
                                                                                            tion from Tweets. In Proceedings of the First International Workshop on Exploita-
success can in part be attributed to the experience gained through                          tion of Social Media for Emergency Relief and Preparedness. Aberdeen, United King-
our participation in previous shared tasks.                                                 dom. http://ceur-ws.org/Vol-1832/SMERP-2017-DC-RU-Retrieval.pdf
   However, there are some issues that we struggled with and where
we expect we might do better on a future occasion. Thus we found
that for the current task the definitions of what constitutes a need
tweet and what an availability tweet were somewhat unclear, more
particularly in specifying what exactly was meant by ’a specific
resource’. The examples in the task description were all clear-cut
cases, including food, water, medicine, electricity, and blood donors.
But what about donations or support which is what we also en-
counter in the data.
   Looking at the results obtained through our linguistic approach
for sub-task (1), we note that the distribution of the need and avail-
ability tweets over the various languages is rather uneven. While
on average 7.73% of the data is classified as availability tweet and
3.07% as need tweet, the ratios especially for other language tweets
are much lower (Table 4). We speculate that the fact that the data
comprised multiple languages has affected our recall obtained for
sub-task (1). While the use of Google Translate for tweets tagged
as Hindi or Nepali was reasonably effective, we expect that better
results could be achieved if we put more effort into the preprocess-
ing of the tweets. This would involve both improving the language
identification and finding a way to handle code-mixed tweets.
   The results obtained with the linguistic approach for sub-task (1)
might have been better if we had allowed for multiple class labels
to be associated with a given tweet. However, as we expected sub-
task (2) to be easier if a tweet carried only a single label, we opted
for a forced choice for one of the two classes and we ignored tweets
where we could not decide for either class.
   In matching need and availability of resources for sub-task (2)
we restricted ourselves to only exact (literal) matches which means
that we fail to match instances such as the need for shelter and the
availability of tents, or the need for food and the availability of
packed meals. In future work we might include synonyms, hyper-
nyms, and hyponyms.
   The Relevancer approach suffered from the fact that the training
set of labelled data was rather small (929 tweets) and quite unbal-
anced (211 need vs 718 availability tweets). However, we think that
a combination of the linguistic approach and Relevancer approach
has the potential to overcome such limitations. We are currently
conducting experiments in which we aim to combine the strengths
of the two approaches. So stay tuned!
6
 The organizers distinguish between automatic and semi-automatic approaches with-
out specifying how to discriminate between them. We consider our approaches semi-
automatic but actually in their execution they operate automatically.
                                                                                    4