=Paper=
{{Paper
|id=Vol-2036/T2-2
|storemode=property
|title=Detecting the Need for Resources and their Availability
|pdfUrl=https://ceur-ws.org/Vol-2036/T2-2.pdf
|volume=Vol-2036
|authors=Nelleke Oostdijk,Ali Hürriyetoglu
|dblpUrl=https://dblp.org/rec/conf/fire/OostdijkH17
}}
==Detecting the Need for Resources and their Availability==
Detecting the Need for Resources and their Availability Nelleke Oostdijk* Ali Hürriyetoǧlu Centre for Language Studies (CLS), Radboud University CLS, Radboud University, Nijmegen and Statistics Erasmusplein 1 Netherlands Nijmegen 6525 HT, The Netherlands CBS-Weg 11 n.oostdijk@let.ru.nl Heerlen, The Netherlands a.hurriyetoglu@cbs.nl ABSTRACT Table 1: Composition Test Dataset In this working note we describe our submission to the FIRE2017 IRMiDiS track. We participated in both sub-tasks, the first of which Language tag Tweet count Tweet % was directed at identifying the need or availability of specific re- English 35,682 79.72 sources and the second at matching tweets expressing the need for Hindi 3,060 6.84 a resource with tweets mentioning their availability. Our linguisti- Nepali 4,180 9.34 cally motivated approach using pattern matching of word n-grams Othera 1,837 4.10 achieved an overall average MAP score of 0.2458 for sub-task (1), All 44,759 100.00 outperforming our machine-learning approach (MAP 0.1739) while a Other is used to refer to a wide range of languages/language tags as well as the being surpassed by two other (automatic) systems. The linguis- UND tag which occurred with 976 tweets. tic approach was also used in sub-task (2). There it was the best- performing approach with an f-score of 0.3793. Participants were given two datasets which they could use to CCS CONCEPTS develop their methods. The first set (development data) included •Computing methodologies →Information extraction; Nat- roughly 18,000 tweets without any class labels. The second set ural language processing; Support vector machines; •Information (train data) contained some 800 tweets with class labels identifying systems →Content ranking; Social tagging; Social networks; Con- them as need tweet (211) or availability tweet (718). The test data tent analysis and feature selection; Information extraction; Expert contained 46,920 tweets of which we managed to download 44,759 search; Clustering and classification; Web and social media search; tweets. The composition of the dataset we downloaded is shown in Table 1. 1 INTRODUCTION 3 SUB-TASK (1): IDENTIFYING NEED AND The FIRE2017 IRMiDiS track[1] was directed at microblogs and the AVAILABILITY aim was to identify actionable information such as what resources are needed or available during a disaster. In this track there were For sub-task (1) participants were expected to develop methodolo- two sub-tasks: (1) identifying need tweets and availability tweets gies for identifying need tweets and availability tweets. The fol- and (2) matching need tweets and availability tweets. The dataset lowing descriptions were provided: 1 consisted of tweets posted during the Nepal 2015 earthquake. Need-tweets: Tweets which inform about the need Our submission consisted of the results of two runs each using or requirement of some specific resource such as a different semi-automatic approach addressing sub-task (1), and food, water, medical aid, shelter, mobile or inter- one run with one of our approaches addressing sub-task (2). net connectivity, etc. The structure of the text below is as follows: The data are de- Availability-tweets: Tweets which inform about scribed in more detail in Section 2. In Section 3 we describe the the availability of some specific resources. This approaches used to address subtask (1) and the results obtained. class includes both tweets which inform about po- In Section 4 the approach used for subtask 2 is described as well as tential availability such as resources being trans- the results obtained. Section 5 summarizes the main findings and ported or dispatched to the disaster-struck area, includes suggestions for future work. as well as tweets informing about the actual avail- ability in the disaster-struck area, such as food be- 2 DATA ing distributed, etc. The data for this track were tweets posted during the 2015 Nepal Earthquake and include tweets in English as well as tweets in local 3.1 The Linguistic Approach languages such as Hindi and Nepali and code-mixed tweets. The The approach we used here was based on that applied on a previous data were provided by the organizers in the form of tweet IDs, to- occasion.[3] In this approach a (monolingual English) lexicon and gether with a script for downloading them. a set of hand-crafted rules are used to tag the relevant n-grams. * 1 This is the corresponding author https://sites.google.com/site/irmidisfire2017/ The tagged output is then used for assigning class labels. As for Table 2: Evaluation Results Sub-Task 1 for Availability the current track approaches were expected to be able to handle Tweets also non-English data, a pre-processing step was introduced for dealing with these data. In order to produce the results in a ranked Submission Availability Tweets fashion, where this is not part of the approach as such, we used Run ID Prec.@100 Recall@100 MAP some heuristics. More details on each of the steps outlined above Radboud_CLS_task1_1 .7300 .3153 .2062 are given below. Radboud_CLS_task1_2 .6100 .2224 .1660 Pre-processing. As a pre-processing step we translated part of the non-English tweets to English by means of Google Translate Table 3: Evaluation Results Sub-Task 1 for Need Tweets (https://translate.google.com), that is, only tweets that carried the (Twitter) language tag HI or NE were translated. All other non- English tweets (again, according to the language tag provided) were Submission Need Tweets left untouched, as were all tweets that the language tag identified Run ID Prec.@100 Recall@100 MAP as English where in fact they were either mixed tweets (English Radboud_CLS_task1_1 .7500 .4309 .2853 and some other language, possibly Hindi or Nepali) or a completely Radboud_CLS_task1_2 .4900 .3934 .1812 different language. An example of a tweet where Twitter identified the language as Hindi is given below (Example 1). Its translation as produced Example 2 shows the result of the tagging of the (translated) by Google Translate is shown in Example 1a. As neither of the tweet shown in Example 1a. authors has any knowledge of Hindi or Nepali, we had no way of knowing to what extent the language tag and the translation were [Example 2] reliable and we simply had to take the translation for what it is food [res-a] water [res-a] medicine [res-a] lack of worth. However, based on what we saw with some other tweets in [act-n] electricity [res-a] languages that we do know, we could tell that the language tag was sometimes completely off. Yet, we decided that we were not going Assigning class labels. The labeling module we developed auto- to spend any time on this issue within the scope of the present matically assigns a class label (’Nepal-Need’ or ’Nepal-Avail’) to task. each tweet. Input for this module is the output of the tagging described above. For each tweet, the tags assigned are uniqued, [Example 1] that is a single instance of each tag type is maintained. Example 3 shows the result of this step for our example tweet. The module makes use of the label pattern list which specifies which labels are to be associated with the (combinations of) tags that can occur in a tweet (the tag patterns). For our example this means that given the tag pattern [act-n][res-a] the label ’Nepal-Need’ is assigned (Exam- [Example 1a] ple 3a). A typical pattern for a tweet expressing the availability of a resource is [act-a] [res-a]. [Example 3] [act-n] [res-a] Tagging relevant n-grams. A lexicon was constructed contain- [Example 3a] ing some 1,400 items. The lexicon contains mostly unigrams and [act-n] [res-a] Nepal-Need bigrams that are considered relevant for the task at hand. All lex- ical entries are typed. Main types are V(erb)-like or N(oun)-like. Ranking the output. As this method does not yield any confi- V-like items typically are verb forms (e.g. distribute) and nominal- dence or likelihood scores, a ranking was obtained in the following izations (e.g. distribution) which express some action. With each manner. The output was first ranked based on a human-estimated V-like item a tag is associated which indicates whether the items confidence of specific class label + tag pattern combinations. This expresses need or availability (tags act-n and act-a respectively). resulted in an initial ranking of the sets of tweets that showed a N-like items are typically nouns (e.g. food). With each N-like item particular tag pattern. The final ranking was obtained by ordering a tag is associated which indicates whether the item is a specific the tweets within these ranked sets according to their tweet ID. (e.g. water) or more general resource (e.g. aid). Tags here are Results. The results as evaluated by the organizers are shown res-a and res. In addition to the lexicon there is a small rule set in Tables 2 and 3 as run ID Radboud_CLS_task1_1. The results (currently comprising 10 rules) which specifies how lexical items were the best of all submissions using a semi-automatic approach may combine to form multiword n-grams. The lexicon and the on all counts (Precision@100, Recall@100, and MAP, both for the rules are used by a software module we have developed in another need tweets and the availability tweets). The average MAP of the project. The module assigns the appropriate tags to the relevant Availability and Necessity tweets are 0.2458 and 0.1736 for Rad- word n-grams. boud_CLS_task1_1 and Radboud_CLS_task1_2 respectively. 2 3.2 The Relevancer Approach food [res-a] medicine [res-a] water [res-a] san- For sub-task (1) we also applied the Relevancer approach[2]. Rel- itation [res-a] sanitation materials [res-a] materi- evancer was used to generate 194 clusters for the tweets tagged als [res] as English or Hindi. English clusters, which are one third of the [Example 5] clusters, were annotated and used as training data for the sup- port vector machines (SVM) based classifier. The cluster annota- Pakistan distributing Beef, Missionaries distribut- tion yielded 272 availability and 38 need tweets. The training data ing Bible @RSSorg @bst_official @ArtofLiving dis- was extended with additional data from [3], the gold annotations tributing Food+Water #Earthquake released by the organization team, and the development data re- food [res-a] water [res-a] leased in the scope of this shared task. The final classifier was [Example 6]3 used to predict label of the test tweets. The classifier confidence was used to rank the results. Results. The results as evaluated by the organizers are shown in Table 2 as run ID Radboud_CLS_task1_2. 4 SUB-TASK (2): MATCHING NEED AND food [res-a] water [res-a] AVAILABILITY [Example 7]4 In sub-task (2) participants were required to develop methodolo- gies for matching need tweets with appropriate availability tweets. For this task we used the tagging output we had obtained in the processing the tweets for sub-task 1 using the linguistic approach. For every need tweet all the word n-grams that had been tagged as identifying a resource, we would attempt to find an exact match in [Example 7a] the availability tweets. In both cases (need and availability tweets), RT @abpnewshindi: Food, water and blankets are the ranked list was used and the software program would work its sent to Nepal in the aircraft. s. See Jaishankar way down. Since the task was to find up to 5 availability tweets #NepalEarthquake Live- http://t.co/MG3hLqR5bO with each need tweet and the algorithm would always start at the top of the list of ranked availability tweets, only a small portion food [res-a] water [res-a] blankets [res-a] of the availability tweets actually appears in the matching results. [Example 8]5 Only when no exact match could be found, would the software at- tempt to find near-matches. This was typically the case for tweets where at best tweets could be found that yielded a partial match (e.g. matching 2 out of 4 requested resources). To illustrate the approach described above, let us get back to the example we have been using throughout this paper. Our example [Example 8a] tweet was identified as a need tweet. From the tagging we obtained Under the service of religion, Christian mission- (shown in Example 2) we would only keep those word n-grams and aries are distributing the Bible in food, water, and their tags that identified a resource. The result of this is shown in clothing in Nepal. Http://t.co/4E6IHcEqM4 via @the- Example 2b. lapine [Example 2b] clothing [res-a] food [res-a] water [res-a] food [res-a] water [res-a] medicine [res-a] elec- tricity [res-a] Results. The results were submitted under run ID Radboud_CLS_task2_1 In order to find matching availability tweets we would look for and evaluated by the organizers as follows: precision@5 0.3305, tweets where the same word n-grams could be found, regardless of recall 0.4450 and f-score 0.3793. Thus this approach was found to the order in which they occurred. The five matching tweets found outperform all other approaches. amongst the highest ranking availability tweets in the case of our example were those shown in Examples 4-8.2 5 DISCUSSION AND CONCLUSIONS [Example 4] Our participation in the IRMiDiS track was rather successful: we RT @aolnepal: Anybody interested in donating achieved a third and sixth place in the overall (MAP) ranking for food, medicine, water and sanitation materials for 3 earthquake victims, contact Art… http://t… The language tag for this tweet was ’und’, therefore no preprocessing was applied. 4 The language tag here was for Hindi. Example 7a shows the English translation obtained from Google Translate. 2 5 In each case the word n-grams identified and tagged as resource are shown below The language tag here was for Hindi. Example 8a shows the English translation the tweet text. obtained from Google Translate. 3 Table 4: Distribution across Languages ACKNOWLEDGMENTS The authors are grateful to Peter Beinema for his help with the Language Test set Availability Need software used in sub-task (2). Tag Count Count % Count % English 35,682 2,913 8.16 1,235 3.46 REFERENCES Hindi 3,060 252 8.24 55 1.80 [1] Moumita Basu, Saptarshi Ghosh, Kripabandhu Ghosh, and Monojit Choudhury. Nepali 4,180 254 6.08 78 1.87 2017. Overview of the FIRE 2017 track: Information Retrieval from Microblogs during Disasters (IRMiDis). In Working notes of FIRE 2017 - Forum for Information Other 1,837 39 2.12 8 0.44 Retrieval Evaluation (CEUR Workshop Proceedings). CEUR-WS.org. All 44,759 3,458 7.73 1,376 3.07 [2] Ali Hürriyetoǧlu, Nelleke Oostdijk, Mustafa Erkan Başar, and Antal van den Bosch. 2017. Supporting Experts to Handle Tweet Collections About Significant Events. Springer International Publishing, Cham, 138–141. https://doi.org/10. 1007/978-3-319-59569-6_14 sub-task (1), and scored best of all approaches on sub-task (2).6 Our [3] Ali Hürriyetoğlu and Nelleke Oostdijk. 2017. Extracting Humanitarian Informa- tion from Tweets. In Proceedings of the First International Workshop on Exploita- success can in part be attributed to the experience gained through tion of Social Media for Emergency Relief and Preparedness. Aberdeen, United King- our participation in previous shared tasks. dom. http://ceur-ws.org/Vol-1832/SMERP-2017-DC-RU-Retrieval.pdf However, there are some issues that we struggled with and where we expect we might do better on a future occasion. Thus we found that for the current task the definitions of what constitutes a need tweet and what an availability tweet were somewhat unclear, more particularly in specifying what exactly was meant by ’a specific resource’. The examples in the task description were all clear-cut cases, including food, water, medicine, electricity, and blood donors. But what about donations or support which is what we also en- counter in the data. Looking at the results obtained through our linguistic approach for sub-task (1), we note that the distribution of the need and avail- ability tweets over the various languages is rather uneven. While on average 7.73% of the data is classified as availability tweet and 3.07% as need tweet, the ratios especially for other language tweets are much lower (Table 4). We speculate that the fact that the data comprised multiple languages has affected our recall obtained for sub-task (1). While the use of Google Translate for tweets tagged as Hindi or Nepali was reasonably effective, we expect that better results could be achieved if we put more effort into the preprocess- ing of the tweets. This would involve both improving the language identification and finding a way to handle code-mixed tweets. The results obtained with the linguistic approach for sub-task (1) might have been better if we had allowed for multiple class labels to be associated with a given tweet. However, as we expected sub- task (2) to be easier if a tweet carried only a single label, we opted for a forced choice for one of the two classes and we ignored tweets where we could not decide for either class. In matching need and availability of resources for sub-task (2) we restricted ourselves to only exact (literal) matches which means that we fail to match instances such as the need for shelter and the availability of tents, or the need for food and the availability of packed meals. In future work we might include synonyms, hyper- nyms, and hyponyms. The Relevancer approach suffered from the fact that the training set of labelled data was rather small (929 tweets) and quite unbal- anced (211 need vs 718 availability tweets). However, we think that a combination of the linguistic approach and Relevancer approach has the potential to overcome such limitations. We are currently conducting experiments in which we aim to combine the strengths of the two approaches. So stay tuned! 6 The organizers distinguish between automatic and semi-automatic approaches with- out specifying how to discriminate between them. We consider our approaches semi- automatic but actually in their execution they operate automatically. 4