Statistical Semantic Classification of Crisis Information Prashant Khare, Miriam Fernandez, and Harith Alani Knowledge Media Institute, Open University, UK, {prashant.khare,miriam.fernandez,h.alani}@open.ac.uk Abstract. The rise of social media as an information channel during crisis has become key to community response. However, existing crisis awareness applications, often struggle to identify relevant information among the high volume of data that is generated over social platforms. A wide range of statistical features and machine learning methods have been researched in recent years to automatically classify this informa- tion. In this paper we aim to complement previous studies by exploring the use of semantics as additional features to identify relevant crisis in- formation. Our assumption is that entities and concepts tend to have a more consistent correlation with relevant and irrelevant information, and therefore can enhance the discrimination power of classifiers. Our results, so far, show that some classification improvements can be obtained when using semantic features, reaching +2.51% when the classifier is applied to a new crisis event (i.e., not in training set). Keywords: semantics, crisis informatics, tweet classification 1 Introduction As per the 2016 World Humanitarian Data and Trends report by UNOCHA,1 there were around 102 million people, from 114 countries, affected by natural disasters in the year of 2015 alone, causing an estimated damage of $90 billion. During such disasters there is normally a surge of real time content across mul- tiple social media platforms. For example, during the 2011 Japan earthquake, there was a 500% increase in the number of tweets.2 All these messages con- stitute a critical source of information for relief and search teams, communities and individuals. However, it is almost impossible to manually absorb and process the sheer volume of social media reports generated during a crisis, and to efficiently fil- ter any relevant and actionable information [5]. Tools to automatically identify relevant information are largely unavailable, and the characteristics of social me- dia messages (short length, use of colloquialisms, ill-formed words and syntactic structures) increases the challenges of automatically processing and understand- ing of such messages. 1 https://data.humdata.org/dataset/world-humanitarian-data-and-trends 2 https://blog.twitter.com/official/en_us/a/2011/global-pulse.html Much research explored methods for the classification of social media data into crisis-related or unrelated, based on supervised [10,8,16,20] and unsuper- vised [14] machine learning (ML) methods. These methods tend to identify rel- evant data based on n-grams and statistical features (message length, URLs, Hashtags, etc.). This paper aims to complement previous works by investigating the impact of semantic features to identify relevant information from Twitter data during crisis situations. The semantic features explored in our work include entities (e.g., “London”, “Colorado”, “Fire”) extracted from tweets, as well as their hypernyms from BabelNet, which is an external knowledge base[11]. Our hypothesis is that entities and concepts may have a more consistent correlation with relevant and irrelevant crisis information, and therefore can be used to bet- ter interpret the content of the tweets and to enhance the discrimination power of classifiers. We explore the effectiveness of semantic features by creating and testing clas- sifiers to identify relevant crisis information, as well as by testing these classifiers with previously unseen information from different crisis events. The dataset used in our research is a small subset of CrisisLexT26;3 a library of 205K annotated tweets posted during 26 real crisis events in 2012 and 2013. Our subset consists of a balanced related-unrelated set of 3.2K tweets on 9 crisis events (detailed in Section 3.1). Our results show that using semantic information can indeed help to enhance classification results, but only by a small margin. When the classifier is applied to a new crisis event, results show that the use of semantic annota- tions of concepts and entities in itself is effective, and the use of semantically expanded concepts (i.e., entities and their hypernyms) further improves over it slightly. However, the use of hypernyms also sometimes introduces generic con- cepts, such as “person”, that appear in both, crisis related and non-crisis related posts, and thus effects the discrimination power of semantic features. The contributions of this work can be summarised as follows: – Demonstrating the impact of using a variety of semantic features for identi- fying crisis-related information from social media posts. – Showing that adding semantic features is especially useful when classifying new crisis events that were not seen during the model training phase. – Testing using annotated data from CrisisLexT26 of 9 real crisis events. – Discussing and reflecting on the potential use of semantics to identify crisis- relevant information. The rest of the paper is structured as follows. Section 2 summarises the related work on processing social media data for identifying crisis related content. Section 3 describes our approach, including the selected semantic features and how they are used to created various types of classifiers. The experiments are results are reported in Section 4. Section 5 discusses the lessons learned from this work, as well as its limitations and the future lines of work. Conclusions are reported in Section 6. 3 crisislex.org 2 Related Work During a crisis, a very large number of messages are often posted on various so- cial media platforms. Processing all such messages requires substantial time and effort to ensure that crisis related messages are efficiently spotted and handled, since a good percentage of messages posted about a crisis tends to be irrele- vant and unrelated. Olteanu and colleagues observed that crisis reports could be classified into three main categories: related and informative, related but not in- formative, and not related [12]. In this work, we focus primarily on the automatic identification of crisis related information. The identification of informativeness in crisis scenarios is a complex task that requires a deeper reflection and investi- gation of the meaning of informativeness and its dimensions (freshness, novelty, location, scope). It is therefore an important part of our future work. To identify crisis related messages from social media data, several works have proposed the use of supervised [10,8,16,20] and unsupervised [14] ML classifica- tion methods. Supervised methods tend to make use of n-grams as well as of linguistic and statistical features such as part of speech (POS), number of hash- tags, mentions, or message length. They also highlight the use of location as an important indicator, since people tend to create and retweet messages with locally actionable information [9]. These works make use of various supervised classification methods, from traditional classification algorithms such as Naive Bayes, Support Vector Machines or Conditional Random Field [13,16,6] to more novel techniques such as deep learning [3]. Unsupervised methods, on the other hand, are mainly based on keyword processing and clustering [14]. Our work aims to complement these studies by investigating the use of semantics, and particularly the use of entities extracted from tweets, and their hypernyms, as additional features to boost classification. As previously done by [8] we not only aim to generate classifiers able to identify crisis-related information, but we also aim to test the generated classifiers on crisis events that the classifiers have not previously seen. While semantic models have been developed and used to represent and cap- ture the information that emerge from crisis events (e.g., MOAC - Management of a Crisis 4 , or HXL - Humanitarian eXchange Language5 ), few works in the literature have explored the use of semantics to identify and filter crisis-related information. In [2], Abel and colleagues presented Twitcident, a system that uses semantic information to facilitate filtering and search of crisis related informa- tion. The system extracts semantic information from social media data in the form of entities using Name Entity Recognisers (NER) and external knowledge bases. However, as opposed to our work, they do not explore the use of entities as features for classification. Instead, they develop similarity models in which the crisis event and the posts are profiled based on this semantic enrichment, and the Jaccard similarity coefficient6 is used to compute whether the content of the posts is similar or not to the event. 4 http://www.observedchange.com/moac/ns. 5 http://hxlstandard.org/ 6 https://en.wikipedia.org/wiki/Jaccard_index 3 Classification: Identifying Crisis Related Information Our approach for identifying crisis related information among tweets functions by generating binary classifiers to differentiate crisis-related from non-related posts. In this section, we explain (i) the dataset used in our experiments, (ii) the two set of features (statistical and semantic) that we use to build the classifiers, and (iii) our classifier selection process. 3.1 Data Selection To conduct our study we selected the CrisisLexT267 dataset [12], an annotated dataset of 205K tweets posted during 26 crisis events occurring between 2012 and 2013. The search keywords used to construct CrisesLexT26 were selected follow- ing the standard practices of hashtags and/or terms often paired with canonical forms of a disaster name and impacted location (e.g., Queensland floods) or a meteorological term (e.g., Hurricane Sandy). For each of the 26 crisis events, around 1,000 tweets are annotated (Related and Informative, Related but not Informative, Not Related, or Not Applicable). Given our focus on English tweets, we selected 9 events for which the content was predominantly in English: West Texas Explosion(WTE), Colorado WildFire(CWF), Colorado Flood (CFL), Aus- tralia Bushfire(ABF), Boston Bombing(BB), LA Shooting(LAS), Queensland Flood (QFL), Savar Building Collapse (SBC), and Singapore Haze(SGH). We merged those tweets labelled as Not Related and Not Applicable under the class Not Related, obtaining a total of 1539 non crisis-related tweets. We also merged those tweets labelled as Related and Informative and Related but not In- formative under the class Related, obtaining a total of 7461 crisis related tweets. In line with common practice, we balanced the dataset to remove classification bias towards the bigger class Related, by randomly selected 1667 crisis related tweets. This gives us a balanced and annotated dataset of 3206 of Related and Not Related tweets. 3.2 Feature Engineering To generate classifiers able to identify crisis-related posts, we explore two dis- tinct feature sets, statistical and semantic features. Statistical features have been widely studied in the literature [10,8,16,20] and are used as the baseline for our experiments. They capture the linguistic and quantifiable attributes of posts. Semantic features, on the other hand, capture the different named entities that emerge from tweets, as well as their hierarchical information which we extract from an external knowledge source. 3.2.1 Statistical Features (SA) For each social media post, we extract the following statistical features: – Number of nouns: nouns generally refer to locations, resources, or actors involved in the crisis event. 7 http://crisislex.org/data-collections.html#CrisisLexT26 – Number of verbs: verbs are an indication of the different actions that are occurring during the crisis event. – Number of pronouns: as with nouns, pronouns may be an indication of the actors, locations, or resources that are named during the crisis event. – Tweet Length: number of characters contained in the posts. The longer the post is, the higher the amount of information it may contain. – Number of words: number of words may be another indication of the amount of information the post may have. – Number of Hashtags: hashtags indicate the themes of the post and are man- ually generated by the posts’ authors. – Readability: Gunning fog index using average sentence length (ASL) and the percentage of complex words (PCW): 0.4 * (ASL + PWC). This feature gauges how hard the post is to parse by humans.8 – Unigrams: unigrams provide a keyword-based representation of the content of the posts To extract the unigrams from social media posts we make use of the Weka data mining software9 , and specifically its StringToWord functionality, includ- ing lower case conversion for all tokens, stemming (using Lovins’ algorithm)10 , stopword removal, and tf*idf transformation. The total number of unigrams, or vocabulary size, for the complete dataset is 10655. To extract the Part of Speech (POS) tags and the statistical features listed above (top five), we make use of the Stanford Core NLP software.11 Hashtags are identified by the use of the # character, and readability is computed using the Gunning fog index. 3.2.2 Semantic Features (SemF) The semantic feature extraction process is summarised in Figure 3.2.2 and consists of three main steps: (i) semantic annotation, (ii) semantic expansion, and (iii) semantic filtering. Each of these three steps generates a different set of semantic features that we explore, indi- vidually and in combination, when generating binary classifiers to distinguish crisis-related posts from unrelated ones. Semantic Annotation Features (SemAF): In the initial step (semantic annotation) semantic entities are extracted from the posts by using Babelfy.12 This Name Entity Recogniser (NER) identifies the different entities that appear in the text, disambiguates them, and links them to the BabelNet[11] knowledge base, providing a unique identifier (SynsetID) for each of the identified entities. For example (Figure 1), for the post “A 15-year-old High River boy is missing due to the flood. Call police if you see Eric St. Denis #abflood ” Babelfly identifies entities such as High River, Boy, Flood, etc. The annotation of the entire dataset (see Section 3.1) resulted in 12,006 unique concepts. 8 https://en.wikipedia.org/wiki/Gunningfogindex 9 http://www.cs.waikato.ac.nz/ml/weka/ 10 http://www.mt-archive.info/MT-1968-Lovins.pdf 11 https://stanfordnlp.github.io/CoreNLP/ 12 http://babelfy.org Fig. 1. Example of a semantically annotated post, generated with Babelfly. Semantic Expansion Features (SemEF): In the second step (semantic expansion) the BabelNet knowledge base is used to extract every direct hyper- nym (distance-1) of these entities. Our hypothesis for considering hypernyms is that, by introducing upper level concepts, we might be able to better encapsulate the semantics of crisis-related tweets. For example, if the entities fireman and policeman appear often in crisis related posts. These entities have a common hypernym, defender. As a result, a post with the entity MP (Military Police), is more likely to also be crisis-related since this entity also has the hypernym defender. The semantic expansion process resulted in an additional 7032 unique concepts. Semantic Filtering Features (SemFF): When semantically expanding the initially extracted entities, we could sometimes introduce very generic con- cepts with low discrimination power. For example, the hypernym Person appears in both crisis and non-crisis related posts, and thus does not help the classifiers to identify crisis-related information. Our filtering process aims to discard such semantic annotations that might be too generic and hence are likely to reduce the discrimination power of semantics. Our proposed filtering process is based on the computation of the depth of a concept in the hierarchy of BabelNet. To determine the depth of concepts, we query iteratively through the hierarchy of BabelNet. Abstract concepts, i.e., concepts with a lower depth are therefore removed. To determine the shortest depth of a concept in the hierarchy of Ba- belNet, we used nearly 4 million relations extracted by iteratively querying for hypernyms and generated a Directed Graph. The node with highest betweenness centrality (SynSetID ‘bn:00031027n’, which relates to the main sense ‘Entity’) was determined to be the most abstract concept. The NetworkX13 graph library for Python was used for this task. We then computed the Shortest path between the node ‘Entity’ and all the extracted hypernyms. The maximum depth found was 21, where level 0 is assigned to the concept ‘Entity’. By performing an em- 13 https://networkx.github.io/ pirical analysis of the concepts using Information Gain, we observed that the most informative concepts are those whose depth is between 3 and 7. Those are therefore the ones selected as features for classification. This filtering process resulted in 574 concepts filtered out from the semantics across 9 events. Fig. 2. Semantic Features: Annotation, Expansion, & Filtering 3.3 Classifier Selection When selecting a classification method for the problem at hand we considered the high dimensionality of features, particularly the high number of unigrams and semantic features, the limited set of labelled data (3,206 posts) and the importance of avoiding over-fitting. Given the large set of features in comparison with the number of training examples, we opted for selecting the Support Vector Machine (SVM) classification model [4] with Linear Kernel. SVM has proven effective for problems with these characteristics.14 4 Experiments In this section we describe our experimental set up, and particularly the design of our model selection and testing experiments. We report on the obtained results and later discuss how semantic features can help enhancing the performance of classifiers based on statistical features, and especially when the classifier is applied to cross-crisis scenarios. 4.1 Experimental Setup We designed two main experiments where we train and test our classification models on (i) all 9 crisis events, and (ii) on 8 events, and retest on the 9th event, i.e., cross-crisis testing. 14 http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf Radial Basis Function (rbf) kernel or a Polynomial Kernel may cause an over-fitting prob- lem,hence we opted for a linearly separable hyperplane. – Crisis Classification Model: In our first experiment we compare the perfor- mance of classifiers generated with statistical features vs. classifiers enhanced with semantic features and analyse if the use of semantics does indeed help boosting the performance of binary classifiers when identifying crisis-related posts. We compare the performance of four different classifiers generated using the complete dataset (see Section 3.1), and tested using 10-fold cross validation. We use the WEKA software (v.3.8)15 to generate the classifiers. • SF: A classifier generated with statistical features; our baseline. • SF+SemAF: A classifier generated with statistical features, and semantic annotation features. • SF+SemAF+SemEF: A classifier generated with statistical features, se- mantic annotations, and their hypernyms, i.e., the Semantic Expanded Features. • SF+SemFF: A classifier generated with statistical features, and filtered semantic annotations, along with their hypernyms, i.e., the Semantic Filtered Features – Cross-crisis Classification: In our second experiment we retest the classifiers above by applying them to a new crisis data, i.e., on data from a new crisis event that was not part of the training set. For this experiment, we generate the same four classifiers described in the previous task. However, rather than using the complete dataset to generate the model, we use 8 out of the 9 crisis events to generate the model, and then apply the models to the remaining event for validation. We therefore generate 36 different classification models for this experiment. 4.2 Results: Crisis Classification The results of our first experiment (each model statistically evaluated with 10 iterations of 10-fold cross validation) are presented in Table 1. The table presents F-measure(F) value (from 10-fold cross validation), mean of F-measure (Fmean ) of 100 results from 10 iterations, standard deviation in F-measure (σ),and the increment of Fmean over the baseline ∆F/F . Precision and Recall values where equal to F in this experiment, and hence were omitted from table. As we can see in this table, the use of semantic features helps to enhance classification results in all cases, but almost negligibly (less than 0.6%). However, the use of annotations alone (SF+SemAF) produces slightly better results than the use of annotations and hypernyms (SF+SemAF+SemEF). To better understand the impact of semantics in this context, we manually analyse some of the tweets that were misclassified by the statistical baseline model, but were correctly classified when using semantics (see Table 2)In addi- tion, we perform feature selection using Information Gain (IG) over the gener- ated classifiers to determine which are the most discriminative statistical and semantic features when identifying crisis related posts. 15 http://www.cs.waikato.ac.nz/ml/weka/ Features F Fmean Std. Dev. σ ∆F/F SF (Baseline) 0.865 0.872 0.017 - SF+SemAF 0.870 0.877 0.017 0.0057 SF+SemAF+SemEF 0.868 0.873 0.017 0.0011 SF+SemFF 0.864 0.873 0.018 0.0011 Table 1. 10 iterations of 10-fold Cross Validation ∆F/F , showing performance of our statistical semantics classifiers against the statistical baseline classifier. PostID Text Label Post1 I GET 5078 REALL FOLLOWERS! http://t.co/qrF5dpD3 Not Related #BestRap,#boulderflood,#PutinsFlik,#Rem #in Post2 @Stana Katic Can we get some loveballs in Colorado? We Not Related need it after all the flooding! Love you! Xo Post3 RT @LarimerCounty: #HighParkFire burn area map as of Related Monday night 10 p.m. http://t.co/1guBTcXX Post4 Colorado wildfires their worst in a decade Related http://t.co/RtfLmfds Post5 RT @RedCross: Thanks to generosity of volunteer blood Related donors there is currently enough blood on the shelves to meet demand. #BostonMarathon Table 2. Examples of posts that were misclassified by the statistical classifier, but classified correctly by the semantic classifiers. When applying IG over the attributes of the baseline classifier, the number of hashtags was the most relevant feature. By manually checking some of the tweets, we observe that Not Related posts tend to either have no hashtags (see as example Post2) or contain many hashtags (see Post1). The number of nouns and pronouns is also a high discriminative feature. As we hypothesized, crisis related posts generally contain more nouns and pronouns mentioning persons, resources or locations relevant to the crisis event. When including semantics, we observe that the hypernyms and annotations are among the highly ranked features, based on IG. Apart from highly ranked statistical features, hypernyms such as ‘Happening’ and ‘Event’ (which, in BabelNet, are hypernyms of concepts such as ‘Incident’, ‘Fire’, ‘Crisis’,‘Disaster’,and ‘Death’), were among the top 10 attributes (out of almost 800 positive IG attributes). Post3 was misclassified when using only statistical features. Although it contains the relevant term burn, it barely appears in the training data. How- ever, the post is correctly classified by SF+SemAF, because the term burn re- turns the concept Fire as part of its semantic annotation. Post 4 was misclas- sified by SF+SemAF, but correctly classified when adding semantic expansion (SF+SemAF+SemEF).The original tweet was annotated with the concept Wild- fire, which has the hypernym Fire; a feature with high IG and strongly associated with the class crisis-related. Therefore, in this case, the use of hypernyms helped to obtain the additional information needed to correctly categorise the post. Post 5 was misclassified by SF+SemAF+SemEF but correctly classified by SF+SemFF. Annotations such as Thanks and Meet semantically expanded to hypernyms such as Virtue, and Desire, which have a very low discrimination power, and hence weakens the classifier. We observe that removing such less informative abstract concepts results in increasing the discriminative power of the remaining, more informative, concepts, such as ‘Volunteer ’ and hypernym (of ‘donor’) ‘Benefactor ’. 4.3 Results: Cross-crisis Classification The results of this experiment are reported in Table 3. In this experiment, we compile 9 different datasets, where in each dataset 1 out of the 9 crisis events is entirely left out of the training sample used to train and test the classification model.16 Each row is named after the the crisis event that was left out of the dataset during its creation (see Section 3.1). The data split for each dataset (train on 8 event/test on 9th event) is presented in the second column of the table. For each of these 9 datasets we created the four different classifiers described in Section 4.2. The results of each of these models for the 9 different datasets are reported in table along with their values of Precision (P), Recall (R), F1-measure (F) and the increment of F measure over the baseline, ∆F/F . Class-1,0 Size SF SF+SemAF SF+SemAF+SemEF SF+SemFF Test Train Test P R F P R F ∆F/F P R F ∆F/F F ∆F/F Event 1 0 1 0 WTE 1556 1450 111 89 0.806 0.805 0.804 0.813 0.81 0.808 0.005 0.819 0.815 0.812 0.010 0.823 0.024 CWF 1420 1292 247 247 0.643 0.64 0.638 0.633 0.623 0.617 -0.033 0.716 0.715 0.714 0.119 0.71 0.113 CFL 1578 1464 89 75 0.784 0.774 0.774 0.796 0.793 0.793 0.025 0.79 0.787 0.787 0.017 0.793 0.025 ABF 1417 1289 250 250 0.776 0.774 0.774 0.782 0.778 0.777 0.004 0.811 0.8 0.798 0.031 0.788 0.018 BB 1588 1468 79 71 0.713 0.707 0.702 0.693 0.693 0.693 -0.013 0.734 0.733 0.732 0.043 0.759 0.081 LAS 1537 1419 130 120 0.811 0.808 0.808 0.777 0.776 0.776 -0.040 0.777 0.776 0.775 -0.041 0.787 -0.026 QFL 1347 1258 320 281 0.699 0.694 0.694 0.702 0.696 0.695 0.001 0.702 0.691 0.69 -0.006 0.691 -0.004 SBC 1306 1200 261 239 0.618 0.594 0.58 0.651 0.64 0.636 0.097 0.619 0.584 0.561 -0.033 0.565 -0.026 SGH 1587 1472 80 67 0.716 0.66 0.648 0.744 0.68 0.669 0.032 0.737 0.68 0.67 0.034 0.662 0.022 Avg. 0.714 0.718 0.009 0.727 0.0194 0.731 0.0251 % 0.9% 1.94% 2.51% Table 3. Cross-Crisis Evaluation- SF, SemAF, SemEF, and SemFF feature sets (best set of features highlighted in bold) As we can see, the use of semantics enhances classification results in all cases. We observe that SF+SemAF improves the classification over the baseline SF, in 6 out of 9 case, with an average of 0.9% increase in F-1 measure. As opposed to our previous experiment (10-fold cross-validation), however, the use of hypernyms makes the model more adaptable to unknown data in 6 out of 9 cases, with an average improvement of 1.94% over the baseline(SF). Semantic expansion (SemEF) improves over the annotation model (SemAF) in 5 out of 9 cases. Also, it is worth noting that filtering out the abstract concepts resulted in an improved performance of SF+SemFF over SF+SemAF+SemEF model ( average of 0.6%), in 7 out of 9 cases. This validates the argument (Sec 3.2.2) that certain concepts tend to appear in both, crisis related and non-related tweets, and 16 Each model was tested on the 8 event dataset it was trained on using 10 fold cross- validation to ensure its accuracy before applying it to the 9th event data. There accuracy drops around 17% on average when applied to new events. therefore introduce noise rather than helping with the classification. Filtering out such concepts enhances the classification. SF+SemFF model improves over the baseline by an average of 2.51%. 5 Discussion and Future Work Our findings show potential in mixing statistical and semantic features for clas- sifying crisis-related and unrelated tweets. The highest, and more worthy, im- provement is achieved when using this hybrid model to classify data of a new crisis event that the model was not trained on. This is due to the use of semantic knowledge graphs to expand the vocabulary into semantic concepts and hyper- nyms, and thus capturing the essence of the tweets and their terms. However, we showed that such a semantic expansion could introduce noise in the form of abstract concepts, which requires filtering to maximise benefit. An issue we encountered was the unsymmetrical mappings of Hypernym- Hyponym relationship in BabelNet, which effected the hierarchical expansion of semantics and hierarchy generation. As a future work, we plan to refer to more symmetrically mapped resources, such as WordNet17 , and extend to the types and categories of semantics through external knowledge base such as DBpedia18 . One of the limitations of this study is the small size of the dataset (3206 an- notations) and type of crisis events (5 different types), which we plan to expand in future work. We also need to investigate whether the discriminative features differ across the various type of crisis, and languages. Additionally, we will inves- tigate whether adding semantic features incorrectly classifies some tweets that are correctly classified by the statistical approach. 6 Conclusion This work presents an approach to leverage semantic enrichment for classifying unseen crisis Twitter data. The two approaches of semantic enrichment; anno- tation and semantic expansion, exhibit an improvement in classification perfor- mance over the statistical features by 0.9%-2.51%. We have also demonstrated empirically that more abstract concepts are less discriminative, and proposed a method that filters the concepts which are less likely to be discriminative. References 1. Abel, F., Celik, I., Houben, G.J. and Siehndel, P. Leveraging the semantics of tweets for adaptive faceted search on twitter. Int. Semantic Web Conf. (ISWC), Bonn, Germany, 2011. 2. Abel, F., Hauff, C., Houben, G. J., Stronkman, R., and Tao, K. Semantics+ filter- ing+ search= twitcident. exploring information in social web streams. Conf. Hyper- text and Social Media (Hypertext), WI., USA, 2012 3. Burel, G., Saif, H., Fernandez, M., and Alani, H. (2017). On Semantics and Deep Learning for Event Detection in Crisis Situations. Workshop on Semantic Deep Learning (SemDeep), at ESWC, Portoroz, Slovenia, 2017. 17 https://wordnet.princeton.edu/ 18 http://wiki.dbpedia.org/ 4. Cristianini, N. and Shawe-Taylor, J. An introduction to support vector machines and other kernel-based learning methods. Cambridge university press, 2000. 5. Gao, H., Barbier, G., & Goolsby, R. Harnessing the crowdsourcing power of social media for disaster relief. IEEE Intelligent Systems, 26(3), 10-14, 2011. 6. Imran, M., Elbassuoni, S., Castillo, C., Diaz, F. and Meier, P. Practical extrac- tion of disaster-relevant information from social media. Int. World Wide Web Conf. (WWW), Rio de Janeiro, Brazil, 2013. 7. Jadhav, A.S., Purohit, H., Kapanipathi, P., Anantharam, P., Ranabahu, A.H., Nguyen, V., Mendes, P.N., Smith, A.G., Cooney, M. and Sheth, A.P. Twitris 2.0: Semantically empowered system for understanding perceptions from social data, http://knoesis.wright.edu/library/download/Twitris ISWC 2010.pdf, 2010. 8. Karimi, S., Yin, J. and Paris, C. December. Classifying microblogs for disasters. Australasian Document Computing Symposium, Brisbane, QLD, Australia, 2013. 9. Kogan, M., Palen, L. and Anderson, K.M. February. Think local, retweet global: Retweeting by the geographically-vulnerable during Hurricane Sandy. Conf. on com- puter supported cooperative work & social computing (CSCW ’15), Vancouver, Canada, 2015. 10. Li, R., Lei, K.H., Khadiwala, R. and Chang, K.C.C., 2012, April. Tedas: A twitter- based event detection and analysis system. IEEE 28th Int. Conf. on Data Engineer- ing (ICDE), Washington, DC, USA, 2012. 11. Navigli, R. and Ponzetto, S.P. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelli- gence, 193, pp.217-250, 2012. 12. Olteanu, A., Vieweg, S. and Castillo, C. February. What to expect when the unexpected happens: Social media communications across crises. Conf. on Com- puter Supported Cooperative Work & Social Computing (CSCW ’15), Vancouver, Canada, 2015. 13. Power, R., Robinson, B., Colton, J. and Cameron, M. Emergency situation aware- ness: Twitter case studies. Int. Conf. on Info. Systems for Crisis Response and Management in Mediterranean Countries (ISCRAM), Toulouse, France, 2014. 14. Rogstadius, J., Vukovic, M., Teixeira, C.A., Kostakos, V., Karapanos, E. and Laredo, J.A. CrisisTracker: Crowdsourced social media curation for disaster aware- ness. IBM Journal of Research and Development, 57(5), pp.4-1, 2013. 15. Sakaki, T., Okazaki, M. and Matsuo, Y. Earthquake shakes Twitter users: real-time event detection by social sensors. Int. Conf. World Wide Web (WWW), Raleigh, North Carolina USA, 2010. 16. Stowe, K., Paul, M., Palmer, M., Palen, L. and Anderson, K. Identifying and Categorizing Disaster-Related Tweets. Workshop on Natural Language Processing for Social Media, In EMNLP, Austin, Texas, USA, 2016. 17. Vieweg, S., Hughes, A.L., Starbird, K. and Palen, L.. Microblogging during two natural hazards events: what twitter may contribute to situational awareness. Proc. SIGCHI Conf. on Human Factors in Computing Systems (CHI), Atlanta, GA, USA, 2010. 18. Vieweg, S.E., 2012. Situational awareness in mass emergency: A behavioral and lin- guistic analysis of microblogged communications (Doctoral dissertation, University of Colorado at Boulder), https://works.bepress.com/vieweg/15/ 19. Yin, J., Lampert, A., Cameron, M., Robinson, B. and Power, R. Using social media to enhance emergency situation awareness. IEEE Intelligent Systems, 27(6), pp.52- 59, 2012. 20. Zhang, S. and Vucetic, S.. Semi-supervised Discovery of Informative Tweets During the Emerging Disasters. arXiv preprint arXiv:1610.03750, 2016.