Towards Text Processing System for Emergency Event Detection in the Arctic Zone © Dmitriy Deviatkin © Artem Shelmanov Federal Research Center “Computer Science and Control” of Russian Academy of Sciences, Moscow, Russia devyatkin@isa.ru shelmanov@isa.ru Abstract unstructured textual data for support of search and rescue operations, as well as for helping people in affected areas. We present the ongoing work on text processing system The Arctic zone is a hard but important and promising for detection and analysis of events related to region that has a lot of potential for the development. The emergencies in the Arctic zone. The peculiarity of the remarkable peculiarity of the chosen domain is data task consists in data sparseness and scarceness of tools / sparseness and scarceness of tools / language resources language resources for processing such specific texts. for processing such specific data, which poses a difficult The system performs focused crawling of documents problem. related to emergencies in the Arctic region, text parsing The most significant features of the system are including named entity recognition and geotagging, and focused crawling and faceted search. indexing texts with their metadata for faceted search. The Since it is impossible to store all available data on the system aims at processing both English and Russian text web, the developed system is designed to accumulate messages and documents. We report the preliminary only data related to emergencies in the Arctic zone from results of the experimental evaluation of the system multiple textual streams. The sources of such information components on Twitter data. include but are not limited to mass media, social networks, reports (e.g., official sources like national Keywords: focused crawling, event detection, transportation safety boards1,2). The focused crawler is monitoring, named entity recognition, text processing, intended to narrow down the amount of indexed text and information search extract basic metadata of downloaded documents. At first sight, the problem of crawling messages about 1 Introduction emergency events is very similar to topic crawling. The key difference lies in the fact that emergency related Due to ever-growing amounts of data available on the messages can be devoted to multiple topics and the web, monitoring and searching in textual streams is still composition of these topics can change over time. It one of the most urgent problems today that has inspired means that using the ordinal topical approaches leads to researchers to develop many general-purpose inappropriate accuracy and laboriousness of the crawling information-retrieval methods and systems. However, process. To mitigate this problem, we have implemented the development of applications for specific domains the following ideas in the proposed framework: often reveals lack of suitable techniques that could  Multiple topic crawlers with narrow focuses address challenging tasks arising in these domains, which outperform a single data collecting process in require significant research. terms of recall. This paper describes an ongoing development of a  Geographical coordinate extracting and search and monitoring system for a specific domain and considering them for further filtering improve the a task. It is oriented on detection and analysis of accuracy of the crawling process. One could get emergency events in the Arctic zone. Since a lot of topically irrelevant, but important messages from textual information is generated during emergencies and emergency zone. crises, as during major events of other types, it is crucial to have automated tools for filtering and processing of  Topic models for crawled texts could be periodically built and verified for better tracking of topic shifts in text streams. Proceedings of the XVIII International Conference  Reposts and fuzzy duplicates can be effectively «Data Analytics and Management in Data Intensive detected via inverted full-text indices [28]. Domains» (DAMDID/RCDL’2016), Ershovo, Russia, The faceted search provides the abilities to retrieve October 11 - 14, 2016 and analyze texts in different perspectives: topic, time, 1 2 http://www.tsb.gc.ca/eng/rapports- http://www.ntsb.gov/investigations/AccidentReports/Pa reports/marine/index.asp ges/marine.aspx 148 location, relations with the given object, etc. The The clusters are considered as events found in an developed system performs deep natural language information stream. Researchers tested the system on the processing of texts (including syntax parsing and data acquired during hurricane Sandy. They showed that semantic role labeling), named entity recognition, as well the system could be used for searching messages from as geotagging. The extracted metadata is indexed for the affected people considering their location. faceted search. Another monitoring system SensePlace2, described We evaluated the developed subsystems for in [12], specializes on analysis of the geographical data geotagging, crawling, and faceted search on the data extracted from tweets. The system aims at improving the acquired from Twitter. Although this social network situational awareness during search and rescue accumulates only short messages and is not designed for operations. The main goal of the system is text stream providing data for the considered tasks, many filtering and searching of messages related to the given researchers, as shown in Section 2, demonstrated that topic, place, and time. The system utilizes the tweets could be a useful source of information about geographical tags, as well as the information extracted emergencies. When common communication services from message texts. Besides text, Senseplace2 also are down, Twitter provides a channel, which is used by indexes geographical and temporal information of affected people and emergency response teams [22]. messages. This enables the system to filter a message Therefore, we used messages crawled from Twitter for stream by place and time and build analytical reports for preliminary experiments, testing our approaches, and topic-time-location data. Senseplace2 can visualize evaluation of the system components. However, we note results in different ways: as a common search result list, that the developed system is designed to handle all sorts present them on a time scale as a histogram, and visualize of textual information, not just short messages. results on a heat-map, which displays the intensity of The rest of the paper is organized as follows. Section messages about particular topic near the given location. 2 reviews the related work about monitoring emergency Researchers tested the system using data related to the events with help of social networks and focused Haiti earthquake. They showed that SensePlace2 could crawling. Section 3 describes the details of the system in be useful for finding refugee streams that are not development; it presents the natural language processing represented in official sources. pipeline, method for focused crawling, and faceted In [24], researchers present a method for search techniques. In section 4, the results of the classification of messages acquired from a message preliminary experiments are presented and discussed. stream. They demonstrate its capabilities of finding Section 5 concludes and outlines the future work. useful emergency related messages on Twitter data. The method can classify messages as useful and non-useful 2 Related work via standard supervised machine learning methods The problem of event detection in text streams has a (Naïve Bayes and Maximum entropy). The most lot of attention from the research community. Methods remarkable thing is a feature set used for training. that were developed to address this problem were applied Besides low-level features, they also conducted to many domains. One of them is monitoring experiments with high-level features like message emergencies. It was noticed that mass emergencies objectivity, whether it is personal or impersonal, whether initiate the intensive exchange of information in social it is formal or informal. The authors show that high-level networks. This immense text stream contains cues about features substantially improve the quality of a situation in an affected area, infrastructure damage, classification. The out-of-domain evaluation showed human casualties, requests and proposals for help. It is a accuracy from 30 to 80%. The experiments are conducted crucial information that can enhance the situation on the data acquired during Haiti earthquakes, USA awareness [18] of both affected people and participants wildfires, and floods. of rescue operations. However, it is mixed up with heavy The system EMERSE (Enhanced Messaging for the noise: irrelevant or useless messages. Therefore, to put it Emergency Response Sector) [4] collects messages from to good use, new methods and technologies are required. different sources, translates them, and classifies them The need of such technologies became apparent, which into topics for better search and filtering. EMERSE facilitated the development of many diverge systems for consists of a smartphone application, a Twitter crawler, a mining emergency related information in social translation subsystem, and a subsystem for classification. networks. We review the most significant recent work on The smartphone application is considered to simplify a such systems. process of collecting messages and their metadata such Papers [20] and [17] present an information flow as location, time, and associated media files (photo, monitoring system Twittris designed for processing of video). Besides, the system crawls Twitter considering short messages from mass and social media, as well as timestamps and eliminating duplicates (reposts). SMS-messages. Researchers tested the system on Twitter EMERSE classifies messages into multiple classes using data. The system crawls messages from Twitter using a support vector machine. In [4], authors experimented set of keywords, which is expanded over time by the most with different features and feature selection methods: bag significant n-grams extracted from acquired messages. of words, feature abstraction methods [21], Latent The system extracts the spatial and temporal information, Dirichlet Allocation (LDA), and others. The system was as well as topics, which are used for message clustering. tested on a collection of messages submitted to the 149 Ushahidi3 web-service during the Haiti earthquake. In Another system for vertical search of information this example, the authors demonstrate that EMERSE can about emergencies is described in [27]. The system improve coordination of people during the emergencies. includes a focused ontology-based crawler. An extensive In [25, 26], a system ESA (Emergency Situation ontology describing various emergencies is designed for Awareness) is presented. It can monitor social networks the crawler. and blogs in real time and visualize information about It is also worth mentioning Tweedr [2] – an open- different emergencies. The main task of the system is to source system that can find informative messages from enhance situational awareness of people in an affected Twitter for information support of people involved in area. The system is oriented on New Zealand and rescue operations. It can distinguish general messages Australia regions. ESA gathers tweets and detects topical from the ones that have particular information about bursts in information streams. The retrospective data is infrastructure damage and human casualties. Another used for building a language model, which is applied for recent effort in constructing tweet classification system the further burst detection. The algorithm searches lexis is described in [5]. The authors use deep natural language that has a very diverge distribution comparing to the processing techniques and rich set of features to language model. For convenient representation of bursts determine whether a message contains information about for end-users, ESA performs thematic clustering of damage dealt during natural disasters. In [14], an messages. The system also selects informative messages approach for construction of crisis-related terms is that signal about emergencies, destructions, and requests proposed. Authors used pseudo-relevance feedback for help. ESA has a component that extracts relevant mechanisms to expand a number of seeding terms during spatial data using explicit geotags of messages (GPS- crawling, which results in recall improvement of coordinates received from a smartphone) and implicit retrieving of messages related to mass emergencies. information, found in user profiles. The conversion from Another lexicon called EMTerms is described in [23]. geographical names to coordinates is performed by The authors claim that it is the biggest crisis-related Yahoo geo-service4 (retired today). ESA also performs lexicon for Twitter analysis so far. named entity recognition: it extracts names of organizations, names of people, geographical entities, Solutions for monitoring events in text streams dates, and timestamps. All these data can be visualized heavily depend on focused crawling techniques. We on a map, which could be useful for providing better review some of the state-of-the art approaches below. representation of found events for end users. ICrawl system [7] is a framework for focused Visualization of data in ESA is also enhanced with media crawling of social networks. It adopts ontology based files (images, videos), extracted from messages. The crawling strategy. The novel feature of this system is a authors tested ESA in Australian crisis center, which is usage of Internet search engines for generation of responsible for monitoring of natural disasters and other bootstrap crawling points. In [3], researchers propose a national security threats. distributed crawler for continuous message-gathering AIDR5 (Artificial Intelligence for Disaster Response) from particular user communities, which can circumvent is an open-source platform for classification of messages limits of Twitter API. In [11], automatic Topic-focused related to emergencies [9]. The system detects messages Monitor is presented. It samples tweets from the message about different topics: infrastructure damage, casualties, stream and selects keywords to track target topics based required or available donations. The authors point out on the samples. that classifiers trained on the data collected during one disaster perform badly on the data acquired from new The review shows that there are a plenty of systems disasters. They address this problem by introducing for monitoring emergency related events in textual human annotation into the process of adapting the system streams intended to improve situational awareness of to new tasks. When a new emergency happens, the affected people and rescue teams. In our work, we system should be retrained. The training dataset for consider a particular geographical region – the Arctic supervised machine learning is composed from the old zone, which complicates focused crawling and filtering labeled data and the data urgently annotated via of data. Many aforementioned systems specialize on crowdsourcing services. The systems have elements of narrow problems like message classification, whereas active learning; it chooses for human annotation the most our research is oriented on the development of a full- informative samples that can significantly leverage stack system that solves many tasks: from focused classification performance. The authors tested the system crawling and information extraction, to faceted search on the collection of messages related to Pakistan leveraged with spatial and temporal metadata. Unlike the earthquake in 2013. aforementioned systems, the framework proposed in this TEDAS [10] is the system for emergency detection paper is oriented on processing messages in both English via focused crawling of Twitter messages. TEDAS and Russian languages. This is significant because of the collects topic relevant messages using the Twitter search large area of the Arctic territories of Russia. We note that API. The system uses the original crawling strategy that many systems use Twitter data for evaluation, and we consists in dynamic shifting of crawler focus. also use this approach in our work. 3 5 https://www.ushahidi.com/ http://aidr.qcri.org/ 4 https://developer.yahoo.com/boss/geo/ 150 Figure 1 Framework for crawling of emergency messages implements rather simple rules to filter out common false 3 System components positives that take into account parts of speech and capitalization of words. 3.1 Natural language processing pipeline We also tag crisis-related lexis in texts; it enhances The system performs deep natural language and simplifies filtering and search. The data for this processing of Russian and English texts. Besides basic purpose is taken from CrisisLex lexicon, proposed in processing tools, the pipeline also includes syntax [14]. parsing, semantic role labelling, and named entity recognition. 3.2 Focused crawling framework The basic analysis for Russian texts is performed by AOT.ru6. This framework is used for tokenization, We deal with several social networks, such as sentence boundary detection, POS tagging, and Twitter, Facebook, and VKontakte, and with some news lemmatization, including morphological disambiguation. feeds (ArcticInfo, BarentsObserver, BBC, etc.) These We use MaltParser7 trained on SynTagRus [13] for sources provide different kinds of content. The Twitter dependency parsing of Russian texts and our semantic provides API for crawling of recent messages by parser for semantic role labelling [19]. The same types of keywords. However, the limitations of the API make the linguistic analysis of English texts are performed via topical crawling process challenging. Since results Freeling [16]. Note that the syntax and semantic commonly contain much irrelevant noise, additional annotations are used for information search (see section filtering is necessary. We access Facebook and 3.3). VKontakte primarily via links in twitter messages that are For the basic named entity recognition, we used considered topically relevant. The news feeds have a Polyglot NER framework [1]. It implements language static structure, therefore, they can be processed by a agnostic approach and due to this provides named entity common crawler with a preliminary created static task. recognition for many languages including English and The data acquired from news feeds do not need topical Russian. It produces annotations for locations, filtering, because the crawling task can be restricted to organizations, and person names. However, we found process only relevant sections. Since we deal with a that the basic NER processor is not suitable for extracting number of heterogeneous sources, we use several kinds toponyms related to a particular region (e.g., Arctic of crawlers (see Fig. 1). zone); it yields low recall in this task. Therefore, we The first type is a GeoTag crawler. It is used for complemented Polyglot with a gazetteer. collecting messages from Twitter with specified The gazetteer was created on the basis of Geonames8 coordinates. Tweets may include geographical database. It contains more than 11 million geographical coordinates or geo-tags, which could be used for locations of different types around the world with their localization of their authors. We filter all messages, names (in many languages including Russian and whose geo-tag latitude is less than 60 degrees. English), geographical coordinates, and other metadata. The second type is a Topic crawler. These crawlers From Geonames, we extracted location names that are download topically relevant messages from Twitter with situated on the north of the 60th latitude. The gazetteer unspecified coordinates. Each topic crawler has lists of uses these data to mark spatial information in texts. It also “permissive” and “restrictive” terms that are fed to 6 8 http://aot.ru/ http://www.geonames.org/ 7 http://maltparser.org/ 151 Table 1 Examples of topics for crawled data No Keywords Relevant 1 Bay, charity, Amazon, Antarctica, cdnpoli. False 2 Starling, Tuktoyaktuk, community, visit, bird, southern, blackbird. True 3 Ice, national, ship, circle, arctics, photography, day, june, pewenvironment. True 4 Rescue, buntings, air, guardsmen, squadron, cranes, divers, spot. True 5 Haha, dart, Trump, meepismurder, white, sales, gauges, street. False 6 Icebreaker, Nunavut, hardy, apithanny, piece, fascinating, blue, warming, bear. True 7 Home, conservation, thebigbidtheory, may, island, science, hydrazine. False 8 Spring, noaa, climatechange, water, super, sail, challenge, Mediterranean. False 9 Arctic, Alaska, skuas, Greenland, road, amb, melt, Anchorage, Bering. True 10 Life, natgeomag, trip, journey, remote, team, chukchi, collaborating. True Twitter search API. In the initial steps, several bootstrap metadata and keywords extracted from search results of terms are used for defining a target topic. The challenge previous iterations. Additionally, search results could be lies in limitations of topic search API, provided by filtered using different sets of meta fields that can be Twitter. It restricts a size of a query and a response, which static or dynamic. leads to insufficient recall of the crawling process. The In the developed system, the faceted search is simplicity of the query language causes the low precision powered by the Exactus technology [15]. Its main and recall of the collected data. We use multiple topic advantage lies in ability to efficiently index rich linguistic crawlers with different keyword subsets to solve the information including syntax relation, semantic roles, or insufficient recall problem. NER and filtering are used to other types of semantic annotations extracted from improve the precision. natural language text (e.g. named entities). This enables The last type of crawlers is a common crawler. It phrase search (results have to contain given syntactically collects data from topically related sections of news connected phrases) semantic search (results are ranked feeds. The crawlers of this type can also download pages taking into account semantic similarity of the query and from VKontakte and Facebook referenced by relevant indexed documents). We take advantage of this Twitter posts. technology by introducing indexing by geographical The whole schema of data processing in our tags, timestamps, and emergency-related tags. This framework is the following. In the first step, messages are provides the ability to filter results efficiently by collected by GeoTag and Topic crawlers. In the second semantic information like location, time, organizations, step, we apply linguistic analyser, NER, and gazetteer to persons, and topics. It also provides the ability to retrieve the collected texts. Then, we filter all messages that do information with certain tags filtered by other metadata not contain any crisis lexis, toponyms, and geotags. producing the results that can be sifted with consequent URLs from the remaining messages are fed to the queries. common crawler that also processes topically related news feeds. The selected useful messages and documents 4 Evaluation of system components are indexed by the Exactus search engine [15]. For Topic crawler, we build a topic model [8] of the We have conducted a series of experiments to assess crawled messages every several days. It helps to track the quality of the created components for focused topic shifts in the message stream. We summarize topic crawling, named entity recognition, and faceted search. content with a keyword cloud and a set of the most The source of the data for evaluation is Twitter social significant messages from the cluster. Then each topic is network. The experimental dataset contains marked as relevant or irrelevant by several assessors (see approximately 100 thousand messages in English and Table 1). We define the following types of posts as Russian. In the first experiment, we assessed accuracy of relevant: the proposed focused crawling framework. More 1. Posts about arbitrary events (past, current and specifically, we evaluated the quality of filtering. We planned) and locations in the Arctic. labelled several subsets of posts devoted to accidents in 2. Arbitrary posts from users, who currently are in Alaska and Bering Sea. Each post from the subsets was the Arctic zone. labelled by three assessors to reach sufficient coherence The most significant terms from the relevant topics of the test data. We have not applied a cross-validation are sent to “permissive” keyword collections of topic approach here because the labelling was not used for the crawlers, and terms from irrelevant topics are sent to crawler training, just for testing. The standard measures “restrictive” ones. Thus, the crawling process becomes for supervised learning: precision, recall and F1-score, responsible to trend shifts. were used for each subset. Macro-averaging was used to evaluate the result assessments. Table 2 refers to the 3.3 Faceted search results of the crawling without and with filtering as The faceted search became a backbone for “Impure data” and “Filtered data” correspondingly. professional search applications [6]. In this type of Applying the proposed filtering technique results in search, users can iteratively specify queries using the substantial growth of the precision without the 152 significant decrease of the recall. This means that during With the data crawled from Twitter, we experimentally the crawling process we do not lose much topically demonstrated that the framework provides the basic relevant data but substantially decrease the stored noise. abilities for analysis of message streams about We decided to choose a fairly soft filtering because, emergencies in the restricted area. although a stricter procedure would improve the In the future work, we are going to incorporate into precision, it would also imply a more significant recall the natural language processing pipeline components that drop, which contradicts the purpose of the monitoring extract information about ships and planes in the Arctic system. zone. Bulk information is available openly on the web (e.g., MarineTraffic service9). Tagging ship names and Table 2 Focused crawling evaluation their coordinates in document and message streams P R F1 potentially can improve the quality of emergency event Impure data 0.26 1.00 0.41 detection and enhance the situation awareness. Filtered data 0.57 0.94 0.70 We are going to accumulate more retrospective data from social networks and other sources to increase the In the second experiment, we estimated the recall of the crawling process. Among many other types performance of named entity recognition performed by of information sources, collections of reports from rescue Polyglot and gazetteer. We labelled all location mentions services are the most prospect supplement for the in 300 tweets that were downloaded by the Topic crawler crawling. Another way to improve topic crawling is and measured precision, recall, and F1-score for detection of users and groups in social networks that extraction of spatial entities (Table 3). constantly post topically relevant messages. This could be done semi-automatically by building topical models Table 3 NER evaluation (on locations) on users and groups. We are also going to create P R F1 visualization tools for geotagged messages that can Polyglot 0.78 0.57 0.66 present events on the map. Gazetteer 0.78 0.74 0.76 Polyg.+gazetter 0.76 0.82 0.79 Acknowledgments Results show that proposed Gazetteer significantly The project is supported by the Russian Foundation outperforms Polyglot on location extraction in terms of for Basic Research, project number: 15-29-06045 recall. The knowledge source of Polyglot is Wikipedia “ofi_m”. that does not have the full coverage of locations. We conclude that it is reasonable to use the gazetteer and References Polyglot together for the maximum performance. [1] Rami Al-Rfou, Vivek Kulkarni, Bryan Perozzi, and In the last experiment, we assessed the performance Steven Skiena. Polyglot-NER: Massive multilingual gain of the information search achieved by using the named entity recognition. In Proceedings of the 2015 proposed emergency faceted search method in SIAM International Conference on Data Mining. comparison to the baseline algorithm. We deployed the SIAM, 2015. Exactus full-text search algorithm without filtering by tag locations as the baseline. For the evaluation, we applied [2] Zahra Ashktorab, Christopher Brown, Manojit the NDCG score and peer reviewing approach. The Nandi, and Aron Culotta. Tweedr: Mining Twitter to results are presented in Table 4. inform disaster response. Proceedings of ISCRAM, pages 354–358, 2014. Table 4 Faceted search evaluation [3] Matko Boanjak, Eduardo Oliveira, José Martins, 3-DCG 5-DCG 10-DCG Eduarda Mendes Rodrigues, and Luís Sarmento. Faceted 0.76 0.76 0.70 Twitterecho: a distributed focused crawler to support Baseline 0.61 0.55 0.53 open research with twitter data. In Proceedings of the 21st international conference companion on World It was revealed that use of location and crisis tags for Wide Web, pages 1233–1240. ACM, 2012. faceted search significantly improves the quality of [4] Cornelia Caragea, Nathan McNeese, Anuj Jaiswal, ranking when searching posts about emergencies. Greg Traylor, Hyun-Woo Kim, Prasenjit Mitra, Dinghao Wu, Andrea H. Tapia, Lee Giles, 5 Conclusion Bernard J. Jansen, et al. Classifying text messages We presented an automated framework for crawling for the Haiti earthquake. In Proceedings of and processing textual documents about emergency ISCRAM, 2011. events in the Arctic zone. The main functions of the proposed framework are focused crawling and faceted [5] Stefano Cresci, Maurizio Tesconi, Andrea Cimino, search that takes into account information about and Felice Dell’Orletta. A linguistically-driven geographical locations and timestamps of messages. approach to cross-event damage assessment of 9 http://www.marinetraffic.com/ 153 natural disasters from social media messages. In communications in crises. In Proceedings of Proceedings of the 24th International Conference on ICWSM, 2014. World Wide Web Companion, pages 1195–1200. International World Wide Web Conferences [15] Gennady Osipov, Ivan Smirnov, Ilya Tikhomirov, Steering Committee, 2015. Ilya Sochenkov, and Artem Shelmanov. Exactus expert – search and analytical engine for research [6] Pavlos Fafalios and Yannis Tzitzikas. Exploratory and development support. In Novel Applications of professional search through semantic post-analysis Intelligent Systems, pages 269–285. Springer, 2016. of search results. In Professional Search in the Modern World, pages 166–192. Springer, 2014. [16] Lluís Padró and Evgeny Stanilovsky. Freeling 3.0: Towards wider multilinguality. In Proceedings of the [7] Gerhard Gossen, Elena Demidova, and Thomas Language Resources and Evaluation Conference Risse. The iCrawl Wizard – supporting interactive (LREC 2012). ELRA, 2012. focused crawl specification. In Advances in Information Retrieval, pages 797–800. Springer, [17] Hemant Purohit and Amit P Sheth. Twitris v3: From 2015. citizen sensing to analysis, coordination and action. In Proceedings of ICWSM, pages 746–747, 2013. [8] Thomas Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual [18] Nadine B Sarter and David D Woods. Situation international ACM SIGIR Conference on Research awareness: A critical but ill-defined phenomenon. and Development in Information Retrieval, pages The International Journal of Aviation Psychology, 50–57. ACM, 1999. 1(1):45–57, 1991. [9] Muhamma-d Imran, Carlos Castillo, Ji Lucas, [19] A. O. Shelmanov and I. V. Smirnov. Methods for Patrick Meier, and Sarah Vieweg. AIDR: Artificial semantic role labeling of Russian texts. In intelligence for disaster response. In Proceedings of Computational Linguistics and Intellectual the companion publication of the 23rd International Technologies. Papers from the Annual International Conference on World Wide Web Companion, pages Conference "Dialogue" (2014), number 13, pages 159–162, 2014. 607–620, 2014. [10] Rui Li, Kin Hou Lei, Ravi Khadiwala, and Kevin [20] Amit P. Sheth, Hemant Purohit, Ashutosh Sopan Chen-Chuan Chang. Tedas: A twitter-based event Jadhav, Pavan Kapanipathi, and Lu Chen. detection and analysis system. In Data engineering Understanding events through analysis of social (ICDE), 2012 IEEE 28th international conference, media. Kno.e.sis Center, Wright State University, pages 1273–1276. IEEE, 2012. Tech. Rep., 2010. [21] Adrian Silvescu, Cornelia Caragea, and Vasant [11] Rui Li, Shengjie Wang, and Kevin Chen-Chuan Honavar. Combining super-structuring and Chang. Towards social data platform: Automatic abstraction on sequence classification. In topic-focused monitor for twitter stream. Proceedings of ICDM, pages 986–991. IEEE, 2009. Proceedings of the VLDB Endowment, 6(14):1966– 1977, 2013. [22] Juan Sixto, Oscar Pena, Bernhard Klein, and Diego López-de Ipina. Enable tweet-geolocation and don’t [12] Alan M. MacEachren, Anuj Jaiswal, Anthony C. drive ERTs crazy! Improving situational awareness Robinson, Scott Pezanowski, Alexander Savelyev, using Twitter. Proceedings of SMERST, pages 27– Prasenjit Mitra, Xiao Zhang, and Justine Blanford. 31, 2013. Senseplace2: Geotwitter analytics support for situational awareness. In Proceedings of Visual [23] Irina Temnikova, Carlos Castillo, and Sarah Vieweg. Analytics Science and Technology (VAST) on IEEE Emterms 1.0: a terminological resource for crisis Conference, pages 181–190, 2011. tweets. In ISCRAM 2015 Proceedings of the 12th International Conference on Information Systems [13] Joakim Nivre, Igor M. Boguslavsky, and Leonid L. for Crisis Response and Management, 2015. Iomdin. Parsing the SynTagRus treebank of Russian. In Proceedings of the 22nd International Conference [24] Sudha Verma, Sarah Vieweg, William J Corvey, on Computational Linguistics (Coling 2008), pages Leysia Palen, James H Martin, Martha Palmer, 641–648, 2008. Aaron Schram, and Kenneth Mark Anderson. Natural language processing to the rescue? [14] Alexandra Olteanu, Carlos Castillo, Fernando Diaz, Extracting "situational awareness" tweets during and Sarah Vieweg. CrisisLex: A lexicon for mass emergency. In Proceedings of ICWSM, pages collecting and filtering microblogged 385–392, 2011. 154