=Paper=
{{Paper
|id=Vol-2088/paper9
|storemode=property
|title=Natural Disaster Database Design and Development for Himalaya Using Social Media
|pdfUrl=https://ceur-ws.org/Vol-2088/paper9.pdf
|volume=Vol-2088
|authors=Kiran Zahra
|dblpUrl=https://dblp.org/rec/conf/agile/Zahra17
}}
==Natural Disaster Database Design and Development for Himalaya Using Social Media==
Natural Disaster Database Design and Development for Himalaya Using Social Media Kiran Zahra Department of Geography, UZH Winterthurerstr. 190, 8057 Zurich, Switzerland kiran.zahra@geo.uzh.ch Abstract Ever increasing population of social media and increasing trend of sharing information related to several different topics has the great potential to use this information for different purposes such as disaster management. The use of Volunteered Geographic Information (VGI) has its own challenges because of the unstructured nature of social media such as extracting information out of noise and credibility related issues. This project focusses on the core issues of using user generated content for disaster management in developing countries. Keywords: VGI, Twitter, Credibility, toponym granularity. 1 Motivation and Project Objectives not only usable, but also very valuable in terms of its contents (Kavanaugh & Yang, 2011). In case of Frequency and impact of natural disasters is on disaster in a region the most important information the rise worldwide (Kellenberg & Mobarak, 2008 ; for disaster response organizations are location, Whybark, 2007). For more than fifty years, timing and impact (Jongman et al., 2015). Social researchers have focused on a series of vital media has the potential to serve in all these aspects. questions such as human occupancy of hazard The increasing trend of using smart phones and zones, response of people and societies, and how to accessibility to fast internet connections are the lessen the effects of environmental hazards (Cutter, main causes of rapid growth of social media 2012). A well-structured database about natural population (Kwak et al., 2010). Social media users disasters can better answer these questions and can such as Twitter community tend to share bulk of be an efficient source of analysing causes, effects, tweets publicly available and these tweets have capacity, vulnerability and further planning of great potential to extract meaningful information disaster management such as EM-DAT1. and use this information during emergency events. The Himalayas are the youngest mountain range But it is often very difficult to filter only “useful in the world. This mountain range is still information” from the Twitter stream by making geologically active (Lavé & Avouac, 2000), which keyword based queries and thus it is recommended causes a high occurrence of natural disasters in this to use machine leaning algorithms to filter region. Documenting natural disasters is important information (Sakaki, Okazaki, & Matsuo, 2013). in order to understand current issues and future Credibility of information shared on social media predictions. Although this kind of information is is another challenge to using this type of data available, it is technically incomplete and difficult during disasters. Mendoza, Poblete, & Castillo to maintain. (2010) state in their research that rumours tend to People share a bulk of information on social be questioned more than authentic news and media sites such as Twitter during mass propose a framework to deal with credibility emergencies (Hossmann et al., 2011 ; Terpstra et issues. al., 2012). Information shared on social media is 1 D. Guha-Sapir, R. Below, Ph. Hoyois - EM-DAT: International Disaster Database – www.emdat.be – Université Catholique de Louvain – Brussels – Belgium AGILE 2016 – Helsinki, June 14-17, 2016 Gelernter & Balaji (2013) discuss the increased retweets question the certainty and reliability of use of geographical content in tweets during Twitter in disaster situations because during a disasters. This geographic location information can disaster credibility of information is very be used to manage rescue operations during and important. Semantic analysis of tweets and after disasters and also for generating maps. keyword ambiguities have very little or no research Although disaster relief is very important issue, but about tweets originating from Himalayan region no research has done on utilizing VGI to design and and its surrounding countries, moreover the role of create natural disaster database. geographic features has never been studied in Overall aim of this study is to analyse different Naïve Bayes classification efficiency. aspects of disaster related information shared on Twitter streaming Application Program Interface Twitter about any disaster happening in Himalayan (API) downloads many fields called metadata of region and its surrounding countries. This social tweet with tweet text field. This metadata can be media data has the potential to contain helpful used to effectively assess tweet credibility (Canini, geographic information and situational updates Suh, & Pirolli, 2011). Credibility assessment issues during mass emergency events. These can be used are expected to be more complex due to cultural by disaster response agencies for efficient and language diversity in the Himalayan region. management. This research will help to investigate When there is a disaster, precise location the issues brought about by natural hazard identification is a very important element in many phenomena. situations. Disaster/emergency response The objectives of this research are to: organizations can only respond if they know • Apply machine learning algorithms to precise locations. People tend to add location resolve keyword ambiguities and optimize tweet information when they share something about classification disasters on social media. Lingad, Karimi, & Yin • Assess credibility of tweets based on data (2013) discuss a tool “Named Entity Recognizers”. quality standards of tweets shared during disasters They state that location extraction from microblogs • Analyse the granularity of geographic is becoming more accurate and that such tools have features reported in tweets during disasters, great potential to find out locational information geoparsing and geocoding those geographic more correctly. features to display on the map for visual Social media’s application to disaster representations management is getting more and more important • Conceptualize and implement a VGI based because of its real-time nature such as natural disaster database TweetTracker (Kumar et al., 2011), Artificial Intelligence for Disaster Response (AIDR) (Imran 2 State of the art et al., 2014), Twitcident (Abel et al., 2012), ScatterBlogs for situational awareness (Thom et Sakaki, Okazaki, & Matsuo (2013) discussed al., 2015). ‘semantic analysis of tweets’ in their research. Since forecasting disaster is difficult, efficient They did this analysis to detect target events from disaster management is very important to improve tweets. They also mentioned that identifying disaster response. There are many national and keywords alone from tweets about disasters is not international organizations maintaining records of enough to gather valuable information about the natural disasters such as EM-DAT, a famous global event. Verma et al. (2011) studied tweets natural and technological disaster database. Such ‘contributing to situational awareness’ and tweets databases are created and maintained by data ‘not contributing to situational awareness’. They collected from different primary and secondary used examples of tweets discussing the disaster but data sources. Witham (2005) discussed in his not contributing to situational awareness, as they research various issues about completeness and do not give any actionable information. accuracy of maintaining databases about disasters. Acar and Muraki (2011) argue that it is often very He also presented a new database about volcanic difficult to filter correct information out of the disasters and analyse this information for risk Twitter data stream. Uncontrollable informal management. AGILE 2016 – Helsinki, June 14-17, 2016 The role of social media as a great potential features reported during the event. These source of data during disasters has already been geographic features then later be geoparsed and discussed, this research focusses on using geocoded by available online platforms such as information shared on Twitter to design a VGI address-parser and gisgraphy. Geoparsing is the based database. Designing and developing VGI process of automatically identifying geographical based databases is a novel concept for the information (names of places) from text (Gelernter Himalayan region. This research gap formulated & Balaji, 2013). Geocoding is the process of the following research questions: converting text into geographical coordinates RQ 1: Which data mining techniques are (Ratcliffe, 2004). The spatial information successful in identifying relevant tweets identified from the tweets will be displayed on the originating from developing countries? map. This map is expected to reveal complex RQ 2: Does information shared on social media locations patterns about social media user’s way of meet quality indicators for user generated content reporting locations during disasters. (UGC) to use in case of disasters in the Himalaya Tweet text with its metadata will be analysed to region? identify potential attributes and entities to design RQ 3: What is geographic feature granularity in an Entity Relationship Diagram (ERD). This UGC during disasters? analysis will be performed on adequate subset of RQ 4: What are the potential contributions of acquired data. Designed ERD will be used for social media data in designing and developing VGI development of the database. based disaster database for the Himalaya region? 4 Conclusion 3 Methods This research project has great potential to be Twitter offers free, real-time data (tweets) implemented by disaster response agencies within downloads through its streaming API. This API the Himalayan region and also by international requires certain parameters to capture tweets such organizations by adapting during and post-disaster as particular keywords, tweets sent from particular framework. The Himalayan region is important in users, or tweets originating from a particular determining world climate, prone to frequent region. For this research, an application has been hazards, and is generally less accessible due to a designed in R to capture real-time tweets without lack of facilities and resources. This research can any pause, based on disaster-related keywords such be a good platform to minimize the impact of as flood, earthquake, and landslide. Starting from disasters in this region. This research project 25.4.2015 more than 25 Million tweets based on enriches the Geographic Information Science disaster-related keywords have been downloaded methods and concepts by focusing on unique so far. The supervised Naïve Bayes topic model is spatio-temporal phenomena of natural disasters. used to classify tweets in two classes: “information” and “not information”. 5 References The Twitter streaming API downloads tweets with their metadata. In addition to the tweet text • Acar, Adam and Muraki, Y. (2011). field, forty-one associated fields are also Twitter for crisis communication : lessons downloaded. Many researchers have identified learned from Japan’ s tsunami disaster. different sets of features to assess the credibility of 7(3), 392–402. tweets. Features such as the “number of followers, • Adam, N. R., Shafiq, B., & Staffin, R. retweet count, number of URL’s, verified, time, (2012). Spatial computing and social user description, status count, [and] length of a media in the context of disaster tweet” (Canini, Suh, & Pirolli, 2011; Morris et al., management. IEEE Intelligent 2012) will be used to assess the credibility of Systems, 27(6), 90-96. tweets in the Himalayan region. • Bilham, R., Gaur, V., & Molnar, P. (2001). A natural disaster event will be selected as a case Pollen Tubes Labeled With Green study to analyse the granularity of geographic Fluorescent Protein Show That the Pollen AGILE 2016 – Helsinki, June 14-17, 2016 Tube Also Pen- Etrates an, (August), ken2=exp=1450710999~acl=/static/pdf/9 1442–1444. 38/art%253A10.1023%252FA%253A102 • Canini, K. R., Suh, B., & Pirolli, P. L. 1 (2011). Finding Credible Information • Kavanaugh, A., & Yang, S. (2011). Sources in Social Networks Based on Microblogging in crisis situations :, 1–6. Content and Social Structure. 2011 IEEE • Kellenberg, D. K., & Mobarak, A. M. Third Int’l Conference on Privacy, (2008). Does rising income increase or Security, Risk and Trust and 2011 IEEE decrease damage risk from natural Third Int’l Conference on Social disasters? Journal of Urban Economics, Computing, 1–8. 63(3), 788–802. http://doi.org/10.1109/PASSAT/SocialCo http://doi.org/10.1016/j.jue.2007.05.003 m.2011.91 • Kwak, H., Lee, C., Park, H., & Moon, S. • Clay Whybark, D. (2007). Issues in (2010). What is Twitter , a Social Network managing disaster relief inventories. or a News Media? The International World International Journal of Production Wide Web Conference Committee Economics, 108(1–2), 228–235. (IW3C2), 1–10. http://doi.org/10.1016/j.ijpe.2006.12.012 http://doi.org/10.1145/1772690.1772751 • Cutter, Sussan L., (2012). Hazards, • Lavé, J., & Avouac, J. P. (2000). Active vulnerability and environmental justice. folding of fluvial terraces across the New York, NY: Taylor and Francis Siwaliks Hills, Himalayas of central • Earle, P. S., Bowden, D. C., & Guy, M. Nepal. Journal of Geophysical Research, (2011). Twitter earthquake detection: 105(B3), 5735. Earthquake monitoring in a social world. http://doi.org/10.1029/1999JB900292 Annals of Geophysics, 54(6), 708–715. • Lingad, J., Karimi, S., & Yin, J. (2013). http://doi.org/10.4401/ag-5364 Location extraction from disaster-related • Gelernter, J., & Balaji, S. (2013). An microblogs. WWW 2013 Companion - algorithm for local geoparsing of Proceedings of the 22nd International microtext. GeoInformatica, 17(4), 635– Conference on World Wide Web, 1017– 667. http://doi.org/10.1007/s10707-012- 1020. Retrieved from 0173-8 http://www.scopus.com/inward/record.url • Hossmann, T., Carta, P., Schatzmann, D., ?eid=2-s2.0- Legendre, F., Gunningberg, P., & Rohner, 84892685783&partnerID=tZOtx3y1 C. (2011). Twitter in Disaster Mode: • Mendoza, M., Poblete, B., & Castillo, C. http://doi.org/10.1145/2079360.2079367 (2010). Twitter Under Crisis : Can we trust • Jongman, B., Wagemaker, J., Romero, B., what we RT ?, 71–79. & de Perez, E. (2015). Early Flood • Middleton, S. E., Middleton, L., & Detection for Rapid Humanitarian Modafferi, S. (2014). Real-time crisis Response: Harnessing Near Real-Time mapping of natural disasters using social Satellite and Twitter Signals. ISPRS media. IEEE Intelligent Systems, 29(2), 9– International Journal of Geo-Information, 17. http://doi.org/10.1109/MIS.2013.126 4(4), 2246–2266. • Morris, M. R., Counts, S., Roseway, A., http://doi.org/10.3390/ijgi4042246 Hoff, A., & Schwarz, J. (2012). Tweeting • Kattelmann, R. (2003). Glacial Lake is believing? Proceedings of the ACM Outburst Floods in the Nepal Himalaya: A 2012 Conference on Computer Supported Manageable Hazard? Retrieved December Cooperative Work - CSCW ’12, 441–450. 21, 2015, from http://doi.org/10.1145/2145204.2145274 http://download.springer.com/static/pdf/9 • Murthy, D. (2013). NEW MEDIA AND 38/art%3A10.1023%2FA%3A102113010 NATURAL DISASTERS: Blogs and the 1283.pdf?originUrl=http://link.springer.c 2004 Indian Ocean tsunami. Information om/article/10.1023/A:1021130101283&to Communication and Society, 16(7), 1176– AGILE 2016 – Helsinki, June 14-17, 2016 1192. development. IEEE Transactions on http://doi.org/10.1080/1369118X.2011.61 Knowledge and Data Engineering, 25(4), 1815 919–931. • Murthy, D., & Longwell, S. A. (2013). http://doi.org/10.1109/TKDE.2012.29 Twitter and Disasters. Information, • Singh, A. (2010). Climate Change and Communication & Society, 16(6), 837– Disasters in the Hindu Kush Himalayan 855. Region. Article, New Delhi, India: http://doi.org/10.1080/1369118X.2012.69 Solution Exchange. Retrieved from 6123 http://amritapeoplesvoice.blogspot.com • Nasikhin, & Adriani, M. (2007). Location • Terpstra, T., Stronkman, R., Vries, a De, Identification for the Geographic & Paradies, G. L. (2012). Towards a information Retrieval. realtime Twitter analysis during crises for • Petrović, S., Osborne, M., & Lavrenko, V. operational crisis management. (2010). Streaming first story detection Proceedings of ISCRAM 2012, (April), 1– with application to twitter. NAACL HLT 9. 2010 - Human Language Technologies: • Thayyen, R. J., & Gergan, J. T. (2010). The 2010 Annual Conference of the North Role of glaciers in watershed hydrology: a American Chapter of the Association for preliminary study of a “Himalayan Computational Linguistics, Proceedings catchment.” The Cryosphere, 4(1), 115– of the Main Conference, (June), 181–189. 128. http://doi.org/10.5194/tc-4-115-2010 Retrieved from • Verma, S., Vieweg, S., Corvey, W. J., http://www.scopus.com/inward/record.url Palen, L., Martin, J. H., Palmer, M., … ?eid=2-s2.0- Anderson, K. M. (2011). Natural 80053272732&partnerID=tZOtx3y1 Language Processing to the Rescue ?: • Poblete, B., Castillo, C., & Mendoza, M. Extracting “ Situational Awareness ” (2011). Information Credibility on Tweets During Mass Emergency. Twitter, 45–59. • Vieweg, S., Hughes, A. L., Starbird, K., & • Ratcliffe, J. H. (2004). Geocoding crime Palen, L. (2010). Microblogging during and a first estimate of a minimum two natural hazards events. Proceedings of acceptable hit rate. International Journal the 28th International Conference on of Geographical Information Science, Human Factors in Computing Systems - 18(January), 61–72. CHI ’10, 1079. http://doi.org/10.1080/136588103100015 http://doi.org/10.1145/1753326.1753486 96076 • Viviroli, D., & Weingartner, R. (2004). • Richardson, S. D., & Reynolds, J. M. The hydrological significance of (2000). An overview of glacial hazards in mountains: from regional to global scale. the Himalayas. Quaternary International, Hydrology and Earth System Sciences, 65–66, 31–47. 8(6), 1017–1030. http://doi.org/10.1016/S1040- http://doi.org/10.5194/hess-8-1017-2004 6182(99)00035-X • Witham, C. S. (2005). Volcanic disasters • Sadler, J. C., & Ramage, C. S. (1976). The and incidents: A new database. Journal of Role of Mountains in the South Asian Volcanology and Geothermal Research, Monsoon Circulation. Journal of the 148(3–4), 191–233. Atmospheric Sciences. http://doi.org/10.1016/j.jvolgeores.2005.0 http://doi.org/10.1175/1520- 4.017 0469(1976)033<2255:COROMI>2.0.CO; • Yamada, T., & Sharma, C. (1993). Glacier 2 lakes and outburst floods in the Nepal • Sakaki, T., Okazaki, M., & Matsuo, Y. Himalaya. IAHS Publications- (2013). Tweet analysis for real-time event Publications of …, (218), 319–330. detection and earthquake reporting system Retrieved from AGILE 2016 – Helsinki, June 14-17, 2016 http://ks360352.kimsufi.com/redbooks/a2 18/iahs_218_0319.pdf • Imran, M., Castillo, C., Lucas, J., Meier, P., et al. (2014) AIDR: Artificial intelligence for disaster response. In: Proceedings of the companion publication of the 23rd international conference on World wide web companion. [Online]. 2014 pp. 159–162. Available from: doi:10.1145/2567948.2577034. • Kumar, S., Barbier, G., Ali Abbasi, M.A. & Liu, H. (2011) TweetTracker: An Analysis Tool for Humanitarian and Disaster Relief. In: Fifth International AAAI Conference on Weblogs and Social Media. 2011 pp. 661–662. • Thom, D., Kruger, R., Ertl, T., Bechstedt, U., et al. (2015) Can twitter really save your life? A case study of visual social media analytics for situation awareness. In: 2015 IEEE Pacific Visualization Symposium (PacificVis). [Online]. 2015 pp. 183–190. Available from: doi:10.1109/PACIFICVIS.2015.7156376. • Abel, F., Hauff, C., Houben, G.-J., Stronkman, R., et al. (2012) Twitcident: Fighting fire with information from social web streams. In: Proceedings of the 21st international conference companion on World Wide Web - WWW ’12 Companion. [Online]. 2012 p. 305. Available from: doi:10.1145/2187980.2188035.