=Paper= {{Paper |id=Vol-2088/paper9 |storemode=property |title=Natural Disaster Database Design and Development for Himalaya Using Social Media |pdfUrl=https://ceur-ws.org/Vol-2088/paper9.pdf |volume=Vol-2088 |authors=Kiran Zahra |dblpUrl=https://dblp.org/rec/conf/agile/Zahra17 }} ==Natural Disaster Database Design and Development for Himalaya Using Social Media== https://ceur-ws.org/Vol-2088/paper9.pdf
        Natural Disaster Database Design and Development for Himalaya Using
                                    Social Media
                                                                         Kiran Zahra
                                                              Department of Geography, UZH
                                                       Winterthurerstr. 190, 8057 Zurich, Switzerland
                                                                 kiran.zahra@geo.uzh.ch


                                                                              Abstract

           Ever increasing population of social media and increasing trend of sharing information related to several different topics has the great potential
        to use this information for different purposes such as disaster management. The use of Volunteered Geographic Information (VGI) has its own
        challenges because of the unstructured nature of social media such as extracting information out of noise and credibility related issues. This
        project focusses on the core issues of using user generated content for disaster management in developing countries.
        Keywords: VGI, Twitter, Credibility, toponym granularity.

1        Motivation and Project Objectives                                      not only usable, but also very valuable in terms of
                                                                                its contents (Kavanaugh & Yang, 2011). In case of
  Frequency and impact of natural disasters is on                               disaster in a region the most important information
the rise worldwide (Kellenberg & Mobarak, 2008 ;                                for disaster response organizations are location,
Whybark, 2007). For more than fifty years,                                      timing and impact (Jongman et al., 2015). Social
researchers have focused on a series of vital                                   media has the potential to serve in all these aspects.
questions such as human occupancy of hazard                                       The increasing trend of using smart phones and
zones, response of people and societies, and how to                             accessibility to fast internet connections are the
lessen the effects of environmental hazards (Cutter,                            main causes of rapid growth of social media
2012). A well-structured database about natural                                 population (Kwak et al., 2010). Social media users
disasters can better answer these questions and can                             such as Twitter community tend to share bulk of
be an efficient source of analysing causes, effects,                            tweets publicly available and these tweets have
capacity, vulnerability and further planning of                                 great potential to extract meaningful information
disaster management such as EM-DAT1.                                            and use this information during emergency events.
  The Himalayas are the youngest mountain range                                 But it is often very difficult to filter only “useful
in the world. This mountain range is still                                      information” from the Twitter stream by making
geologically active (Lavé & Avouac, 2000), which                                keyword based queries and thus it is recommended
causes a high occurrence of natural disasters in this                           to use machine leaning algorithms to filter
region. Documenting natural disasters is important                              information (Sakaki, Okazaki, & Matsuo, 2013).
in order to understand current issues and future                                  Credibility of information shared on social media
predictions. Although this kind of information is                               is another challenge to using this type of data
available, it is technically incomplete and difficult                           during disasters. Mendoza, Poblete, & Castillo
to maintain.                                                                    (2010) state in their research that rumours tend to
  People share a bulk of information on social                                  be questioned more than authentic news and
media sites such as Twitter during mass                                         propose a framework to deal with credibility
emergencies (Hossmann et al., 2011 ; Terpstra et                                issues.
al., 2012). Information shared on social media is

    1
    D. Guha-Sapir, R. Below, Ph. Hoyois - EM-DAT:
International Disaster Database – www.emdat.be –
Université Catholique de Louvain – Brussels – Belgium
    AGILE 2016 – Helsinki, June 14-17, 2016




  Gelernter & Balaji (2013) discuss the increased       retweets question the certainty and reliability of
use of geographical content in tweets during            Twitter in disaster situations because during a
disasters. This geographic location information can     disaster credibility of information is very
be used to manage rescue operations during and          important. Semantic analysis of tweets and
after disasters and also for generating maps.           keyword ambiguities have very little or no research
Although disaster relief is very important issue, but   about tweets originating from Himalayan region
no research has done on utilizing VGI to design and     and its surrounding countries, moreover the role of
create natural disaster database.                       geographic features has never been studied in
  Overall aim of this study is to analyse different     Naïve Bayes classification efficiency.
aspects of disaster related information shared on         Twitter streaming Application Program Interface
Twitter about any disaster happening in Himalayan       (API) downloads many fields called metadata of
region and its surrounding countries. This social       tweet with tweet text field. This metadata can be
media data has the potential to contain helpful         used to effectively assess tweet credibility (Canini,
geographic information and situational updates          Suh, & Pirolli, 2011). Credibility assessment issues
during mass emergency events. These can be used         are expected to be more complex due to cultural
by disaster response agencies for efficient             and language diversity in the Himalayan region.
management. This research will help to investigate        When there is a disaster, precise location
the issues brought about by natural hazard              identification is a very important element in many
phenomena.                                              situations.       Disaster/emergency        response
  The objectives of this research are to:               organizations can only respond if they know
  •      Apply machine learning algorithms to           precise locations. People tend to add location
resolve keyword ambiguities and optimize tweet          information when they share something about
classification                                          disasters on social media. Lingad, Karimi, & Yin
  •      Assess credibility of tweets based on data     (2013) discuss a tool “Named Entity Recognizers”.
quality standards of tweets shared during disasters     They state that location extraction from microblogs
  •      Analyse the granularity of geographic          is becoming more accurate and that such tools have
features reported in tweets during disasters,           great potential to find out locational information
geoparsing and geocoding those geographic               more correctly.
features to display on the map for visual                 Social media’s application to disaster
representations                                         management is getting more and more important
  •      Conceptualize and implement a VGI based        because of its real-time nature such as
natural disaster database                               TweetTracker (Kumar et al., 2011), Artificial
                                                        Intelligence for Disaster Response (AIDR) (Imran
2     State of the art                                  et al., 2014), Twitcident (Abel et al., 2012),
                                                        ScatterBlogs for situational awareness (Thom et
  Sakaki, Okazaki, & Matsuo (2013) discussed            al., 2015).
‘semantic analysis of tweets’ in their research.          Since forecasting disaster is difficult, efficient
They did this analysis to detect target events from     disaster management is very important to improve
tweets. They also mentioned that identifying            disaster response. There are many national and
keywords alone from tweets about disasters is not       international organizations maintaining records of
enough to gather valuable information about the         natural disasters such as EM-DAT, a famous global
event. Verma et al. (2011) studied tweets               natural and technological disaster database. Such
‘contributing to situational awareness’ and tweets      databases are created and maintained by data
‘not contributing to situational awareness’. They       collected from different primary and secondary
used examples of tweets discussing the disaster but     data sources. Witham (2005) discussed in his
not contributing to situational awareness, as they      research various issues about completeness and
do not give any actionable information.                 accuracy of maintaining databases about disasters.
  Acar and Muraki (2011) argue that it is often very    He also presented a new database about volcanic
difficult to filter correct information out of the      disasters and analyse this information for risk
Twitter data stream. Uncontrollable informal            management.
                                                                        AGILE 2016 – Helsinki, June 14-17, 2016




  The role of social media as a great potential           features reported during the event. These
source of data during disasters has already been          geographic features then later be geoparsed and
discussed, this research focusses on using                geocoded by available online platforms such as
information shared on Twitter to design a VGI             address-parser and gisgraphy. Geoparsing is the
based database. Designing and developing VGI              process of automatically identifying geographical
based databases is a novel concept for the                information (names of places) from text (Gelernter
Himalayan region. This research gap formulated            & Balaji, 2013). Geocoding is the process of
the following research questions:                         converting text into geographical coordinates
  RQ 1: Which data mining techniques are                  (Ratcliffe, 2004). The spatial information
successful in identifying relevant tweets                 identified from the tweets will be displayed on the
originating from developing countries?                    map. This map is expected to reveal complex
  RQ 2: Does information shared on social media           locations patterns about social media user’s way of
meet quality indicators for user generated content        reporting locations during disasters.
(UGC) to use in case of disasters in the Himalaya           Tweet text with its metadata will be analysed to
region?                                                   identify potential attributes and entities to design
  RQ 3: What is geographic feature granularity in         an Entity Relationship Diagram (ERD). This
UGC during disasters?                                     analysis will be performed on adequate subset of
  RQ 4: What are the potential contributions of           acquired data. Designed ERD will be used for
social media data in designing and developing VGI         development of the database.
based disaster database for the Himalaya region?
                                                          4    Conclusion
3    Methods
                                                               This research project has great potential to be
  Twitter offers free, real-time data (tweets)            implemented by disaster response agencies within
downloads through its streaming API. This API             the Himalayan region and also by international
requires certain parameters to capture tweets such        organizations by adapting during and post-disaster
as particular keywords, tweets sent from particular       framework. The Himalayan region is important in
users, or tweets originating from a particular            determining world climate, prone to frequent
region. For this research, an application has been        hazards, and is generally less accessible due to a
designed in R to capture real-time tweets without         lack of facilities and resources. This research can
any pause, based on disaster-related keywords such        be a good platform to minimize the impact of
as flood, earthquake, and landslide. Starting from        disasters in this region. This research project
25.4.2015 more than 25 Million tweets based on            enriches the Geographic Information Science
disaster-related keywords have been downloaded            methods and concepts by focusing on unique
so far. The supervised Naïve Bayes topic model is         spatio-temporal phenomena of natural disasters.
used to classify tweets in two classes:
“information” and “not information”.                      5   References
  The Twitter streaming API downloads tweets
with their metadata. In addition to the tweet text            •   Acar, Adam and Muraki, Y. (2011).
field, forty-one associated fields are also                       Twitter for crisis communication : lessons
downloaded. Many researchers have identified                      learned from Japan’ s tsunami disaster.
different sets of features to assess the credibility of           7(3), 392–402.
tweets. Features such as the “number of followers,            •   Adam, N. R., Shafiq, B., & Staffin, R.
retweet count, number of URL’s, verified, time,                   (2012). Spatial computing and social
user description, status count, [and] length of a                 media in the context of disaster
tweet” (Canini, Suh, & Pirolli, 2011; Morris et al.,              management.          IEEE      Intelligent
2012) will be used to assess the credibility of                   Systems, 27(6), 90-96.
tweets in the Himalayan region.                               •   Bilham, R., Gaur, V., & Molnar, P. (2001).
  A natural disaster event will be selected as a case             Pollen Tubes Labeled With Green
study to analyse the granularity of geographic                    Fluorescent Protein Show That the Pollen
AGILE 2016 – Helsinki, June 14-17, 2016




      Tube Also Pen- Etrates an, (August),              ken2=exp=1450710999~acl=/static/pdf/9
      1442–1444.                                        38/art%253A10.1023%252FA%253A102
 •    Canini, K. R., Suh, B., & Pirolli, P. L.          1
      (2011). Finding Credible Information          •   Kavanaugh, A., & Yang, S. (2011).
      Sources in Social Networks Based on               Microblogging in crisis situations :, 1–6.
      Content and Social Structure. 2011 IEEE       •   Kellenberg, D. K., & Mobarak, A. M.
      Third Int’l Conference on Privacy,                (2008). Does rising income increase or
      Security, Risk and Trust and 2011 IEEE            decrease damage risk from natural
      Third Int’l Conference on Social                  disasters? Journal of Urban Economics,
      Computing,                             1–8.       63(3),                            788–802.
      http://doi.org/10.1109/PASSAT/SocialCo            http://doi.org/10.1016/j.jue.2007.05.003
      m.2011.91                                     •   Kwak, H., Lee, C., Park, H., & Moon, S.
 •    Clay Whybark, D. (2007). Issues in                (2010). What is Twitter , a Social Network
      managing disaster relief inventories.             or a News Media? The International World
      International Journal of Production               Wide Web Conference Committee
      Economics,        108(1–2),       228–235.        (IW3C2),                               1–10.
      http://doi.org/10.1016/j.ijpe.2006.12.012         http://doi.org/10.1145/1772690.1772751
 •    Cutter, Sussan L., (2012). Hazards,           •   Lavé, J., & Avouac, J. P. (2000). Active
      vulnerability and environmental justice.          folding of fluvial terraces across the
      New York, NY: Taylor and Francis                  Siwaliks Hills, Himalayas of central
 •    Earle, P. S., Bowden, D. C., & Guy, M.            Nepal. Journal of Geophysical Research,
      (2011). Twitter earthquake detection:             105(B3),                               5735.
      Earthquake monitoring in a social world.          http://doi.org/10.1029/1999JB900292
      Annals of Geophysics, 54(6), 708–715.         •   Lingad, J., Karimi, S., & Yin, J. (2013).
      http://doi.org/10.4401/ag-5364                    Location extraction from disaster-related
 •    Gelernter, J., & Balaji, S. (2013). An            microblogs. WWW 2013 Companion -
      algorithm for local geoparsing of                 Proceedings of the 22nd International
      microtext. GeoInformatica, 17(4), 635–            Conference on World Wide Web, 1017–
      667. http://doi.org/10.1007/s10707-012-           1020.             Retrieved             from
      0173-8                                            http://www.scopus.com/inward/record.url
 •    Hossmann, T., Carta, P., Schatzmann, D.,          ?eid=2-s2.0-
      Legendre, F., Gunningberg, P., & Rohner,          84892685783&partnerID=tZOtx3y1
      C. (2011). Twitter in Disaster Mode:          •   Mendoza, M., Poblete, B., & Castillo, C.
      http://doi.org/10.1145/2079360.2079367            (2010). Twitter Under Crisis : Can we trust
 •    Jongman, B., Wagemaker, J., Romero, B.,           what we RT ?, 71–79.
      & de Perez, E. (2015). Early Flood            •   Middleton, S. E., Middleton, L., &
      Detection for Rapid Humanitarian                  Modafferi, S. (2014). Real-time crisis
      Response: Harnessing Near Real-Time               mapping of natural disasters using social
      Satellite and Twitter Signals. ISPRS              media. IEEE Intelligent Systems, 29(2), 9–
      International Journal of Geo-Information,         17. http://doi.org/10.1109/MIS.2013.126
      4(4),                           2246–2266.    •   Morris, M. R., Counts, S., Roseway, A.,
      http://doi.org/10.3390/ijgi4042246                Hoff, A., & Schwarz, J. (2012). Tweeting
 •    Kattelmann, R. (2003). Glacial Lake               is believing? Proceedings of the ACM
      Outburst Floods in the Nepal Himalaya: A          2012 Conference on Computer Supported
      Manageable Hazard? Retrieved December             Cooperative Work - CSCW ’12, 441–450.
      21,               2015,               from        http://doi.org/10.1145/2145204.2145274
      http://download.springer.com/static/pdf/9     •   Murthy, D. (2013). NEW MEDIA AND
      38/art%3A10.1023%2FA%3A102113010                  NATURAL DISASTERS: Blogs and the
      1283.pdf?originUrl=http://link.springer.c         2004 Indian Ocean tsunami. Information
      om/article/10.1023/A:1021130101283&to             Communication and Society, 16(7), 1176–
                                                            AGILE 2016 – Helsinki, June 14-17, 2016




    1192.                                             development. IEEE Transactions on
    http://doi.org/10.1080/1369118X.2011.61           Knowledge and Data Engineering, 25(4),
    1815                                              919–931.
•   Murthy, D., & Longwell, S. A. (2013).             http://doi.org/10.1109/TKDE.2012.29
    Twitter and Disasters. Information,           •   Singh, A. (2010). Climate Change and
    Communication & Society, 16(6), 837–              Disasters in the Hindu Kush Himalayan
    855.                                              Region. Article, New Delhi, India:
    http://doi.org/10.1080/1369118X.2012.69           Solution Exchange. Retrieved from
    6123                                              http://amritapeoplesvoice.blogspot.com
•   Nasikhin, & Adriani, M. (2007). Location      •   Terpstra, T., Stronkman, R., Vries, a De,
    Identification    for    the   Geographic         & Paradies, G. L. (2012). Towards a
    information Retrieval.                            realtime Twitter analysis during crises for
•   Petrović, S., Osborne, M., & Lavrenko, V.         operational        crisis      management.
    (2010). Streaming first story detection           Proceedings of ISCRAM 2012, (April), 1–
    with application to twitter. NAACL HLT            9.
    2010 - Human Language Technologies:           •   Thayyen, R. J., & Gergan, J. T. (2010).
    The 2010 Annual Conference of the North           Role of glaciers in watershed hydrology: a
    American Chapter of the Association for           preliminary study of a “Himalayan
    Computational Linguistics, Proceedings            catchment.” The Cryosphere, 4(1), 115–
    of the Main Conference, (June), 181–189.          128. http://doi.org/10.5194/tc-4-115-2010
    Retrieved                             from    •   Verma, S., Vieweg, S., Corvey, W. J.,
    http://www.scopus.com/inward/record.url           Palen, L., Martin, J. H., Palmer, M., …
    ?eid=2-s2.0-                                      Anderson, K. M. (2011). Natural
    80053272732&partnerID=tZOtx3y1                    Language Processing to the Rescue ?:
•   Poblete, B., Castillo, C., & Mendoza, M.          Extracting “ Situational Awareness ”
    (2011). Information Credibility on                Tweets During Mass Emergency.
    Twitter, 45–59.                               •   Vieweg, S., Hughes, A. L., Starbird, K., &
•   Ratcliffe, J. H. (2004). Geocoding crime          Palen, L. (2010). Microblogging during
    and a first estimate of a minimum                 two natural hazards events. Proceedings of
    acceptable hit rate. International Journal        the 28th International Conference on
    of Geographical Information Science,              Human Factors in Computing Systems -
    18(January),                        61–72.        CHI                 ’10,               1079.
    http://doi.org/10.1080/136588103100015            http://doi.org/10.1145/1753326.1753486
    96076                                         •   Viviroli, D., & Weingartner, R. (2004).
•   Richardson, S. D., & Reynolds, J. M.              The     hydrological      significance     of
    (2000). An overview of glacial hazards in         mountains: from regional to global scale.
    the Himalayas. Quaternary International,          Hydrology and Earth System Sciences,
    65–66,                              31–47.        8(6),                            1017–1030.
    http://doi.org/10.1016/S1040-                     http://doi.org/10.5194/hess-8-1017-2004
    6182(99)00035-X                               •   Witham, C. S. (2005). Volcanic disasters
•   Sadler, J. C., & Ramage, C. S. (1976). The        and incidents: A new database. Journal of
    Role of Mountains in the South Asian              Volcanology and Geothermal Research,
    Monsoon Circulation. Journal of the               148(3–4),                          191–233.
    Atmospheric                       Sciences.       http://doi.org/10.1016/j.jvolgeores.2005.0
    http://doi.org/10.1175/1520-                      4.017
    0469(1976)033<2255:COROMI>2.0.CO;             •   Yamada, T., & Sharma, C. (1993). Glacier
    2                                                 lakes and outburst floods in the Nepal
•   Sakaki, T., Okazaki, M., & Matsuo, Y.             Himalaya.         IAHS         Publications-
    (2013). Tweet analysis for real-time event        Publications of …, (218), 319–330.
    detection and earthquake reporting system         Retrieved                               from
AGILE 2016 – Helsinki, June 14-17, 2016




      http://ks360352.kimsufi.com/redbooks/a2
      18/iahs_218_0319.pdf
 •     Imran, M., Castillo, C., Lucas, J., Meier,
      P., et al. (2014) AIDR: Artificial
      intelligence for disaster response. In:
      Proceedings of the companion publication
      of the 23rd international conference on
      World wide web companion. [Online].
      2014 pp. 159–162. Available from:
      doi:10.1145/2567948.2577034.
 •    Kumar, S., Barbier, G., Ali Abbasi, M.A.
      & Liu, H. (2011) TweetTracker: An
      Analysis Tool for Humanitarian and
      Disaster Relief. In: Fifth International
      AAAI Conference on Weblogs and Social
      Media. 2011 pp. 661–662.
 •    Thom, D., Kruger, R., Ertl, T., Bechstedt,
      U., et al. (2015) Can twitter really save
      your life? A case study of visual social
      media analytics for situation awareness.
      In: 2015 IEEE Pacific Visualization
      Symposium (PacificVis). [Online]. 2015
      pp.      183–190.      Available     from:
      doi:10.1109/PACIFICVIS.2015.7156376.
 •    Abel, F., Hauff, C., Houben, G.-J.,
      Stronkman, R., et al. (2012) Twitcident:
      Fighting fire with information from social
      web streams. In: Proceedings of the 21st
      international conference companion on
      World Wide Web - WWW ’12 Companion.
      [Online]. 2012 p. 305. Available from:
      doi:10.1145/2187980.2188035.