<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Retrieve Volunteered Geographic Information for Forest Fire</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Laura Spinsanti</string-name>
          <email>laura.spinsanti@Springer.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frank Ostermann</string-name>
          <email>frank.ostermann@Springer.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Spatial Data Infrastructures Unit European Commission - Joint Research Center Institute for Environment and Sustainability Via E. Fermi</institution>
          ,
          <addr-line>T.P. 262, I-21027 Ispra (VA)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The proposed contribution describes a methodology we are testing in the framework of an exploratory research project on the contribution of Volunteered Geographic Information (VGI) in the case-study of forest fires. The purpose of the research is to assess the value of VGI at different stages of a fire event. The first step of the methodology focuses on the retrieval of geographical information related to forest fire from social networks. VGI is intrinsically heterogeneous as it is provided by different people, using different media such as photographs, text, and video. Moreover, geographical information can be explicit and expressed as coordinates or implicit as expressed in the text as location names. In this stage of the project we are extracting text data from Twitter and photographs from Flickr.</p>
      </abstract>
      <kwd-group>
        <kwd>VGI</kwd>
        <kwd>Forest Fire</kwd>
        <kwd>natural hazards</kwd>
        <kwd>geographic information retrieval</kwd>
        <kwd>social network</kwd>
        <kwd>Twitter</kwd>
        <kwd>Flickr</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>In the framework of an exploratory research project in partnership with Google and
the Land Management and Natural Hazards Unit (JRC-EU), we are developing and
testing a methodology to semi-automatically harvesting and analyzing Volunteered
Geographic Information (VGI) in a case-study on forest fires. The purpose of the
research is to assess the value of VGI at different stages of a fire event and integrate it
with the flow of information from “official” sources as used by the European Forest
Fire Information System (effis.jrc.ec.europa.eu).</p>
      <p>The background of the research is the concept of citizens as sensors for a next
generation digital earth [1, 2, 3]. The amount of volunteered geographic information
is ever increasing, with many social media platforms adding the ability to
geographically reference any volunteered information (Twitter GeoTagging,
Facebook Places). The wide-spread use of GPS sensors in portable devices like digital
cameras and smartphones facilitates this development.</p>
      <p>The expected amount of volunteered data makes it inevitable to automatize the
tasks of retrieve and filter using context aware methodologies to help in the
assessment of the information relevance.</p>
      <p>How to integrate this data with existing spatial data infrastructures will be a major
future challenge.</p>
      <p>In the next sections, we give some background information on VGI, we focus on
the extension in a forest fire context of a work flow originally proposed in [2] and
report on our preliminary findings of retrieved data for 2010 summer fire season.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Volunteered geographic information and forest fire management</title>
      <p>Today, there is already a substantial amount of information provided by the public on
disasters, which has always been strongly involved [5]. This VGI is intrinsically
heterogeneous as it is provided by different people, using different media and authors
often overcome device and software limitations in imaginative and unpredictable
ways. Moreover, the expected future amount of near-real time, geographically
referenced volunteered information will increase manifold during the coming years
because of the next generation smart phones and device. Experiences like the
Ushahidi platform [6] usage in Haiti 2010 heart quake disaster management cannot be
replicate unless the analysis of the data flow will be (semi)-automatic.</p>
      <p>This development is going to change the way information is collected, distributed
and used. In the past, there was only a uni-directional vertical flow of information
from officials to public, with horizontal peer-to-peer communication limited to a
small reach. With the web 2.0 the amount of bi-directional horizontal peer-to-peer
communication among the public has been increasing. Another boost has been the
recent possibilities to reach an even larger number of people in real-time through
social media platforms like Twitter and Facebook. It is expected that also the
traditional uni-directional vertical flow of information is going to be affected,
possibly becoming bi-directional soon. The lines between public and official already
blur when official administrative agencies like British Columbia that use regularly
Facebook or Twitter for communication services.</p>
      <p>Despite the difficulties to integrate with traditional established emergency response
protocols and opposition, it is a fact that people in crisis context tried to get more
information, to help by organizing the information and making portals available to
others. This activity was motivated by the observation that traditional media outlets
were often too inaccurate, too global, biased towards large agglomeration/urban areas,
and late.</p>
    </sec>
    <sec id="sec-3">
      <title>3 VGI work flow: from harvesting to integration with official data</title>
      <p>The entire work flow starts with the retrieval of data from various social media
sources in the Web. Then the data are processed, integrated, analyzed and evaluated.
At the end, clustered and quality assessed information is integrated with the official
data and produce an enriched representation of the event. The core part of the process
is a complex knowledge discovery process that involves several steps and cycles,
especially for data source that are continuously updated during the event.</p>
      <p>The data can be analyzed and assessed as single piece of information, but also as
grouped information. The dimensions of analysis are numerous and can be complex,
as for space and time dimension. For these reasons, analysis and discovery methods
are domain driven, including natural language processing (NLP) and ontology, and
data driven, including spatio-temporal data mining. Moreover, the grouped
information can lead to a new knowledge that feeds back into the assessment of the
single information.</p>
      <p>The work flow is completed with the integration of the quality-controlled,
eventrelated VGI into the official spatial data infrastructures used during the crisis.
3.1 Geographic information retrieval: summer 2010
We harvested Twitter using the provided streaming API and 12 wildfire related
keywords (e.g. fire, helicopter, evacuation) in 8 different languages. We collected
24,5 GB of data for more than 8 million messages and 700 thousand images. Around
1% of the dada is geocoded with coordinates (even fewer in Europe). Many of these
contain geographic information such as: “Two very Large forest fire in the mountains
behind Funchal clouds of smoke covered the sun turn sunlight deep yellow ash
coming down”. It is worth notice, that this kind of information must be carefully
analyzed. At first, it refers to the city of Funchal. We used YahooPlacemaker1 that is a
freely available geoparsing Web service. More precisely, the text refers to the
mountain behind the city. This level of natural language analysis has to be performed
in the future of the project.</p>
      <p>A first broader analysis was performed by Country. As an example we show the
2010 French Fires located in the South part of France near the city of Marseille. We
1 http://developer.yahoo.com/geo/placemaker/
selected the Tweets and the photographs posted on Flickr in the season containing the
word “incendie”. Only few of them had coordinates. For the others, we used the
YahooPlacemaker service and filtered the ones located in France. Then we clustered
(minimum 2 related information) by day and location, resulting in spatio temporal
point we call „events‟. As it is possible to observe in Fig. 2, there is a correspondence
between the official fires registered and the social networks activity. In the map it is
possible to observe only the spatial dimension of the events, but temporal clustering
also confirms the temporal correspondence. The next steps of the project consist in
applying knowledge discovery methods to refine the retrieval and the analysis of the
data.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this paper, we have argued that emerging use of social media by members of the
public will become an important pathway for communicating during a crisis event.
Almost all volunteered information has a geographic component. VGI has the
potential to dramatically change the traditional top-down, uni-directional
communication pathway from official to public via broadcasting media. Recent
examples have shown that the public and official institutions embrace the new
platforms for communicating volunteered information. However, the increasing usage
will also amplify two main challenges: the volume of information will need some sort
of filtering, and in order to be useful for official emergency response, its quality,
relevance and credibility needs to be assessed. We propose a work flow for the
retrieval, formatting, filtering, and assessment of volunteered geographic information
from social media. The integration of VGI with official spatial data infrastructures,
and its use by the public and decision-makers, are the final steps in the proposed work
flow.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Craglia</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          et al.,
          <year>2008</year>
          .
          <article-title>Next-Generation Digital Earth: A position paper from the Vespucci Initiative for the Advancement of Geographic Information Science</article-title>
          .
          <source>International Journal of Spatial Data Infrastructures Research</source>
          ,
          <volume>3</volume>
          ,
          <fpage>146</fpage>
          -
          <lpage>167</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>De Longueville</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Annoni</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          et al.,
          <year>2010</year>
          .
          <article-title>Digital Earth's Nervous System for crisis events: real-time Sensor Web Enablement of Volunteered Geographic Information</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>International Journal of Digital Earth</source>
          ,
          <volume>3</volume>
          (
          <issue>3</issue>
          ),
          <fpage>242</fpage>
          -
          <lpage>259</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Goodchild</surname>
            ,
            <given-names>M.F.</given-names>
          </string-name>
          ,
          <year>2007</year>
          .
          <article-title>Citizens as Voluntary Sensors: Spatial Data Infrastructure in the World of Web 2.0</article-title>
          .
          <source>International Journal of Spatial Data Infrastructures Research</source>
          ,
          <volume>2</volume>
          ,
          <fpage>24</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Goodchild</surname>
            ,
            <given-names>M.F.</given-names>
          </string-name>
          ,
          <year>2010</year>
          .
          <article-title>Twenty years of progress: GIScience in 2010</article-title>
          .
          <source>Journal of spatial information science, Number</source>
          <volume>1</volume>
          (
          <year>2010</year>
          ), pp.
          <fpage>3</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Palen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>S.B.</given-names>
          </string-name>
          ,
          <year>2007</year>
          .
          <article-title>Citizen Communications in Crisis: Anticipating a Future of ICT-Supported Public Participation</article-title>
          .
          <source>In CHI 2007 Proceedings. Computer Human Interaction</source>
          <year>2007</year>
          . San Jose, USA, pp.
          <fpage>727</fpage>
          -
          <lpage>726</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>