<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Volunteered Geographic Information and Data Quality - The Case of Social Reporting</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Olga Yanenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Bamberg, Chair of Computing in the Cultural Sciences</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Copyright (c) by the paper's authors. Copying permitted for private and academic purposes. In: A. Comber, B. Bucher, S. Ivanovic (eds.): Proceedings of the 3rd AGILE Phd School, Champs sur Marne, France, 15-17-September-2015, published at http://ceur-ws.org</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Content created by internet users and enriched with geographical footprints is
usually referred to as Volunteered Geographical Information
        <xref ref-type="bibr" rid="ref7">(VGI; Goodchild
2007)</xref>
        . This new source of information enables the collection of rich data sets that
often outperform the data collected by private companies and government
agencies in terms of the amount of data producers and their personal motivation to
contribute
        <xref ref-type="bibr" rid="ref8 ref9">(Goodchild 2008, Goodchild and Glennon 2010)</xref>
        .
      </p>
      <p>
        In spite of the potential of VGI, user-generated data production is often
accompanied by problems that have to be overcome to ensure usable and valuable
data sets. Four main challenges were identified in literature, namely the
motivation of users to contribute, the quality of the resulting data and the spatial
and temporal coverage of the latter
        <xref ref-type="bibr" rid="ref3 ref5 ref6">(Coleman et al. 2009, Feick &amp; Roche 2013,
Flanagin &amp; Metzger 2008)</xref>
        .
      </p>
      <p>This work studies the issue of data quality in the different stages of VGI data
creation, collection and evaluation. It concentrates on social reporting scenarios, in
which citizens submit geotagged reports about observations of real-world-events.
The following basic research questions are addressed within this work:
1. How can users be motivated to contribute correct and truthful data?
2. How does gamification affect data quality?
3. What (semi-)automated approaches can be used to improve and evaluate data
quality?
4. How can agreement and disagreement between different data producers be
modeled, evaluated and interpreted?
5. What is the difference of objective and subjective data and what principles can
be used to validate both?</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        The data quality of VGI highly depends on the motivation of people to collect data
and produce reports
        <xref ref-type="bibr" rid="ref3 ref5 ref6">(Coleman et al. 2009; Feick &amp; Roche 2013; Flanagin and
Metzger 2008)</xref>
        . Gamification has proven to be an effective method for motivating
people to collect data about geographical places
        <xref ref-type="bibr" rid="ref10">(Matyas et al. 2008)</xref>
        . There exists
considerable research describing how gamification can be used in different
contexts to increase the amount of produced data. But there is still only little work
studying how gamification affects the quality of the resulting data sets. Since
gaming is a competitive situation, the data collected in location-based games is
often biased, incomplete or useless and there is a need for finding ways to ensure
the data quality without degrading the game experience
        <xref ref-type="bibr" rid="ref2 ref4">(Cramer et al. 2011,
Cechanowicz et al. 2013)</xref>
        .
      </p>
      <p>The most reliable method for validating data is verification on-site as known
from classical journalism. In most of the VGI cases this method is not suitable
since many spatio-temporal events only exist for a short period of time and it is
simply not possible to verify all of the reports on-site and in real-time.</p>
      <p>
        A prominent validation approach is letting volunteers review and rate reports of
other volunteers. These ratings are used to compute authority measures for
individual reporters
        <xref ref-type="bibr" rid="ref1">(Bishr and Mantelas 2008)</xref>
        . Authority methods highly depend
on former reputations of a reporter and fail for first-time and infrequent reporters.
However, the idea of
        <xref ref-type="bibr" rid="ref1">Bishr and Mantelas (2008)</xref>
        to use a spatial quality criterion
for the computation of reputation values by taking the spatial distance between the
reporter and the observed object into account, builds the basis for the more
generalized spatio-temporal-proximity principle that is part of this work.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Methods and Results</title>
      <p>
        The first part of the project consists in a systematic review of different motivations
of reporters in the data creation process and their effect on data quality. The focus
is on gamification as a motivational strategy and the development of game
principles that are not only attracting people to participate but also motivate them
to contribute in an honest way. An example of such an approach are the retesting
and confirmation mechanisms included in a mobile VGI game as part of the game
play without being recognized as internal control features by the players
        <xref ref-type="bibr" rid="ref15">(Yanenko
&amp; Schlieder 2014)</xref>
        . These game principles are also dealing with another exiting
aspect of VGI, the differentiation between objective and subjective data. While
objective or factual data can be confirmed by different individuals, the validation
of subjective data is more complicated based on different perceptions.
      </p>
      <p>
        The second part of the project deals with finding methods and principles for
data validation of already existing data sets. Spatio-temporal proximity and social
distance were identified as core principles for the integration of different data
sources
        <xref ref-type="bibr" rid="ref13">(Schlieder &amp; Yanenko 2010)</xref>
        . The main idea is that the mutual
confirmation of two reports with similar content is higher if they were recorded
spatially and temporally close to each other – and thus obviously concern the same
(spatial) event. Social reporting scenarios also have to address the issue of social
bias: the events reported often depend on what social group the contributor
belongs to. In such cases, social distance proves to be a useful validation criterion.
Reports by contributors from different stake holder groups usually provide higher
confirmation than reports from the same group. For computing the confirmation
values between different reports, several functions and approaches were tested.
Two experiments were performed with phenological data from the BudBurst1
project: a simple model of the temporal relationship between the different
phenophases for identifying incorrect data entries and a constraint-satisfaction
approach for restricting the range of possible values for incomplete entries
        <xref ref-type="bibr" rid="ref14">(Yanenko &amp; Schlieder 2012)</xref>
        .
      </p>
      <p>
        The software part of the project consists of a generic report integration tool. Its
functionality includes the construction of a confirmation graph between reports
based on the principles described above with the possibility of individual
adjustments. The flexible architecture allows for use-case-based selection of
appropriate confirmation functions and methods that will be applied to construct
the edges of the graph and compute the agreement values between two pieces of
information (vertices of the graph). Finally, the individual values for singular
reports are computed by an aggregation mechanism that combines all edges
connected to a report into one veracity value. The tool will implement some
whitespread algorithms such as PageRank
        <xref ref-type="bibr" rid="ref12">(Page et al. 1999)</xref>
        but can also be customized
for specific interests.
      </p>
      <p>1 http://budburst.org/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Bishr</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mantelas</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>A trust and reputation model for filtering and classifying knowledge about urban growth</article-title>
          ,
          <source>GeoJournal</source>
          <volume>72</volume>
          (
          <issue>3</issue>
          ), pp.
          <fpage>229</fpage>
          -
          <lpage>237</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Cechanowicz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutwin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brownell</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Effects of gamification on participation and data quality in a real-world market research domain</article-title>
          .
          <source>In: Proceedings of the First International Conference on Gameful Design</source>
          , Research, and
          <string-name>
            <surname>Applications</surname>
          </string-name>
          (Gamification'
          <fpage>13</fpage>
          ). ACM, New York, NY, USA, pp.
          <fpage>58</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Coleman</surname>
            ,
            <given-names>D. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Georgiadou</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Labonte</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Volunteered geographic information: The nature and motivation of produsers</article-title>
          .
          <source>In: International Journal of Spatial Data Infrastructures Research</source>
          ,
          <volume>4</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>332</fpage>
          -
          <lpage>358</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Cramer</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ahmet</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rost</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Holmquist</surname>
            ,
            <given-names>L. E.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Gamification &amp; Locationsharing: some emerging social conflicts</article-title>
          . Gamification Workshop at CHI'
          <volume>11</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Feick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Roche</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Understanding the Value of VGI</article-title>
          . In: Crowdsourcing geographic knowledge, Springer Netherlands, pp.
          <fpage>15</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Flanagin</surname>
            ,
            <given-names>A. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Metzger</surname>
            ,
            <given-names>M. J.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>The credibility of volunteered geographic information</article-title>
          .
          <source>In: GeoJournal</source>
          ,
          <volume>72</volume>
          (
          <issue>3-4</issue>
          ), pp.
          <fpage>137</fpage>
          -
          <lpage>148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Goodchild</surname>
            ,
            <given-names>M. F.</given-names>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>Citizens as sensors: the world of volunteered geography</article-title>
          .
          <source>GeoJournal</source>
          ,
          <volume>69</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>211</fpage>
          -
          <lpage>221</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Goodchild</surname>
            ,
            <given-names>M. F.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <source>Commentary: Whither VGI? GeoJournal</source>
          , 72, pp.
          <fpage>239</fpage>
          -
          <lpage>244</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Goodchild</surname>
            ,
            <given-names>M. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glennon</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          (
          <year>2010</year>
          )
          <article-title>Crowdsourcing geographic information for disaster response: a research frontier</article-title>
          .
          <source>International Journal of Digital Earth</source>
          <volume>3</volume>
          (
          <issue>3</issue>
          ), pp.
          <fpage>231</fpage>
          -
          <lpage>241</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Matyas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matyas</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiefer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schlieder</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mitarai</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>and</article-title>
          <string-name>
            <surname>Kamata</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Designing Location-based Mobile Games with a Purpose - Collecting Geospatial Data with CityExplorer</article-title>
          .
          <source>In: ACM International Conference Proceeding Series</source>
          , Vol.
          <volume>352</volume>
          ,
          <string-name>
            <surname>Proceedings</surname>
            <given-names>of</given-names>
          </string-name>
          <source>the 2008 International Conference on Advances in Computer Entertainment Technology, Yokohama, Japan</source>
          , pp.
          <fpage>244</fpage>
          -
          <lpage>247</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Matyas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiefer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schlieder</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Kleyer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Wisdom about the crowd: Assuring geospatial data quality collected in location-based games</article-title>
          .
          <source>In: Entertainment Computing-ICEC 2011</source>
          . Springer Berlin Heidelberg, pp.
          <fpage>331</fpage>
          -
          <lpage>336</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Page</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motwani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Winograd</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>The PageRank citation ranking: bringing order to the Web</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Schlieder S.</given-names>
            ,
            <surname>Yanenko</surname>
          </string-name>
          <string-name>
            <surname>O.</surname>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Spatio-temporal Proximity and Social Distance: a Confirmation Framework for Social Reporting</article-title>
          .
          <source>In: Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Location Based Social Networks (LBSN'10)</source>
          . ACM, New York, NY, USA, pp.
          <fpage>60</fpage>
          -
          <lpage>67</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Yanenko O.</given-names>
            ,
            <surname>Schlieder</surname>
          </string-name>
          <string-name>
            <surname>C.</surname>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Enhancing the Quality of Volunteered Geographic Information: A Constraint-Based Approach</article-title>
          .
          <source>In: Bridging the Geographic Information Sciences, Lecture Notes in Geoinformation and Cartography, Part</source>
          <volume>8</volume>
          , pp.
          <fpage>429</fpage>
          -
          <lpage>446</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Yanenko O.</given-names>
            ,
            <surname>Schlieder</surname>
          </string-name>
          <string-name>
            <surname>C.</surname>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Game Principles for Enhancing the Quality of User-generated Data Collections</article-title>
          .
          <source>In: AGILE'14 Workshop on Geogames and Geoplay</source>
          . Castellón, Spain, June 3rd
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>