<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Incentives, Motivation, Participation, Games: Human Computation for Linked Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Katharina Siorpaes</string-name>
          <email>katharina.siorpaes@sti2.at</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elena Simperl</string-name>
          <email>elena.simperl@kit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AIFB, Karlsruhe Institute of Technology</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>STI, University of Innsbruck</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Various tasks in publishing and maintaining Linked Data require human contribution at several ends. In this paper, we discuss the role of human intelligence in data interlinking. This justifies the necessity of incentive models, motivation mechanisms, and applying the paradigm of human computation to the interlinking process. To conclude, we give examples for using human computation for interlinking by summarizing “games with a purpose” that address tasks in data interlinking.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Motivation</title>
      <p>The Web of Data, also described as a web of things in the world, described by data on
the Web ” [1], is the result of the process of publishing linked data on the Web. Such
Linked Data, a global data space, enables more comprehensive answers to queries
over aggregated data. Machine-readable data with explicitly defined meaning can be
consumed by machines in order to provide improved access to various information
sources leading to an enhanced user experience.</p>
      <p>Along with four principles for linked data, the basic process of publishing data has
been defined in three main steps [1]. Even though substantial tool support for these
steps has already been developed, the process still requires human contribution at
several ends. This also holds for several other tasks in the area of knowledge
acquisition, such as finding free tags describing images, certain problems in ontology
alignment, or modeling domains [2].</p>
      <p>As human intervention is required in many areas of knowledge acquisition, this
labor must be motivated by extrinsic or intrinsic incentives. Several Web 2.0
applications, such as del.icio.us, Amazon Mechanical Turk, or Wikipedia,
demonstrate successful implementation of various motivation mechanisms.</p>
      <p>The idea of Human Computation [4] is that tasks that are trivial for humans but not
solvable for computer programs are solved by channeling human labor. One example
for this are CAPTCHAs [4], another line of work are “games with a purpose” [3].</p>
      <p>Still: even if a game addressing a knowledge acquisition task is available, it is by
no means a guarantee for community involvement and the generation of large
amounts of data. Designing “games with a purpose” is a tricky task, as gaming fun,
data acquisition, and data quality concerns must be considered and well balanced.</p>
      <p>We argue that it is necessary to address incentive models and motivation
mechanisms to involve human users in the interlinking process, not only when
publishing but also with respect to maintenance of Linked Data.</p>
      <p>In this paper, we first delineate why human contribution is required at several ends
in the Linked Data lifecycle. We then describe games - as a form of incentive and
motivation mechanism - that already address interlinking tasks.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>Human Intelligence in Data Interlinking</title>
      <p>The main tasks that have to be performed in order to publish data as Linked Data are
(i) to assign consistent URIs to data published, (ii) to generate links, and (iii) to
publish metadata allowing further exploration and discovery of relevant datasets.
Methods and tools were developed for automating steps 1 and 3, such as D2R or
Virtuoso. We therefore argue that the issue of link generation is the major, most
challenging problem that requires human attention.</p>
      <p>The main problem, which arises, is the issue of finding the matching concepts in
datasets to be interlinked and to name the relationships between the interlinked
concepts (using defined relations such as owl:sameAs, rdfs:seeAlso, foaf:birthPlace,
foaf:homeTown or others).</p>
      <p>Several approaches exist for semantically linking data: RDF links can be set
manually - supported by a set of tools including URI search and recommendation
engines such as Uriqr, Sindice, or MOAT1.</p>
      <p>While currently available interlinking algorithms yield good results for textual
resources the question arises if the quality of the links can be increased. In our
understanding this is mainly possible through the utilization of human power.
The paradigm of Human Computation and lessons learnt from Web 2.0, where users
collaboratively create content and metadata should be applied to Linked Data. In this
case, annotations are based on semantic links, that is, RDF properties.
“User Contributed Interlinking” [5] denotes this principle way of interlinking
resources on the Web of Data. The crucial steps in interlinking are identifying a target
dataset, a target link, and choosing a link predicate. While the first step is of technical
nature and can be easily automated and the second step essentially is annotation of
resources, the last three steps require sophisticated methods or human intelligence.
The linking phase focuses on the final three steps of the process and is the main focus
of methods for interlinking.</p>
      <sec id="sec-2-1">
        <title>1 Identify target dataset</title>
        <p>The choice of the target dataset involves knowledge of available datasets, such as
DBPedia, Geonames, Freebase, and their domain and focus. In many cases, these
datasets might also be overlapping. However, for all datasets, reliable and
1 http://virtuoso.openlinksw.com, http://www.w3.org/TR/skos-reference/, http://dev.uriqr.com/,
http://www.sindice.com/, http://moat- project.org
comprehensive descriptions of their scope and purpose are available. This allows for a
higher degree of automation.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2 Identify link target</title>
        <p>Many datasets can be large and thus identifying the relevant link target can require
machine support. However, finding the right link target might be tricky.</p>
      </sec>
      <sec id="sec-2-3">
        <title>3 Select link predicate</title>
        <p>The final step is the most challenging one as it describes the type of relationship
that exists between two nodes.</p>
        <p>When discussing manual labor in interlinking, just like in annotation, the type of
content that is interlinked must be considered. For interlinking methods, we are aware
of one survey published by Scharffe and Euzenat (2009)2 that also investigated the
degree of automation interlinking tools can support. The authors analyzed six tools
and conducted interviews with the developers. The result was that except one system
all tools where classified as semi-automatic, all requiring human intervention. The
automatic method only worked for a specific domain. For the area of multimedia
interlinking [8], the degree of automation reduces. However, this is tightly coupled to
the quality and availability of annotations of the multimedia content.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Human Computation for Linked Data</title>
      <p>In the previous section, we briefly summarized why human involvement is required
for publishing, maintaining, and consuming Linked Data. One form of human
computation and incentives are games that hide the abstract task behind an
entertaining user interface (and story). In this section, we shortly summarize three
games that aim at using or producing linked data.</p>
      <p>The GuessWhat! game [7] is a multi-player online game that leverages the “games
with a purpose” paradigm and Linked Open Data as a data source in order to build
formal vocabularies (or domain ontologies). In the game, players are confronted with
class expressions such as fruit AND yellow AND grows on tree automatically
generated from Linked Open Data. The players have to invent a suitable class name
(banana or lemon, for example) as fast as possible. The player with the highest
number of plausible class labels wins the game.</p>
      <p>SpotTheLink3 [6], the latest release of the OntoGame framework, is a game that
allows for the definition of mappings between ontologies as part of a collaborative
game experience. In the game, players have to agree on the type of relationship
between two concepts (or entities). The background is that a multitude of approaches
to match, merge and integrate ontologies and to interlink RDF data sets have been
2 http://melinda.inrialpes.fr/systems.html
3 http://www.ontogame.org
proposed. While advances in this area cannot be contested, it is equally true that full
automation of the ontology-alignment process is far from being feasible, and human
intervention is often indispensable - mainly for bootstrapping the underlying methods
and for validating and enhancing their results.</p>
      <p>TubeLink4 (Figures 1 and 2) is another game of the OntoGame series that will be
published in early 2011. The idea is to use data on the Web to bootstrap the process of
video annotation. In the game, players have to choose suitable tags describing
contents of videos. These tags are really pieces of information that help methods in
the background to choose appropriate data from the Linked Open Data cloud. At
some point in time, players’ input is used to choose an appropriate dataset, another
time it might be used for selecting instances of that set. However, all this complexity
is well hidden from the player.
4 TubeLink is a part of the INSEMTIVES OntoGame series and will be published soon. Check
back at www.insemtives.eu or www.ontogame.org</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>Publishing, maintaining, consuming linked data and thus contributing to a Web of
Data involves several tasks that are partly dependent on human intelligence and
intervention. We discussed that many methods for interlinking are semi-automatic and
thus require user intervention. Therefore, it is necessary to address incentive models
and motivation mechanisms to involve human users in the interlinking process,
including publishing, maintenance, and consumption. One example of incentives and
applying the paradigm of human computation are “games with a purpose”. We
described three example games that somehow address Linked (Open) Data.</p>
      <p>Acknowledgments. This work has been funded by the EU FP7 project
INSEMTIVES – Incentives for Semantics (www.insemtives.eu).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>International Journal on Semantic Web and Information Systems (IJSWIS)</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Katharina</given-names>
            <surname>Siorpaes</surname>
          </string-name>
          and Elena Simperl:
          <article-title>Human Intelligence in the Process of Semantic Content Creation</article-title>
          ,
          <source>World Wide Web Journal (WWW)</source>
          , Volume
          <volume>13</volume>
          ,
          <string-name>
            <surname>Issue</surname>
          </string-name>
          1-2,
          <year>March 2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>Luis von Ahn and Laura Dabbish: Designing games with a purpose</article-title>
          .
          <source>Communications of the ACM</source>
          <volume>51</volume>
          (
          <issue>8</issue>
          ),
          <fpage>58</fpage>
          -
          <lpage>67</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Luis von Ahn: Human Computation</surname>
            ,
            <given-names>K-</given-names>
          </string-name>
          <source>CAP '07 Proceedings of the 4th international conference on Knowledge capture, ACM</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Michael</given-names>
            <surname>Hausenblas</surname>
          </string-name>
          , Raphael Troncy, Tobias Bürger, and
          <article-title>Yves Raimond "Interlinking Multimedia: How to Apply Linked Data Principles to Multimedia Fragments" In: Proceedings of Linked Data on the Web (LDOW2009), co-located with the 18th</article-title>
          <source>International World Wide Web Conference (WWW2009)</source>
          , Madrid, Spain,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Thaler</surname>
          </string-name>
          , Elena Simperl, Katharina Siorpaes:
          <source>SpotTheLink: Playful Alignment of Ontologies, 26th Symposium On Applied Computing (SAC'11)</source>
          , TaiChung, Taiwan, March
          <volume>21</volume>
          -25,
          <year>2011</year>
          .
          <article-title>(upcoming) Thomas Markotschi and Johanna Völker</article-title>
          . GuessWhat?!
          <article-title>- Human Intelligence for Mining Linked Data</article-title>
          .
          <source>Proceedings of the Workshop on Knowledge Injection into and Extraction from Linked Data (KIELD) at the International Conference on Knowledge Engineering and Knowledge Management (EKAW)</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>