<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Crowd-sourced knowledge graph extension: a belief revision based approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Artem Revenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Albin Ahmeti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Schauer Semantic Web Company</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Marta Sabou Technical University of Vienna</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Knowledge graphs are gaining popularity as key ingredients of many advanced applications. For many applications there is a need of having the common sense knowledge that is not domain specific, and, therefore, can be provided by nonexperts. In this paper we introduce a novel crowd-sourcing approach that allows the crowdworkers to provide their update in a simplistic intuitive form without having the information about the knowledge already contained in the graph. The approach roots in belief revision theory and is capable of analyzing the user input, identifying the compliance with the existing structure and singling out new suggestions. When providing the update and upon submission the crowdworkers obtain intuitive color-coded feedback on their input w.r.t. consistency and discrepancies with the existing knowledge. This feedback enables the educational aspect of the approach. The approach guarantees the consistency of the crowd-sourced knowledge when it is being integrated into the knowledge graph.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Knowledge graphs (KG) are a novel kind of data structures
that enable the creation of intelligent applications such as
advanced search engines, recommender systems and
question answering systems. Recently, knowledge graphs were
defined as “a set of interconnected typed entities and their
attributes”
        <xref ref-type="bibr" rid="ref2">(Pan et al. 2017)</xref>
        , where an ontology defines the
vocabulary of the graph. For example, box 4 in Figure 1,
shows an example knowledge graphs including entities such
as bat or cat, which can be of type Mammal. Lines between
the entities denote the type relation (cat is of type Mammal)
or other relations. Some representative examples of
knowledge graphs include DBpedia1, which is a structured
representation of Wikipedia data, or EuroVoc2, a multi-lingual
thesaurus of all activities of the European Union.
      </p>
      <p>A critical problem in the life-cycle of a KG is
extending and keeping it up-to-date. This is a costly and
timeconsuming task that is hard to achieve within the boundaries
of one organization. Therefore, in this paper, we investigate
the following research question:
Copyright c 2018for this paper by its authors. Copying permitted
for private and academic purposes.</p>
      <p>1dbpedia.org
2eurovoc.europa.eu
RQ1: How to use crowd-sourcing to extend a large KG?
Involving crowds into the extension of knowledge
structures leads to opportunities in terms of educating them in the
subject domain covered by the knowledge structure.
Therefore, an additional research question addressed is:</p>
      <p>RQ2: How to educate crowdworkers about the subject
domain of the KG while they are extending it?</p>
      <p>We address these research questions as part of the
European PROFIT project3, where a platform to promote
financial awareness and stability is developed, and we are
designing a web-based system which collects extensions to a large
knowledge graph from a crowd of citizens which use this
platform4. To address RQ2, the tool provides the users
feedback about discrepancies between their vision of the domain
and the existing knowledge graph of the domain.</p>
      <p>
        Our approach to ensure this capability is the use of belief
revision theory
        <xref ref-type="bibr" rid="ref1">(Ga¨rdenfors 2003)</xref>
        . Accordingly, the
problem is translated into the belief revision problem where the
existing knowledge graph is “mapped” to the world W , and
the model created by the user is “mapped” to the update U .
Therefore, we enable the analysis of the differences and
distances between the user provided update and the existing
knowledge graph (world).
      </p>
      <p>
        Novelty in the proposed work is the use of Semantic Web
technologies to formally represent the knowledge structure
that is extended. This enables the system to automatically
reason upon user suggestions to judge their correctness,
which is a pre-requisite to providing feedback to users (thus
educating them) as well as to integrating this knowledge in
the KG in a way that it remains correct (i.e., consistent). The
use of belief revision theory to inform the reasoning
mechanisms is another novelty. As an important consequence the
tool allows for an additional implicit voting mechanism by
comparing the overlapping parts in the users’ updates.
Overall, the tool illustrates the use of Semantic Web reasoning
capabilities to support a Human Computation task, a research
line which has only been weakly covered so far
        <xref ref-type="bibr" rid="ref4 ref5">(Sabou et al.
2018a)</xref>
        .
      </p>
      <p>In the rest of the paper we detail the problem setting and
sketch the general workflow followed by the tool,
highlight</p>
      <sec id="sec-1-1">
        <title>3platform.projectprofit.eu</title>
        <p>4A demo of the system is available:
research.semantic-web.com/crowd-sourcing/
ing the role and benefits of using belief revision.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Problem Setting</title>
      <p>A core component of the knowledge graph that is to be
extended is an ontology O. As ontologies are often encoded in
terms of the OWL5 knowledge representation language, we
define the ontology by relying on the OWL terminology. The
ontology holds definitions of classes and relations between
them. Let A and B be two classes. Let a be an instance.
The statement a 2 A is interpreted as a is of type class A
and is called a class assertion. Let R A B be a
relation between the two classes. For a 2 A; b 2 B one may
assert R(a; b), i.e. a and b are in relation R; this is called
a relation assertion. Moreover, every instance can have
attributes whose values are constants (integers, strings, dates,
etc). Statements about class, relation or attribute assertions
are atomic knowledge structures that we refer to as triples.</p>
      <p>The ontology O is pre-defined and fixed for the
crowdsourcing process, i.e. the users cannot suggest new classes
or new relations. Nevertheless, users can suggest new class
assertions, new relation assertions, new attribute values.</p>
      <p>The basis of our ontology is the Simple Knowledge
Organization Scheme6. Instances are called concepts in SKOS
notation. SKOS allows for defining a thesaurus with
hierarchical relations broader skos:broader and narrower
skos:narrower. Moreover, in a SKOS thesaurus every
instance may have different labels which denote synonyms
of that instance. These labels are important in several
advanced applications where they support tasks such as finding
instance mentions in text or disambiguation. In the
developed crowd-sourced application the users can provide
suggestions on new instance labels as well.</p>
    </sec>
    <sec id="sec-3">
      <title>Approach</title>
      <p>The typical workflow of our approach consists of the
following phases, as illustrated in Fig. 1:
Collect The user inputs their update U (box 1 in Fig.1). The
proposed tool allows users to provide input without
referring to the existing knowledge graph, i.e. the user is not
forced into any particular vision of the subject domain.
Users are encouraged to convey their input in a free form,
starting from an empty canvas and creating new triples.
In order to enable such freedom and flexibility it is
necessary to (1) identify and resolve inconsistencies between U
and W and (2) compute overlaps, contradictions and
novelties w.r.t. the existing knowledge. This is performed in
the analysis phase, described next.</p>
      <sec id="sec-3-1">
        <title>Analyze and Provide Feedback The user’s update U is an</title>
        <p>alyzed against the world W (box 2 in Fig. 1) in order
to identify new triple suggestions and update the trust
thresholds of these triples, as we will discuss in more
detail in the next section on inconsistency detection.
The user’s input is analyzed in real time and all the
inconsistencies in his provided knowledge are highlighted.
5www.w3.org/OWL
6www.w3.org/2004/02/skos
The user is not able to finalize the input unless he
resolves all the intrinsic inconsistencies in U . Each
inconsistency features a description for user convenience. Upon
submitting the input, for educational purposes, the user
obtains color-coded feedback on his submission in terms
of new (blue), confirming (green) and contradicting (red)
triples in his input. This consistency checking
mechanisms is employed during both the collect and the
integrate phases of the workflow.</p>
        <p>Vote The users vote on triples suggested by other users (box
3 in Fig. 1). Voting mechanisms are introduced as an
answer to RQ2 since they initiate interaction and opinion
exchange with other users and/or experts in the field. Two
types of voting are implemented. First, in the dedicated
page every authorized user can vote explicitly. The user
can vote on triples contributed by others only once. The
user can change the vote (from upvote to downvote and
vice versa) or withdraw the vote. If different users suggest
the same new triple then an implicit voting mechanism
gets activated. When the difference between upvotes and
downvotes reaches the trust threshold the triple becomes
accepted and the integrate gets activated.</p>
        <p>The users cannot upvote or downvote their own triples.
Integrate The new and verified crowd-sourced knowledge
is integrated into the world W (box 4 in Fig. 1).</p>
      </sec>
      <sec id="sec-3-2">
        <title>Inconsistency detection and management</title>
        <p>Core to our approach is identifying differences between the
existing (W) and newly contributed (U) knowledge and
assessing whether inconsistencies arise, as these should be
avoided. An inconsistency is defined as a violation of
axioms. Since the ontology is defined using SKOS, we take
SKOS axioms into account7. Of all axioms the following
two could be violated by the user input:
1. “Disjointness of skos:related and
skos:broaderTransitive. This specification
treats the hierarchical and associative relations as
fundamentally distinct in nature. Therefore a clash between
hierarchical and associative links is not consistent with
the SKOS data model.” In other words, if instance a is
skos:broader of b then the two instances cannot be
skos:related
2. “Cycles in the Hierarchical Relation
(skos:broaderTransitive and Reflexivity)”. For
example, a skos:broader b and b skos:broader
a. We prohibit this kind of hierarchical cycles for our
application.</p>
        <p>Furthermore we introduce two additional axioms and we do
not allow to submit the update unless it is free from these
two types of inconsistencies:
3. In U there should not be any disconnected instances. We
introduce this requirement to avoid abandoned instances.</p>
        <sec id="sec-3-2-1">
          <title>7www.w3.org/TR/skos-reference/</title>
          <p>#semantic-relations
4. Every new instance in U should have a broader instance.</p>
          <p>This condition requires every new instance to be
integrated into the hierarchical structure.</p>
          <p>We distinguish between two sources of inconsistencies:
intrinsic inconsistency, an inconsistency in the update
itself; any of the four identified inconsistency types above
may appear here;
general inconsistency, an inconsistency that is only
present in the union of W and U and does not appear
neither in W alone nor in U alone; only violation of axioms
1 and 2 may appear as general inconsistencies.</p>
          <p>For the sake of identifying the discrepancies between W
and U only the general inconsistencies are taken into
account. As follows from the definitions of axioms 1 and 2,
it is always possible to identify the triples in U that cause
these inconsistencies; these triples form the set of
contradicting triples Tcontra. The set of confirming triples Tconf
contains the triples contained in both W and U . The set of
new triples Tnew contains all the triples that are contained in
U but not in W .</p>
          <p>The new, confirming, and contradicting sets of triples
enable us to give the user a feedback on his input w.r.t. existing
knowledge and quantify the correspondence between the
update and the world. Moreover, we can relate the updates of
different users and enable implicit voting between updates
in case the sets of new triples overlap. Now we are in
position to compute a distance between U and W and introduce
a threshold for accepting the new triples.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>Threshold</title>
        <p>The threshold t, denoting the trust level of a triple, depends
on the number of contradicting triples jTcontraj and
confirming triples jTconf j. In order to encourage users to provide
larger input and avoid updates with only new facts we
introduce a “penalty” p; if the user uses less than p triples from
the existing knowledge graph then the user’s threshold is
increased. Moreover, each contradicting triple indicates a
deviation from the existing knowledge, hence the triples from
the update need to obtain additional support from other users
to get accepted. Finally, in order to prevent any update to be
accepted automatically we add 1 to the resulting threshold.
The resulting formula:
t = max(0; p</p>
        <p>jTconf [ Tcontraj) + 2 jTcontraj+1 (1)
Example 1 Let p = 5, jTconf j= 3 and jTcontraj= 1, i.e. the
user has provided 3 confirming triples and 1 contradicting.
Then t = max(0; 5 (3 + 1)) + 2 1 + 1 = 4, i.e. at least
4 upvotes are needed to accept the new triples.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Future Work</title>
      <p>
        In the future we plan to improve the usability and
personalization of the tool by enabling users to start with a
prefilled canvas. The pre-filled canvas may contain the triples
of most interest to the crowd-sourcing process or to the user.
To that end, we will reuse principles outlined in
        <xref ref-type="bibr" rid="ref6">(Wohlgenannt, Sabou, and Hanika 2016)</xref>
        and
        <xref ref-type="bibr" rid="ref4 ref5">(Sabou et al. 2018b)</xref>
        .
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Ga</surname>
          </string-name>
          ¨rdenfors, P.
          <year>2003</year>
          .
          <article-title>Belief revision</article-title>
          , volume
          <volume>29</volume>
          . Cambridge University Press.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>J. Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Vetere</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ;
          <article-title>Gomez-</article-title>
          <string-name>
            <surname>Perez</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          ; and Wu,
          <string-name>
            <surname>H.</surname>
          </string-name>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>Exploiting Linked Data and Knowledge Graphs in Large Organisations. Springer, 1st edition.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Sabou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Aroyo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bozzon</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and Qarout,
          <string-name>
            <surname>R. K.</surname>
          </string-name>
          <year>2018a</year>
          .
          <article-title>Semantic Web and Human Computation: the Status of an Emerging Field</article-title>
          .
          <source>Semantic Web</source>
          <volume>9</volume>
          (
          <issue>3</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Sabou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Winkler</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Biffl</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and Penzerstadler,
          <string-name>
            <surname>P.</surname>
          </string-name>
          2018b.
          <article-title>Verifying conceptual domain models with human computation: A case study in software engineering</article-title>
          .
          <source>In The sixth AAAI Conference on Human Computation and Crowdsourcing.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Wohlgenannt</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ; Sabou,
          <string-name>
            <given-names>M.</given-names>
            ; and
            <surname>Hanika</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>Crowdbased ontology engineering with the uComp Prote´ge´ plugin</article-title>
          .
          <source>Semantic Web</source>
          <volume>7</volume>
          (
          <issue>4</issue>
          ):
          <fpage>379</fpage>
          -
          <lpage>398</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>