=Paper= {{Paper |id=Vol-2173/paper4 |storemode=property |title=Crowd-Sourced Knowledge Graph Extension: A Belief Revision Based Approach |pdfUrl=https://ceur-ws.org/Vol-2173/paper4.pdf |volume=Vol-2173 |authors=Artem Revenko,Marta Sabou,Albin Ahmeti,Martin Schauer |dblpUrl=https://dblp.org/rec/conf/hcomp/RevenkoSAS18 }} ==Crowd-Sourced Knowledge Graph Extension: A Belief Revision Based Approach== https://ceur-ws.org/Vol-2173/paper4.pdf
                                 Crowd-sourced knowledge graph extension:
                                     a belief revision based approach

Artem Revenko and Albin Ahmeti and Martin Schauer                                    Marta Sabou
                          Semantic Web Company                               Technical University of Vienna




                            Abstract                                    RQ1: How to use crowd-sourcing to extend a large KG?
                                                                        Involving crowds into the extension of knowledge struc-
  Knowledge graphs are gaining popularity as key ingredients
  of many advanced applications. For many applications there
                                                                     tures leads to opportunities in terms of educating them in the
  is a need of having the common sense knowledge that is             subject domain covered by the knowledge structure. There-
  not domain specific, and, therefore, can be provided by non-       fore, an additional research question addressed is:
  experts. In this paper we introduce a novel crowd-sourcing            RQ2: How to educate crowdworkers about the subject do-
  approach that allows the crowdworkers to provide their up-         main of the KG while they are extending it?
  date in a simplistic intuitive form without having the infor-         We address these research questions as part of the Euro-
  mation about the knowledge already contained in the graph.         pean PROFIT project3 , where a platform to promote finan-
  The approach roots in belief revision theory and is capable        cial awareness and stability is developed, and we are design-
  of analyzing the user input, identifying the compliance with
                                                                     ing a web-based system which collects extensions to a large
  the existing structure and singling out new suggestions. When
  providing the update and upon submission the crowdworkers          knowledge graph from a crowd of citizens which use this
  obtain intuitive color-coded feedback on their input w.r.t. con-   platform4 . To address RQ2, the tool provides the users feed-
  sistency and discrepancies with the existing knowledge. This       back about discrepancies between their vision of the domain
  feedback enables the educational aspect of the approach. The       and the existing knowledge graph of the domain.
  approach guarantees the consistency of the crowd-sourced              Our approach to ensure this capability is the use of belief
  knowledge when it is being integrated into the knowledge           revision theory (Gärdenfors 2003). Accordingly, the prob-
  graph.                                                             lem is translated into the belief revision problem where the
                                                                     existing knowledge graph is “mapped” to the world W , and
                        Introduction                                 the model created by the user is “mapped” to the update U .
                                                                     Therefore, we enable the analysis of the differences and dis-
Knowledge graphs (KG) are a novel kind of data structures            tances between the user provided update and the existing
that enable the creation of intelligent applications such as         knowledge graph (world).
advanced search engines, recommender systems and ques-                  Novelty in the proposed work is the use of Semantic Web
tion answering systems. Recently, knowledge graphs were              technologies to formally represent the knowledge structure
defined as “a set of interconnected typed entities and their         that is extended. This enables the system to automatically
attributes” (Pan et al. 2017), where an ontology defines the         reason upon user suggestions to judge their correctness,
vocabulary of the graph. For example, box 4 in Figure 1,             which is a pre-requisite to providing feedback to users (thus
shows an example knowledge graphs including entities such            educating them) as well as to integrating this knowledge in
as bat or cat, which can be of type Mammal. Lines between            the KG in a way that it remains correct (i.e., consistent). The
the entities denote the type relation (cat is of type Mammal)        use of belief revision theory to inform the reasoning mech-
or other relations. Some representative examples of knowl-           anisms is another novelty. As an important consequence the
edge graphs include DBpedia1 , which is a structured rep-            tool allows for an additional implicit voting mechanism by
resentation of Wikipedia data, or EuroVoc2 , a multi-lingual         comparing the overlapping parts in the users’ updates. Over-
thesaurus of all activities of the European Union.                   all, the tool illustrates the use of Semantic Web reasoning ca-
   A critical problem in the life-cycle of a KG is extend-           pabilities to support a Human Computation task, a research
ing and keeping it up-to-date. This is a costly and time-            line which has only been weakly covered so far (Sabou et al.
consuming task that is hard to achieve within the boundaries         2018a).
of one organization. Therefore, in this paper, we investigate
                                                                        In the rest of the paper we detail the problem setting and
the following research question:
                                                                     sketch the general workflow followed by the tool, highlight-
Copyright c 2018for this paper by its authors. Copying permitted
                                                                        3
for private and academic purposes.                                      platform.projectprofit.eu
    1                                                                   4
      dbpedia.org                                                       A demo of the system is available:
    2
      eurovoc.europa.eu                                              research.semantic-web.com/crowd-sourcing/
ing the role and benefits of using belief revision.                 The user is not able to finalize the input unless he re-
                                                                    solves all the intrinsic inconsistencies in U . Each incon-
                    Problem Setting                                 sistency features a description for user convenience. Upon
                                                                    submitting the input, for educational purposes, the user
A core component of the knowledge graph that is to be ex-
                                                                    obtains color-coded feedback on his submission in terms
tended is an ontology O. As ontologies are often encoded in
                                                                    of new (blue), confirming (green) and contradicting (red)
terms of the OWL5 knowledge representation language, we
                                                                    triples in his input. This consistency checking mecha-
define the ontology by relying on the OWL terminology. The
                                                                    nisms is employed during both the collect and the inte-
ontology holds definitions of classes and relations between
                                                                    grate phases of the workflow.
them. Let A and B be two classes. Let a be an instance.
The statement a ∈ A is interpreted as a is of type class A       Vote The users vote on triples suggested by other users (box
and is called a class assertion. Let R ⊆ A × B be a rela-          3 in Fig. 1). Voting mechanisms are introduced as an an-
tion between the two classes. For a ∈ A, b ∈ B one may             swer to RQ2 since they initiate interaction and opinion
assert R(a, b), i.e. a and b are in relation R; this is called     exchange with other users and/or experts in the field. Two
a relation assertion. Moreover, every instance can have at-        types of voting are implemented. First, in the dedicated
tributes whose values are constants (integers, strings, dates,     page every authorized user can vote explicitly. The user
etc). Statements about class, relation or attribute assertions     can vote on triples contributed by others only once. The
are atomic knowledge structures that we refer to as triples.       user can change the vote (from upvote to downvote and
   The ontology O is pre-defined and fixed for the crowd-          vice versa) or withdraw the vote. If different users suggest
sourcing process, i.e. the users cannot suggest new classes        the same new triple then an implicit voting mechanism
or new relations. Nevertheless, users can suggest new class        gets activated. When the difference between upvotes and
assertions, new relation assertions, new attribute values.         downvotes reaches the trust threshold the triple becomes
   The basis of our ontology is the Simple Knowledge Or-           accepted and the integrate gets activated.
ganization Scheme6 . Instances are called concepts in SKOS         The users cannot upvote or downvote their own triples.
notation. SKOS allows for defining a thesaurus with hier-
archical relations broader skos:broader and narrower             Integrate The new and verified crowd-sourced knowledge
skos:narrower. Moreover, in a SKOS thesaurus every                 is integrated into the world W (box 4 in Fig. 1).
instance may have different labels which denote synonyms
of that instance. These labels are important in several ad-      Inconsistency detection and management
vanced applications where they support tasks such as finding
instance mentions in text or disambiguation. In the devel-       Core to our approach is identifying differences between the
oped crowd-sourced application the users can provide sug-        existing (W) and newly contributed (U) knowledge and as-
gestions on new instance labels as well.                         sessing whether inconsistencies arise, as these should be
                                                                 avoided. An inconsistency is defined as a violation of ax-
                                                                 ioms. Since the ontology is defined using SKOS, we take
                        Approach                                 SKOS axioms into account7 . Of all axioms the following
The typical workflow of our approach consists of the follow-     two could be violated by the user input:
ing phases, as illustrated in Fig. 1:
                                                                 1. “Disjointness        of       skos:related              and
Collect The user inputs their update U (box 1 in Fig.1). The        skos:broaderTransitive. This specification
  proposed tool allows users to provide input without refer-        treats the hierarchical and associative relations as funda-
  ring to the existing knowledge graph, i.e. the user is not        mentally distinct in nature. Therefore a clash between
  forced into any particular vision of the subject domain.          hierarchical and associative links is not consistent with
  Users are encouraged to convey their input in a free form,        the SKOS data model.” In other words, if instance a is
  starting from an empty canvas and creating new triples.           skos:broader of b then the two instances cannot be
  In order to enable such freedom and flexibility it is neces-      skos:related
  sary to (1) identify and resolve inconsistencies between U
  and W and (2) compute overlaps, contradictions and nov-        2. “Cycles      in     the      Hierarchical      Relation
  elties w.r.t. the existing knowledge. This is performed in        (skos:broaderTransitive and Reflexivity)”. For
  the analysis phase, described next.                               example, a skos:broader b and b skos:broader
                                                                    a. We prohibit this kind of hierarchical cycles for our
Analyze and Provide Feedback The user’s update U is an-             application.
  alyzed against the world W (box 2 in Fig. 1) in order
  to identify new triple suggestions and update the trust        Furthermore we introduce two additional axioms and we do
  thresholds of these triples, as we will discuss in more de-    not allow to submit the update unless it is free from these
  tail in the next section on inconsistency detection.           two types of inconsistencies:
  The user’s input is analyzed in real time and all the in-      3. In U there should not be any disconnected instances. We
  consistencies in his provided knowledge are highlighted.          introduce this requirement to avoid abandoned instances.
   5                                                               7
       www.w3.org/OWL                                                www.w3.org/TR/skos-reference/
   6
       www.w3.org/2004/02/skos                                   #semantic-relations
Figure 1: Crowd-sourcing workflow. User 1 and User 2 submit their updates (collect). Let the threshold needed to accept each
new suggestion be 2 (analyze). Both updates contain two new suggestions that extend the world. One suggestion is overlapping
in the updates (S1 := Mammal → bat), it is implicitly upvoted (vote). User 3 upvotes the same suggestion S1 explicitly through
the user interface (vote), therefore S1 gets 2 upvotes, reaches the threshold and it is added to the world (integrate).


4. Every new instance in U should have a broader instance.         Threshold
   This condition requires every new instance to be inte-
   grated into the hierarchical structure.                         The threshold t, denoting the trust level of a triple, depends
                                                                   on the number of contradicting triples |Tcontra | and confirm-
  We distinguish between two sources of inconsistencies:           ing triples |Tconf |. In order to encourage users to provide
• intrinsic inconsistency, an inconsistency in the update it-      larger input and avoid updates with only new facts we intro-
  self; any of the four identified inconsistency types above       duce a “penalty” p; if the user uses less than p triples from
  may appear here;                                                 the existing knowledge graph then the user’s threshold is in-
                                                                   creased. Moreover, each contradicting triple indicates a de-
• general inconsistency, an inconsistency that is only             viation from the existing knowledge, hence the triples from
  present in the union of W and U and does not appear nei-         the update need to obtain additional support from other users
  ther in W alone nor in U alone; only violation of axioms         to get accepted. Finally, in order to prevent any update to be
  1 and 2 may appear as general inconsistencies.                   accepted automatically we add 1 to the resulting threshold.
   For the sake of identifying the discrepancies between W         The resulting formula:
and U only the general inconsistencies are taken into ac-
count. As follows from the definitions of axioms 1 and 2,          t = max(0, p − |Tconf ∪ Tcontra |) + 2 ∗ |Tcontra |+1     (1)
it is always possible to identify the triples in U that cause
these inconsistencies; these triples form the set of contra-       Example 1 Let p = 5, |Tconf |= 3 and |Tcontra |= 1, i.e. the
dicting triples Tcontra . The set of confirming triples Tconf      user has provided 3 confirming triples and 1 contradicting.
contains the triples contained in both W and U . The set of        Then t = max(0, 5 − (3 + 1)) + 2 ∗ 1 + 1 = 4, i.e. at least
new triples Tnew contains all the triples that are contained in    4 upvotes are needed to accept the new triples.
U but not in W .
   The new, confirming, and contradicting sets of triples en-                            Future Work
able us to give the user a feedback on his input w.r.t. existing
knowledge and quantify the correspondence between the up-          In the future we plan to improve the usability and person-
date and the world. Moreover, we can relate the updates of         alization of the tool by enabling users to start with a pre-
different users and enable implicit voting between updates         filled canvas. The pre-filled canvas may contain the triples
in case the sets of new triples overlap. Now we are in posi-       of most interest to the crowd-sourcing process or to the user.
tion to compute a distance between U and W and introduce           To that end, we will reuse principles outlined in (Wohlge-
a threshold for accepting the new triples.                         nannt, Sabou, and Hanika 2016) and (Sabou et al. 2018b).
                       References
Gärdenfors, P. 2003. Belief revision, volume 29. Cambridge
University Press.
Pan, J. Z.; Vetere, G.; Gomez-Perez, J. M.; and Wu, H. 2017.
Exploiting Linked Data and Knowledge Graphs in Large Or-
ganisations. Springer, 1st edition.
Sabou, M.; Aroyo, L.; Bozzon, A.; and Qarout, R. K. 2018a.
Semantic Web and Human Computation: the Status of an
Emerging Field. Semantic Web 9(3):1–12.
Sabou, M.; Winkler, D.; Biffl, S.; and Penzerstadler, P.
2018b. Verifying conceptual domain models with human
computation: A case study in software engineering. In The
sixth AAAI Conference on Human Computation and Crowd-
sourcing.
Wohlgenannt, G.; Sabou, M.; and Hanika, F. 2016. Crowd-
based ontology engineering with the uComp Protégé plugin.
Semantic Web 7(4):379–398.