=Paper=
{{Paper
|id=Vol-2173/paper4
|storemode=property
|title=Crowd-Sourced Knowledge Graph Extension: A Belief Revision Based Approach
|pdfUrl=https://ceur-ws.org/Vol-2173/paper4.pdf
|volume=Vol-2173
|authors=Artem Revenko,Marta Sabou,Albin Ahmeti,Martin Schauer
|dblpUrl=https://dblp.org/rec/conf/hcomp/RevenkoSAS18
}}
==Crowd-Sourced Knowledge Graph Extension: A Belief Revision Based Approach==
Crowd-sourced knowledge graph extension: a belief revision based approach Artem Revenko and Albin Ahmeti and Martin Schauer Marta Sabou Semantic Web Company Technical University of Vienna Abstract RQ1: How to use crowd-sourcing to extend a large KG? Involving crowds into the extension of knowledge struc- Knowledge graphs are gaining popularity as key ingredients of many advanced applications. For many applications there tures leads to opportunities in terms of educating them in the is a need of having the common sense knowledge that is subject domain covered by the knowledge structure. There- not domain specific, and, therefore, can be provided by non- fore, an additional research question addressed is: experts. In this paper we introduce a novel crowd-sourcing RQ2: How to educate crowdworkers about the subject do- approach that allows the crowdworkers to provide their up- main of the KG while they are extending it? date in a simplistic intuitive form without having the infor- We address these research questions as part of the Euro- mation about the knowledge already contained in the graph. pean PROFIT project3 , where a platform to promote finan- The approach roots in belief revision theory and is capable cial awareness and stability is developed, and we are design- of analyzing the user input, identifying the compliance with ing a web-based system which collects extensions to a large the existing structure and singling out new suggestions. When providing the update and upon submission the crowdworkers knowledge graph from a crowd of citizens which use this obtain intuitive color-coded feedback on their input w.r.t. con- platform4 . To address RQ2, the tool provides the users feed- sistency and discrepancies with the existing knowledge. This back about discrepancies between their vision of the domain feedback enables the educational aspect of the approach. The and the existing knowledge graph of the domain. approach guarantees the consistency of the crowd-sourced Our approach to ensure this capability is the use of belief knowledge when it is being integrated into the knowledge revision theory (Gärdenfors 2003). Accordingly, the prob- graph. lem is translated into the belief revision problem where the existing knowledge graph is “mapped” to the world W , and Introduction the model created by the user is “mapped” to the update U . Therefore, we enable the analysis of the differences and dis- Knowledge graphs (KG) are a novel kind of data structures tances between the user provided update and the existing that enable the creation of intelligent applications such as knowledge graph (world). advanced search engines, recommender systems and ques- Novelty in the proposed work is the use of Semantic Web tion answering systems. Recently, knowledge graphs were technologies to formally represent the knowledge structure defined as “a set of interconnected typed entities and their that is extended. This enables the system to automatically attributes” (Pan et al. 2017), where an ontology defines the reason upon user suggestions to judge their correctness, vocabulary of the graph. For example, box 4 in Figure 1, which is a pre-requisite to providing feedback to users (thus shows an example knowledge graphs including entities such educating them) as well as to integrating this knowledge in as bat or cat, which can be of type Mammal. Lines between the KG in a way that it remains correct (i.e., consistent). The the entities denote the type relation (cat is of type Mammal) use of belief revision theory to inform the reasoning mech- or other relations. Some representative examples of knowl- anisms is another novelty. As an important consequence the edge graphs include DBpedia1 , which is a structured rep- tool allows for an additional implicit voting mechanism by resentation of Wikipedia data, or EuroVoc2 , a multi-lingual comparing the overlapping parts in the users’ updates. Over- thesaurus of all activities of the European Union. all, the tool illustrates the use of Semantic Web reasoning ca- A critical problem in the life-cycle of a KG is extend- pabilities to support a Human Computation task, a research ing and keeping it up-to-date. This is a costly and time- line which has only been weakly covered so far (Sabou et al. consuming task that is hard to achieve within the boundaries 2018a). of one organization. Therefore, in this paper, we investigate In the rest of the paper we detail the problem setting and the following research question: sketch the general workflow followed by the tool, highlight- Copyright c 2018for this paper by its authors. Copying permitted 3 for private and academic purposes. platform.projectprofit.eu 1 4 dbpedia.org A demo of the system is available: 2 eurovoc.europa.eu research.semantic-web.com/crowd-sourcing/ ing the role and benefits of using belief revision. The user is not able to finalize the input unless he re- solves all the intrinsic inconsistencies in U . Each incon- Problem Setting sistency features a description for user convenience. Upon submitting the input, for educational purposes, the user A core component of the knowledge graph that is to be ex- obtains color-coded feedback on his submission in terms tended is an ontology O. As ontologies are often encoded in of new (blue), confirming (green) and contradicting (red) terms of the OWL5 knowledge representation language, we triples in his input. This consistency checking mecha- define the ontology by relying on the OWL terminology. The nisms is employed during both the collect and the inte- ontology holds definitions of classes and relations between grate phases of the workflow. them. Let A and B be two classes. Let a be an instance. The statement a ∈ A is interpreted as a is of type class A Vote The users vote on triples suggested by other users (box and is called a class assertion. Let R ⊆ A × B be a rela- 3 in Fig. 1). Voting mechanisms are introduced as an an- tion between the two classes. For a ∈ A, b ∈ B one may swer to RQ2 since they initiate interaction and opinion assert R(a, b), i.e. a and b are in relation R; this is called exchange with other users and/or experts in the field. Two a relation assertion. Moreover, every instance can have at- types of voting are implemented. First, in the dedicated tributes whose values are constants (integers, strings, dates, page every authorized user can vote explicitly. The user etc). Statements about class, relation or attribute assertions can vote on triples contributed by others only once. The are atomic knowledge structures that we refer to as triples. user can change the vote (from upvote to downvote and The ontology O is pre-defined and fixed for the crowd- vice versa) or withdraw the vote. If different users suggest sourcing process, i.e. the users cannot suggest new classes the same new triple then an implicit voting mechanism or new relations. Nevertheless, users can suggest new class gets activated. When the difference between upvotes and assertions, new relation assertions, new attribute values. downvotes reaches the trust threshold the triple becomes The basis of our ontology is the Simple Knowledge Or- accepted and the integrate gets activated. ganization Scheme6 . Instances are called concepts in SKOS The users cannot upvote or downvote their own triples. notation. SKOS allows for defining a thesaurus with hier- archical relations broader skos:broader and narrower Integrate The new and verified crowd-sourced knowledge skos:narrower. Moreover, in a SKOS thesaurus every is integrated into the world W (box 4 in Fig. 1). instance may have different labels which denote synonyms of that instance. These labels are important in several ad- Inconsistency detection and management vanced applications where they support tasks such as finding instance mentions in text or disambiguation. In the devel- Core to our approach is identifying differences between the oped crowd-sourced application the users can provide sug- existing (W) and newly contributed (U) knowledge and as- gestions on new instance labels as well. sessing whether inconsistencies arise, as these should be avoided. An inconsistency is defined as a violation of ax- ioms. Since the ontology is defined using SKOS, we take Approach SKOS axioms into account7 . Of all axioms the following The typical workflow of our approach consists of the follow- two could be violated by the user input: ing phases, as illustrated in Fig. 1: 1. “Disjointness of skos:related and Collect The user inputs their update U (box 1 in Fig.1). The skos:broaderTransitive. This specification proposed tool allows users to provide input without refer- treats the hierarchical and associative relations as funda- ring to the existing knowledge graph, i.e. the user is not mentally distinct in nature. Therefore a clash between forced into any particular vision of the subject domain. hierarchical and associative links is not consistent with Users are encouraged to convey their input in a free form, the SKOS data model.” In other words, if instance a is starting from an empty canvas and creating new triples. skos:broader of b then the two instances cannot be In order to enable such freedom and flexibility it is neces- skos:related sary to (1) identify and resolve inconsistencies between U and W and (2) compute overlaps, contradictions and nov- 2. “Cycles in the Hierarchical Relation elties w.r.t. the existing knowledge. This is performed in (skos:broaderTransitive and Reflexivity)”. For the analysis phase, described next. example, a skos:broader b and b skos:broader a. We prohibit this kind of hierarchical cycles for our Analyze and Provide Feedback The user’s update U is an- application. alyzed against the world W (box 2 in Fig. 1) in order to identify new triple suggestions and update the trust Furthermore we introduce two additional axioms and we do thresholds of these triples, as we will discuss in more de- not allow to submit the update unless it is free from these tail in the next section on inconsistency detection. two types of inconsistencies: The user’s input is analyzed in real time and all the in- 3. In U there should not be any disconnected instances. We consistencies in his provided knowledge are highlighted. introduce this requirement to avoid abandoned instances. 5 7 www.w3.org/OWL www.w3.org/TR/skos-reference/ 6 www.w3.org/2004/02/skos #semantic-relations Figure 1: Crowd-sourcing workflow. User 1 and User 2 submit their updates (collect). Let the threshold needed to accept each new suggestion be 2 (analyze). Both updates contain two new suggestions that extend the world. One suggestion is overlapping in the updates (S1 := Mammal → bat), it is implicitly upvoted (vote). User 3 upvotes the same suggestion S1 explicitly through the user interface (vote), therefore S1 gets 2 upvotes, reaches the threshold and it is added to the world (integrate). 4. Every new instance in U should have a broader instance. Threshold This condition requires every new instance to be inte- grated into the hierarchical structure. The threshold t, denoting the trust level of a triple, depends on the number of contradicting triples |Tcontra | and confirm- We distinguish between two sources of inconsistencies: ing triples |Tconf |. In order to encourage users to provide • intrinsic inconsistency, an inconsistency in the update it- larger input and avoid updates with only new facts we intro- self; any of the four identified inconsistency types above duce a “penalty” p; if the user uses less than p triples from may appear here; the existing knowledge graph then the user’s threshold is in- creased. Moreover, each contradicting triple indicates a de- • general inconsistency, an inconsistency that is only viation from the existing knowledge, hence the triples from present in the union of W and U and does not appear nei- the update need to obtain additional support from other users ther in W alone nor in U alone; only violation of axioms to get accepted. Finally, in order to prevent any update to be 1 and 2 may appear as general inconsistencies. accepted automatically we add 1 to the resulting threshold. For the sake of identifying the discrepancies between W The resulting formula: and U only the general inconsistencies are taken into ac- count. As follows from the definitions of axioms 1 and 2, t = max(0, p − |Tconf ∪ Tcontra |) + 2 ∗ |Tcontra |+1 (1) it is always possible to identify the triples in U that cause these inconsistencies; these triples form the set of contra- Example 1 Let p = 5, |Tconf |= 3 and |Tcontra |= 1, i.e. the dicting triples Tcontra . The set of confirming triples Tconf user has provided 3 confirming triples and 1 contradicting. contains the triples contained in both W and U . The set of Then t = max(0, 5 − (3 + 1)) + 2 ∗ 1 + 1 = 4, i.e. at least new triples Tnew contains all the triples that are contained in 4 upvotes are needed to accept the new triples. U but not in W . The new, confirming, and contradicting sets of triples en- Future Work able us to give the user a feedback on his input w.r.t. existing knowledge and quantify the correspondence between the up- In the future we plan to improve the usability and person- date and the world. Moreover, we can relate the updates of alization of the tool by enabling users to start with a pre- different users and enable implicit voting between updates filled canvas. The pre-filled canvas may contain the triples in case the sets of new triples overlap. Now we are in posi- of most interest to the crowd-sourcing process or to the user. tion to compute a distance between U and W and introduce To that end, we will reuse principles outlined in (Wohlge- a threshold for accepting the new triples. nannt, Sabou, and Hanika 2016) and (Sabou et al. 2018b). References Gärdenfors, P. 2003. Belief revision, volume 29. Cambridge University Press. Pan, J. Z.; Vetere, G.; Gomez-Perez, J. M.; and Wu, H. 2017. Exploiting Linked Data and Knowledge Graphs in Large Or- ganisations. Springer, 1st edition. Sabou, M.; Aroyo, L.; Bozzon, A.; and Qarout, R. K. 2018a. Semantic Web and Human Computation: the Status of an Emerging Field. Semantic Web 9(3):1–12. Sabou, M.; Winkler, D.; Biffl, S.; and Penzerstadler, P. 2018b. Verifying conceptual domain models with human computation: A case study in software engineering. In The sixth AAAI Conference on Human Computation and Crowd- sourcing. Wohlgenannt, G.; Sabou, M.; and Hanika, F. 2016. Crowd- based ontology engineering with the uComp Protégé plugin. Semantic Web 7(4):379–398.