=Paper=
{{Paper
|id=Vol-1409/paper-01
|storemode=property
|title=Fixing the Domain and Range of Properties in Linked Data by Context Disambiguation
|pdfUrl=https://ceur-ws.org/Vol-1409/paper-01.pdf
|volume=Vol-1409
|dblpUrl=https://dblp.org/rec/conf/www/TononCDC15
}}
==Fixing the Domain and Range of Properties in Linked Data by Context Disambiguation==
Fixing the Domain and Range of Properties in Linked Data
by Context Disambiguation
Alberto Tonon Michele Catasta
eXascale Infolab EPFL
University of Fribourg Lausanne
Switzerland Switzerland
alberto@exascale.info michele.catasta@epfl.ch
Gianluca Demartini Philippe Cudré-Mauroux
Information School eXascale Infolab
University of Sheffield University of Fribourg
United Kingdom Switzerland
g.demartini@sheffield.ac.uk phil@exascale.info
ABSTRACT data does not always adhere to its corresponding schema, as
The amount of Linked Open Data available on the Web is we discuss in more detail in Section 3. That is, factual state-
rapidly growing. The quality of the provided data, however, ments (i.e., RDF triples) do not always follow the definitions
is generally-speaking not fundamentally improving, hamper- given in the related RDF Schemas or ontologies. Having a
ing its wide-scale deployment for many real-world applica- schema which the published data adheres to allows for bet-
tions. A key data quality aspect for Linked Open Data can ter parsing, automated processing, reasoning, or anomaly
be expressed in terms of its adherence to an underlying well- detection over the data. Also, it serves as a de facto doc-
defined schema or ontology, which serves both as a docu- umentation for the end-users querying the LOD datasets,
mentation for the end-users as well as a fixed reference for fostering an easier deployment of Linked Data in practice.
automated processing over the data. In this paper, we first To mitigate the issues related to the non-conformity of the
report on an analysis of the schema adherence of domains data, statistical methods for inducing the schema over the
and ranges for Linked Open Data. We then propose new data have been proposed. Voelker et al. [9], for example,
techniques to improve the correctness of domains and ranges extract OWL-EL axioms from the data and use statistics to
by i) identifying the cases in which a property is used in the compute confidence values on the axioms. Similar statistics
data with several different semantics, and ii) resolving them were also used in order to detect inconsistencies in the data
by updating the underlying schema and/or by modifying the [8].
data without compromising its retro-compatibility. We ex- In this work, we focus on one particular issue of LOD
perimentally show the validity of our methods through an schema adherence: the proper definition of the properties’
empirical evaluation over DBpedia by creating expert judge- domains and ranges in LOD. More precisely, we propose (see
ments of the proposed fixes over a sample of the data. Section 4) a new data-driven technique that amends both
the schema and the instance data in order to assign bet-
ter domains and ranges to properties; this goal is achieved
Categories and Subject Descriptors by detecting the cases through which a property is used for
H.4.m [Information Systems]: Miscellaneous different purposes (i.e., with different semantics) and by dis-
ambiguating its different uses by dynamically creating new
General Terms sub-properties extending the original property. Thus, our
approach modifies both the schema (new sub-properties are
Experimentation, Algorithms created) and the data (occurrences of the original property
that were used with some given semantics are replaced with
Keywords the newly created sub-property). One of the interesting
Linked Open Data, Schema adherence, Data quality properties of our approach is that the modified data is retro-
compatible, that is, a query made over the original version
of the data can be posed as is over the amended version.
1. INTRODUCTION We evaluate our methods in Section 5 by first comparing
Linked Open Data (LOD) is rapidly growing in terms of how much data it can fix by adjusting different parameters,
the number of available datasets moving from 295 available and then by asking Semantic Web experts to judge the qual-
datasets in 2011 to 1’014 datasets in 20141 . As we report in ity of the modifications suggested by our approach.
Section 2, LOD quality has already been analyzed from dif-
ferent angles; one key LOD quality issue is the fact that the
1
http://lod-cloud.net
2. RELATED WORK
One of the most comprehensive piece of work describing
Copyright is held by the author/owner(s). LOD is the article by Schmachtenberg et al. [7] in which the
WWW2015 Workshop: Linked Data on the Web (LDOW2015). adoption of best practices for various aspects, from creation
to publication, of the 2014 LOD are analyzed. Such prac- is also suggested by the fact that the top-3 most frequent
tices, ultimately, are meant to preserve the quality of a large properties defined in the DBpedia ontology, namely
body of data as LOD—a task that is even more daunting, dpo:birthPlace, dpo:birthYear, and birthDate, have
considering the inherently distributed nature of LOD. WDR and WRR smaller than 0.01, while the top-3 most
Data quality is a thoroughly-studied area in the context of used property in Freebase, namely fb:type.object.type,
companies [6], because of its importance in economic terms. fb:type.type.instance, and fb:type.object.key, have
Recently, LOD did also undergo a similar scrutiny: in [4], an average WDR of 0.30 and an average WRR of 0.87.
the authors show that the Web of Data is by no means a This disparity can in part be explained by the fact that
perfect world of consistent and valid facts. Linked Data has the Freebase ontology is a forest of trees rather than a
multiple dimensions of shortcomings ranging from simple tree with a single root note (as in DBpedia). Thus, while
syntactical errors over logical inconsistencies to complex se- one could expect that each entity in the dataset should
mantic errors and wrong facts. For instance, Töpper et al. [8] descend from ‘object’, this is not the case when looking
statistically infer the domain and range of properties in or- at the data. In addition, we noticed that in DBpedia, out
der to detect inconsistencies in DBpedia. Similarly, Bizer of the 1’368 properties actually used in the data, 1’109
et al. in [5] propose a data-driven approach that exploits have a domain declaration in the ontology and 1’181 have
statistical distributions of properties and types for enhanc- a range declaration. Conversely, Freebase specifies domain
ing the quality of incomplete and noisy Linked Data sets, and range of 65’019 properties but only 18’841 properties
specifically for adding missing type statements, and identi- are used in the data.
fying faulty statements. Differently from us, they leverage In this paper we argue that a number of occurrences of
the number of instances of a certain type appearing in the wrong domains or ranges are due to the fact that the same
property’s subject and object position in order to infer the property is used in different contexts, thus with different se-
type of an entity, while we use data as evidence to detect mantics. The property dpo:gender, for example, whose do-
properties used with different semantics. main is not specified in the DBpedia ontology, is used both
There is also a vast literature ( [9, 3, 2, 1]) that introduces to indicate the gender of a given person and the gender of a
statistical schema induction and enrichment (based on asso- school (that is, if it accepts only boys, girls or both). Hence,
ciation rule mining, logic programming, etc.) as a means to dpo:gender appears both in the context of dpo:GivenName
generate ontologies from RDF data. Such methods can for and of dpo:School. While this can make sense in spoken
example extract OWL axioms and then use probabilities to language, we believe that the two cases should be distinct
come up with confidence scores, thus building what can be in a knowledge base. However, we cannot make a general
considered a “probabilistic ontology” that can emerge from rule out of this sole example as, for instance, we have that
the messiness and dynamicity of Linked Data. In this work, foaf:name (whose domain is not defined in the DBpedia on-
we focus on the analysis of property usage with the goal of tology) is attached to 25 direct subtypes of owl:Thing out of
fixing Linked Data and improve its quality. 33; these types include dpo:Agent (the parent of dpo:Person
and dpo:Organization), dpo:Event, and dpo:Place. In
this case, it does not make sense to claim that all these
3. MOTIVATION AND BASIC IDEAS occurrences represent different contexts in which the prop-
The motivation that led us to the research we are erty appears, since the right domain for this case is indeed
presenting is summarized in Table 1. Its upper part reports owl:Thing, as specified by the FOAF Vocabulary Specifica-
the top-5 properties in DBpedia2 and Freebase3 The table tion.5 Moreover, in this case creating a new property for
reports on the number of times the properties appear with each subtype would lead to an overcomplicated schema. Fi-
a wrong domain, together with their Wrong Domain Rate nally, the fact that dpo:name is not attached to all the sub-
(WDR), that is, the ratio between the number of times the types of owl:Thing suggests that the property is optional.
property is used with a wrong domain to its total number What follows describes the intuition given by this example
of uses. Analogously, the lower part of the table reports in terms of statistics computed on the knowledge base. In
on the top-5 properties by number of range violations and addition, we also present algorithms to identify the use of
their Wrong Range Rate (WRR).4 We observe that the properties in different contexts.
absolute number of occurrences of wrong domains/ranges
in Freebase is two orders of magnitude greater than that of
DBpedia. This cannot be explained only by the different 4. DETECTING AND CORRECTING
number of entities contained in the two knowledge bases MULTI-CONTEXT PROPERTIES
since the number of topics covered by Freebase is only In this section, we describe in detail the algorithm we
one order of magnitude greater than that of DBpedia propose, namely, LeRiXt (LEft and RIght conteXT). For
(approximately 47.43 and 4.58 million topics, respectively, the sake of presentation, we first describe a simpler version
according to their Web-pages). We deduce that in Freebase of the method we call LeXt (LEft conteXT) that uses the
the data adheres to the schema less than in DBpedia. This types of the entities appearing as subjects of the property
2
We used the English version of DBpedia 2014 (http:// in order to identify properties that are used in different con-
dbpedia.org/Downloads2014). texts (multi-context properties). We then present the full
3
We used a dump downloaded on March 30th 2014 (http: algorithm as an extension of this simpler version. For the
//freebase.com). description of the algorithm, we make use of the notation
4
When computing WDR and WRR we do take into account defined in Table 2.
the type hierarchy for computing the violation rate. That
is, if a property has ‘Actor’ as range and is used in a RDF 4.1 Statistical Tools
triple where the object is an ‘American Actor’ we consider
5
it as correct as ‘American Actor’ is a subtype of ‘Actor’. http://xmlns.com/foaf/spec/.
Table 1: Top-5 properties by absolute number of domain violations (top), and range vi-
olations (bottom), with their domain/range violation rate (the truncated properties are
fb:dataworld.gardening_hint.last_referenced_by and fb:common.topic.topic_equivalent_webpage).
DBpedia property #Wrong Dom. WDR Freebase Property #Wrong Dom. WDR
dpo:years 641’528 1.00 fb:type.object.type 99’119’559 0.61
dpo:currentMember 260’412 1.00 fb:type.object.name 41’708’548 1.00
dpo:class 255’280 0.95 fb:type.object.key 35’276’872 0.29
dpo:managerClub 47’324 1.00 fb:type.object.permission 7’816’632 1.00
dpo:address 36’449 0.90 fb:[. . . ].last referenced by 3’371’713 1.00
DBpedia property #Wrong Rng. WRR Freebase Property #Wrong Rng. WRR
dpo:starring 298’713 0.95 fb:type.type.instance 96’764’915 0.61
dpo:associatedMusicalArtist 70’307 0.64 fb:[. . . ]topic equivalent webpage 53’338’833 1.00
dpo:instrument 60’385 1.00 fb:type.permission.controls 7’816’632 1.00
dpo:city 55’697 0.55 fb:common.document.source uri 4’578’671 1.00
dpo:hometown 47’165 0.52 fb:[. . . ].last referenced by 3’342’789 0.99
and we analyze Pr(p | tL i ) for all ti ∈ Ch(owl:Thing) we see
Table 2: Notation used for describing LeRiXt. that the probability is greater than 0 in 25 cases out of 33
Symbol Meaning and is greater than 0.50 in 18 cases, suggesting that all the
KB the knowledge base composed of triples (s, p, o) ti s do not constitute uses of the properties in other contexts
s.t. s ∈ E ∪ T , p ∈ P , o ∈ E ∪ L ∪ T with E set but rather that the properties are used in the more general
of all entities, P set of all properties, T set of all context identified by owl:Thing.
entity types, and L set of all literals. Computationally, we only need to maintain one value for
> the root of the type hierarchy.
e, t an entity and an entity type, respectively. each property p and for each type t, that is the number
eat (e, a, t) ∈ KB, that is, e is an instance of t #(p ∧ tL ) of triples having as subject an instance of t and
p a property. p as predicate. In fact, if we assume that whenever there
tL an entity type t on the left side of a property. is a triple stating that (e, a , t) ∈ KB there is also a triple
tR an entity type t on the right side of a property. (e, a , t0 ) ∈ KB for each ancestor t0 of t in the type hierarchy,
Par(t) the parent of a type t in the type hierarchy. we have that
Ch(t) the set of children of a type t in the type hierar-
∀p ∈ P.| (s, p0 , o) ∈ KB | p = p0 | = #(p ∧ >L ),
chy.
Cov(p0 ) the coverage of a sub-property p0 of a property X
∀p ∈ P.| (s, p0 , o) ∈ KB | s ∈ t | = #(p0 ∧ tL ).
p, that is, the rate of occurrences of p covered by
p0 . p0 ∈P
The computation of all the #(p ∧ tL ) can be done with one
map/reduce job similar to the well-known word-count ex-
LeXt makes use of two main statistics: Pr(tL | p), that ample often used to show how the paradigm works, thus, it
is, the conditional probability of finding an entity of type t can be efficiently computed in a distributed environment al-
as the subject of a triple having p as predicate (i.e., finding lowing the algorithms we propose to scale to large amounts
t “to the Left” of p), and the probability Pr(p | tL ), that is, of data. Another interesting property implied by the type
the probability of seeing a property p given a triple whose subsumptions of the underlying type hierarchy is that if
subject is an instance of t. Equation 1 formally defines those t1 ∈ Ch(t0 ) then Pr(tL L
1 | p) ≤ Pr(t0 | p). Assuming the same
two probabilities. premises, however, nothing can be said about Pr(p | tL0 ) and
Pr(p | tL
1 ).
| { (s, p0 , o) ∈ KB | s a t, p = p0 } |
Pr(tL | p) = 4.2 LeXt
| { (s, p0 , o) ∈ KB | p = p0 } |
(1) As previously anticipated, LeXt detects multi-context
| { (s, p0 , o) ∈ KB | s a t, p = p0 } |
Pr(p | tL ) = properties by exploiting the types of the entities found on
| { (s, p0 , o) ∈ KB | s ∈ t } |
the left-hand side of the property taken into consideration.
As one can imagine, Pr(tL | p) = 1 indicates that t is a suit- Specifically, given a property p, the algorithm makes a
able domain for p, however, t can be very generic. In partic- depth-first search of the type hierarchy starting from
ular Pr(>L | p) = 1 for every property p where > is the root the root to find all cases for which there is enough
of the type hierarchy. Conversely, Pr(p | tL ) measures how evidence that the property is used with a different context.
common a property is among the instances of a certain type. Practically, at each step, a type t—the current root of the
Pr(p | tL ) = 1 suggests that the property is mandatory for tree—is analyzed and all the ti ∈ Ch(t) having Pr(tL i | p)
t’s instances. In addition, whenever we have strong indica- greater than a certain threshold λ are considered. If
tors that a property is mandatory for many children ti of a there is no such child, or if we are in a case similar to
given type t, that is, Pr(p | tL
i ) is close to 1 for all ti s, we can that of the foaf:name example described previously, a
deduce that t is a reasonable domain for p and that all the ti new sub-property t p of p is created with t as domain;
are using p as an inherited (possibly optional) property. For otherwise the method is recursively called on each ti .
example, if in DBpedia we consider the property foaf:name Finally, cases analogous to the foaf:name example are
Thing Algorithm 1 LeXt
1.00
Require: 0 ≤ λ ≤ 1 strictness threshold, η ≥ 0 entropy thresh-
H = 0.09 SportSeason Agent ... old.
0.55 0.24
Require: curr root ∈ T the current root, p ∈ P .
SportsTeamSeason ... ... Organisation Require: acc a list containing all the meanings found so far.
0.55 0.44 Ensure: acc updated with all
the meanings of p.
1: p given t ← Pr p | tD
c ) | tc ∈ Ch(curr root)
SoccerClubSeason ... SportsTeam ... 2: H = Entropy(p given t)
0.55 0.44
3: candidates ← tc | tc ∈ Ch(curr root) ∧ Pr(tD c | p) ≥ λ
4: if H ≥ η ∨ candidates = ∅ then
H = 1.96 Soccer Baseball
"
... Cricket
"
Rugby" 5: if Pr(curr root | p) = 1 then
0.42 1 k 1 k
6: acc ← (p, curr root, 1) : acc
Figure 1: Execution of LeXt on dpo:manager. 7: else
8: p0 ← new property(p, curr root)
9: KB ← KB ∪ { (p0 , rdfs:subPropertyOf, p) }
detected by using the entropy of the probabilities Pr(p | tD i ) 10: acc ← p0 , curr root, Pr(curr root 0 | p) : acc
with ti ∈ Ch(t) that captures the intuition presented 11: end if
while introducing the above mentioned statistics. Since, 12: else
P L
in general, ti ∈Cht Pr(p | ti ) 6= 1,P we normalize each 13: for c ∈ candidates do
probability by dividing it by Z = 14: LeXt(λ, η, c, acc)
ti ∈Ch(t) ti and we
compute the entropy H using Equation 2. 15: end for
16: end if
X Pr(p | ti ) Pr(p | ti )
H p | Ch(t) = − · log2 (2)
Z Z
ti ∈Ch(t) Section 5 we show how the algorithm behaves with varying
Algorithm 1 formally describes the full process. In the levels of strictness.
pseudo-code, a context of the input property is encoded with The presented algorithm has a number of limitations. In
a triple (p0 , dom(p0 ), coverage) where p0 is a property identi- particular, it does not explicitly cover the cases for which
fying the context, dom(p0 ) is its domain, and coverage ≥ λ one type has more than one parent, thus multi-inheriting
is the rate of the occurrences of p covered by the context, from several other types. In that case, an entity type can
denoted by Cov(p0 ). If the coverage is one, p is used in just be processed several times (at most once per parent). We
one context (see Line 5). In Line 8, a new property p0 is leave to future work studying if simply making sure that
created and its domain is set to curr root, while in Line 9, each node is processed once is enough to cover that case.
p0 is declared to be a sub-property of p: this makes the data
retro-compatible under the assumption that the clients can
4.4 ReXt and LeRiXt
resolve sub-properties. Ideally, after the execution of the It is straightforward to define a variant of LeXt that con-
algorithm, all the triples referring to the identified mean- siders property ranges instead of property domains by using
ings should be updated. The algorithm can also be used to Pr(tR | p) and Pr(p | tR ). We call this method ReXt. In our
obtain hints on how to improve the knowledge base. implementation we only consider object properties, that is,
The execution steps of the algorithm on dpo:manager properties that connect an entity to another entity (rather
(m, for short) with λ = 0.4 and η = 1 are depicted in than, for example, to a literal since these values are not
Figure 1. The entity types are organized according to the entities and thus are not in the type hierarchy).
DBpedia type hierarchy and each type t is subscripted by Generalizing LeXt to identify multi-context properties
Pr(t | m). As can be observed, during the first step the based on both domains and ranges is a more complicated
children of owl:Thing are analyzed: the entropy constraint task. The solution we propose is called LeRiXt and con-
is satisfied and two nodes satisfy the Pr(t | m) constraint. sists in using two copies of the type hierarchy, one for the
The exploration of the dpo:sportsSeason branch ends domains, and one for the ranges. At each step there is a
when dpo:SoccerClubSeason is reached. The triple “current domain” td and a “current range” tr whose children
(SoccerClubSeason manager, dpo:SoccerClubSeason, are analyzed (thus the algorithm takes one more parameter
0.55) is returned. The new property is a sub-property of than LeXt). Instead of using the condition Pr(tD | p) ≥ λ
dpo:manager that covers 55% of the occurrences. Finally, to select the candidate types to explore, we use Pr(tD R
i ∧ tj |
the algorithm goes down the other branch until the entropy p) ≥ λ for each ti ∈ Ch(td ), tj ∈ Ch(tr ), and we recursively
constraint is violated and returns the context (SportsTeam call LeRiXt for each pair of types satisfying the constraint
manager, dpo:SportsTeam, 0.45). (see Line 14 of Algorithm 1).
4.3 Discussion 5. EXPERIMENTS
The threshold λ sets a condition on the minimum degree We empirically evaluate the three methods described in
of evidence we need to state that we have identified a new Section 4, namely, LeXt, ReXt, and LeRiXt, first by
meaning for p, expressed in term of the Pr(t | p) probability. studying how they behave when varying the threshold λ,
This threshold is of key importance in practice. On the one and then by measuring the precision of the modifications
hand, low thresholds require little evidence and thus foster they suggest. The LOD dataset we selected for our evalua-
the creation of new properties, possibly over-populating the tion is DBpedia 2014 since its entity types are organized in a
schema. On the other hand, high thresholds almost never well-specified tree, contrary to Freebase, whose type system
accept a new meaning of a property, thus inferring coarser is a forest. As we anticipated in Section 4.4, we consider
domains. In particular, with λ = 1 the exact domain of p only object properties when the range is used to identify
is inferred (which in several cases can result to be >). In multi-context properties by using ReXt and LeRiXt. The
1.0 LeXt ReXt LeRiXt of the algorithm as future work.
Avg. Coverage In practice, we envision our algorithms to be used as a
0.8 decision-support tool for LOD curators rather than a fully
0.6 automatic system to fix LOD datasets.
0.4
0.2
0.0
6. CONCLUSIONS
1000 In this paper, we tackled the problem of extracting and
# New Properties
800 then amending domain and range information from LOD.
The main idea behind our work stems from the observa-
600
tion that many properties are misused at the instance level
400
or used in several, distinct contexts. The three algorithms
200
we proposed, namely, LeXt, ReXt, and LeRiXt, exploit
0 statistics about the types of the entities appearing as subject
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Threshold Value and object in the triples involving the property analyzed in
order to identify the various cases in which a multi-context
Figure 2: Average coverage and number of new property is used. Once a particular context is identified,
properties with varying values of the threshold λ. a new sub-property is derived such that occurrences of the
original property can be substituted using the newly gener-
numbers of properties we take into consideration when run- ated sub-property. Our methods can also be used to provide
ning LeXt and the other two algorithms are 1’368 and 643, insight into the knowledge base analyzed and how it should
respectively. Finally, during our experimentation we fix the be revised in subsequent iterations. We evaluated our meth-
η threshold to 1. This value was chosen based on the anal- ods by studying their behavior with different parameter set-
ysis of the entropy stopping criterion on a small subset of tings and by asking Semantic Web experts to evaluate the
properties. generated sub-properties.
The impact of λ on the output of the algorithms is studied The algorithms we propose require the entities contained
in terms of average property coverage and number of gen- in the dataset to be typed with types organized in a tree-
erated sub-poperties. Recall that in Section 4.2 we defined structured type hierarchy. As future work, we plan to run a
the coverage of a sub-property. Here we measure the prop- deeper evaluation of our techniques, and to design a method
erty coverage, defined as the overall rate of occurrences of a that overcomes the limitation presented above by consider-
certain property p that is covered by its sub-properties, that ing the case in which the entity types are organized in a
is, the sum of Cov(p0 ) for all p0 generated sub-property of p. Direct Acyclic Graph, thus supporting multiple inheritance.
In the upper part of Figure 2 the average over the prop-
erty coverage is shown for various λ. We notice that, as Acknowledgments
expected, lower values of λ lead to a high coverage since
many new properties covering small parts of the data are cre- This work was supported by the Haslerstiftung in the
ated. As the value of the threshold increases, fewer and fewer context of the Smart World 11005 (Mem0r1es) project
properties are created, reaching the minimum at λ = 1. In- and by the Swiss National Science Foundation under grant
terestingly, we observe that the average coverage curve is number PP00P2 128459.
M-shaped with a local minimum at λ = 0.5. That is the
consequence of the fact that with λ ≥ 0.5 the new proper-
ties are required to cover at least half of the occurrences of 7. REFERENCES
[1] L. Bühmann and J. Lehmann. Universal OWL axiom
the original property, leaving no space for other contexts, enrichment for large knowledge bases. LNCS, 7603
thus, at most one new context can be identified for each LNAI:57–71, 2012.
property. Finally, at λ = 1 the average coverage drops to [2] C. d’Amato, N. Fanizzi, and F. Esposito. Inductive learning
0 since no sub-property can cover all the instances of the for the semantic web: What does it buy? Semantic Web,
original property. 1(1):53–59, 2010.
In order to evaluate the output produced by the methods, [3] G. A. Grimnes, P. Edwards, and A. Preece. Learning
3 authors and 2 external experts evaluated the output of the meta-descriptions of the foaf network. In The Semantic
Web–ISWC 2004, pages 152–165. Springer, 2004.
algorithms computed on a sample of fifty randomly selected
[4] M. Knuth and H. Sack. Data Cleansing Consolidation with
DBpedia properties using λ = 0.1 and η = 1. To decide PatchR. In ESWC, volume 8798 of LNCS, pages 231–235.
whether the context separation proposed by the algorithm Springer, 2014.
is correct or not, we built a web application showing to the [5] H. Paulheim and C. Bizer. Improving the Quality of Linked
judges the clickable URI of the original property together Data Using Statistical Distributions. I. J. Semantic Web
with the types of the entities it appears with. The judges Inf. Syst., 10(2):63–86, Jan. 2014.
had then to express their opinion on every generated sub- [6] L. L. Pipino, Y. W. Lee, and R. Y. Wang. Data quality
property. assessment. Communications of the ACM, 45(4):211, 2002.
The judgments were aggregated by majority vote and then [7] M. Schmachtenberg, C. Bizer, and H. Paulheim. Adoption of
the linked data best practices in different topical domains. In
precision was computed by dividing the number of positive ISWC, pages 245–260, 2014.
judgments by the number of all judgments. LeXt, ReXt, [8] G. Töpper, M. Knuth, and H. Sack. DBpedia ontology
and LeRiXt achieved a precision of 96.50%, 91.40%, and enrichment for inconsistency detection. I-SEMANTICS,
87.00%, respectively. page 33, 2012.
We note that this result was obtained with just one con- [9] J. Völker and M. Niepert. Statistical schema induction.
figuration of the parameters—we leave a deeper evaluation LNCS, 6643 LNCS:124–138, 2011.