<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Diagnosis and Repair Framework for DL-Lite KBs A</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michalis Chortis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giorgos Flouris</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ICS-FORTH</institution>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Several logical formalisms have been proposed in the literature for expressing structural and semantic integrity constraints of Linked Open Data (LOD). Still, the integrity of the datasets published in the LOD cloud needs to be improved, as published data often violate such constraints, jeopardising the value of applications consuming linked data in an automatic way. In this work, we propose a novel, fully automatic framework for detecting and repairing violations of integrity constraints, by considering both explicit and implicit ontological knowledge. Our framework relies on the ontology language DL-LiteA for expressing several useful types of constraints, while maintaining good computational properties. The experimental evaluation shows that our framework is scalable for large datasets and numbers of invalidities exhibited in reality by reference linked datasets (e.g., DBpedia).</p>
      </abstract>
      <kwd-group>
        <kwd>Repairing</kwd>
        <kwd>diagnosis</kwd>
        <kwd>DL-LiteA</kwd>
        <kwd>integrity constraints</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Linked Open Data (LOD) published on the Web of Data are often associated
with various structural (e.g., primary key) and semantic (e.g., disjointness)
integrity constraints. These constraints are usually expressed in ontological [
        <xref ref-type="bibr" rid="ref19 ref22">19, 22</xref>
        ]
or database [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] logic frameworks. However, LOD sources do not impose such
constraints a priori, when data are created, so violations of integrity constraints
must be detected and repaired a posteriori. As have been reported in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ],
reference LOD sources, such as DBpedia1 or LinkedGeoData2, exhibit millions of
violations (this is also veri ed by our own experiments { see Table 3).
      </p>
      <p>
        In most of the cases, LOD are manually repaired by their curators or by
their consuming applications, using, at best, diagnosis approaches or tools (e.g.,
[
        <xref ref-type="bibr" rid="ref16 ref19 ref22">16, 19, 22</xref>
        ], Stardog3, QuOnto [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] etc.) for detecting violations of various types
of integrity constraints. Obviously, the manual repair of millions of violations
is a time-consuming and error-prone task, a fact that seriously limits the data
quality of the available LOD sources. Thus, a major challenge is to automatically
      </p>
    </sec>
    <sec id="sec-2">
      <title>1 http://dbpedia.org</title>
    </sec>
    <sec id="sec-3">
      <title>2 http://linkedgeodata.org</title>
    </sec>
    <sec id="sec-4">
      <title>3 http://stardog.com/</title>
      <p>detect and repair violations of both structural and semantic integrity constraints,
especially when ontology reasoning is involved (i.e., detect and repair violations
of constraints like disjointness, functional constraints etc., taking into account
logical inference and its interaction with those constraints).</p>
      <p>
        In this work, we propose a novel automatic framework for assisting
curators in the arduous task of enforcing integrity constraints in large datasets. We
provide an e cient methodology for detecting invalidities (diagnosis ), as well
as for automatically resolving them (repairing ), in a manner that has minimal
impact in terms of lost knowledge on the Knowledge Base (KB), according to
the principles set out in earlier works [
        <xref ref-type="bibr" rid="ref2 ref8">2, 8</xref>
        ].
      </p>
      <p>
        We consider detecting and repairing of invalidities attributed to constraints
of a purely logical nature (e.g., class disjointness). Constraints are expressed in
the language DL-LiteA [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], which belongs to the DL-Lite family of ontology
languages that forms the foundation of the popular OWL 2 QL4 language. The
choice of DL-LiteA was motivated by the fact that it is arguably rich enough to
capture several useful types of integrity constraints that are used in practice in
LOD datasets, and their interaction with implicit knowledge, while at the same
time supporting e cient query answering [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>The main contributions of our work are the following:
{ We propose a framework for detecting and automatically repairing
invalidities, for constraints that are expressed in DL-LiteA, namely: concept/property
disjointness constraints, property domain/range disjointness constraints and
functional constraints. Diagnosis of invalidities related to both explicit and
inferred constraints can be performed in linear time with respect to the
dataset size, whereas repairing can be performed in polynomial time with
respect to the number of invalidities.
{ We have implemented an operational repairing system for real-world
applications. Our implementation is modular, allowing each component to be
implemented in a manner independent to the other components. This way,
we managed to reuse o -the-shelf, state-of-the-art tools for many of the
components, such as reasoning, storage, query answering, etc.
{ We have experimentally evaluated the scalability and performance of our
algorithms, using real and synthetic datasets. The main conclusion drawn is
that our framework can scale for very large datasets, such as DBpedia, as
well as for large numbers (millions) of invalidities.</p>
      <p>The rest of the paper is structured as follows: in Section 2, we motivate the use
of the DL-LiteA language for this problem and explain its features; in Section 3,
we describe our framework and explain how we address the problems of detecting
and resolving invalidities; Section 4 describes our algorithms for diagnosis and
repairing; in Section 5, we describe our experimental evaluation and report on
the main conclusions drawn; nally, Section 6 compares our contributions to the
related work and Section 7 concludes.</p>
    </sec>
    <sec id="sec-5">
      <title>4 http://www.w3.org/TR/owl2-profiles/#OWL_2_QL</title>
      <sec id="sec-5-1">
        <title>Preliminaries</title>
        <p>
          In DL-LiteA [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], concept expressions, hereafter expressed by the letter C, and
role expressions, denoting binary relations between concepts and hereafter
expressed by the letter R, are formed according to the following syntax, where A
denotes an atomic concept and P denotes an atomic role:
        </p>
        <p>C ! A j 9R R ! P j P
A DL-LiteA TBox consists of axioms of the following form:</p>
        <p>C1 v C2 C1 v :C2 R1 v R2 R1 v :R2 (funct R)
A DL-LiteA ABox is a nite set of assertions of the following form:</p>
        <p>A(x) P (x; y)</p>
        <p>
          In order to guarantee good complexity results for reasoning tasks like
consistency checking, DL-LiteA imposes a limitation in the TBox, namely that a
functional role cannot be specialized by using it in the right-hand side of a role
inclusion assertion. This means that if a DL-LiteA TBox contains an axiom of
the form R0 v R, then it cannot contain (funct R) or (funct R ) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Note that
DL-LiteA assertions can be also expressed in OWL syntax.
        </p>
        <p>DL-LiteA follows the standard reasoning semantics of DLs [4{6]. A DL-LiteA
KB K = hT ; Ai is called inconsistent i T [ A is inconsistent (in the standard
logical sense). It is called consistent otherwise.</p>
        <p>
          With respect to performance, DL-LiteA has the important property of
FOLReducibility [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], which essentially means that one can reduce the process of
inconsistency checking and query answering to the evaluation of First-Order Logic
(FOL) queries over the ABox, considered as a database; this makes both tasks
tractable (in LogSpace with respect to the data) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
3
3.1
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>Diagnosis and Repair</title>
        <p>
          Constraints in DL-LiteA
For the purposes of diagnosis and repair, we can distinguish three di erent types
of DL-LiteA TBox axioms, namely positive inclusions (of the form C1 v C2,
R1 v R2), negative inclusions (of the form C1 v :C2, R1 v :R2) and
functionality assertions (of the form funct R). This distinction is important for diagnosis
and repair due to the fact that the ABox is viewed under the Open World
Assumption (OWA), which is considered for Description Logics and the ontology
languages of the Semantic Web in general (such as OWL { but see [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] for an
e ort to understand OWL under the Closed World Assumption, and the NRL
language5 for a similar analysis). Due to the OWA, a TBox consisting of positive
inclusions only can never lead to an inconsistent KB; therefore, the only
interesting (from the diagnosis perspective) constraints are the negative inclusions and
the functionality assertions. In the following, the term constraint will be used to
refer to negative TBox inclusions and functionality assertions.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5 http://www.semanticdesktop.org/ontologies/2007/08/15/nrl</title>
      <p>Despite that, positive inclusions are still relevant for the diagnosis process,
because they may generate inferred information that should be taken into
account. As an example, assume that the TBox contains the constraint A1 v :A3
and the axiom A2 v A3 (where A1; A2; A3 are atomic concepts), and suppose
that the ABox contains both A1(x) and A2(x) for some x. Even though no
constraint is explicitly violated, the combination of the ABox contents with the
aforementioned TBox would lead to inferring both A3(x) and :A3(x), i.e., an
invalidity. Note that the positive inclusion A2 v A3, albeit not violated itself,
plays a critical role in creating this invalidity.</p>
      <p>Rather than capturing such invalidities via the obvious method of computing
the closure of the ABox, it is more e cient to identify the constraints implied
by the explicitly declared constraints and the positive inclusions in the TBox. In
our example, we could identify that the constraint A1 v :A2 is a consequence of
the two explicit axioms in the TBox, so the presence of A1(x) and A2(x) violates
this implicit constraint.</p>
      <p>
        This process amounts to computing all explicit and implicit constraints of
the TBox (denoted by cln(T )) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], i.e., the set of all the functionality assertions
and the explicit and implicit negative inclusions present in the TBox. In fact,
it has been proven that, in order to check the consistency of a DL-LiteA KB,
one has to take into account only the constraints in cln(T ) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. More formally, a
DL-LiteA KB K = hT ; Ai is inconsistent i there is a constraint c 2 cln(T ) and
a pair of assertions a1; a2 2 A such that the DL-LiteA KB K0 = hfcg; fa1; a2gi is
inconsistent [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In the following, the triple (a1; a2; c) will be called an invalidity
of K. It is obvious by the above result that in order to render a KB consistent,
for each invalidity (a1; a2; c), one of a1; a2 has to be removed from the ABox.
Example 1. Consider the following DL-LiteA KB K = hT ; Ai:
      </p>
      <p>T = f(funct P1), A1 v :A2, 9P2 v A1g</p>
      <p>
        A = fA1(x1); A2(x1); P2(x1; y1); P1(x3; y2); P1(x3; y3); P1(x3; y4)g
The closure of negative inclusions and functionality assertions of T (cln(T )),
computed in the way that was presented in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], is the following:
cln(T ) = f(funct P1), A1 v :A2, 9P2 v :A2g
From the computed closure, we can easily deduce that (A1(x1); A2(x1); A1 v
:A2) is one of the invalidities in the KB. tu
3.2
      </p>
      <sec id="sec-6-1">
        <title>Approach for Diagnosis and Repair</title>
        <p>
          Diagnosis amounts to identifying the invalidities, i.e., the data assertions and
the (possibly implicit) constraint that are involved in an invalidity. Using the
property of FOL-Reducibility, the identi cation of invalidities in a DL-LiteA
KB can be reduced to the execution of adequately de ned FOL queries over a
database [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] { see also Table 1. Exploiting this property, diagnosis is performed
by simply executing the queries corresponding to the constraints in cln(T ), to
get all the invalidities of the KB under question.
        </p>
        <p>Repairing is based on the aforementioned property that restoring consistency
requires eliminating all invalidities from a KB via removing either one of the two
data assertions that take part in each invalidity; formally:
De nition 1. Given a DL-LiteA KB K = hT ; Ai a repairing delta of K is a
selection of data assertions RD, such that K0 = hT ; A n RDi is consistent. A
repairing delta RD is called minimal i there is no repairing delta RD0, such
that RD0 RD.
tu</p>
        <p>
          The notion of minimality is important, as many authors have proposed the
identi cation of minimal repairing deltas (under di erent forms of minimality)
as one of the main concerns during repairing [
          <xref ref-type="bibr" rid="ref2 ref8">2, 8</xref>
          ]; as is obvious by De nition 1,
minimal repairing deltas correspond to subset repairs in the terminology of [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>Identifying the minimal repairing delta(s) is not trivial. The computation
of such delta(s) is based on the fact that constraints expressed in DL-LiteA
allow the presence of interrelated invalidities, i.e., data assertions being involved
in more than one invalidities. This implies that potential resolutions of such
invalidities coincide, and that there exist resolutions which resolve more than
one invalidity at the same time.</p>
        <p>To help in the process of identifying the minimal repairing delta, the
diagnosed invalidities are organized into an interdependency graph, which is used to
identify assertions involved in multiple invalidities. Formally:
De nition 2. The interdependency graph of a DL-LiteA KB K = hT ; Ai is an
undirected labelled graph IG(K) = (V; E) such that V = fa j (a1; a2; c) is an
invalidity of K and a = a1 or a = a2g and E = f(a1; a2; c) j (a1; a2; c) is an
invalidity of Kg.
tu</p>
        <p>
          The use of the interdependency graph as a structure to represent the
invalidities that are diagnosed in the KB gives the ability to get a better grasp of
the form and complexity of the invalidities and their interrelationships, as well
as to use methods and tools that come from graph theory in order to facilitate
the repairing process. Note that an interdependency graph is di erent from a
con ict-graph [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], as the interdependency graph does not contain every assertion
in the ABox, having an obvious impact in the algorithm time-cost.
        </p>
        <p>
          In terms of the interdependency graph, resolving an invalidity amounts to
removing one of the two vertices that are connected by the edge representing this
invalidity. Therefore, a minimal repairing delta is essentially the minimal vertex
cover of the corresponding interdependency graph, which reduces the problem
of repairing to the well-known problem of vertex cover [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. This fact forms
the basis of our algorithms presented in the next section.
4
4.1
        </p>
        <sec id="sec-6-1-1">
          <title>Algorithms for Diagnosis and Repairing</title>
        </sec>
      </sec>
      <sec id="sec-6-2">
        <title>Diagnosis Algorithm</title>
        <p>The diagnosis algorithm is used to detect all the invalidities in a KB, and provide
them as output in the form of an interdependency graph. The steps needed to
perform diagnosis are illustrated in Algorithm 1.
Algorithm 1 Diagnosis(K)</p>
        <p>
          The diagnosis algorithm starts by computing the closure cln(T ) of negative
inclusions and functionality assertions of the TBox (line 2 of Algorithm 1), in
order to get the full set of constraints that need to be checked over the ABox.
Each of the constraints in cln(T ) is then transformed to a FOL query (line 4)
using prede ned patterns, as de ned in Table 1 (see also [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]), whose answers
determine the invalidities. These queries are executed over the ABox in line 5
(Ansqc contains pairs ha1; a2i such that (a1; a2; c) is an invalidity). Note that
these FOL queries can be easily expressed as SPARQL queries over an ABox
stored in a triple store, so that o -the-shelf, optimized tools can be used for
query answering. The last step of the algorithm encodes the invalidities in the
form of an interdependency graph (lines 6-9) as speci ed in De nition 2.
        </p>
        <p>Constraint (c)
c = A1 v :A2 (c) = q(x)
c = A1 v :9P1 (or c = 9P1 v :A1) (c) = q(x)
c = A1 v :9P1 (or c = 9P1 v :A1) (c) = q(x)
c = 9P1 v :9P2 (c) = q(x)
c = 9P1 v :9P2 (c) = q(x)
c = 9P1 v :9P2 (c) = q(x)
A1(x); A2(x)
A1(x); P1(x; y)
A1(x); P1(y; x)
P1(x; y1); P2(x; y2)
P1(y1; x); P2(y2; x)</p>
        <p>P1(x; y1); P2(y2; x)</p>
        <p>Transformation ( (c))
c = P1 v :P2 (or c = P1 v :P2 ) (c) = q(x; y) P1(x; y); P2(x; y)
c = P1 v :P2 (c) = q(x; y) P1(x; y); P2(y; x)
c =(funct P ) (c) = q(x) P (x; y1); P (x; y2)
c =(funct P ) (c) = q(x) P (y1; x); P (y2; x)
Table 1: Transformation of DL-LiteA constraints to FOL queries.
The following example illustrates the diagnosis algorithm in action:
Example 2. Consider the KB K and the cln(T ) of Example 1. The corresponding
FOL queries to check for invalidities, according to Table 1 are:</p>
        <p>
          As already mentioned, computing cln(T ) (line 2) is in LogSpace with
respect to the data [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], whereas the remaining steps of the algorithm are linear
with respect to the invalidities and the constraints in cln(T ).
The repairing algorithm (Algorithm 2) takes as input the interdependency graph
and is responsible for automatically repairing the KB. As explained in
Section 3.2, the main idea behind the repairing algorithm is the computation of the
vertex cover of the interdependency graph.
        </p>
        <p>To do so, the repairing algorithm rst breaks the interdependency graph
IG(K) into the set of its connected components (line 2). Note that the
computation of the vertex cover for each of the connected components is independent
to the others, and can be parallelized for better performance.</p>
        <p>
          This computation (vertex cover) is performed in lines 3-5. Recall that vertex
cover is a well-known NP-complete problem [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], but many approximation
algorithms have been proposed, such as the 2-approximation algorithm [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], or
the approximation algorithm presented in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
Algorithm 2 Repair(IG(K); A)
Input: An interdependency graph IG(K) and a DL-LiteA ABox A
Output: K in a consistent state
1: repairing delta ;
2: CC ConnectedComponents(IG(K))
3: for all cc 2 CC do
4: repairing delta
5: end for
6: A A n repairing delta
        </p>
        <p>repairing delta [ GreedyV ertexCover(cc)</p>
        <p>
          For our implementation and experiments below, we chose (for e ciency) to
compute the vertex cover in a greedy manner, as presented in [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] (but any
other algorithm for vertex cover could be used instead). Greedy means that,
in each step of the computation, the vertex that is chosen to be included in
the cover is the vertex with the highest degree (in other words, the invalid
data assertion that is part of the most invalidities). If there exist more than
one vertices with the same degree, one of those vertices is arbitrarily chosen;
this arbitrary choice avoids the need for complex and time-consuming selection
conditions and guarantees that a single vertex cover is returned by the algorithm.
This computation is performed in the GreedyVertexCover subroutine, which is
omitted for brevity.
        </p>
        <p>The output of lines 3-5 (repairing delta) contains the data assertions to be
removed from the dataset in order to render it valid. The actual repairing is
performed in line 6, through a single SPARQL-Update statement6 requesting
the deletion of all the assertions in the repairing delta.</p>
        <p>
          The correctness of our algorithms is guaranteed by our analysis in Section 3
and the results in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The computational complexity of Algorithm 2 is dominated
by the computation of the vertex cover, which is proven to achieve O(log n)
approximation of the optimal solution (where n is the number of vertices of the
graph), with a time complexity of O(n log n) [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
        </p>
        <p>The following concludes the running example for our framework:
Example 3. Consider the interdependency graph of Figure 1. The repairing
algorithm will compute the following repairing delta:</p>
        <p>repairing delta = fA2(x1); P1(x3; y2); P1(x3; y3)g
After the application of the repairing delta, the ABox A is in the following state:</p>
        <p>A = fA1(x1); P2(x1; y1); P1(x3; y4)g
which can be easily veri ed to be a consistent KB with respect to T .
tu
6 http://www.w3.org/TR/2013/REC-sparql11-update-20130321/
5.1</p>
        <sec id="sec-6-2-1">
          <title>Experimental Evaluation</title>
        </sec>
      </sec>
      <sec id="sec-6-3">
        <title>Overview of Experimental Evaluation</title>
        <p>We have implemented our framework as a Java web application. More speci
cally, we have created a system that uses a triple store, which lies on a Virtuoso
Open-Source Edition Server7 version 07.10, as a storage for the ABox instances
and as an endpoint for query answering. For storing the set of constraints, we
used a main memory model, which, along with the communication with the
Virtuoso Server, are handled by the Apache Jena8 framework. Apache Jena is also
used for the various reasoning tasks (e.g., computation of cln(T )). The system
used for the experiments was an AMD Opteron 3280, 8-core CPU with 24GB
RAM (we allocated 8GB for the JVM), running Ubuntu Server 12.04.</p>
        <p>We have performed several experiments in order to measure the performance
and scalability of our framework, as well as to determine the decisive factors
for the performance of the di erent phases of the process. More speci cally, we
performed three sets of experiments: (i) the rst set veri ed that our framework
can handle millions of violations in real-world ABoxes that scale up to more
than 2 billion triples, considering hundreds of thousands of constraints; (ii) the
second set of experiments quanti ed the impact of ABox size on performance, by
using real Tboxes with constraints and synthetic ABoxes of varying sizes; and,
(iii) the third set quanti ed the impact of the number of invalid data assertions
on performance, by using real TBoxes with constraints and synthetic ABoxes
with varying number of invalid data assertions.</p>
        <p>In all of the above sets of experiments, we measured the time needed to
run the diagnosis algorithm and produce the interdependency graph (diagnosis
time), the time needed by the repairing algorithm to compute the repairing delta
(repair computation time) and the time needed to apply this repairing delta on
the dataset, using a SPARQL-Update query (repair application time). All of our
experiments were run in sets of 5 hot runs and the average times were taken.
5.2</p>
      </sec>
      <sec id="sec-6-4">
        <title>Real and Synthetic Datasets Used</title>
        <p>For the TBox, we used two versions (3.6, 3.9) of the DBpedia ontology, which
is a reference dataset for LOD, already containing di erent amounts and types
of constraints; this is illustrated in Table 2, which shows information on how
many functional and (concept/domain/range) disjointness constraints exist in
the original TBox, as well as how many of these exist in the closure of negative
inclusions (cln(T ))9, and how many queries need to be executed for diagnosis.
Property disjointness is the only type of constraint supported by DL-LiteA that</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7 http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/</title>
    </sec>
    <sec id="sec-8">
      <title>8 http://jena.apache.org/</title>
      <p>9 The big di erence in the amount of disjointness constraints between the original
DBpedia 3.9 and its closure is caused by the many positive inclusions and their
interaction with the negative inclusions during the computation of the closure.
TBox version FunctCioonnasltDr aisijnotinstness FCuonncsttiornaainltDsisinjoicnltnn(eTss) Queries
DBpedia 3.6 18 0 18 0 18
DBpedia 3.9 26 17 26 323.389 323.415
was not considered in our experiments, because we were unable to nd any real
TBoxes with this constraint (so it seems irrelevant for practical applications).</p>
      <p>For running our experiments, we used the above TBoxes, together with both
real and synthetic ABoxes. Real ABoxes were used to evaluate our system in
realistic conditions, whereas synthetic ABoxes allow controlling the important
factors for the performance of our algorithm, such as size and number of
invalidities, and the appropriate evaluation of their e ect on performance.</p>
      <p>Real ABoxes were taken by the two DBpedia versions corresponding to the
two aforementioned TBoxes, stored in a local Virtuoso instance. The DBpedia
3.6 ABox contains around 541 million triples, whereas the DBpedia 3.9 ABox
contains more than 2 billion triples.</p>
      <p>To generate synthetic ABoxes, we started from each TBox and an empty
ABox, and added data and property instances of the classes/properties of the
corresponding TBox, making sure to include some invalid pairs of assertions as
well (taking into account the constraints). For the rst set of generated ABoxes
we created a xed number of invalid data assertions (10K) and a varying ABox
size (500K-5M triples, with a step of 500K triples). The second set of ABoxes
had a xed size (10M triples) and a varying number of invalid data assertions
(50K-500K, with a step of 50K). The above two sets of ABoxes were used in the
second and third set of experiments respectively.
The rst set of experiments aimed at verifying the scalability of our framework
in real-world settings, with ABoxes of billions of triples and with large numbers
of constraints (up to hundreds of thousands). For this purpose, we used DBpedia
versions 3.6 and 3.9 (TBox and real ABox). For each version, we measured the
diagnosis time (identifying invalidities and creating the interdependency graph),
the repair computation time (computing the repairing delta), the repair
application time (applying the repairing delta) and the total time (sum of the above).
We also measured the number of invalid data assertions that appear in the
datasets, to see how well our framework scales with respect to that, as well as
the size of the repairing delta, to verify that a manual repair by the curator
would be infeasible in this context.</p>
      <p>The results of this set of experiments are illustrated in Table 3. In the table,
IDA denotes the number of invalid data assertions, Delta is the size of the
repairing delta (in triples), td denotes the diagnosis time, tr:c: the repair computation
time, tr:a: the repair application time and tt the total time needed for diagnosis
and repairing. All times are in milliseconds.</p>
      <p>The results show that our framework is scalable, for both large datasets and
big numbers of invalid data assertions, and that it can be applied in real-world
settings. It also proves that already deployed and massively used reference KBs,
such as DBpedia, don't have su cient mechanisms for preventing the
introduction of invalid data or for detecting and repairing such invalid data. Moreover,
our experiments illustrate that the number of invalid data assertions and the
size of the repairing delta would be prohibitive for manual repairing.</p>
      <p>Our second set of experiments evaluated the e ect of ABox size on
performance using synthetic ABoxes of varying sizes and a xed number of invalid data
assertions. The results of this set of experiments appear in Figure 2. Note that
some of the curves in the graphs are di cult to distinguish, either because they
are too close to the start of the x-axis (e.g., the repair computation time and
the repair application time in the left gure), or because they are too close with
another curve (e.g., the diagnosis time and the total time in the right gure).
Fig. 2: Performance for DBpedia 3.6 (left) and 3.9 (right) with 10K invalidities.</p>
      <p>From the results of this set of experiments, we conclude that diagnosis time
grows linearly with respect to the ABox size and that it is the dominating factor
of the total time, when the number of invalid data assertions is xed. This is
an important conclusion because it shows that, overall, our framework scales
linearly with respect to the dataset size.</p>
      <p>The third set of experiments evaluated the e ect of the number of invalidities
using ABoxes of xed size, but with a varying number of invalid data assertions.
The results of this set of experiments are illustrated in Figure 3.</p>
      <p>From these results, we can conclude that the number of invalid data
assertions has no immediate impact on the diagnosis time. On the contrary, it is
the main impact factor of the repair computation time. That was an expected
behaviour, as the repair computation is done by computing the vertex cover of
the interdependency graph. A bigger number of invalid data assertions leads to
a bigger graph and this leads to a more costly computation of the vertex cover.</p>
      <p>Another signi cant impact factor of the repair computation time is the
amount of interdependencies in the interdependency graph. We can see that
the repair computation time increases with a higher rate in the left graph of
Figure 3 than in the right one, which can be explained by the fact that the
DBpedia 3.6 TBox contains only functional constraints, which form cliques in the
interdependency graph (thus, more interdependencies), whereas the DBpedia 3.9
TBox contains mainly disjointness constraints, which cause less
interdependencies, therefore less \touching" edges in the interdependency graph.</p>
      <p>Moreover, the repair application time seems to be negligible in all of the
experiments. This is due to the fact that the repair application is performed
by executing a single SPARQL-Update query requesting the deletion of all the
triples in the repairing delta, which is very e cient due to the optimizations for
batch operations of Virtuoso.</p>
      <p>Fig. 3: Performance for DBpedia 3.6 (left) and 3.9 (right) with 10M triples.</p>
      <p>The last signi cant conclusion comes from the comparison of the times
measured for the two di erent DBpedia TBox versions. We see that the diagnosis
times for version 3.9 are two orders of magnitude higher compared to the
respective times of version 3.6. This is due to the fact that the closure of version 3.9
contains 323.415 constraints, whereas the closure of version 3.6 only 18; more
constraints require the generation of more queries to be executed by the diagnosis
algorithm, eventually causing this big di erence in the measurements.</p>
      <p>The following main conclusions can be distilled from our evaluation:
{ Diagnosis can be performed in linear time with respect to the ABox size.
{ Repair computation can be performed in polynomial time with respect to
the number of invalid data assertions that appear in the dataset.
{ Our implementation enjoys a decent performance in real-world settings with
large datasets, numbers of constraints and invalidities, being able to repair
the huge DBpedia 3.9 (&gt;2B triples) in about 10 hours, which is a reasonable
amount of time, given that repairing is expected to be an o ine process.</p>
      <p>
        It should be noted that the experimental evaluation of the only other work
in the literature that performs automated repairing of inconsistent DL-LiteA
KBs ([
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] { see Section 6), only considers datasets of size up to 30.000 triples,
whereas we consider datasets of up to 5 orders of magnitude larger; thus, the
results are not comparable.
6
      </p>
      <sec id="sec-8-1">
        <title>Related Work</title>
        <p>
          The problem of inconsistencies appearing in KBs can be tackled either by
providing the ability to query inconsistent data and get consistent answers (Consistent
Query Answering - CQA) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], or by actually repairing the KB, which leads to a
consistent version of it [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Both these approaches have attracted researchers'
attention, mostly in the context of relational databases and, lately, in the context
of linked data and ontology languages as well.
        </p>
        <p>
          In the context of relational databases, CQA has been studied in various
works dealing with di erent classes of conjunctive queries and denial constraints,
mainly key constraints (e.g., [
          <xref ref-type="bibr" rid="ref13 ref23">13, 23</xref>
          ]). These works underline the main
advantages of using First-Order query rewriting for the validation of integrity
constraints. Note that CQA techniques systematically drop all information involved
in a constraint violation, whereas repairing techniques, like ours, make explicit
decisions on what to keep and what to drop, in accordance with the principles
set out in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], that require preserving as much information as possible.
        </p>
        <p>
          Di erent semantics have been studied for the repairing of inconsistent
relational databases, considering di erent kinds of constraints. For example, [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]
studied the problem of repairing by allowing only tuple deletions and, in this
way, resolving violations of denial constraints and inclusion dependencies, which
is a more expressive set of constraints than the one we consider in this work.
However, as proven in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], the unrestricted combination of those constraints leads
to intractability issues.
        </p>
        <p>
          In the context of linked data and the corresponding languages and
technologies, there has been research on the topic of using ontological languages to encode
integrity constraints (ICs) that must be checked over a dataset. In [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], the
authors present a way to integrate ICs in OWL and they show that IC validation
can be reduced to query answering, for integrity constraints that fall into the
SROI DL fragment. A similar approach has been followed in [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. In [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], the
presented approach integrates constraints that come from the relational world
(primary-key, foreign-key) into RDF and provides a way to validate these
constraints. IC validation is also an important part of some of the current OWL
reasoners, such as Stardog10. The above approaches address, essentially, only
10 http://stardog.com/
the KB satis ability problem and do not consider detection and repairing of
invalidities.
        </p>
        <p>
          In the eld of diagnosis for DL-Lite KBs, there has been some work regarding
inconsistency checking. The DL-LiteA reasoner QuOnto [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] has the ability
to check the satis ability of a DL-LiteA KB. However, it does not detect the
invalid data assertions in the ABox, neither repairs it. A problem very similar
to repairing (but in a di erent setting) is addressed in the context of DL-Lite
KB evolution (e.g., [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]), where the objective is to identify the minimal
set of assertions to remove in order to render a DL-Lite KB consistent during
evolution.
        </p>
        <p>
          Recently, there has also been some research on CQA for inconsistent
knowledge bases expressed in Description Logic languages, using query rewriting
techniques. For example, [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] deals with di erent variants of inconsistency-tolerant
semantics to reach a good compromise between expressive power of the semantics
and computational complexity of inconsistency-tolerant query answering.
        </p>
        <p>
          Finally, [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] is (to our knowledge) the only work addressing the automatic
repairing of an inconsistent DL-LiteA KB, and thus the closest to our work. It
is based on the inconsistency-tolerant semantics studied in [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] and resolves each
invalidity by removing both data assertions that take part in it. On the contrary,
our repairing algorithm considers the removal of only one of two involved data
assertions. Thus, [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] removes more information than necessary from the original
KB. In addition, the work of [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] has only been evaluated with datasets that are
unrealistically small (up to 30.000 triples).
7
        </p>
      </sec>
      <sec id="sec-8-2">
        <title>Conclusion and Future Work</title>
        <p>We presented a novel, fully automatic and modular diagnosis and repairing
framework, which can be used on top of already deployed datasets to assist
the curators in the task of enforcing the validity of logical integrity constraints,
taking into account logical inference, in order to maintaining their consistency.
Our experimental evaluation showed that our framework is scalable for large
dataset sizes, often found in real reference linked datasets such as DBpedia.</p>
        <p>As future work, we will try to improve the scalability properties of our
algorithms, possibly using a parallel implementation relying on the MapReduce
model. In addition, we will consider di erent models of interaction with the
curator, to allow him to in uence the repairing process (e.g., via user guidelines
or preferences) without being overwhelmed with the complete set of invalidities;
the ultimate goal is to develop an interactive repairing process that will
combine the quality of manual curation with the e ciency of automatic repairing.
Another possible extension is to experiment with more LOD datasets, and
provide a comprehensive study of the number and types of violations that exist in
di erent popular datasets.</p>
        <p>Acknowledgement. This work was partially supported by the EU project
DIACHRON (ICT-2011.4.3, #601043), and by the State Scholarships Foundation.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Acciarri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Giacomo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmieri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
          </string-name>
          , R.:
          <article-title>QUONTO: querying ontologies</article-title>
          .
          <source>In: AAAI. Volume</source>
          <volume>5</volume>
          . (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Afrati</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolaitis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Repair checking in inconsistent databases: algorithms and complexity</article-title>
          . In: ICDT. (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bertossi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chomicki</surname>
          </string-name>
          , J.:
          <article-title>Consistent query answers in inconsistent databases</article-title>
          .
          <source>In: PODS</source>
          . (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Giacomo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poggi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
          </string-name>
          , R.:
          <article-title>Linking data to ontologies: The Description Logic DL-LiteA</article-title>
          . In: OWLED. (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Giacomo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Tractable reasoning and e cient query answering in Description Logics: The DL-Lite family</article-title>
          .
          <source>Journal of Automated Reasoning</source>
          .
          <volume>39</volume>
          (
          <issue>3</issue>
          ). (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Giacomo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poggi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>RodriguezMuro</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
          </string-name>
          , R.:
          <article-title>Ontologies and databases: The DL-Lite approach</article-title>
          .
          <source>In: Reasoning Web</source>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kharlamov</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nutt</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheleznyakov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Evolution of DL-Lite Knowledge Bases</article-title>
          . In: ISWC. (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Chomicki</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marcinkowski</surname>
          </string-name>
          , J.:
          <article-title>On the computational complexity of minimalchange integrity maintenance in relational databases</article-title>
          .
          <source>Inconsist. Toler</source>
          . (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Chomicki</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marcinkowski</surname>
          </string-name>
          , J.:
          <article-title>Minimal-change integrity maintenance using tuple deletions</article-title>
          .
          <source>Information and Computation</source>
          .
          <volume>197</volume>
          (
          <issue>1</issue>
          ). (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Deutsch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>FOL modeling of integrity constraints (dependencies)</article-title>
          .
          <source>In: Encyclopedia of Database Systems</source>
          . Springer. (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Garey</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johnson</surname>
          </string-name>
          , D.:
          <article-title>Computers and intractability</article-title>
          . Vol.
          <volume>174</volume>
          .
          <string-name>
            <surname>Freeman</surname>
          </string-name>
          . (
          <year>1979</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Greco</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Greco</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zumpano</surname>
          </string-name>
          , E.:
          <article-title>A logical framework for querying and repairing inconsistent databases</article-title>
          .
          <source>IEEE TKDE</source>
          .
          <volume>15</volume>
          (
          <issue>6</issue>
          ). (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Grieco</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruzzi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Consistent query answering under key and exclusion dependencies: Algorithms and experiments</article-title>
          . In: CIKM. (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Karakostas</surname>
          </string-name>
          , G.:
          <article-title>A better approximation ratio for the vertex cover problem</article-title>
          .
          <source>In: Automata, Languages and Programming</source>
          . (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Kontokostas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Westphal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cornelissen</surname>
          </string-name>
          , R.:
          <article-title>Databugger: a test-driven framework for debugging the web of data</article-title>
          . In:
          <string-name>
            <surname>WWW (Companion</surname>
            <given-names>Volume).</given-names>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Lausen</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meier</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>SPARQLing constraints for RDF</article-title>
          . In: EDBT. (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruzzi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Savo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Query rewriting for inconsistent DL-Lite ontologies</article-title>
          .
          <source>In: Web Reasoning and Rule Systems</source>
          . (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Masotti</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruzzi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Practical ABox cleaning in DL-Lite (progress report)</article-title>
          .
          <source>In: Description Logics</source>
          . (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>Bridging the gap between OWL and relational databases</article-title>
          .
          <source>In: WWW</source>
          . (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Papadimitriou</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steiglitz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Combinatorial optimization: algorithms and complexity</article-title>
          .
          <source>Courier Dover Publications</source>
          . (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Qi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhuang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Approximating model-based ABox revision in DL-Lite: Theory and practice</article-title>
          . In: AAAI. (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Tao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sirin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Integrity constraints in OWL</article-title>
          . In: AAAI. (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Wijsen</surname>
          </string-name>
          , J.:
          <article-title>Consistent query answering under primary keys: a characterization of tractable queries</article-title>
          .
          <source>In: ICDT</source>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>