=Paper=
{{Paper
|id=Vol-2980/paper383
|storemode=property
|title=Explanations for
              Non-validation in SHACL
|pdfUrl=https://ceur-ws.org/Vol-2980/paper383.pdf
|volume=Vol-2980
|authors=Shqiponja Ahmetaj, Robert David, Magdalena Ortiz, Axel Polleres, Bojken Shehu, Mantas Simkus
|dblpUrl=https://dblp.org/rec/conf/semweb/AhmetajDOPSS21
}}
==Explanations for
              Non-validation in SHACL==
<pdf width="1500px">https://ceur-ws.org/Vol-2980/paper383.pdf</pdf>
<pre>
    Explanations for Non-validation in SHACL?

  Shqiponja Ahmetaj2 , Robert David2,4 , Magdalena Ortiz1 , Axel Polleres2,3 ,
                  Bojken Shehu5 , and Mantas Šimkus1
                          1
                             Technical University of Vienna
                  2
                      Vienna University of Economics and Business
                      3
                        Complexity Science Hub Vienna, Austria
                              4
                                Semantic Web Company
                          5
                            Polytechnic University of Tirana

    The Shape Constraint Language (SHACL) is a recently standardized lan-
guage for expressing constraints on RDF graphs. It is the result of industrial and
academic efforts to provide solutions for checking the quality of RDF graphs and
for declaratively describing (parts of) their structure. We recommend [9] for an
introduction to SHACL and its close relative ShEx. Among other, the SHACL
standard provides a syntax for writing down constraints, as well as describes the
way RDF graphs should be validated w.r.t. to a given set of SHACL constraints.
However, some aspects of validation were not completely specified in the stan-
dard, like the semantics of validation for constraints with cyclic dependencies.
To address these shortcomings, several formalizations of SHACL grounded on
logic-based languages with clear semantics have recently emerged [7,2,11].
    In SHACL, the basic computational problem is to check whether a given
RDF graph G validates a SHACL document (C, T ), where C is a specification
of validation rules (constraints) and T is a specification of nodes to which the
validation rules should apply (targets). In order to make SHACL truly useful and
widely accepted, we need automated tools that implement not only validation,
which results in “yes” or “no” answers, but also support the users in their efforts
to understand the reasons why a given graph validates or not against a given
document. The SHACL specification stresses the importance of explaining vali-
dation outcomes and introduces the notion of validation reports for this purpose.
If a graph validates a document, the standard has clear guidance how the valida-
tion reports should look like. However, the situation is different when the graph
does not validate. The principles of validation reports in case of non-validation
are left largely open in the standard, which specifies little beyond requiring that
the node and constraint violated are indicated. It is not hard to see that, in
general, there may be a very large number of possible reasons for a specific vali-
dation target to fail, and it is far from obvious what should be presented to the
user in validation reports. This is precisely the topic of our study.

  Copyright c 2021 for this paper by its authors. Use permitted under Creative Com-
  mons License Attribution 4.0 International (CC BY 4.0).
?
  Partially supported by the Vienna Business Agency and the Austrian Science Fund
  (FWF) projects P30360 and P30873. Axel Polleres’ work is supported by funding in
  the European Commission’s Horizon 2020 Research Programme under Grant Agree-
  ment Number 957402.
2       Ahmetaj et al.

    We present here an extended abstract of [1] and additionally introduce a pro-
totype system to automatically generate explanations. We advocate explanations
in the style of database repairs [3] as one concrete way to provide explanations
for the non-validation of SHACL constraints. This approach is closely related
to abductive reasoning, model-based diagnosis and counterfactuals, which have
received very significant attention in last decades and have been applied to a
range of similar problems requiring explanatory services (see, e.g., [8,12,5,6]).
Our main goal is to formalize the notion of explanations for SHACL, to define a
collection of reasoning tasks for exploring explanations, and to characterize their
computational complexity. The main contributions of [1] are as follows:
    ◦ To explain non-validation of a SHACL document (C, T ) by an RDF graph
G, we introduce the notion of a SHACL Explanation Problem (SEP). A solu-
tion to a SEP is a pair (A, D), where A and D are sets of facts to be added
and removed from G, respectively, so that the resulting graph does validate the
document (C, T ). We consider natural preference orders over explanations, and
study also explanations that are minimal w.r.t. set inclusion or w.r.t. cardinality.
    ◦ We define a collection of inference services, which are reminiscent of basic
reasoning problems in logic-based abduction [8], and study both combined and
data complexity of these tasks. We then turn our attention to non-recursive
SHACL constraints and show that, in general, reasoning does not become easier.
    ◦ Finally, we define SEPs with restricted explanation signatures, which allow,
e.g., to specify that some classes and properties are read-only. We study the
complexity of the problems for this generalized version of SEPs.


1   SHACL Validation

Let N, C, and P denote countably infinite, mutually disjoint sets of nodes, class
names, and property names, respectively. A data graph G (RDF graph) is a finite
set of atoms of the form B(c) and p(c, d), where B ∈ C, p ∈ P, and c, d ∈ N. The
set of nodes appearing in G is denoted with V (G). We assume a countably infinite
set S of shape names, disjoint from N∪C∪P. A shape atom is an expression of the
form s(a), where s ∈ S and a ∈ N. A path expression E is a regular expression
built using the usual operators ∗, ·, ∪ from symbols in P+ = P ∪ {p− | p ∈ P}. If
p ∈ P, then p− is the inverse property of p. A (complex) shape is an expression
φ obeying the syntax: φ, φ0 ::= > | s | B | c | φ ∧ φ0 | ¬φ |≥n E.φ | E = E 0 , where
s ∈ S, B ∈ C, c ∈ N, n is a positive integer, and E, E 0 are path expressions.
A (shape) constraint is an expression s ↔ φ where s ∈ S and φ is a possibly
complex shape. In SHACL, targets are used to prescribe that certain nodes of
the input data graph should validate certain shapes. W.l.o.g. we view targets as
shape atoms of the form s(a), where s ∈ S and a ∈ N. Ashape document is a
pair (C, T ), where (i) C is a set of constraints and (ii) T is a set of targets. The
evaluation of a shape expression φ is given by assigning nodes of the data graph
to (possibly multiple) shape names. A (shape) assignment for a data graph G is
a set I = G ∪ L, where L is a set of shape atoms such that a ∈ V (G) for each
s(a) ∈ I. The evaluation of a (complex) shape w.r.t. an assignment I is given in
                                Explanations for Non-validation in SHACL        3

terms of a function that maps a (complex) shape expression φ to a sets of nodes,
and a path expression E to a set of pairs of nodes. We refer to [1] for details
on the evaluation of the various operators in complex shapes. For validation we
consider the semantics proposed in [7]. An assignment I for G and a document
(C, T ) is a (supported) model of C if JφKI = sI for all s ↔ φ ∈ C. The data
graph G validates (C, T ) if there exists an assignment I = G ∪ L for G such that
(i) I is a model of C, and (ii) T ⊆ L.


2   Explaining Non-Validation in SHACL
In this section, we formalize the notion of explanations for non-validation of a
SHACL document by a data graph, illustrate it with an example, and present
some complexity results.
Definition 1. Let G be a data graph, let (C, T ) be a SHACL document, and let
the set of hypotheses H be a data graph disjoint from G. Then Ψ = (G, C, T, H)
is a SHACL Explanation Problem (SEP). An explanation for Ψ is a pair (A, D),
such that (a) D ⊆ G, A ⊆ H, and (b) (G \ D) ∪ A validates (C, T ).
Example 1. Consider a SEP Ψ = (G, C, T, H), where C contains the constraints
Teacher ↔ ∃teaches.> and Student ↔ ∃enrolledIn.Course ∧ ¬Teacher, T =
{Student(Ben), Teacher(Ann)}, H = {Course(C1 ), Course(C2 )}, and G contains
enrolledIn(Ben, C1 ), teaches(Ann, Ben), teaches(Ben, Ben), teaches(Ann, Li).
The constraints state that each Teacher must teach someone, and each Student
must be enrolled in some course and must not comply with the shape Teacher.
Note that Teacher and Student are shape names, enrolledIn is a property name,
and Course is a class name. The data graph G validates (C, {Teacher(Ann)}),
but does not validate (C, T ). A possible explanation for non-validation is that
G is missing the fact that C1 is a Course; it also contains the possibly erroneous
fact that teaches(Ben, Ben). Thus, validation is ensured by repairing G with the
explanation (A, D), where A = {Course(C1 )} and D = {teaches(Ben, Ben)}.
We consider preference relations over explanations, given by a reflexive and tran-
sitive relation  on the set of explanations. We consider two typical preference
orders: subset-minimal (⊆), and cardinality-minimal (≤) explanations.
Definition 2. Let Ψ = (G, C, T, H) be a SEP, let A, D be data graphs, let α be
an atom in G ∪ H, and let  be a preorder. We define six decision problems: 1)
-IsExpl checks whether (A, D) is a -explanation for Ψ , 2) -Exist checks
whether there exists a -explanation for Ψ 3) -NecAdd and 4) -NecDel
check whether α occurs in A or D, respectively, in every -explanation (A, D)
for Ψ , 5) -RelAdd and 6) -RelDel check whether α occurs in A or D,
respectively, in some -explanation (A, D) for Ψ .
We present the results for recursive SHACL and refer to [1] for the non-recursive
fragment and for SEPs with restricted explanation signature. We omit  from
the name of decision problems when  is empty, and write () when considering
the variants with and without . We use  as a place holder for both ⊆ and ≤.
4        Ahmetaj et al.

Theorem 1. The following results are true for both data and combined com-
plexity for SHACL with recursive constraints:

    – IsExpl, ()-Exist, RelAdd, RelDel are NP-complete,
    – -IsExpl is DP-complete,
    – (⊆)-NecAdd and (⊆)-NecDel are coNP-complete,
    – ≤-NecAdd, ≤-NecDel, ≤-RelAdd, ≤-RelDel are PkNP -complete,
    – ⊆-RelAdd, ⊆-RelDel are Σ2P -complete.


3     A Prototype Implementation of Explanations

To compute minimal explanations, we adapt an approach from databases and
logic programming (see [4] for details) that computes minimal repairs for Datalog
programs with negation. It was shown in [2] that the validation problem can be
encoded as an answer-set program (ASP) which computes a model to represent
validation. In case of non-validation, the idea is to add rules to advice additions
and removals of facts from the input graph in a way that the repaired data
graph conforms to the constraints. These suggestions come as true ta to advice
addition of a fact and as false fa to advice deletion of a fact. We use t∗ to
state that an atom is or becomes true, and t∗∗ to state that an atom is true in
the repair. Multiple suggestions can interact and can possibly get into conflict,
which is then resolved by the repair rules. In a nutshell, we rewrite a SEP as an
ASP program, whose models will identify minimal repairs of the input SEP. We
illustrate the idea with an example.

Example 2. Consider a SEP (G, C, T, H), where G = {enrollIn(Ben, C1 )}, C =
{Student ↔ ∃enrollIn.Course}, T = {Student(Ben)}, and H = {Course(C1 )}.
The repair program consists of the following rules and facts.
    First, it contains enrollIn(Ben, Y ) from G, and also the fact Student(Ben, t∗ ).
Second, it contains rules of the form enrollIn (X, Y, t∗ ) ← enrollIn (X, Y, ta ) ∨
enrollIn(X, Y ) that capture the intended meaning of annotations (similarly also
for Course). Roughly, they say that if an atom is true in the data or is advised to
be true, then it should be annotated with t∗ . Note that we use fresh predicates
enrollIn , and Course , which extend the arity of the original ones with 1. Third, it
contains the repair rules: enrollIn (X, Y, ta ) ← Student (X, t∗ ) ∧ ¬enrollIn(X, Y ),
and Course (Y, ta ) ← Student (X, t∗ )∧enrollIn (X, Y, t∗ )∧¬Course(X). The bod-
ies of these rules capture the possible ways to invalidate the target atom, and
the heads capture the possible ways to restore validation by advising to add
or remove atoms from the data. For simplicity, we discard in this example the
irrelevant atoms with fa . Finally, we have the interpretation rules of the form
enrollIn (X, Y, t∗∗ ) ← enrollIn (X, Y, t∗ ) (similarly for Course ), which add at the
end the atoms that should be true in the repair. The model of this program con-
tains Course (C1 , ta ), Course (C1 , t∗ ), Course (C1 , t∗∗ ), enrolledIn (Ben, C1 , t∗ ),
and enrolledIn (Ben, C1 , t∗∗ ). The annotations state that both enrollIn(Ben, C1 )
and Course(C1 ) should be in the repaired data graph.
                                   Explanations for Non-validation in SHACL            5

Implementation details The implementation is done by a Java program that
produces logic programs for Clingo based on an approach described in [10]. In
a nutshell, the process is done in three steps: 1) We convert RDF triples and
SHACL constraints into an ASP program that runs the SHACL validation and
produces the validation report as minimal models of the program. 2) In case of
non-validation, we add the annotated repair rules described above. Thus, the
program is extended to produce the repairs in addition to the validation report.
3) The extended logic program is then solved using Clingo and produces the
repair proposals from the stable models of the program. These repairs minimally
change the input data graph through additions and deletions of facts.

References
 1. Ahmetaj, S., David, R., Ortiz, M., Polleres, A., Shehu, B., Šimkus, M.: Reasoning
    about explanations for non-validation in SHACL. In: Proceedings of the 18th In-
    ternational Conference on Principles of Knowledge Representation and Reasoning,
    KR 2021. To appear.
 2. Andresel, M., Corman, J., Ortiz, M., Reutter, J.L., Savkovic, O., Šimkus, M.: Sta-
    ble model semantics for recursive SHACL. In: Proc. of The Web Conference 2020.
    p. 1570–1580. WWW ’20, ACM (2020). https://doi.org/10.1145/3366423.3380229
 3. Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent query answers in in-
    consistent databases. In: Proc. of PODS. pp. 68–79. ACM Press (1999).
    https://doi.org/10.1145/303976.303983
 4. Bertossi, L.E.: Database Repairing and Consistent Query Answering. Synthe-
    sis Lectures on Data Management, Morgan & Claypool Publishers (2011).
    https://doi.org/10.2200/S00379ED1V01Y201108DTM020
 5. Calvanese, D., Ortiz, M., Simkus, M., Stefanoni, G.: Reasoning about explanations
    for negative query answers in DL-Lite. J. Artif. Intell. Res. 48, 635–669 (2013).
    https://doi.org/10.1613/jair.3870
 6. Ceylan, İ.İ., Lukasiewicz, T., Malizia, E., Molinaro, C., Vaicenavicius, A.: Expla-
    nations for negative query answers under existential rules. In: Proc. of KR 2020.
    pp. 223–232 (2020). https://doi.org/10.24963/kr.2020/23
 7. Corman, J., Reutter, J.L., Savkovic, O.: Semantics and validation of recursive
    SHACL. In: Proc. of ISWC’18. Springer (2018). https://doi.org/10.1007/978-3-030-
    00671-6 19
 8. Eiter, T., Gottlob, G.: The complexity of logic-based abduction. J. ACM 42(1),
    3–42 (1995). https://doi.org/10.1145/200836.200838
 9. Gayo, J.E.L., Prud’hommeaux, E., Boneva, I., Kontokostas, D.: Validating RDF
    Data. Synthesis Lectures on the Semantic Web: Theory and Technology (2017).
    https://doi.org/10.2200/S00786ED1V01Y201707WBE016
10. Labra-Gayo, J.E., Garcı́a-González, H., Fernández-Alvarez, D., Prud’hommeaux,
    E.: Challenges in rdf validation. In: Current Trends in Semantic Web Technologies:
    Theory and Practice, pp. 121–151. Springer (2019)
11. Leinberger, M., Seifer, P., Rienstra, T., Lämmel, R., Staab, S.: Deciding SHACL
    shape containment through description logics reasoning. In: Proc. of ISWC 2020.
    Lecture Notes in Computer Science, vol. 12506, pp. 366–383. Springer (2020).
    https://doi.org/10.1007/978-3-030-62419-4 21
12. Van Harmelen, F., Lifschitz, V., Porter, B.: Handbook of knowledge representation.
    Elsevier (2008)

</pre>