Reasoning about Explanations for
Non-validation in SHACL (Extended Abstract)?

   Shqiponja Ahmetaj2 , Robert David4 , Magdalena Ortiz1 , Axel Polleres2,3 ,
                   Bojken Shehu5 , and Mantas Šimkus1
                          1
                             Technical University of Vienna
                  2
                      Vienna University of Economics and Business
                      3
                        Complexity Science Hub Vienna, Austria
                              4
                                Semantic Web Company
                          5
                            Polytechnic University of Tirana

    The Shape Constraint Language (SHACL) is a recently standardized lan-
guage for expressing constraints on RDF graphs. It is the result of industrial and
academic efforts to provide solutions for checking the quality of RDF graphs and
for declaratively describing (parts of) their structure. We recommend [9] for an
introduction to SHACL and its close relative ShEx. Among others, the SHACL
standard provides a syntax for writing down constraints, as well as describes the
way RDF graphs should be validated w.r.t. a given set of SHACL constraints.
However, some aspects of validation were not completely specified in the stan-
dard, like the semantics of validation for constraints with cyclic dependencies.
To address these shortcomings, recently several formalizations of SHACL have
emerged, which describe it in logic-based languages with clear semantics. E.g.,
some works resort to first-order logic [6], while some use logic programming [2].
SHACL is closely related to expressive Description Logics (DLs). Such connec-
tions have already been observed in [10], where the authors reduce implication of
SHACL constraints to concept subsumption in DLs. The key difference between
SHACL and DLs is that SHACL makes the closed-world assumption (CWA),
while DLs use the open-world assumption (OWA). To understand the difference,
RDF graphs equipped with SHACL constraints can be thought of as DL knowl-
edge bases in which all roles and some concept names are closed predicates in the
context of DLs with closed predicates (see [8, 11]), i.e., where only some concept
names are allowed to be non-closed predicates.
    In SHACL, the basic computational problem is to check whether a given
RDF graph G validates a SHACL document (C, T ), where C is a specification
of validation rules (constraints) and T is a specification of nodes to which the
validation rules should apply (targets). In order to make SHACL truly useful and
widely accepted, we need automated tools that implement not only validation,
which results in “yes” or “no” answers, but also support the users in their efforts
  Copyright c 2021 for this paper by its authors. Use permitted under Creative Com-
  mons License Attribution 4.0 International (CC BY 4.0).
?
  Partially supported by the Vienna Business Agency and the Austrian Science Fund
  (FWF) projects P30360 and P30873. Axel Polleres’ work is supported by funding in
  the European Commission’s Horizon 2020 Research Programme under Grant Agree-
  ment Number 957402.
2        Ahmetaj et al.

to understand the reasons why a given graph validates or not against a given
document. The SHACL specification stresses the importance of explaining vali-
dation outcomes and introduces the notion of validation reports for this purpose.
If a graph validates a document, the standard has clear guidance how the valida-
tion reports should look like. However, the situation is different when the graph
does not validate. The principles of validation reports in case of non-validation
are left largely open in the standard, which specifies little beyond requiring that
the node and constraint violated are indicated. It is not hard to see that, in
general, there may be a very large number of possible reasons for a specific vali-
dation target to fail, and it is far from obvious what should be presented to the
user in validation reports. This is precisely the topic of our study.
    In this work6 , we advocate explanations in the style of database repairs [3]
as one concrete way to provide explanations for the non-validation of SHACL
constraints. This approach is closely related to subareas of KR&R like abductive
reasoning, model-based diagnosis and counterfactuals, which have received very
significant attention in the last decades and applied to a range of similar problems
requiring explanatory services (see, e.g., [7, 12, 4, 5]).
    The main goal of this work is to formalize the notion of explanations for
SHACL, to define a collection of reasoning tasks for exploring explanations, and
to characterize their computational complexity. In a nutshell, the contributions
of this paper are as follows:
   ◦ To explain non-validation of a SHACL document (C, T ) by an RDF graph
G, we introduce the notion of a SHACL Explanation Problem (SEP). A solution
to a SEP is a pair (A, D) that describes a collection A of facts to be added to
G and a collection D of facts to be deleted from G, so that the resulting graph
does validate the document (C, T ). We consider natural preference orders over
explanations, and study also explanations that are minimal w.r.t. set inclusion
or w.r.t. cardinality. We illustrate the use of explanations with some examples.
    ◦ We define a collection of inference services for reasoning about explanations
for non-validation. We start with the basic tasks of recognizing whether a given
candidate is indeed a (preferred) explanation, and deciding whether a (preferred)
explanation exists. We also define the problems of checking whether a given atom
is relevant (resp., necessary) as an addition or as a deletion in some explanation
(resp., all explanations). These tasks are reminiscent of basic reasoning problems
in logic-based abduction [7].
   ◦ We study the computational complexity of the introduced reasoning tasks
and characterize both combined and data complexity. Our results range from
tractability to completeness for the second level of the polynomial hierarchy.
   ◦ After studying the general setting, we turn our attention to non-recursive
SHACL constraints. We show that with one exception, reasoning about expla-
nations in the presence of non-recursive constraints does not become easier in
terms of computational complexity. The exception is the problem of recognizing
an explanation, which becomes tractable in the absence of a preference order.
6
    This is an extended abstract of [1].
                                    Title Suppressed Due to Excessive Length           3

As a side result we show that SHACL validation in the presence of non-recursive
constraints is P-complete.
    ◦ Finally we consider a generalization of SEPs with restricted explanation
signatures. This useful feature allows, e.g., to specify that some classes and prop-
erties are read-only, prohibiting deletions from them during explanations. Also
for this setting, we establish a collection of complexity results, including the case
of non-recursive constraints.


References
 1. Ahmetaj, S., David, R., Ortiz, M., Polleres, A., Shehu, B., Šimkus, M.: Reasoning
    about explanations for non-validation in SHACL. In: Proceedings of the 18th In-
    ternational Conference on Principles of Knowledge Representation and Reasoning,
    KR 2021. To appear.
 2. Andresel, M., Corman, J., Ortiz, M., Reutter, J.L., Savkovic, O., Šimkus, M.: Sta-
    ble model semantics for recursive SHACL. In: Proc. of The Web Conference 2020.
    p. 1570–1580. WWW ’20, ACM (2020). https://doi.org/10.1145/3366423.3380229,
    https://doi.org/10.1145/3366423.3380229
 3. Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent query answers in in-
    consistent databases. In: Proc. of PODS. pp. 68–79. ACM Press (1999).
    https://doi.org/10.1145/303976.303983, https://doi.org/10.1145/303976.303983
 4. Calvanese, D., Ortiz, M., Simkus, M., Stefanoni, G.: Reasoning about explanations
    for negative query answers in DL-Lite. J. Artif. Intell. Res. 48, 635–669 (2013).
    https://doi.org/10.1613/jair.3870, https://doi.org/10.1613/jair.3870
 5. Ceylan, İ.İ., Lukasiewicz, T., Malizia, E., Molinaro, C., Vaicenavicius,
    A.: Explanations for negative query answers under existential rules. In:
    Proc. of KR 2020. pp. 223–232 (2020). https://doi.org/10.24963/kr.2020/23,
    https://doi.org/10.24963/kr.2020/23
 6. Corman, J., Reutter, J.L., Savkovic, O.: Semantics and validation of recursive
    SHACL. In: Proc. of ISWC’18. Springer (2018). https://doi.org/10.1007/978-3-030-
    00671-6 19, https://doi.org/10.1007/978-3-030-00671-6 19
 7. Eiter, T., Gottlob, G.: The complexity of logic-based abduction.
    J.   ACM       42(1),    3–42     (1995).     https://doi.org/10.1145/200836.200838,
    https://doi.org/10.1145/200836.200838
 8. Franconi,      E.,    Ibáñez-Garcı́a,    Y.A.,       Seylan,   I.:    Query     an-
    swering     with    DBoxes       is     hard.      Electr.   Notes     Theor.   Com-
    put.         Sci.       (2011).          https://doi.org/10.1016/j.entcs.2011.10.007,
    http://dx.doi.org/10.1016/j.entcs.2011.10.007
 9. Gayo,      J.E.L.,    Prud’hommeaux,           E.,     Boneva,     I.,   Kontokostas,
    D.:    Validating    RDF       Data.     Synthesis     Lectures    on   the   Seman-
    tic Web: Theory and Technology, Morgan & Claypool Publish-
    ers       (2017).       https://doi.org/10.2200/S00786ED1V01Y201707WBE016,
    https://doi.org/10.2200/S00786ED1V01Y201707WBE016
10. Leinberger, M., Seifer, P., Rienstra, T., Lämmel, R., Staab, S.: Decid-
    ing SHACL shape containment through description logics reasoning. In:
    Proc. of ISWC 2020. Lecture Notes in Computer Science, vol. 12506,
    pp. 366–383. Springer (2020). https://doi.org/10.1007/978-3-030-62419-4 21,
    https://doi.org/10.1007/978-3-030-62419-4 21
4       Ahmetaj et al.

11. Lutz, C., Seylan, I., Wolter, F.: Ontology-based data access with closed
    predicates is inherently intractable(sometimes). IJCAI/AAAI (2013),
    http://www.aaai.org/ocs/index.php/IJCAI/IJCAI13/paper/view/6870
12. Van Harmelen, F., Lifschitz, V., Porter, B.: Handbook of knowledge representation.
    Elsevier (2008)