=Paper=
{{Paper
|id=Vol-3203/paper1
|storemode=property
|title=Well-founded Semantics for Recursive SHACL
|pdfUrl=https://ceur-ws.org/Vol-3203/paper1.pdf
|volume=Vol-3203
|authors=Adrian Chmurovic,Mantas Simkus
|dblpUrl=https://dblp.org/rec/conf/datalog/ChmurovicS22
}}
==Well-founded Semantics for Recursive SHACL==
<pdf width="1500px">https://ceur-ws.org/Vol-3203/paper1.pdf</pdf>
<pre>
Well-founded Semantics for Recursive SHACL
Adrian Chmurovič1 , Mantas Šimkus1,2
1
    Faculty of Informatics, TU Wien, Austria
2
    Department of Computing Science, Umeå University, Sweden


                                         Abstract
                                         W3C has recently introduced SHACL as a new standard for integrity constraints on RDF graphs. Unfor-
                                         tunately, the standard defines the semantics of non-recursive constraints only, which has spurred recent
                                         research efforts into finding a suitable, mathematically crisp semantics for constraints with cyclic depen-
                                         dencies. To this end, Corman et al. [1] introduced a semantics related to supported models known in logic
                                         programming, while Andreşel et al. [2] presented a semantics based on stable models known in Answer
                                         Set Programming (ASP). In this paper, we argue that recursive SHACL can be naturally equipped with a
                                         semantics inspired in the well-founded semantics for (recursive) DATALOG queries with default negation.
                                         This semantics is not only intuitive, but it is also computationally tractable, unlike the previous proposals.
                                         To draw a connection with the classic definition of well-founded semantics in logic programming, we
                                         provide a simple translation of recursive SHACL under the well-founded semantics into propositional
                                         logic programs under the well-founded semantics. This translation is not only of theoretical interest:
                                         It can be efficiently implemented using the functionality of standard RDF triple stores (specifically, by
                                         issuing a small number of SPARQL queries), followed by an efficient evaluation using an existing engine
                                         for computing the well-founded model of a propositional logic program.

                                         Keywords
                                         graph-structured data, integrity constraints, SHACL, RDF, well-founded semantics, answer set program-
                                         ming


1. Introduction
Graph-structured data is becoming increasingly prominent and popular, mainly because it does
not require the development of rigid database schemas. This is important in emerging scenarios
where data has complex structure, where it may be highly incomplete, or where its structure
evolves over time, which makes the development of a fixed database schema challenging. This
has spurred research into topics like Knowledge Graphs, Property Graphs, Graph Databases, RDF
graphs, etc. The flexible nature of graph-structured data does not mean we are uninterested in
the quality and the structural integrity of the stored data. In traditional relational databases, the
so-called integrity constraints provide means to enforce a strict structure and maintain quality of
stored information, and we would like to have similar tools for graph-structure data. However,
this is not easy: we need to strike a good balance between the power to assert some control over
the structure of data, and being relaxed enough to not negate the benefits of the graph-based
data model.

Datalog 2.0 2022: 4th International Workshop on the Resurgence of Datalog in Academia and Industry, September 05,
2022, Genova - Nervi, Italy
$ simkus@cs.umu.se (M. Šimkus)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                           2
Adrian Chmurovič et al. CEUR Workshop Proceedings                                               2–13


   To address the above challenge, W3C has recently introduced SHACL1 as a new standard
for expressing constraints on RDF graphs (we refer the reader to [3] for an excellent tutorial
on SHACL). Constraints in SHACL are specified using validation rules, which have features
reminiscent of logic programming and concept expressions expressed in Description Logics.
Unfortunately, the SHACL standard only defines the semantics of non-recursive SHACL con-
straints, while the semantics of SHACL with cyclic dependencies, or recursion, was intentionally
left unspecified: it is currently up to the developers of SHACL validators to come up with
ad-hoc solutions to handle recursion. This has spurred recent research efforts into finding a
suitable, mathematically crisp semantics for recursive SHACL constraints. Corman et al. have
introduced a semantics that is related to the notion of supported models known in logic programs,
while Andreşel et al. have explored a semantics based on stable models known in Answer Set
Programming (ASP) [1, 2]. The supported model semantics of Corman et al. has two weaknesses:
(a) the problem of deciding validity is intractable, (b) it allows for unfounded (self-supported)
justifications of inferences, which sometimes leads to counter-intuitive validation results. The
stable model semantics does not suffer from (b) but still has problem (a): the intractability (specif-
ically, NP-hardness) of validation in the two semantics is caused by the possible simultaneous
presence of recursion and negation in constraints. NP-hardness already applies when the size of
input constraints is assumed to be fixed by a constant (i.e., in data complexity), which becomes
a major challenge for implementing validators for recursive SHACL as RDF graphs in practice
are often very big. Interestingly, in the case of the supported model semantics, NP-hardness
holds already for constraints with stratified negation. In this paper, we argue that recursive
SHACL can be naturally equipped with a semantics inspired in the well-founded semantics for
logic programs, a semantics that also plays a major role in (recursive) DATALOG queries with
default negation. This semantics is not only intuitive (e.g., it avoids problem (b)), but it is also
computationally tractable.
   The rest of this paper is organized as follows:
       • We conclude this introduction with a rather detailed motivating example to illustrate the
         basic ideas and the challenge addressed in this paper.
       • In Section 2, we recall the well-founded semantics for propositional logic programs based
         on the notion of unfounded sets, and provide an example on how the (unique) well-founded
         model of a program can be computed.
       • In Section 3, we recall the (abstract) syntax of SHACL and recall the (2-valued) supported
         model semantics of SHACL. This most basic semantics is illustrated with an example that
         shows an undesired self-justification cycle.
       • In Section 4, we present our well-founded semantics for SHACL, based on a notion of
         unfounded sets (adapted from the notion used for logic programs).
       • In Section 5, we describe a transformation that allows to reduce the validation problem
         to the task of computing the well-founded model of a propositional logic program. This
         transformation can be implemented efficiently and leads to a way of performing SHACL
         validation under the new semantics using some of the existing deductive database engines.
       • In Section 6, we conclude this paper, including a discussion of some of the challenges and
         future directions in the research on SHACL.
1
    https://www.w3.org/TR/shacl/


                                                  3
Adrian Chmurovič et al. CEUR Workshop Proceedings                                                   2–13


                                                                                          𝗁𝖺𝗌𝖲𝗎𝗉𝖾𝗋𝗂𝗈𝗋
                        𝗁𝖺𝗌𝖲𝗎𝗉𝖾𝗋𝗂𝗈𝗋            𝗂𝗌𝖬𝖾𝗇𝗍𝗈𝗋𝖮𝖿
         𝖤𝗆𝗉𝗅𝗈𝗒𝖾𝖾                                             𝖲𝗍𝗎𝖽𝖾𝗇𝗍
                                                                                   Drew
                 Alex
                                      Blake             Cameron

                                                                          𝗂𝗌𝖬𝖾𝗇𝗍𝗈𝗋𝖮𝖿
    𝗂𝗌𝖬𝖾𝗇𝗍𝗈𝗋𝖮𝖿


         ProfessorShape ← EmployeeShape ∧ (∃isMentorOf.StudentShape)                      (𝑃 )
         EmployeeShape ← Employee ∨ ∃hasSuperior.EmployeeShape                            (𝐸)
           StudentShape ← Student ∨ (EmployeeShape ∧ ¬ProfessorShape)                     (𝑆)

Figure 1: Example knowledge graph and SHACL constraints


Example 1. Consider the example knowledge graph (KG) in Figure 1, containing some information
about members of an educational institution. The KG mentions four persons called Alex, Blake,
Cameron and Drew, which correspond to the four nodes of the KG. Alex is (explicitly) stated to
be an employee in this institution, which is indicated by their membership in the class Employee.
Cameron is a (regular) student in this institution, which is indicated by their membership in the
class Student. Blake is a mentor of Cameron, which is indicated via the edge between them labeled
with the property isMentorOf. Blake has Alex as a superior in this institution, captured via an
edge labeled with hasSuperior. Furthermore, Cameron has Drew as another mentor. Finally, the
KG has two facts that are quite questionable: Alex is a mentor of Alex, and Drew is a superior of
Drew. The presence of these two facts is clearly unintended (and should be dealt with using an
appropriate validation machinery).
   The KG presented so far does not carry any semantic information on the expected relations
between the different kinds of objects that are present, it contains no schema information nor integrity
constraints. The SHACL standard specifies a language for expressing such information. Consider
the three expressions (P), (E) and (S) in Figure 1, which intuitively correspond to three validation
rules that allow to identify nodes of three types: nodes corresponding to professors, employees, and
students, respectively. Each of these three types has a name (i.e., ProfessorShape, EmployeeShape,
and StudentShape) that is associated to its definition using a (possibly quite complex) logic-
based expression. In the DATALOG terminology, we have ProfessorShape, EmployeeShape, and
StudentShape as monadic intensional predicates, while Employee, Student are unary extensional
predicates, and isMentorOf, hasSuperior are binary extensional predicates. Specifically, (P) tells us
that a person is considered to be a professor (has type ProfessorShape) if they are an employee (of
type EmployeeShape) and they mentor at least one student (an object with type StudentShape).
According to the rule (E), a person is an employee (has type EmployeeShape) if that is stated
explicitly (via membership in the class Employee), or the person has a superior that is in turn of
type EmployeeShape. The last rule (S) tells us that an object is of type StudentShape if the object
is explicitly stated to be an instance of the class Student, or the object is of type EmployeeShape
but not of type ProfessorShape. Intuitively, in our example, the employees of the institution that
are not professors can be seen as students who can, e.g., take courses.
   As a final step, we need to specify which nodes of the graph are expected to satisfy selected types.


                                                   4
Adrian Chmurovič et al. CEUR Workshop Proceedings                                                 2–13


This is done via a specification of target nodes. In the basic case, we can list pairs of nodes and the
expected type (shape) names. For example, consider the following target specification:

                  𝒯1 = {(Blake, ProfessorShape), (Cameron, StudentShape)}

Intuitively, our KG equipped with the validation rules (P), (E) and (S) validates the targets in
𝒯1 . Indeed, Cameron satisfies StudentShape because they are explicitly stated as an instance of
Student. Since Blake has Alex as a superior and Alex is explicitly stated to be an employee, we
have that Blake satisfies EmployeeShape. Given that Blake mentors Cameron, we infer that
Blake satisfies ProfessorShape. Unsurprisingly, we can have targets that we don’t expect to be
validated by our KG and the above validation rules (under a reasonable validation semantics): this
holds, e.g., for 𝒯 = {(Blake, StudentShape)} and 𝒯 = {(Cameron, ProfessorShape)}.
   In practice, we want more succinct representations of target specifications. The SHACL stan-
dard describes the so-called class and property (domain and range) targets. E.g., with 𝒯 =
{(Student, EmployeeShape)} we ask each instance of Student to satisfy EmployeeShape (this
requirement will clearly be violated in our KG).
   Let’s consider the 2-valued supported model semantics of Corman et at. [1]. Here we simply
try to assign to each node some set of shape names such that: (i) a node 𝑒 is a assigned a shape
name 𝑆 iff 𝑒 satisfies the shape expression associated to 𝑆, and (ii) the target specification is
respected. Unfortunately, under this semantics, the target 𝒯1 above cannot be validated by our
KG, which is certainly not desirable. The problem here is that Alex rules out the existence of any
shape assignment as required for validation. If we assign ProfessorShape to Alex, then Alex
must be a mentor of some object with StudentShape. Since Alex only mentors themselves, and
StudentShape is incompatible with ProfessorShape, our candidate shape assignment fails. If
we decide to not assign ProfessorShape to Alex, then Alex must not mentor any object with
StudentShape. However, Alex mentors themselves and obviously satisfies StudentShape. Thus
again the shape assignment has failed. Overall, the 2-valued supported model semantics has a
significant disadvantage: If we have some issue in the graph that bars the existence of a total shape
assignment, then the validation fails for all target specifications, even those without any connection
to the problematic part of the knowledge graph. The 2-valued stable model semantics in [2] has the
same problem, and thus [1, 2] also consider 3-valued versions of their semantics.
   Consider the knowledge graph obtained by dropping the node Alex, i.e., deleting the class mem-
bership and property assertions that involve Alex. Then a total shape assignment becomes possible:
simply assign StudentShape to Cameron and assign ProfessorShape and EmployeeShape to
Drew. In fact, this is the only assignment possible. However, this situation is questionable because
the knowledge graph states that Drew supervises themselves, which in turn causes an unfounded
inference: Drew is an employee because Drew is an employee. Thus this graph validates the target
specification 𝒯 = {(Drew, ProfessorShape)} despite the absence of evidence that Drew is an
employee in the organization. The two problems that were discussed in this example are addressed
by the well-founded semantics for SHACL that we discuss here.


                                                  5
Adrian Chmurovič et al. CEUR Workshop Proceedings                                               2–13


2. Preliminaries
We now recall the well-founded semantics for propositional logic programs with negation [4].
We assume a countably infinite set Npa of propositional atoms, also called positive literals. A
negative literal is an expression of the form ¬𝑎, where 𝑎 ∈ Npa . A literal is either a positive or a
negative literal. A rule 𝜌 is an expression of the form ℎ ← 𝐿1 , . . . , 𝐿𝑛 where ℎ is a propositional
atom, and 𝐿1 , . . . , 𝐿𝑛 are literals. The atom ℎ is called the head of 𝜌, denoted head(𝜌). We let
body(𝜌) = {𝐿1 , . . . , 𝐿𝑛 }. A program 𝑃 is any finite set of rules.
   A (3-valued) interpretation 𝐼 is any set of literals such that {𝑎, ¬𝑎} ̸⊆ 𝐼 for any 𝑎. Intuitively,
if 𝑎 is a propositional atom such that 𝑎 ∈ 𝐼, then 𝑎 is true in 𝐼. If ¬𝑎 ∈ 𝐼, then 𝑎 is false in 𝐼. If
neither 𝑎 ∈ 𝐼 nor ¬𝑎 ∈ 𝐼, then the truth value of 𝑎 is undefined in 𝐼.
   A set 𝑈 of propositional atoms is called an unfounded set w.r.t. an interpretation 𝐼 and a
program 𝑃 , if at least one of the following holds for all rules 𝜌 ∈ 𝑃 :
  (i) head(𝜌) ̸∈ 𝑈 ,
 (ii) there is a positive literal 𝑏 ∈ body(𝜌) such that 𝑏 ∈ 𝑈 or ¬𝑏 ∈ 𝐼, or
(iii) there is a negative literal ¬𝑏 ∈ body(𝜌) such that 𝑏 ∈ 𝐼.
For any program 𝑃 and interpretation 𝐼, there is a unique ⊆-maximal unfounded set w.r.t. 𝑃
and 𝐼, which we denote 𝑈𝑃 (𝐼). For a program 𝑃 , we define a pair 𝑇𝑃 and 𝑊𝑃 of operators that
map interpretations to interpretations as follows:

                            𝑇𝑃 (𝐼) = {head(𝜌) | body(𝜌) ⊆ 𝐼, 𝜌 ∈ 𝑃 }
                           𝑊𝑃 (𝐼) = 𝑇𝑃 (𝐼) ∪ {¬𝑎 | 𝑎 ∈ 𝑈𝑃 (𝐼)}

Intuitively, by applying the operator 𝑊𝑃 on an interpretation 𝐼, we augment 𝐼 with some new
literals: (i) first we include 𝑇𝑃 (𝐼) that contains immediate consequences of the rules in 𝑃 and
the literals in 𝐼, and then (ii) we add the negation of all atoms that can be safely assumed false
according to 𝑈𝑃 (𝐼). The operator 𝑊𝑃 is monotonic, i.e., 𝑊𝑃 (𝐼1 ) ⊆ 𝑊𝑃 (𝐼2 ) holds whenever
𝐼1 ⊆ 𝐼2 . Thus 𝑊𝑃 has the least fixpoint lfp(𝑊𝑃 ) which we call the well-founded model of
𝑃 , and denote it by WFS (𝑃 ). In more detail, we can compute WFS (𝑃 ) by considering the
sequence 𝐼0 , 𝐼1 , . . . with 𝐼0 = ∅ and 𝐼𝑖 = 𝑊𝑃 (𝐼𝑖−1 ) for 𝑖 > 0. Due to the monotonicity of 𝑊𝑃 ,
this sequence reaches a 𝑗 such that 𝐼𝑗+1 = 𝐼𝑗 . Then WFS (𝑃 ) = 𝐼𝑗 .

Example 2. Consider the program 𝑃 consisting of the following rules:

              𝑎 ← 𝑏, 𝑐         𝑏 ← 𝑎, 𝑑         𝑑←𝑒           𝑐 ← ¬𝑏           𝑒 ← ¬𝑑

Let us compute WFS (𝑃 ). We start with 𝐼0 = ∅. We have 𝑇𝑃 (𝐼0 ) = ∅ and 𝑈𝑃 (𝐼0 ) = {𝑎, 𝑏}. Thus
𝐼1 = 𝑊𝑃 (𝐼0 ) = {¬𝑎, ¬𝑏}. To compute 𝐼2 , first note that 𝑇𝑃 (𝐼1 ) = {𝑐} and 𝑈𝑃 (𝐼1 ) = {𝑎, 𝑏}.
Then 𝐼2 = 𝑊𝑃 (𝐼1 ) = {𝑐, ¬𝑎, ¬𝑏}. Observe that 𝑈𝑃 (𝐼2 ) = {𝑎, 𝑏} and 𝑇𝑃 (𝐼2 ) = {𝑐}, and thus
𝐼3 = 𝐼2 . Hence WFS (𝑃 ) = {𝑐, ¬𝑎, ¬𝑏}.


                                                  6
Adrian Chmurovič et al. CEUR Workshop Proceedings                                              2–13


3. SHACL Constraints
We recall the (abstract) syntax of SHACL constraints, mainly following the approach in [1].
We assume countably infinite, mutually disjoint sets Nnode , Nclass , Nprop , Nshape , Nvar of nodes,
class names, property names, shape names, and variables, respectively. If 𝑝 ∈ Nprop , then 𝑝− is
the inverse property of 𝑝. We let N±
                                   prop = Nprop ∪ {𝑝 | 𝑝 ∈ Nprop } and let (𝑝 ) = 𝑝 for any
                                                     −                            − −

𝑝 ∈ Nprop . Each 𝑟 ∈ Nprop is a basic property. A property path is an expression 𝐸 built using
                        ±

the grammar
                                  𝐸 ::= 𝑟 | 𝐸 ∘ 𝐸 | 𝐸 ∪ 𝐸 | 𝐸 *
where 𝑟 ∈ N±
           prop .
  A data graph is a pair 𝒢 = (Δ𝒢 , ·𝒢 ), where Δ𝒢 ⊆ Nnode is a set of nodes (called domain),
and ·𝒢 is a function that assigns to every class name 𝐴 ∈ Nclass some set of nodes 𝐴𝒢 ⊆ Δ𝒢 ,
and to every property name 𝑝 ∈ Nprop some binary relation 𝑝𝒢 ⊆ Δ𝒢 × Δ𝒢 . For a property
name 𝑝, we let (𝑝− )𝒢 = {(𝑜′ , 𝑜) | (𝑜, 𝑜′ ) ∈ 𝑝𝒢 }. Furthermore, the function ·𝒢 is extended to all
property paths as follows:

                                  (𝐸1 ∘ 𝐸2 )𝒢 = (𝐸1 )𝒢 ∘ (𝐸2 )𝒢
                                  (𝐸1 ∪ 𝐸2 )𝒢 = (𝐸1 )𝒢 ∪ (𝐸2 )𝒢
                                        (𝐸 * )𝒢 = (𝐸 𝒢 )*

SHACL constraints. A shape expression 𝜙 is an expression built according to the following
grammar:
                 𝜙 ::= 𝑐 | 𝐴 | ¬𝐴 | 𝑠 | ¬𝑠 | 𝜙 ∧ 𝜙 | 𝜙 ∨ 𝜙 | 𝑋 | ∃𝐸.𝜙
where 𝑐 ∈ Nnode , 𝐴 ∈ Nclass , 𝑠 ∈ Nshape , 𝑋 ∈ Nvar , and 𝐸 is a complex path. A shape
constraint is an expression of the form 𝑠 ← 𝜙, where 𝑠 ∈ Nshape and 𝜙 is a shape expression. In
the following, when considering a set 𝒞 of shape constraints, we always require that (a) 𝜙1 = 𝜙2
for any pair 𝑠 ← 𝜙1 and 𝑠 ← 𝜙2 of shape constraints in 𝒞, and (b) for every shape name 𝑠
that appears in 𝒞, there is 𝜙 such that 𝑠 ← 𝜙 ∈ 𝒞. The above just says that every shape name
that appears in 𝐶 has exactly one “definition”. Concretely, for such a constraint set 𝒞 and a
shape name 𝑠 that appears in 𝒞, we let 𝒞(𝑠) denote the unique 𝜙 such that 𝑠 ← 𝜙 ∈ 𝐶. A shape
expression is ground if it contains no variables. In what follows we only consider ground shape
expressions, with a small exception in Section 5 where variables are used as an auxiliary tool to
obtain a translation into logic programs. Note that the SHACL standard does not allow variables
in shape expressions. We also note that the syntax above does not support property pair equality,
property pair disjointness, cardinality constraints, and negation of complex expressions; these
features are described in the W3C standard, but we omit them here for presentation reasons
(they will be added and discussed in the extended version of this paper).
   A target is a pair (ℓ, 𝑠), where ℓ ∈ Nnode ∪ Nclass ∪ N±
                                                          prop and 𝑠 ∈ Nshape . A SHACL document
is a pair (𝒞, 𝒯 ), where 𝒞 is a set of shape constraints and 𝒯 is a set of targets. We will next
discuss the validation of a graph 𝒢 against a SHACL document (𝒞, 𝒯 ). For this we require that
the nodes that explicitly appear in 𝒞 or 𝒯 are also included in the domain Δ𝒢 .


                                                 7
Adrian Chmurovič et al. CEUR Workshop Proceedings                                                  2–13


                               𝑐𝒢,𝜇 = {𝑐}
                               𝑠𝒢,𝜇 = 𝜇(𝑠)
                              𝐴𝒢,𝜇 = 𝐴𝒢
                           (¬𝐴)𝒢,𝜇 = Δ𝒢 ∖ 𝐴𝒢
                           (¬𝑠)𝒢,𝜇 = Δ𝒢 ∖ 𝜇(𝑠)
                      (𝜙1 ∧ 𝜙2 )𝒢,𝜇 = (𝜙1 )𝒢,𝜇 ∩ (𝜙2 )𝒢,𝜇
                      (𝜙1 ∨ 𝜙2 )𝒢,𝜇 = (𝜙1 )𝒢,𝜇 ∪ (𝜙2 )𝒢,𝜇
                        (∃𝐸.𝜙)𝒢,𝜇 = {𝑒 ∈ Δ𝒢 | ∃𝑒′ : (𝑒, 𝑒′ ) ∈ 𝐸 𝒢 ∧ 𝑒′ ∈ 𝜙𝒢,𝜇 }

Figure 2: Evaluating complex shape expressions


Supported model semantics. We present next the basic 2-valued semantics of recursive
SHACL due to Corman et al. [1]. A shape assignment for 𝒢 is any function 𝜇 that maps every
shape name 𝑠 ∈ Nshape to a subset 𝜇(𝑠) ⊆ Δ𝒢 . For a data graph 𝒢 and a shape assignment 𝜇
for 𝒢, we define a function ·𝒢,𝜇 that maps every shape expression 𝜙 to a set 𝜙𝒢,𝜇 of nodes as
shown in Figure 2. We say a graph 𝒢 is valid against a SHACL document (𝒞, 𝒯 ), if there exists
a shape assignment 𝜇 such that:
   1. 𝜇(𝑠) = 𝜙𝒢,𝜇 for all constraints 𝑠 ← 𝜙 ∈ 𝒞, and
   2. for all (ℓ, 𝑠) ∈ 𝑇 we have:
          • if ℓ is a node 𝑎 ∈ Nnode , then 𝑎 ∈ 𝜇(𝑠),
          • if ℓ is a class name 𝐴, then 𝐴𝒢 ⊆ 𝜇(𝑠),
          • if ℓ is a property name 𝑝, then {𝑒 | (𝑒, 𝑒′ ) ∈ 𝑝𝒢 } ⊆ 𝜇(𝑠), and
          • if ℓ = 𝑝− for a property name 𝑝, then {𝑒′ | (𝑒, 𝑒′ ) ∈ 𝑝𝒢 } ⊆ 𝜇(𝑠).

Example 3. Consider a SHACL document (𝒞, 𝒯 ) consisting of the following:

     𝒞 = {𝑠1 ← ∃𝑝.¬𝐴,        𝑠2 ← ¬𝑠1 ,      𝑠3 ← 𝐵 ∧ ¬𝑠4 ,     𝑠4 ← 𝐵 ∧ ¬𝑠3 ,      𝑠5 ← ∃𝑟.𝑠5 }

                                  𝒯 = {(𝐴, 𝑠1 ), (𝐵, 𝑠2 ), (𝑎, 𝑠5 )}
Consider a data graph 𝒢 with Δ𝐺 = {𝑎, 𝑏} and

                     𝐴𝒢 = {𝑎}      𝐵 𝒢 = {𝑏}     𝑝𝒢 = {(𝑎, 𝑏)} 𝑟𝒢 = {(𝑎, 𝑎)}

It easy to see that 𝒢 is valid against (𝒞, 𝒯 ), which is witnessed, e.g., by the shape assignment
𝜇 such that 𝜇(𝑠1 ) = 𝜇(𝑠5 ) = {𝑎}, 𝜇(𝑠2 ) = 𝜇(𝑠3 ) = {𝑏}, 𝜇(𝑠4 ) = ∅. This example already
illustrates a drawback of the supported model semantics: 𝑎 is inferred to satisfy 𝑠5 because it has
an 𝑟-connection to a node that satisfies 𝑠5 . However, 𝑎 has only a singe 𝑟-connection to itself, i.e.,
we have self-justification.
   Consider the targets 𝒯1 = {(𝑏, 𝑠3 )}, 𝒯2 = {(𝑏, 𝑠4 )}, 𝒯3 = {(𝑏, 𝑠3 ), (𝑏, 𝑠4 )}. Observe that 𝒢 is
valid against (𝒞, 𝒯1 ) and (𝒞, 𝒯2 ) individually, but 𝒢 does not validate 𝒯3 as we cannot assign both 𝑠3


                                                   8
Adrian Chmurovič et al. CEUR Workshop Proceedings                                               2–13


                  ⌊𝑠⌋𝒢𝑆 = {𝑎 | 𝑠(𝑎) ∈ 𝑆}
                 ⌊¬𝑠⌋𝒢𝑆 = {𝑎 | ¬𝑠(𝑎) ∈ 𝑆}
                  ⌈𝑠⌉𝒢𝑆 = {𝑎 ∈ Δ𝒢 | ¬𝑠(𝑎) ̸∈ 𝑆}
                 ⌈¬𝑠⌉𝒢𝑆 = {𝑎 ∈ Δ𝒢 | 𝑠(𝑎) ̸∈ 𝑆}
                   [𝑐]𝒢𝑆 = {𝑐}                                            [·] ∈ {⌊·⌋, ⌈·⌉}
                  [𝐴]𝒢𝑆 = {𝑎 | 𝑎 ∈ 𝐴𝒢 }                                   [·] ∈ {⌊·⌋, ⌈·⌉}
                [¬𝐴]𝒢𝑆 = {𝑎 | 𝑎 ∈ Δ𝒢 ∖ 𝐴𝒢 }                               [·] ∈ {⌊·⌋, ⌈·⌉}
            [𝜙1 ∧ 𝜙2 ]𝒢𝑆 = [𝜙1 ]𝒢𝑆 ∩ [𝜙2 ]𝒢𝑆                              [·] ∈ {⌊·⌋, ⌈·⌉}
            [𝜙1 ∨ 𝜙2 ]𝒢𝑆 = [𝜙1 ]𝒢𝑆 ∪ [𝜙2 ]𝒢𝑆                              [·] ∈ {⌊·⌋, ⌈·⌉}
              [∃𝐸.𝜙]𝒢𝑆 = {𝑎 | ∃𝑎′ : (𝑎, 𝑎′ ) ∈ 𝐸 𝒢 ∧ 𝑎′ ∈ [𝜙]𝒢𝑆 }         [·] ∈ {⌊·⌋, ⌈·⌉}

Figure 3: Evaluating shape expressions: upper and lower bounds


and 𝑠4 to 𝑏 in a single shape assignment. This phenomenon could be undesired in certain situations:
when a stronger notion of validation is desired, we may expect that a graph validates 𝒯 ′ ∪ 𝒯 ′′
whenever it validates 𝒯 ′ and 𝒯 ′′ . The well-founded semantics introduced next has this feature.

   Note that by introducing an inconsistency in 𝒞, e.g., by adding the constraint 𝑠6 ← ¬𝑠6 , we
lose the existence of a shape assignment, and thus no target gets validated, including 𝒯 = ∅.
This is partially resolved by considering a 3-valued supported model semantics, which does
not require determining the truth value of all shape names [1]. However, this does not resolve
the problem of self-justifications, which can be dealt with by using the 2-valued and 3-valued
stable model semantics of Andreşel et al. [2]. See also [5] for an approach based on Magic Sets
to increase inconsistency tolerance of the 2-valued stable model semantics in SHACL.


4. Well-founded Semantics for SHACL
We are now ready to define a well-founded semantics for SHACL. We do this by adapting the
notion of unfounded sets and fixpoint computations from [4]. We first need our version of
3-valued interpretations in the context of SHACL.
   A shape atom is an expression of the form 𝑠(𝑎), where 𝑠 ∈ Nshape and 𝑎 ∈ Nnode . A negated
shape atom is an expression ¬𝑠(𝑎), where 𝑠(𝑎) is an atom. A (shape) literal is a possibly negated
shape atom. A (3-valued) interpretation (for SHACL constraints) is any set 𝑆 of shape literals
such that there is no 𝑠(𝑎) ∈ 𝑆 with ¬𝑠(𝑎) ∈ 𝑆. Intuitively, an atom 𝑠(𝑎) ∈ 𝑆 means that there
is a justification that 𝑠 holds at 𝑎, a literal ¬𝑠(𝑎) ∈ 𝑆 means that there is no reason for 𝑠 to hold
at 𝑎, while {𝑠(𝑎), ¬𝑠(𝑎)} ∩ 𝑆 = ∅ corresponds to the case where the the satisfaction of 𝑠 at 𝑎 is
undefined. For a set 𝐾 of shape atoms, let ¬.𝐾 = {¬𝑠(𝑎) | 𝑠(𝑎) ∈ 𝐾}.

Definition 1. For a given graph 𝒢 and an interpretation 𝑆, we define two functions ⌊·⌋𝒢𝑆 and ⌈·⌉𝒢𝑆
that map every shape expression to a set of nodes. These functions are defined in Figure 3.


                                                    9
Adrian Chmurovič et al. CEUR Workshop Proceedings                                                2–13


Assume a data graph 𝒢 and a constraint 𝑠 ← 𝜙. Suppose we “believe” a set 𝑆 of shape literals,
i.e., we assume that all literals in 𝑆 are true. Then ⌊𝜙⌋𝒢𝑆 and ⌈𝜙⌉𝒢𝑆 return us the nodes of 𝒢 where
𝜙 is certainly true and where 𝜙 is possibly true, respectively. Thus ⌊𝜙⌋𝒢𝑆 can be used to infer
positive shape literals: if 𝑎 ∈ ⌊𝜙⌋𝒢𝑆 , then we can infer 𝑠(𝑎). We can use ⌈𝜙⌉𝒢𝑆 to infer negative
information: if 𝑎 ̸∈ ⌈𝜙⌉𝒢𝑆 , then we can infer ¬𝑠(𝑎). These inferences are formalized next.

Definition 2. Assume a data graph 𝒢 and a set 𝒞 of constraints. We define an operator 𝑇𝒢,𝒞 (·)
that maps interpretations into interpretations as follows:

                            𝑇𝒢,𝒞 (𝑆) = {𝑠(𝑎) | 𝑠 ← 𝜙 ∈ 𝐶, 𝑎 ∈ ⌊𝜙⌋𝒢𝑆 }

We are now ready to define the notion of an unfounded set of shape atoms.

Definition 3 (Unfounded set). Assume an interpretation 𝑆, a data graph 𝒢, and a set 𝒞 of
constraints. A set 𝑈 of shape atoms is called an unfounded set w.r.t. 𝑆, 𝒢 and 𝒞, if 𝑎 ̸∈ ⌈𝒞(𝑠)⌉𝒢𝑆∪¬.𝑈
for all 𝑠(𝑎) ∈ 𝑈 .

Assume a data graph 𝒢, a set 𝑈 of shape atoms, and suppose we “believe” a set 𝑆 of shape
literals. Intuitively, the atoms in 𝑈 form an unfounded set (and can thus be simultaneously
set to false) if none of the shape atoms 𝑠(𝑎) ∈ 𝑈 can possibly be implied by the associated
constraint, assuming the negation of the atoms in 𝑈 holds (in addition to 𝑆 being true).
   The following property follows from the fact that 𝑈1 ∪ 𝑈2 is an unfounded set w.r.t. 𝑆, 𝒢 and
𝒞 whenever 𝑈1 , 𝑈2 are two unfounded sets w.r.t. 𝑆, 𝒢 and 𝒞.

Proposition 1. Assume an interpretation 𝑆, a data graph 𝒢, and a set 𝒞 of constraints. There
exists a unique set 𝑈 such that
   1. 𝑈 is an unfounded set w.r.t. 𝑆, 𝒢 and 𝒞, and
   2. there is no 𝑈 ′ ⊃ 𝑈 that is unfounded set w.r.t. 𝑆, 𝒢 and 𝒞.

The unique set 𝑈 in the proposition above is called the greatest unfounded set w.r.t. 𝑆, 𝒢 and 𝒞.
Assume a data graph 𝒢 and a set 𝒞 of constraints. We let 𝑈𝒢,𝒞 be the operator that maps every
interpretation 𝑆 to the greatest unfounded set w.r.t. 𝑆, 𝒢 and 𝒞. Thus, 𝑈𝒢,𝒞 (𝑆) is the maximal
set of shape atoms that we can safely set to false if we assume that the literals in 𝑆 are true.
We can now finally define a well-founded semantics for SHACL. We define an operator 𝑊𝒢,𝒞
that maps interpretations into interpretations as follows: It combines the positive consequences
based on the 𝑇𝒢,𝒞 operator and the negated atoms of the greatest unfounded set produced by
the 𝑈𝒢,𝒞 operator. More formally, we define:

                                𝑊𝒢,𝒞 (𝑆) = 𝑇𝒢,𝒞 (𝑆) ∪ ¬.𝑈𝒢,𝒞 (𝑆)

The above operator is monotone, i.e., 𝑊𝒢,𝒞 (𝑆1 ) ⊆ 𝑊𝒢,𝒞 (𝑆2 ) whenever 𝑆1 , 𝑆2 are two inter-
pretations with 𝑆1 ⊆ 𝑆2 . Thus 𝑊𝒢,𝒞 has the least fixpoint, i.e., there exists 𝑆 such that (i)
𝑊𝒢,𝒞 (𝑆) = 𝑆, and (ii) there is no 𝑆 ′ ⊂ 𝑆 with 𝑊𝒢,𝒞 (𝑆 ′ ) = 𝑆 ′ . We use WFS (𝒢, 𝒞) to denote
the least fixpoint of 𝑊𝒢,𝒞 , and call it the well-founded model of 𝒢 and 𝒞. WFS (𝒢, 𝒞) can be
obtained by constructing a sequence 𝑆0 , 𝑆1 , 𝑆2 , . . . such that 𝑆0 = ∅ and 𝑆𝑖+1 = 𝑊𝒢,𝒞 (𝑆𝑖 )
until eventually an index 𝑗 is reached were 𝑆𝑗 = 𝑆𝑗+1 . Then WFS (𝒢, 𝒞) = 𝑆𝑗 .


                                                 10
Adrian Chmurovič et al. CEUR Workshop Proceedings                                                          2–13


  We can now define validation under the well-founded semantics. We say a graph 𝒢 is valid
against a SHACL document (𝒞, 𝒯 ) (under the well-founded semantics), if for all (ℓ, 𝑠) ∈ 𝑇
we have:
    • if ℓ is a node 𝑎 ∈ Nnode , then 𝑠(𝑎) ∈ WFS (𝒢, 𝒞),
    • if ℓ is a class name 𝐴, then 𝑠(𝑎) ∈ WFS (𝒢, 𝒞) for all 𝑎 ∈ 𝐴𝒢 ,
    • if ℓ is a property name 𝑝, then 𝑠(𝑎) ∈ WFS (𝒢, 𝒞) for all 𝑎 with (𝑎, 𝑏) ∈ 𝑝𝒢 ,
    • if ℓ = 𝑝− for 𝑝 ∈ Nprop , then 𝑠(𝑏) ∈ WFS (𝒢, 𝒞) for all 𝑏 with (𝑎, 𝑏) ∈ 𝑝𝒢 .
Example 4. Consider the data graph 𝒢 and the constraint set 𝒞 from Example 1. It is not dif-
ficult to check that WFS (𝒢, 𝒞) contains ProfessorShape(Blake), StudentShape(Cameron),
EmployeeShape(Alex ), EmployeeShape(Blake). Observe also that ProfessorShape(Alex ) ̸∈
WFS (𝒢, 𝒞) and ¬ProfessorShape(Alex ) ̸∈ WFS (𝒢, 𝒞), i.e., membership of Alex in
ProfessorShape is undetermined, which is intuitive as discussed in Example 1. For the
remaining shape atoms over the signature of 𝒢 and 𝒞, we have that WFS (𝒢, 𝒞) con-
tains their negation. Thus 𝒢 is valid against (𝒞, {(Blake, ProfessorShape)}) but not valid
w.r.t. (𝒞, {(Drew , ProfessorShape)}).


5. From SHACL to Propositional Logic Programs
Assume a graph 𝒢 and a set 𝒞 of constraints. We show how to compute a propositional logic
program 𝑃𝒢,𝒞 from 𝒢 and 𝒞 such that the well-founded model of 𝑃𝒢,𝒞⋃︀corresponds to the
well-founded model of 𝒢 and 𝒞. We define our target program as 𝑃𝒢,𝒞 = 𝑠←𝜙∈𝒞 𝑃𝑠,𝜙 , where
each 𝑃𝑠,𝜙 is defined as follows.
   Assume a constraint 𝑠 ← 𝜙. First, perform the following replacement in 𝜙 exhaustively
until no further replacement is possible: find a position in 𝜙 where a negative shape literal ¬𝑠
occurs, and replace ¬𝑠 by a fresh variable 𝑥¬𝑠 . Second, perform the following replacement in 𝜙
exhaustively: find a position in 𝜙 where a positive shape literal 𝑠 occurs, and replace 𝑠 by a fresh
variable 𝑥𝑠 . After these steps, we have that the shape expression 𝜙′ that is produced from 𝜙 has
a tuple of variables ⃗𝑥 = (𝑥1¬𝑠1 , . . . , 𝑥𝑘¬𝑠𝑘 , 𝑥𝑘+1
                                                    𝑠𝑘+1 , . . . , 𝑥𝑠𝑛 ). Then the program 𝑃𝑠,𝜙 consists of the
                                                                    𝑛

propositional rule 𝑠(𝑎) ← ¬𝑠1 (𝑏1 ), . . . , ¬𝑠𝑘 (𝑏𝑘 ), 𝑠𝑘+1 (𝑏𝑘+1 ), . . . , 𝑠𝑛 (𝑏𝑛 ) for every node 𝑎 and
every node tuple ⃗𝑏 = (𝑏1 , . . . , 𝑏𝑛 ) such that 𝑎 ∈ (𝜙′ [𝑥       ⃗ /𝑏⃗ ])𝒢,∅ . Here 𝜙[𝑥  ⃗ ] denotes the shape
                                                                                         ⃗ /𝑏
expression that is obtained from 𝜙 by performing the substitution of variables ⃗𝑥 by nodes ⃗𝑏.
Proposition 2. If 𝒢 is a graph and 𝒞 is a constraint set, then WFS (𝒢, 𝒞) = WFS (𝑃𝒢,𝒞 ). The
size of 𝑃𝒢,𝒞 is exponential in the size of 𝒢 and 𝒞, but it is only polynomial in case the number of
shape names in every constraint is bounded by a constant.
Note that, for a given 𝑠 ← 𝜙, the relevant node tuples (𝑎, 𝑏1 , . . . , 𝑏𝑛 ) above can be computed
by posing a single SPARQL query over 𝒢, in a similar way as in [6]. This opens a way to
implement validation under the new semantics by exploiting existing SPARQL endpoints. We
also remark that, by introducing fresh shape names, we can rewrite a given SHACL document
in a way that its shape expressions contain at most 2 shape names, thus obtaining a polynomial
transformation from SHACL to propositional logic programs under the well-founded semantics.
However, our initial experiments show that aiming for a guaranteed polynomial time translation
does not necessarily lead to increased efficiency of validation.


                                                       11
Adrian Chmurovič et al. CEUR Workshop Proceedings                                           2–13


6. Discussion
In this paper we have defined a well-founded semantics for SHACL constraints with cyclic
dependencies. This semantics is not only intuitive and based on well-established principles
in logic programming and deductive databases, but it also has advantages over the previous
approaches (based on supported and stable models) in terms of semantic and computational
properties, i.e., it avoids unfounded justifications and is tractable. We note that our semantics
has a direct connection to the so-called cautious validation under the 3-valued stable model
semantics in [2], which is due to the classic result by Przymusinski [7].
   We have also established a connection between the SHACL semantics introduced here and
the well-founded semantics for logic programs. We are currently working on an implementation
of a SHACL validator that directly exploits this connection: Given a graph 𝒢 and a constraint
set 𝒞, it issues SPARQL queries over 𝒢 in order to compute a propositional logic program 𝑃𝒢,𝒞 ,
which can then be sent to a dedicated reasoner to compute the well-founded model of 𝑃𝒢,𝒞 .
Our initial experiments using the DLV engine [8] suggest that this is a viable approach; these
results will be presented in the extended version of this paper.
   There are many research questions for future work. For instance, the computational com-
plexity of satisfiability and implication of SHACL constraints under the well-founded semantics
is a natural next topic for investigations. Some initial results in this area for the semantics
introduced previously were presented in [9, 10]. Due to the syntactic form of shape expressions,
many of those previous results are related to known positive and negative decidability and
complexity results in the area of Description Logics. More broadly, SHACL is related to the
problem of reconciling the open-world assumption (as in OWL and Description Logics) with the
closed-world assumption (as in databases and logic programming). E.g., validation in SHACL
under the supported model semantics can be seen as checking the satisfiability of a Description
Logic ontology where all but some selected concept names are closed predicates. Supporting
ontological inferences in the context of SHACL validation is an interesting research direction,
which is also hinted at by the SHACL standard. More details on the challenges of combining
open- and closed-world assumptions can be found in [11, 12]. Another natural direction is
to study explanations and repairs of violations of SHACL constraints under the well-founded
semantics: these topics are natural as the SHACL standard calls for (but does not specify the
details of) the so-called validation reports to support users and applications. Some initial work
on explaining and repairing violations of SHACL constraints was reported in [13, 14].


Acknowledgments
This work was partially supported by the Wallenberg AI, Autonomous Systems and Software
Program (WASP) funded by the Knut and Alice Wallenberg Foundation. It was also partially
supported by the Austrian Science Fund (FWF) projects P30360 and P30873, and by the Vienna
Business Agency’s project CoRec.


                                               12
Adrian Chmurovič et al. CEUR Workshop Proceedings                                              2–13


References
 [1] J. Corman, J. L. Reutter, O. Savkovic, Semantics and validation of recursive SHACL,
     in: Proc. of ISWC 2018, volume 11136 of LNCS, Springer, 2018, pp. 318–336. URL: https:
     //doi.org/10.1007/978-3-030-00671-6_19.
 [2] M. Andreşel, J. Corman, M. Ortiz, J. L. Reutter, O. Savkovic, M. Šimkus, Stable model
     semantics for recursive SHACL, in: WWW ’20: The Web Conference 2020, ACM / IW3C2,
     2020, pp. 1570–1580. URL: https://doi.org/10.1145/3366423.3380229.
 [3] P. Pareti, G. Konstantinidis, A review of SHACL: from data validation to schema reasoning
     for RDF graphs, in: [15], 2021. URL: https://doi.org/10.1007/978-3-030-95481-9_6.
 [4] A. Van Gelder, K. A. Ross, J. S. Schlipf, The well-founded semantics for general logic
     programs, J. ACM 38 (1991) 619–649. URL: https://doi.org/10.1145/116825.116838.
 [5] S. Ahmetaj, B. Loehnert, M. Ortiz, M. Šimkus, Magic shapes for validation in SHACL,
     Proceedings of the VLDB Endowment. Volume 15 Issue 10 (to appear) (2022).
 [6] J. Corman, F. Florenzano, J. L. Reutter, O. Savkovic, Validating SHACL constraints over
     a SPARQL endpoint, in: Proc. of ISWC 2019, volume 11778 of LNCS, Springer, 2019, pp.
     145–163. URL: https://doi.org/10.1007/978-3-030-30793-6_9.
 [7] T. C. Przymusinski, The well-founded semantics coincides with the three-valued stable
     semantics, Fundam. Inform. 13 (1990) 445–463.
 [8] N. Leone, G. Pfeifer, W. Faber, T. Eiter, G. Gottlob, S. Perri, F. Scarcello, The dlv system for
     knowledge representation and reasoning., ACM Trans. Comput. Log. 7 (2006) 499–562.
     URL: http://dblp.uni-trier.de/db/journals/tocl/tocl7.html#LeonePFEGPS06.
 [9] P. Pareti, G. Konstantinidis, F. Mogavero,               Satisfiability and containment of
     recursive SHACL,         Journal of Web Semantics 74 (2022) 100721. URL: https://
     www.sciencedirect.com/science/article/pii/S1570826822000130. doi:https://doi.org/
     10.1016/j.websem.2022.100721.
[10] M. Leinberger, P. Seifer, T. Rienstra, R. Lämmel, S. Staab, Deciding SHACL shape contain-
     ment through description logics reasoning, in: Proc. of ISWC 2020, Springer, 2020.
[11] M. Knorr, On combining ontologies and rules, in: [15], 2021, pp. 22–58. URL: https:
     //doi.org/10.1007/978-3-030-95481-9_2.
[12] T. Schneider, M. Šimkus, Ontologies and data management: A brief survey, Künstliche
     Intell. 34 (2020) 329–353. URL: https://doi.org/10.1007/s13218-020-00686-3.
[13] S. Ahmetaj, R. David, M. Ortiz, A. Polleres, B. Shehu, M. Šimkus, Reasoning about
     Explanations for Non-validation in SHACL, in: Proc. of KR 2021, 2021, pp. 12–21. URL:
     https://doi.org/10.24963/kr.2021/2.
[14] R. David, S. Ahmetaj, A. Polleres, M. Šimkus, Repairing SHACL constraint violations using
     answer set programming, in: Proc. of ISWC 2022 (to appear), Springer, 2022.
[15] M. Šimkus, I. Varzinczak (Eds.), Reasoning Web. Declarative Artificial Intelligence - 17th
     International Summer School 2021, 2021, Tutorial Lectures, volume 13100 of LNCS, Springer,
     2022. URL: https://doi.org/10.1007/978-3-030-95481-9.


                                                 13

</pre>