=Paper=
{{Paper
|id=Vol-2180/paper-11
|storemode=property
|title=A Tractable Notion of Stratification for SHACL
|pdfUrl=https://ceur-ws.org/Vol-2180/paper-11.pdf
|volume=Vol-2180
|authors=Julien Corman,Juan L. Reutter,Ognjen Savkovic
|dblpUrl=https://dblp.org/rec/conf/semweb/CormanRS18a
}}
==A Tractable Notion of Stratification for SHACL==
<pdf width="1500px">https://ceur-ws.org/Vol-2180/paper-11.pdf</pdf>
<pre>
       A Tractable Notion of Stratification for SHACL

                Julien Corman1 , Juan L. Reutter2 , and Ognjen Savković1
                      1
                       Free University of Bozen-Bolzano, Bolzano, Italy
            2
                PUC Chile and Center for Semantic Web Research, Santiago, Chile


       Abstract. This article introduces a restriction on the usage of negation in SHACL
       “core constraint components” constraints, called strict stratification, which guaran-
       tees tractability of graph validation.


1   Introduction
One of the challenges of recent RDF-based applications is managing data quality [1],
and several systems already provide RDF validation procedures (e.g., https://www.
stardog.com/docs/,            https://www.topquadrant.com/technology/shacl/).
This created the need for a standardized declarative constraint language for RDF, and for
mechanisms to detect violations of such constraints. An important step in this direction is
SHACL, or Shapes Constraint Language (https://www.w3.org/TR/shacl/) which
has become a W3C recommendation in 2017.
    The SHACL specification however leaves explicitly undefined the validation of
recursive constraints. In a previous article [2], we showed that extending the specifica-
tion’s semantics to accommodate for recursion leads to intractability (in the size of the
graph) for the so-called “core constraint components” of SHACL. This result holds for
stratified constraints already, which may come as a surprise, considering that stratification
guarantees tractability in well-studied recursive languages such as Datalog.
    Our previous work identified a tractable fragment of SHACL’s core components.
In this paper, we propose an alternative approach to gain tractability, retaining all
SHACL operators, but strengthening the stratification condition traditionally used in logic
programming. More exactly, we introduce a syntactic condition on shape constraints
called “strict stratification”, which guarantees that graph validation is in PTIME in
combined (i.e. graph and constraints) complexity. We also describe a procedure to
perform such validation.
    The current paper is not self-contained, due to space limitations, but all definitions
can be found in our previous article [2] or its online extended version [3].

2   SHACL Specification and Formal Semantics
We briefly recall some notions introduced in [2, 3]. We defined a graph validation problem
which captures the target-based validation procedure depicted in the SHACL specification.
An instance of the validation problem is a tuple hG, S, s0 , v0 i, where G is a labeled
                  .               .
graph, S = {s0 = φs0 , . . . , sn = φsn } is a set of shape definitions (also called “shapes”
in what follows), s0 is a shape defined in S, and v0 is a node in G. The problem consists
in deciding whether v0 verifies the constraint φs0 for s0 , given S and G. We will call
hs0 , v0 i the target of this instance. Each constraint φsi follows the following syntax:

                 φ ::= > | s | I | φ1 ∧ φ2 | ¬φ | ≥n r.φ | EQ(r1 , r2 )

where > means boolean true, s is a shape name defined in S, I is an IRI, r is a SPARQL
property path, and n ∈ N+ . Further, ∧ stands for conjunction, ¬ for negation, “≥n r.φ”
means “must have at least n r-successors in G verifying φ”, and “EQ(r1 , r2 )” means
that the r1 and r2 -successors of a node must coincide. A translation from SHACL core
constraint components to this syntax and conversely can be found in [3].
     According to this syntax, a shape can reference another. But the SHACL specification
leaves open the recursive case, i.e. when a shape references itself, either directly, or via a
reference cycle. As a solution, we proposed a semantics based on the notion of assignment,
that maps (positive and negative) shape labels to sets of nodes, and we defined constraint
evaluation given an assignment. Formally, a shape assignment for G and S is a total
function from pairs hs, vi of shape name and node to {0(false), 0.5(unknown), 1(true)}
(the “unknown” value is used to deal with cases when recursion and negation can be
arbitrarily combined). The evaluation of formula φ at node v given assignment σ is
written JφKv,G,σ (for a formal definition we refer to [2] or [3]). The input hG, S, s0 , v0 i
is valid ff there exists a shape assignment σ for G and S such that (i) σ(s0 , v0 ) = 1, and
(ii) σ(s, v) = Jφs Kv,G,σ for each shape s defined in S and node v in G.

3    Strictly Stratified SHACL and Validation
This section defines strict stratification of a set of SHACL shapes, and proposes a PTIME
validation algorithm (combined complexity) for this case. We chose to convey the general
intuition behind stratification from a procedural perspective, as follows: if there is a
satisfying assignment for a given input, then it may be built stratum by stratum, starting
from the lowest stratum, and assigning shapes in a greedy fashion, without the need to
backtrack from a higher to a lower stratum. However, adopting the classical definition of
stratified negation (borrowed from Datalog) is not sufficient to avoid such backtracking.
This is why we define the stronger notion of strict stratification.
     Let S be a set of shapes. We say that shape s1 references shape s2 if s2 appears in
the definition of s1 . Further, we define the dependency graph of S as the directed graph
with all shape names defined in S as nodes, and an edge from s1 to s2 iff s1 references
s2 . If the reference is in the scope of negation, then the edge is labeled with a negation
symbol (as in Fig. 1), and we call it a negative edge. A path between two nodes is called
negative if it contains at least one negative edge, and positive otherwise. S is stratified iff
its dependency graph has no negative cycle. Finally, we define the contracted dependency
graph of S as the graph obtained from its dependency graph by contracting all positive
strongly connected components.

Definition 1 (strict stratification). A set S of shapes is strictly stratified if, for any pair
(s1 , s2 ) of nodes in its contracted dependency graph, either:
  – there is at most one path from s1 to s2 , or
  – all paths from s1 to s2 are positive.
In this definition, using the contracted dependency graph (rather than the dependency
graph) allows us to capture a more expressive fragment of SHACL. We also note that
from this definition, a strictly stratified set of shapes must be stratified.

Example 1. Consider the three shape definitions and their contracted dependency graph
in Figure 1. This set of shape is stratified, but not strictly stratified, since there are more
than one paths from s0 to s2 , and one of them is negative.

         .                                                 s1
      s0 = (≥1 P.s1 ) ∧ (≥1 P.s2 )
         .                                                           ¬
      s1 = ¬(≥1 P.s2 )
         .
      s2 = (≥1 P.s2 )
                                                           s0            s2

                  Fig. 1. A set of shapes, and its contracted dependency graph

                                                                P

                                                 P
                                  v1                        v2

                                             P
                              P                                 P

                                                 P
                                  v0                         v3
                                  Fig. 2. A graph under validation

Example 2. Consider again the shapes from Example 1, together with the graph in
Figure 2. Assume now that one tries to build a validating assignment σ for the target
hv0 , s0 i in a greedy fashion, taking advantage of stratification, thus starting from the
lowest stratum {s2 }. Then s2 can be assigned to v0 , v2 and v3 . Moving up to the next
stratum {s0 , s1 }, s0 cannot be assigned to v0 , because the definition of s1 is violated by
v1 , and therefore the conjunct (≥1 P.s1 ) is not verified by v0 . In order to find a validating
assignment, one would need to backtrack, reverting the decision to assign s2 to v2 . But
the decision to assign s2 to v3 on the other hand should not be reverted, otherwise the
conjunct (≥1 P.s2 ) would not be verified by v0 anymore. It can be easily seen that such
pattern can lead to a blowup of backtracking alternatives, exponential in the size of the
graph under validation.

The intuition for intractability is illustrated by Example 2 (in terms of backtracking),
and was made more formal in [3], with a reduction from boolean circuit satisfiability.
Intractability does not hold for strictly stratified shapes though, which intuitively guar-
antees that no backtracking is needed. We make this more precise in the following (a
complete formalization with proofs can be found in [3]).
     Let Σ G,S be the family of all (3-valued) assignments for G and S. The operator TG,S :
  G,S
Σ        → Σ G,S takes an assignment σ, and returns the assignment obtained by evaluating
each shape constraint definition at each node given σ, i.e. (TG,S (σ))(s, v) = Jφs Kv,G,σ
for each s and v. So from the definition of graph validation given in Section 2, hG, S,
s0 , v0 i is valid iff TG,S admits a fixed-point σ verifying σ(s, v) = 1. We also need the
Algorithm 1 VALIDATION PROCEDURE FOR STRICTLY STRATIFIED SHAPES
Input: G, S, s0 , v0
 1: Initiate σ with σ(s, v) = 0.5, for each shape s and node v
 2: repeat
 3:     σ0 ← σ
 4:     σ ← TG,S (σ)
 5: until σ = σ 0
 6: if σ(s0 , v0 ) = 0 then return Invalid
 7: else return Valid

partial order  over Σ G,S , defined by σ1  σ2 iff σ1 (s, v) = 0 implies σ2 (s, v) = 0,
and σ1 (s, v) = 1 implies σ2 (s, v) = 1, for any s and v.
     Next, it is shown in [3] that TG,S must admit a unique minimal fixed-point σminFix
w.r.t  over Σ G,S , and that σminFix can be reached by recursive applications of TG,S ,
starting with the empty assignment. Finally, TG,S is monotone w.r.t. , which guarantees
that σminFix can be computed in polynomial time. So if σminFix (s0 , v0 ) = 1, because
σminFix is a fixed-point of TG,S , the graph is valid. If σminFix (s0 , v0 ) = 0, because
σminFix is a minimal fixed-point of TG,S , any other fixed-point σ 0 must verify σ 0 (s0 ,
v0 ) = 0, therefore the graph is invalid. Finally, for the case σminFix (s0 , v0 ) = 0.5, we
showed in [3] that if S is strictly stratified (thus regardless of G), then Σ G,S must contain
a fixed-point σ 00 of TG,S s.t. σ 00 (s0 , v0 ) = 1. The corresponding validation procedure is
shown in Algorithm 1.
4    Discussion and Future Work
As a possible continuation, we observe that strict stratification is only a sufficient condition
for tractability. This means that the requirement may be relaxed, identifying a wider class
of instances which can be validated in polynomial time. Our preliminary investigations
indicate that additional properties of the dependency graph may be used, such as odd/even
number of negative references in a path, or morphisms between multiple paths from one
node to another.
    We are also currently developing implementation techniques for validation based on
logic programming. More exactly, since validation is based on the the existence of a
fixed-point evaluation/assignment, we are investigating an encoding into Datalog for
the strictly stratified case, and into to more expressive logic formalisms for non-strictly
stratified constraints.
Acknowledgments. This work was supported by the QUEST, QUADRO and AD-
VANCE4KG projects at the Free University of Bozen-Bolzano.

Bibliography
[1] M. Arenas, C. Gutierrez, and J. Pérez. Foundations of RDF databases. In Reasoning Web
    International Summer School, pages 158–204. Springer, 2009.
[2] J. Corman, J. L. Reutter, and O. Savković. Semantics and Validation of Recursive SHACL. In
    ISWC, 2018.
[3] J. Corman, J. L. Reutter, and O. Savković. Semantics and Validation of Recursive SHACL.
    Technical Report KRDB18-1, Free University of Bozen-Bolzano, 2018. Available at https:
    //www.inf.unibz.it/krdb/KRDB%20files/tech-reports/KRDB18-01.pdf.

</pre>