Magic Shapes for Validation in SHACL?

Shqiponja Ahmetaj2 , Bianca Löhnert1 , Magdalena Ortiz1 , and Mantas Šimkus1
                     1
                         TU Wien, Austria, 2 WU Wien, Austria
                                     (POSTER)

    The Shape Constraint Language (SHACL) was recently standardized by the
W3C as a formalism for checking the quality of RDF graphs; we refer to [7] for
an introduction. In SHACL, the main problem is to check whether a given RDF
graph G validates a SHACL document (C, T ), where C is a set of constraints, also
called shapes graph, each associated to a so-called shape name, and T (targets)
is a specification of nodes from the data graph which should validate certain
shapes from C. For illustration, consider a graph G = {enrolledIn(Ben, C1 )}
and a SHACL document (C, T ), where C = {Student ↔ ∃enrolledIn.Course},
and T is the shape atom Student(Ben). The constraint states that each Student
must be enrolled in some course; Student is a shape name, and enrolledIn and
Course are data predicates, i.e., property and class name, respectively. Clearly,
G does not validate (C, T ), but the extended graph G0 = G∪{Course(C1 )} does.
    The standard specifies a syntax for expressing SHACL constraints and de-
scribes when they are validated by RDF graphs. However, it leaves undefined the
semantics of recursive constraints, i.e., constraints that involve cyclic dependen-
cies. To address this, some logic-based proposals to formalize the semantics of
full SHACL have emerged recently. Andresel et al. [1] proposed a semantics based
on the stable models semantics for logic programs, stricter than the semantics
based on classical logic due to Corman el al. [4]. Both semantics coincide with
the official recommendation for non-recursive constraints, and unfortunately, the
validation problem is NP-complete under both.
    To make SHACL truly useful and facilitate its adoption, we need automated
tools that efficiently implement validation and scale well in the presence of large
RDF graphs and sets of constraints. There are already significant efforts in this
direction for fragments of SHACL [3, 6]. Shacl2Sparql [3] is a SHACL vali-
dation engine that checks conformance of RDF graphs with SHACL constraints
by evaluating SPARQL queries against the data, which optimzes the order in
which shapes are processed. Further optimization techniques are implemented in
Trav-Shacl [6]. However, these works focus on tractable fragments of SHACL.
They do not handle unrestricted interaction of recursion and negation in SHACL
constraints, which calls for verifying whether there exists some global assignment
of shapes to nodes in the graph that is consistent with all the constraints. This
cannot be easily done using top-down approaches implemented in existing val-
idators. To our knowledge, the only implementation that validates full SHACL
  Copyright 2021 for this paper by its authors. Use permitted under Creative Com-
  mons License Attribution 4.0 International (CC BY 4.0).
?
  Supported by the Vienna Business Agency and the Austrian Science Fund (FWF)
  projects P30360 and P30873, and the Faculty of Informatics, TU Wien.
2        Shqiponja Ahmetaj, Bianca Löhnert, Magdalena Ortiz, and Mantas Šimkus

is the Shacl-Asp prototype1 from [1], which translates SHACL constraints into
ASP programs and evaluates them using the DLV system2 . We note that the
authors of [3] propose an algorithm for full SHACL that involves a SAT solver,
but to our knowledge, with no available implementation. The focus of this work
is SHACL validation in the presence of unrestricted negation and recursion.
    In this extended abstract we present preliminary work on adapting the Magic
Sets technique, known from logic programs with (unstratified) negation [5],
to SHACL constraints, as a way to improve performance and applicability of
SHACL validators. Magic Sets [2] is a well-known optimization method from
logic programming and deductive databases. In a nutshell, it uses the query goal
to adorn the input program with binding information, obtaining a new program
whose bottom-up evaluation, analogously to the top-down one, involves ground
atoms only from the relevant part of the data. When applied to SHACL shape
graphs, it allows us to do validation on a potentially much smaller fragment of
the input RDF graph, while also ingoring shape constraints in the input that do
not affect validation of the targets. This is particularly useful in the presence of
negation and recursion, which may result in inconsistency and non-validation,
even when they are irrelevant to the target of interest. We report some prelimi-
nary experiments showing that the technique significantly improves the perfor-
mance of the Shacl-Asp validator, and it may also allow the exploitation of
the more optimized engines Shacl2Sparql and Trav-Shacl for input shape
graphs that they could not originally handle.


1     Preliminaries
Let N, C, and P denote countably infinite, mutually disjoint sets of nodes, class
names, and property names, respectively. A data graph G is a finite set of atoms
of the form B(c) and p(c, d), where B ∈ C, p ∈ P, and c, d ∈ N. The set of
nodes appearing in G is denoted with V (G). We assume a countably infinite set
S of shape names, disjoint from N ∪ C ∪ P. A shape atom is an expression of the
form s(a), where s ∈ S and a ∈ N. A path expression E is a regular expression
built using the usual operators ∗, ·, ∪ from symbols in P+ = P ∪ {p− | p ∈ P}. If
p ∈ P, then p− is the inverse property of p. A (complex) shape is an expression φ
obeying the syntax: φ, φ0 ::= > | s | A | a | φ ∧ φ0 | ¬φ |≥n E.φ | E = E 0 , where
s ∈ S, A ∈ C, c ∈ N, n is a positive integer, and E, E 0 are path expressions.
A (shape) constraint is an expression s ← φ where s ∈ S and φ is a possibly
complex shape; we may refer to φ as the body of the constraint and s as head.
W.l.o.g. we view targets as shape atoms of the form s(a), where s ∈ S and a ∈ N.
A SHACL document is a pair (C, T ), where (i) C is a set of constraints and (ii)
T is a set of targets. The evaluation of a shape expression φ is given by assigning
nodes of the data graph to shape names. A (shape) assignment for G is a set
I = G ∪ L, where L is a set of shape atoms such that a ∈ V (G) for each s(a) ∈ I.
The evaluation of a shape w.r.t. I is given in terms of a function that maps a
1
    https://github.com/medinaandresel/shacl-asp
2
    http://www.dlvsystem.com/dlv/
                                      Magic Shapes for Validation in SHACL          3

shape expression to a set of nodes, and a path expression to a set of pairs of
nodes. We refer to [1] for more details. We focus on the stable model semantics.


2    Magic Shape Algorithm for SHACL
We now briefly describe the Magic algorithm for SHACL, which takes as input a
document (C, T ) and outputs a new optimized shapes graph against which data
graphs can be validated, see Figure 1. The method comprises four main steps:


           T         Magic Shape Algorithm                      T    data graph
                       Create Magic Seeds

          C                  Adorn             C_magic

                           Generate                               SHACL
                                                                 Validator
                            Modify
Fig. 1: Using the Magic Shape Algorithm for SHACL validation. The input shapes graph
C and targets T are transformed into a (typically smaller and simpler) shapes graph
C magic relevant for the targets, which can be validated with off-the-shelf validators


 1. Create Magic Seeds: for each target shape atom s(a) ∈ T , the algorithm
    adds a constraint of the form s magic ← a, called a magic seed.
 2. Adorn: for each shape name s occurring in the targets T , an adorned shape
    sb is pushed onto a stack. Procedure Adorn then pops sb and propagates the
    adornment to the body of the constraints with head s, pushing each freshly
    adorned shape onto the stack.
 3. Generate. The adorned set of constraints is then used to generate “magic”
    constraints through the procedure Generate. Roughly, for a constraint r of
    the form s ← φ, it creates a new constraint with s magic in the body and
    s0 magic in the head for each adorned shape name s0 in φ. In case that in
    the constraint two shapes are connected by a path expression E, the path
    expression has to be inverted.
 4. Modify enhances the body of each adorned constraint with a magic version
    of the shape occurring in the head of the constraint, i.e. r will be rewritten
    as s ← s magic ∧ φ and added to the set of modified constrains.
In the presence of arbitrary negation and recursion, we also need to identify the
so-called dangerous constraints that appear under the scope of negation. If they
are relevant for validating the target, their adornments are propagated also from
the body to the head [5].

Example 1. Consider a SHACL document (C, T ), where C consists of the con-
straints s1 ← s2 , s2 ← s3 ∧ s4 , s3 ←≥5 p.s5 , s6 ← ¬s6 , and T = {s1 (a)}. The
algorithm first adds s1 magic ← a. Since s1 appears in T , the shape names s1 to
s5 and the corresponding constraints will be adorned. The last (disconnected)
4       Shqiponja Ahmetaj, Bianca Löhnert, Magdalena Ortiz, and Mantas Šimkus

constraint will not participate in the resulting program. Generate(ra ) adds
s2 magic ← s1 magic , s3 magic ← s2 magic , s4 magic ← s2 magic and s5 magic ←≥1
p− .s3 magic . Finally, the modified constraints are: s1 ← s1 magic ∧ s2 , s2 ←
s2 magic ∧ s3 ∧ s4 , and s3 ← s3 magic ∧ ≥5 p.s5 .

3   Implementation and Experiments
We implemented a prototype of the Magic Shapes Algorithm. It receives as input
a shapes graph (in Turtle syntax) and produces as output a new shapes graph.
To evaluate the usefulness of the approach, we perform experiments with the
Shacl-Asp validator. For each test case, we run the validator once with the
original shapes graph and once with the magic variant.
Data Graph. The data for the experiments was obtained from DBPedia (ver-
sion 2016-10)3 . More precisely, we used the datasets “PersonData”, “Instance
Types”, “Labels”, “Mappingbased Literals” and “Mappingbased Objects”. The
datasets are in Turtle syntax and contain about 61 million triples (7.7 GB).
Shapes Graph. We created six shapes graphs4 S1 to S6 from the domain of
movies and actors, similar to those in [3]. Some of the shapes graphs are in-
consistent, i.e., there is no valid shape assignment. As target we use either the
class “dbo:Person” for “MusicianShape” (S1 , S2 , S4 ) or the class “dbo:Film” for
“MovieShape” (S3 , S5 , S6 ).
Setting. The experiments were performed on a Linux server with a 24 core Intel
Xeon CPU running at 2.20 GHz and 264 GB of RAM. We used the DLV based
Shacl-Asp validator. An Apache Jena TDB5 was used as an RDF triple store
to retrieve the relevant triples from the data graph, which are transformed into
ASP facts; this was done for the input shapes graph and for its magic version.
Results. Table 1 summarizes the results of our experiments. Although all inputs
except S1 are recursive, the output of the Magic Shapes algorithm is only recur-
sive for S3 , S5 and S6 . The shapes graphs S4 to S6 are inconsistent in general. S4
and S5 become consistent after running the algorithm since the constraints caus-
ing inconsistency are not relevant for validating the targets; this is not the case
for S6 . In some cases the output of the algorithm falls in a fragment that is sup-
ported by Shacl2Sparql or Trav-Shacl although the original shapes graphs
are not, e.g., S4 and S5 . We distinguish the non-recursive fragment Lnon-rec and
the fragment Ls , which restricts the interaction between recursion and negation.
The last column shows that the Magic Shapes algorithm significantly improves
the performance of the DLV validator.

4   Conclusion
In this extended abstract, we have presented initial results on adapting the Magic
Set technique for improving the performance of SHACL validators. This is par-
3
  http://downloads.dbpedia.org/wiki-archive/downloads-2016-10.html
4
  https://github.com/biziloehnert/magicSHACL/tree/master/experiments
5
  https://jena.apache.org/documentation/tdb/
                                       Magic Shapes for Validation in SHACL            5

   case    #shapes recursive inconsistent Lnon-rec             Ls      shacl-asp
          orig. magic orig. magic orig. magic orig. magic orig. magic orig. magic
    S1     15    6                             X     X     X      X 1798s 934s
    S2     15    6     X                             X     X      X 1927s 797s
    S3     15    6     X     X                             X      X 1872s 582s
    S4     15    6     X           X                 X            X 1965s 820s
    S5     15    6     X     X     X                              X 2036s 540s
    S6     15    6     X     X     X     X                            1932s 592s
                              Table 1: Experiment Results


ticularly useful when the input shapes graphs contain unrestricted interaction of
recursion and negation. We performed experiments with Shacl-Asp, the only
SHACL validator that can support such shape graphs, and showed that the sim-
pler shapes graphs produced by the Magic Shapes algorithm are evaluated sig-
nificantly more efficiently than the original ones. The algorithm may also discard
constraints with recursion or negation that are not relevant for validating the
targets, possibly eliminating some irrelevant inconsistency. The resulting shapes
graphs may fall into SHACL fragments that can be handled by more optimized
SHACL adopters such as Shacl2Sparql and Trav-Shacl, thus enabling the
use of these engines on graphs that they could not originally support.
    We are running some experiments to identify cases where the Magic Shapes
technique can also improve the performance of Shacl2Sparql or Trav-Shacl
on inputs that they can handle.

References
1. Andresel, M., Corman, J., Ortiz, M., Reutter, J.L., Savkovic, O., Šimkus, M.: Stable
   model semantics for recursive SHACL. In: Proc. of The Web Conference 2020. p.
   15701580. ACM (2020). https://doi.org/10.1145/3366423.3380229
2. Bancilhon, F., Maier, D., Sagiv, Y., Ullman, J.: Magic sets and other strange ways
   to implement logic programs (extended abstract). In: PODS ’86 (1985)
3. Corman, J., Florenzano, F., Reutter, J.L., Savkovic, O.: Validating shacl constraints
   over a sparql endpoint. In: ISWC. Springer (2019). https://doi.org/10.1007/978-3-
   030-30793-6 9
4. Corman, J., Reutter, J.L., Savkovic, O.: Semantics and validation of recursive
   SHACL. In: Proc. of ISWC’18. Springer (2018). https://doi.org/10.1007/978-3-030-
   00671-6 19
5. Faber, W., Greco, G., Leone, N.: Magic sets and their application to data integration.
   J. Comput. Syst. Sci. pp. 584–609 (2007). https://doi.org/10.1016/j.jcss.2006.10.012
6. Figuera, M., Rohde, P.D., Vidal, M.: Trav-shacl: Efficiently validating net-
   works of SHACL constraints. In: WWW. pp. 3337–3348. ACM / IW3C2 (2021).
   https://doi.org/10.1145/3442381.3449877
7. Gayo,    J.E.L.,   Prud’hommeaux,        E.,   Boneva,      I.,  Kontokostas,      D.:
   Validating RDF Data. Synthesis Lectures on the Semantic Web:
   Theory     and    Technology,    Morgan      &    Claypool      Publishers     (2017).
   https://doi.org/10.2200/S00786ED1V01Y201707WBE016