Magic Shapes for Validation in SHACL? Shqiponja Ahmetaj2 , Bianca Löhnert1 , Magdalena Ortiz1 , and Mantas Šimkus1 1 TU Wien, Austria, 2 WU Wien, Austria (POSTER) The Shape Constraint Language (SHACL) was recently standardized by the W3C as a formalism for checking the quality of RDF graphs; we refer to [7] for an introduction. In SHACL, the main problem is to check whether a given RDF graph G validates a SHACL document (C, T ), where C is a set of constraints, also called shapes graph, each associated to a so-called shape name, and T (targets) is a specification of nodes from the data graph which should validate certain shapes from C. For illustration, consider a graph G = {enrolledIn(Ben, C1 )} and a SHACL document (C, T ), where C = {Student ↔ ∃enrolledIn.Course}, and T is the shape atom Student(Ben). The constraint states that each Student must be enrolled in some course; Student is a shape name, and enrolledIn and Course are data predicates, i.e., property and class name, respectively. Clearly, G does not validate (C, T ), but the extended graph G0 = G∪{Course(C1 )} does. The standard specifies a syntax for expressing SHACL constraints and de- scribes when they are validated by RDF graphs. However, it leaves undefined the semantics of recursive constraints, i.e., constraints that involve cyclic dependen- cies. To address this, some logic-based proposals to formalize the semantics of full SHACL have emerged recently. Andresel et al. [1] proposed a semantics based on the stable models semantics for logic programs, stricter than the semantics based on classical logic due to Corman el al. [4]. Both semantics coincide with the official recommendation for non-recursive constraints, and unfortunately, the validation problem is NP-complete under both. To make SHACL truly useful and facilitate its adoption, we need automated tools that efficiently implement validation and scale well in the presence of large RDF graphs and sets of constraints. There are already significant efforts in this direction for fragments of SHACL [3, 6]. Shacl2Sparql [3] is a SHACL vali- dation engine that checks conformance of RDF graphs with SHACL constraints by evaluating SPARQL queries against the data, which optimzes the order in which shapes are processed. Further optimization techniques are implemented in Trav-Shacl [6]. However, these works focus on tractable fragments of SHACL. They do not handle unrestricted interaction of recursion and negation in SHACL constraints, which calls for verifying whether there exists some global assignment of shapes to nodes in the graph that is consistent with all the constraints. This cannot be easily done using top-down approaches implemented in existing val- idators. To our knowledge, the only implementation that validates full SHACL Copyright 2021 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). ? Supported by the Vienna Business Agency and the Austrian Science Fund (FWF) projects P30360 and P30873, and the Faculty of Informatics, TU Wien. 2 Shqiponja Ahmetaj, Bianca Löhnert, Magdalena Ortiz, and Mantas Šimkus is the Shacl-Asp prototype1 from [1], which translates SHACL constraints into ASP programs and evaluates them using the DLV system2 . We note that the authors of [3] propose an algorithm for full SHACL that involves a SAT solver, but to our knowledge, with no available implementation. The focus of this work is SHACL validation in the presence of unrestricted negation and recursion. In this extended abstract we present preliminary work on adapting the Magic Sets technique, known from logic programs with (unstratified) negation [5], to SHACL constraints, as a way to improve performance and applicability of SHACL validators. Magic Sets [2] is a well-known optimization method from logic programming and deductive databases. In a nutshell, it uses the query goal to adorn the input program with binding information, obtaining a new program whose bottom-up evaluation, analogously to the top-down one, involves ground atoms only from the relevant part of the data. When applied to SHACL shape graphs, it allows us to do validation on a potentially much smaller fragment of the input RDF graph, while also ingoring shape constraints in the input that do not affect validation of the targets. This is particularly useful in the presence of negation and recursion, which may result in inconsistency and non-validation, even when they are irrelevant to the target of interest. We report some prelimi- nary experiments showing that the technique significantly improves the perfor- mance of the Shacl-Asp validator, and it may also allow the exploitation of the more optimized engines Shacl2Sparql and Trav-Shacl for input shape graphs that they could not originally handle. 1 Preliminaries Let N, C, and P denote countably infinite, mutually disjoint sets of nodes, class names, and property names, respectively. A data graph G is a finite set of atoms of the form B(c) and p(c, d), where B ∈ C, p ∈ P, and c, d ∈ N. The set of nodes appearing in G is denoted with V (G). We assume a countably infinite set S of shape names, disjoint from N ∪ C ∪ P. A shape atom is an expression of the form s(a), where s ∈ S and a ∈ N. A path expression E is a regular expression built using the usual operators ∗, ·, ∪ from symbols in P+ = P ∪ {p− | p ∈ P}. If p ∈ P, then p− is the inverse property of p. A (complex) shape is an expression φ obeying the syntax: φ, φ0 ::= > | s | A | a | φ ∧ φ0 | ¬φ |≥n E.φ | E = E 0 , where s ∈ S, A ∈ C, c ∈ N, n is a positive integer, and E, E 0 are path expressions. A (shape) constraint is an expression s ← φ where s ∈ S and φ is a possibly complex shape; we may refer to φ as the body of the constraint and s as head. W.l.o.g. we view targets as shape atoms of the form s(a), where s ∈ S and a ∈ N. A SHACL document is a pair (C, T ), where (i) C is a set of constraints and (ii) T is a set of targets. The evaluation of a shape expression φ is given by assigning nodes of the data graph to shape names. A (shape) assignment for G is a set I = G ∪ L, where L is a set of shape atoms such that a ∈ V (G) for each s(a) ∈ I. The evaluation of a shape w.r.t. I is given in terms of a function that maps a 1 https://github.com/medinaandresel/shacl-asp 2 http://www.dlvsystem.com/dlv/ Magic Shapes for Validation in SHACL 3 shape expression to a set of nodes, and a path expression to a set of pairs of nodes. We refer to [1] for more details. We focus on the stable model semantics. 2 Magic Shape Algorithm for SHACL We now briefly describe the Magic algorithm for SHACL, which takes as input a document (C, T ) and outputs a new optimized shapes graph against which data graphs can be validated, see Figure 1. The method comprises four main steps: T Magic Shape Algorithm T data graph Create Magic Seeds C Adorn C_magic Generate SHACL Validator Modify Fig. 1: Using the Magic Shape Algorithm for SHACL validation. The input shapes graph C and targets T are transformed into a (typically smaller and simpler) shapes graph C magic relevant for the targets, which can be validated with off-the-shelf validators 1. Create Magic Seeds: for each target shape atom s(a) ∈ T , the algorithm adds a constraint of the form s magic ← a, called a magic seed. 2. Adorn: for each shape name s occurring in the targets T , an adorned shape sb is pushed onto a stack. Procedure Adorn then pops sb and propagates the adornment to the body of the constraints with head s, pushing each freshly adorned shape onto the stack. 3. Generate. The adorned set of constraints is then used to generate “magic” constraints through the procedure Generate. Roughly, for a constraint r of the form s ← φ, it creates a new constraint with s magic in the body and s0 magic in the head for each adorned shape name s0 in φ. In case that in the constraint two shapes are connected by a path expression E, the path expression has to be inverted. 4. Modify enhances the body of each adorned constraint with a magic version of the shape occurring in the head of the constraint, i.e. r will be rewritten as s ← s magic ∧ φ and added to the set of modified constrains. In the presence of arbitrary negation and recursion, we also need to identify the so-called dangerous constraints that appear under the scope of negation. If they are relevant for validating the target, their adornments are propagated also from the body to the head [5]. Example 1. Consider a SHACL document (C, T ), where C consists of the con- straints s1 ← s2 , s2 ← s3 ∧ s4 , s3 ←≥5 p.s5 , s6 ← ¬s6 , and T = {s1 (a)}. The algorithm first adds s1 magic ← a. Since s1 appears in T , the shape names s1 to s5 and the corresponding constraints will be adorned. The last (disconnected) 4 Shqiponja Ahmetaj, Bianca Löhnert, Magdalena Ortiz, and Mantas Šimkus constraint will not participate in the resulting program. Generate(ra ) adds s2 magic ← s1 magic , s3 magic ← s2 magic , s4 magic ← s2 magic and s5 magic ←≥1 p− .s3 magic . Finally, the modified constraints are: s1 ← s1 magic ∧ s2 , s2 ← s2 magic ∧ s3 ∧ s4 , and s3 ← s3 magic ∧ ≥5 p.s5 . 3 Implementation and Experiments We implemented a prototype of the Magic Shapes Algorithm. It receives as input a shapes graph (in Turtle syntax) and produces as output a new shapes graph. To evaluate the usefulness of the approach, we perform experiments with the Shacl-Asp validator. For each test case, we run the validator once with the original shapes graph and once with the magic variant. Data Graph. The data for the experiments was obtained from DBPedia (ver- sion 2016-10)3 . More precisely, we used the datasets “PersonData”, “Instance Types”, “Labels”, “Mappingbased Literals” and “Mappingbased Objects”. The datasets are in Turtle syntax and contain about 61 million triples (7.7 GB). Shapes Graph. We created six shapes graphs4 S1 to S6 from the domain of movies and actors, similar to those in [3]. Some of the shapes graphs are in- consistent, i.e., there is no valid shape assignment. As target we use either the class “dbo:Person” for “MusicianShape” (S1 , S2 , S4 ) or the class “dbo:Film” for “MovieShape” (S3 , S5 , S6 ). Setting. The experiments were performed on a Linux server with a 24 core Intel Xeon CPU running at 2.20 GHz and 264 GB of RAM. We used the DLV based Shacl-Asp validator. An Apache Jena TDB5 was used as an RDF triple store to retrieve the relevant triples from the data graph, which are transformed into ASP facts; this was done for the input shapes graph and for its magic version. Results. Table 1 summarizes the results of our experiments. Although all inputs except S1 are recursive, the output of the Magic Shapes algorithm is only recur- sive for S3 , S5 and S6 . The shapes graphs S4 to S6 are inconsistent in general. S4 and S5 become consistent after running the algorithm since the constraints caus- ing inconsistency are not relevant for validating the targets; this is not the case for S6 . In some cases the output of the algorithm falls in a fragment that is sup- ported by Shacl2Sparql or Trav-Shacl although the original shapes graphs are not, e.g., S4 and S5 . We distinguish the non-recursive fragment Lnon-rec and the fragment Ls , which restricts the interaction between recursion and negation. The last column shows that the Magic Shapes algorithm significantly improves the performance of the DLV validator. 4 Conclusion In this extended abstract, we have presented initial results on adapting the Magic Set technique for improving the performance of SHACL validators. This is par- 3 http://downloads.dbpedia.org/wiki-archive/downloads-2016-10.html 4 https://github.com/biziloehnert/magicSHACL/tree/master/experiments 5 https://jena.apache.org/documentation/tdb/ Magic Shapes for Validation in SHACL 5 case #shapes recursive inconsistent Lnon-rec Ls shacl-asp orig. magic orig. magic orig. magic orig. magic orig. magic orig. magic S1 15 6 X X X X 1798s 934s S2 15 6 X X X X 1927s 797s S3 15 6 X X X X 1872s 582s S4 15 6 X X X X 1965s 820s S5 15 6 X X X X 2036s 540s S6 15 6 X X X X 1932s 592s Table 1: Experiment Results ticularly useful when the input shapes graphs contain unrestricted interaction of recursion and negation. We performed experiments with Shacl-Asp, the only SHACL validator that can support such shape graphs, and showed that the sim- pler shapes graphs produced by the Magic Shapes algorithm are evaluated sig- nificantly more efficiently than the original ones. The algorithm may also discard constraints with recursion or negation that are not relevant for validating the targets, possibly eliminating some irrelevant inconsistency. The resulting shapes graphs may fall into SHACL fragments that can be handled by more optimized SHACL adopters such as Shacl2Sparql and Trav-Shacl, thus enabling the use of these engines on graphs that they could not originally support. We are running some experiments to identify cases where the Magic Shapes technique can also improve the performance of Shacl2Sparql or Trav-Shacl on inputs that they can handle. References 1. Andresel, M., Corman, J., Ortiz, M., Reutter, J.L., Savkovic, O., Šimkus, M.: Stable model semantics for recursive SHACL. In: Proc. of The Web Conference 2020. p. 15701580. ACM (2020). https://doi.org/10.1145/3366423.3380229 2. Bancilhon, F., Maier, D., Sagiv, Y., Ullman, J.: Magic sets and other strange ways to implement logic programs (extended abstract). In: PODS ’86 (1985) 3. Corman, J., Florenzano, F., Reutter, J.L., Savkovic, O.: Validating shacl constraints over a sparql endpoint. In: ISWC. Springer (2019). https://doi.org/10.1007/978-3- 030-30793-6 9 4. Corman, J., Reutter, J.L., Savkovic, O.: Semantics and validation of recursive SHACL. In: Proc. of ISWC’18. Springer (2018). https://doi.org/10.1007/978-3-030- 00671-6 19 5. Faber, W., Greco, G., Leone, N.: Magic sets and their application to data integration. J. Comput. Syst. Sci. pp. 584–609 (2007). https://doi.org/10.1016/j.jcss.2006.10.012 6. Figuera, M., Rohde, P.D., Vidal, M.: Trav-shacl: Efficiently validating net- works of SHACL constraints. In: WWW. pp. 3337–3348. ACM / IW3C2 (2021). https://doi.org/10.1145/3442381.3449877 7. Gayo, J.E.L., Prud’hommeaux, E., Boneva, I., Kontokostas, D.: Validating RDF Data. Synthesis Lectures on the Semantic Web: Theory and Technology, Morgan & Claypool Publishers (2017). https://doi.org/10.2200/S00786ED1V01Y201707WBE016