1. Introduction

Consistent Query Answering over SHACL Constraints (Extended Abstract)

Shqiponja Ahmetaj

Timo Camillo Merkl

Reinhard Pichler

0 0 TU Wien , Vienna , Austria

The Shapes Constraint Language (SHACL) is the World Wide Web recommended language for expressing and validating constraints on RDF graphs [1]. SHACL uses the notion of shapes graph to describe a set of shape constraints paired with targets, that specify which nodes of the RDF graph should satisfy which shapes. The main computational problem in SHACL is to check whether an RDF graph validates a shapes graph. Indeed, large RDF triple stores, which are subject to frequent change, may be missing some facts to validate a target or may have conflicting or contradictory facts. Therefore, detecting such inconsistencies and measures to deal with data graphs that fail to validate a shapes graph is very relevant in practice. A possible solution is to fix the data graph before reasoning. In the style of database repairs, [2, 3] propose to fix the data graph through (minimal) additions or removals of facts such that the resulting data graph validates the shapes graph. However, this may not always be desirable in practice as there may be a large number of possible repairs and selecting one may keep wrong facts, or remove true facts. Another alternative is to tolerate the non-validation and find ways to leverage the consistent part of the data and obtain meaningful and correct answers to queries despite the non-validation. This view is known as consistent query answering (CQA), and it has gained a lot of attention since the seminal paper [4]. The idea is to accept as answers to a query those that are true over all (minimal) repairs of the input database. This is also known as the AR semantics [4, 5, 6, 7]. Several other inconsistency-tolerant semantics have also been studied such as brave [8] and IAR [6] semantics. The former accepts answers that are true in some repair, and the latter accepts the most reliable answers, that is those that are true in the intersection of all repairs. There is now a large body of research works on CQA in both the database and KR setting; we refer to [5, 9, 10] for nice surveys. In this work, we focus on CQA in the presence of (recursive) SHACL shapes. We thus consider a fundamental fragment of the standard query language SPARQL. Specifically, we focus on basic graph patterns (Bgps) and the well-behaved extension with the OPTIONAL operator [11], referred to as well-designed queries (wdQs) in this paper. Our main goal is a detailed complexity analysis of the CQA problem. In total, we study several variants of the CQA problem by considering 4 query languages (Bgps and wdQs, with or without projection), under 3 semantics (brave, AR, IAR), with or without (cardinality or subset inclusion) minimality-restrictions.

1. Introduction

Moreover, we distinguish data complexity (where the SHACL constraints and the query are considered as fixed and only the data is allowed to vary) and combined complexity.

We initially consider the scenario that the repairs must validate all targets. If this is not possible, we have to settle for repairs that validate a subset of the targets. To address this, in the spirit of inconsistency tolerance, [ 3 ] proposes a relaxed notion of repairs, which aims at validating a maximal subset of the targets. We also study the complexity of CQA over maximal repairs. In all cases considered, we provide a complete complexity classification in the form of matching upper and lower bounds. All details of our results are provided in the full paper [12, 13].

2. SHACL Validation and Well-Designed SPARQL

RDF Graphs. Let , , denote sets of nodes (constants), class names, and property names, respectively. An RDF (data) graph is a finite set of (ground) atoms of the form (c) and (c, d), where ∈ , ∈ , and c, d ∈ .

SHACL Validation. Let be a set of shape names. A shape atom is an expression of the form s(a), where s ∈ and a ∈ . A path expression is a regular expression built using the usual operators * , · , ∪, property names ∈ and inverse properties − , where ∈ . A (complex) shape is an expression obeying the syntax:

, ′ ::= ⊤ | s | | c | ∧ ′ | ¬ | ≥ . | =. | = ′, where s ∈ , ∈ , ∈ , c ∈ , is a positive integer, and , ′ are path expressions. A (shape) constraint is an expression s ↔ where s ∈ and is a complex shape. A shapes graph is a pair (, ), where is a set of constraints and is a set of shape atoms called targets. A set of constraints is recursive, if there is a shape name in that directly or indirectly refers to itself. A (shape) assignment for a data graph is a set = ∪ , where is a set of shape atoms. The evaluation of a complex shape w.r.t. an assignment is given in terms of a function J· K that maps shape expressions to a set of nodes, and path expressions to a set of pairs of nodes with the connectives interpreted in the usual way; we refer to [13] for more details.

Following the supported model semantics proposed in [14], given a shapes graph (, ), an assignment for is a (supported) model of if J K = s for all s ↔ ∈ . The data graph validates (, ) if there exists a supported model of such that ⊆ . Well-Designed SPARQL. Let be a set of variables. A basic graph pattern (Bgp) is a conjunctive query (without existentially quantified variables) over atoms of the form () or (1, 2) with ∈ , ∈ , and , 1, 2 ∈ ∪ . A well-designed SPARQL query (wdQs) is built from Bgps and the OPTIONAL (or OPT) operator such that for every subquery ′ = (1 OPT 2) of , the variables of 2 that appear outside of ′ also appear in 1. A mapping is any partial function whose domain dom( ) is from . The notion of a mapping that is an answer to a Bgp over a graph is defined in the standard way. Mappings 1 and 2 are compatible (written 1 ∼ 2) if 1() = 2() for all ∈ dom( 1) ∩ dom( 2). A mapping is an answer to a wdQ = 1 OPT 2 if (1) is of the form 1 ∪ 2, where 1 is an answer to 1, 2 is an answer to 2, and 1 ∼ 2, or (2) is an answer to 1 for which there does not exists an answer to 2 that is compatible with . We denote the query classes where we additionally allow top-level projections by -Bgp and -wdQ.

3. Inconsistency-tolerant Semantics and Complexity Results

In this section, we formally introduce the problems considered in the paper and present our main results. As was done in [ 2 ], we can explain non-validation of a SHACL shapes graph in the style of database repairs. Hence, a repair is provided as a set of facts to be added and a set of facts to be deleted, so that the resulting data graph validates the shapes graph. Concretely, let be a data graph, (, ) a SHACL shapes graph, and let be another data graph disjoint from , called hypotheses. Then, a repair is a pair = (, ), such that ⊆ , ⊆ , and the repaired graph := ( ∖ ) ∪ validates (, ). Note the is necessary, as allowing arbitrary atoms to be added makes the problem of checking the existence of a repair undecidable.

Instead of considering all possible repairs, we consider preference relations given by a preorder ⪯ (a reflexive and transitive relation) on the set of repairs. Following [ 2 ], we study subset-minimal (⊆ ), and cardinality-minimal (≤ ) repairs. We denote with = when there is no preference order, and we use ⪯ as a placeholder for ⊆ , ≤ , and =. We say a repaired graph is ⪯ -minimal if is ⪯ -minimal.

We define the three inconsistency-tolerant semantics brave (∃), AR (∀), and IAR semantics (∩). Consider a query , a mapping , a data graph , a shapes graph (, ), and hypotheses . Then, is an answer of over Ψ = (, , , ) and preference order ⪯ ∈ { =, ≤ , ⊆} under: • brave semantics, if is an answer to over some ⪯ -minimal repaired graph , • AR semantics, if is an answer to over all ⪯ -minimal repaired graphs , • IAR semantics, if is an answer to over ∩ := ⋂︀{ | is a ⪯ -minimal repair}. We illustrate SHACL and the various inconsistency-tolerant semantics with an example. Example. Consider data graph , hypotheses , and the shapes graph (, ): = {Stud (Ann), Stud (Ben), id (Ann, 3)id (Ben, 1), id (Ben, 2), enrolledIn(Ben, a)}, = {enrolledIn(Ann, b), enrolledIn(Ben, c)}, = {ActiveStud ↔ Stud ∧ =1id ∧ ≥ 1enrolledIn}, = {ActiveStud(Ann), ActiveStud(Ben)}.

Intuitively, the constraint states that “active students” are students that have exactly one ID and are enrolled in at least one course. The targets ask to check whether Ann and Ben are active students. The data graph does not validate the shapes graph. Intuitively, the reason is that Ben has two IDs and Ann is not enrolled in any course. Validation can be obtained by repairing with the subset- and cardinality-minimal repairs 1 = (, 1) and 2 = (, 2), where = {enrolledIn(Ann, b)}, and each includes the fact id (Ben, ). There are 4 more non-minimal repairs, i.e., repairs where additionally enrolledIn(Ben, c) is added and, optionally, the atom enrolledIn(Ben, a) is removed. Next, consider the Bgp 1 = Stud () ∧ (, ) and wdQ 2 = Stud () OPT id(, ). Clearly, the mapping 1 = { → Ann, → 3} is an answer of 1 under brave, AR, and IAR semantics. The mappings 2 = { → Ben, → 1} and 3 = { → Ben, → 2} are answers of 1 over 1 and 2 , respectively, and hence under brave semantics, but not under AR nor IAR semantics. For 2, 1 and 4 = { → Ben} are the solutions under IAR semantics. Clearly, 1, 2, and 3 are still the answers under brave semantics and 1 under AR semantics. Note that the above statements hold for each preference order ⪯ ∈ {⊆ , ≤ , =}.

For query language ℒ ∈ {Bgp, -Bgp, wdQ, -wdQ}, preference order ⪯ ∈ { =, ≤ , ⊆} , and inconsistency-tolerant semantics ∈ {∃, ∀, ∩}, we define the CQA(ℒ, ⪯ , ) problem as follows:

Input: A query ∈ ℒ, Ψ = (, , , ), and a mapping .

Question: Is an answer of over Ψ and preference order ⪯ under -semantics? The complete picture of our complexity results in all settings is shown in Table 1.

We also consider the setting where there is no repair of the data graph that validates all the targets. E.g., consider constraints s1 ↔ and s2 ↔ ¬ and targets s1(a) and s2(a); in this case, there exists no repair for any input data graph, since adding (a) violates the second constraint and not adding it violates the first one. Following [ 3 ], we relax the notion of repairs and aim at satisfying the maximum number of targets. Such tuples = (, ) that maximize the number of targets validated by are called maximal repairs and we define the problem mCQA(ℒ, ⪯ , ) analogously to CQA(ℒ, ⪯ , ) where maximal repairs play the role of repairs. It turns out that the complexities remain almost as in Table 1 – only the classes NP, coNP, DP have to be replaced by the boolean hierarchy BH in data complexity and by Θ2P in combined complexity.

4. Conclusion and Future Work

In this work, we have carried out a thorough complexity analysis of the CQA problem for data graphs with SHACL constraints and we have pinpointed the complexity of this problem in a multitude of settings – considering various query languages, inconsistency-tolerant semantics of CQA, and preference relations on repairs. Additionally, we have studied the CQA problem for maximal repairs in cases where no repair exists. Several new proof techniques had to be developed to obtain these results. For instance, this has allowed us – in contrast to [ 2 ] – to prove all our hardness results without making use of recursion in shapes constraints.

The targets we considered here are shape atoms of the form s(c). The SHACL standard also allows for richer targets over class and property names. Specifically, it allows to state that a data graph must validate a shape name at each node of a certain class name, or domain (or range) of a property name. All the membership results in this paper can be immediately updated to support these richer targets. Some features of SHACL (such as disjointness and closed constraints) are not considered here, but we strongly believe they do not change the complexity results.

An immediate direction for future work is to investigate the notion of optimal repairs of prioritized data graphs over SHACL constraints and specifically, the notions of global, Pareto and completion optimality of repairs [15]. In particular, it would be of interest to study the computational properties of the main reasoning tasks such as repair checking and repair existence, and to analyze CQA for each of the three notions of optimal repairs and the various settings studied in this paper.

In the absence of general tractability results, a further important next step is to devise practical algorithms for CQA over SHACL constraints, either by relying on heuristics or by identifying meaningful and relevant fragments of SHACL that admit better complexity results.

Acknowledgements

This work was supported by the Vienna Science and Technology Fund (WWTF) [10.47379/ICT2201, 10.47379/VRG18013]. In addition, Ahmetaj was supported by the FWF and netidee SCIENCE project T1349-N. Philadelphia, Pennsylvania, USA, ACM Press, 1999, pp. 68–79. URL: https://doi.org/10. 1145/303976.303983. doi:10.1145/303976.303983. [5] L. E. Bertossi, Database Repairing and Consistent Query Answering, Synthesis Lectures on Data Management, Morgan & Claypool Publishers, 2011. URL: https://doi.org/10.2200/ S00379ED1V01Y201108DTM020. doi:10.2200/S00379ED1V01Y201108DTM020. [6] D. Lembo, M. Lenzerini, R. Rosati, M. Ruzzi, D. F. Savo, Inconsistency-tolerant semantics for description logics, in: P. Hitzler, T. Lukasiewicz (Eds.), Web Reasoning and Rule Systems Fourth International Conference, RR 2010, Bressanone/Brixen, Italy, September 22-24, 2010. Proceedings, volume 6333 of Lecture Notes in Computer Science, Springer, 2010, pp. 103–117. URL: https://doi.org/10.1007/978-3-642-15918-3_9. doi:10.1007/978-3-642-15918-3\ _9. [7] S. Arming, R. Pichler, E. Sallinger, Complexity of repair checking and consistent query answering, in: W. Martens, T. Zeume (Eds.), 19th International Conference on Database Theory, ICDT 2016, Bordeaux, France, March 15-18, 2016, volume 48 of LIPIcs, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016, pp. 21:1–21:18. URL: https://doi.org/10. 4230/LIPIcs.ICDT.2016.21. doi:10.4230/LIPICS.ICDT.2016.21. [8] M. Bienvenu, R. Rosati, Tractable approximations of consistent query answering for robust ontology-based data access, in: F. Rossi (Ed.), IJCAI 2013, Proceedings of the 23rd International Joint Conference on Artificial Intelligence, Beijing, China, August 3-9, 2013, IJCAI/AAAI, 2013, pp. 775–781. URL: http://www.aaai.org/ocs/index.php/IJCAI/IJCAI13/ paper/view/6904. [9] J. Wijsen, Foundations of query answering on inconsistent databases, SIGMOD Rec. 48 (2019) 6–16. URL: https://doi.org/10.1145/3377391.3377393. doi:10.1145/3377391. 3377393. [10] M. Bienvenu, C. Bourgaux, Inconsistency-tolerant querying of description logic knowledge bases, in: J. Z. Pan, D. Calvanese, T. Eiter, I. Horrocks, M. Kifer, F. Lin, Y. Zhao (Eds.), Reasoning Web: Logical Foundation of Knowledge Graph Construction and Query Answering - 12th International Summer School 2016, Aberdeen, UK, September 5-9, 2016, Tutorial Lectures, volume 9885 of Lecture Notes in Computer Science, Springer, 2016, pp. 156–202. URL: https://doi.org/10.1007/978-3-319-49493-7_5. doi:10.1007/978-3-319-49493-7\_5. [11] J. Pérez, M. Arenas, C. Gutierrez, Semantics and complexity of SPARQL, ACM Trans.

Database Syst. 34 (2009) 16:1–16:45. URL: https://doi.org/10.1145/1567274.1567278. doi:10. 1145/1567274.1567278. [12] S. Ahmetaj, T. C. Merkl, R. Pichler, Consistent query answering over shacl constraints, Accepted to KR 2024, the 21st International Conference on Principles of Knowledge Representation and Reasoning (2024). [13] S. Ahmetaj, T. C. Merkl, R. Pichler, Consistent query answering over shacl constraints,

CoRR abs/2406.16653 (2024). URL: https://arxiv.org/abs/2406.16653. arXiv:2406.16653. [14] J. Corman, J. L. Reutter, O. Savkovic, Semantics and validation of recursive SHACL, in: D. Vrandecic, K. Bontcheva, M. C. Suárez-Figueroa, V. Presutti, I. Celino, M. Sabou, L. Kafee, E. Simperl (Eds.), The Semantic Web - ISWC 2018 - 17th International Semantic Web Conference, Monterey, CA, USA, October 8-12, 2018, Proceedings, Part I, volume 11136 of Lecture Notes in Computer Science, Springer, 2018, pp. 318–336. URL: https://doi. org/10.1007/978-3-030-00671-6_19. doi:10.1007/978-3-030-00671-6\_19. [15] S. Staworko, J. Chomicki, J. Marcinkowski, Prioritized repairing and consistent query answering in relational databases, Ann. Math. Artif. Intell. 64 (2012) 209–246. URL: https: //doi.org/10.1007/s10472-012-9288-8. doi:10.1007/S10472-012-9288-8.

[1]

Knublauch ,

Kontokostas , Shapes constraint language (SHACL) . W3C Recommendation , W3C ., 2017 . https://www.w3.org/TR/shacl/.

[2]

Ahmetaj ,

David ,

Ortiz ,

Polleres ,

Shehu ,

Simkus , Reasoning about explanations for non-validation in SHACL , in: M. Bienvenu , G. Lakemeyer , E. Erdem (Eds.), Proceedings of the 18th International Conference on Principles of Knowledge Representation and Reasoning , KR 2021, Online event , November 3 - 12 , 2021 , 2021 , pp. 12 - 21 . URL: https://doi.org/10.24963/kr.2021/2. doi: 10 .24963/kr.2021/2.

[3]

Ahmetaj ,

David ,

Polleres ,

Simkus , Repairing SHACL constraint violations using answer set programming , in: U. Sattler,

Hogan ,

C. M.

Keet ,

Presutti ,

J. P. A.

Almeida ,

Takeda ,

Monnin , G. Pirrò, C. d'Amato (Eds.), The Semantic Web - ISWC 2022 - 21st International Semantic Web Conference, Virtual Event, October 23-27 , 2022 , Proceedings, volume 13489 of Lecture Notes in Computer Science, Springer, 2022 , pp. 375 - 391 . URL: https: //doi.org/10.1007/978-3- 031 -19433-7_ 22 . doi: 10 .1007/978-3- 031 -19433-7\_ 22 .

[4]

Arenas ,

L. E.

Bertossi ,

Chomicki , Consistent query answers in inconsistent databases , in: V. Vianu , C. H. Papadimitriou (Eds.), Proceedings of the Eighteenth ACM SIGACTSIGMOD-SIGART Symposium on Principles of Database Systems , May 31 - June 2, 1999 ,