<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Consistent Query Answering over SHACL Constraints (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shqiponja Ahmetaj</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Timo Camillo Merkl</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Reinhard Pichler</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>TU Wien</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Shapes Constraint Language (SHACL) is the World Wide Web recommended language for expressing and validating constraints on RDF graphs [1]. SHACL uses the notion of shapes graph to describe a set of shape constraints paired with targets, that specify which nodes of the RDF graph should satisfy which shapes. The main computational problem in SHACL is to check whether an RDF graph validates a shapes graph. Indeed, large RDF triple stores, which are subject to frequent change, may be missing some facts to validate a target or may have conflicting or contradictory facts. Therefore, detecting such inconsistencies and measures to deal with data graphs that fail to validate a shapes graph is very relevant in practice. A possible solution is to fix the data graph before reasoning. In the style of database repairs, [2, 3] propose to fix the data graph through (minimal) additions or removals of facts such that the resulting data graph validates the shapes graph. However, this may not always be desirable in practice as there may be a large number of possible repairs and selecting one may keep wrong facts, or remove true facts. Another alternative is to tolerate the non-validation and find ways to leverage the consistent part of the data and obtain meaningful and correct answers to queries despite the non-validation. This view is known as consistent query answering (CQA), and it has gained a lot of attention since the seminal paper [4]. The idea is to accept as answers to a query those that are true over all (minimal) repairs of the input database. This is also known as the AR semantics [4, 5, 6, 7]. Several other inconsistency-tolerant semantics have also been studied such as brave [8] and IAR [6] semantics. The former accepts answers that are true in some repair, and the latter accepts the most reliable answers, that is those that are true in the intersection of all repairs. There is now a large body of research works on CQA in both the database and KR setting; we refer to [5, 9, 10] for nice surveys. In this work, we focus on CQA in the presence of (recursive) SHACL shapes. We thus consider a fundamental fragment of the standard query language SPARQL. Specifically, we focus on basic graph patterns (Bgps) and the well-behaved extension with the OPTIONAL operator [11], referred to as well-designed queries (wdQs) in this paper. Our main goal is a detailed complexity analysis of the CQA problem. In total, we study several variants of the CQA problem by considering 4 query languages (Bgps and wdQs, with or without projection), under 3 semantics (brave, AR, IAR), with or without (cardinality or subset inclusion) minimality-restrictions.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Moreover, we distinguish data complexity (where the SHACL constraints and the query are
considered as fixed and only the data is allowed to vary) and combined complexity.</p>
      <p>
        We initially consider the scenario that the repairs must validate all targets. If this is not
possible, we have to settle for repairs that validate a subset of the targets. To address this,
in the spirit of inconsistency tolerance, [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] proposes a relaxed notion of repairs, which aims
at validating a maximal subset of the targets. We also study the complexity of CQA over
maximal repairs. In all cases considered, we provide a complete complexity classification in the
form of matching upper and lower bounds. All details of our results are provided in the full
paper [12, 13].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. SHACL Validation and Well-Designed SPARQL</title>
      <p>RDF Graphs. Let  ,  ,  denote sets of nodes (constants), class names, and property names,
respectively. An RDF (data) graph  is a finite set of (ground) atoms of the form (c) and
(c, d), where  ∈  ,  ∈  , and c, d ∈  .</p>
      <p>SHACL Validation. Let  be a set of shape names. A shape atom is an expression of the form
s(a), where s ∈  and a ∈  . A path expression  is a regular expression built using the
usual operators * , · , ∪, property names  ∈  and inverse properties − , where  ∈  . A
(complex) shape is an expression  obeying the syntax:</p>
      <p>,  ′ ::= ⊤ | s |  | c |  ∧  ′ | ¬ | ≥ . | =. |  = ′,
where s ∈ ,  ∈  ,  ∈  , c ∈  ,  is a positive integer, and , ′ are path expressions.
A (shape) constraint is an expression s ↔  where s ∈  and  is a complex shape. A shapes
graph is a pair (,  ), where  is a set of constraints and  is a set of shape atoms called targets.
A set of constraints  is recursive, if there is a shape name in  that directly or indirectly refers
to itself. A (shape) assignment for a data graph  is a set  =  ∪ , where  is a set of shape
atoms. The evaluation of a complex shape w.r.t. an assignment  is given in terms of a function
J· K that maps shape expressions  to a set of nodes, and path expressions  to a set of pairs of
nodes with the connectives interpreted in the usual way; we refer to [13] for more details.</p>
      <p>Following the supported model semantics proposed in [14], given a shapes graph (,  ), an
assignment  for  is a (supported) model of  if J K = s for all s ↔  ∈ . The data graph
 validates (,  ) if there exists a supported model  of  such that  ⊆ .
Well-Designed SPARQL. Let  be a set of variables. A basic graph pattern (Bgp) is a conjunctive
query (without existentially quantified variables) over atoms of the form () or (1, 2) with
 ∈  ,  ∈  , and , 1, 2 ∈  ∪  . A well-designed SPARQL query  (wdQs) is built
from Bgps and the OPTIONAL (or OPT) operator such that for every subquery ′ = (1 OPT 2)
of , the variables of 2 that appear outside of ′ also appear in 1. A mapping  is any partial
function whose domain dom( ) is from  . The notion of a mapping  that is an answer to
a Bgp  over a graph  is defined in the standard way. Mappings  1 and  2 are compatible
(written  1 ∼  2) if  1() =  2() for all  ∈ dom( 1) ∩ dom( 2). A mapping  is an answer
to a wdQ  = 1 OPT 2 if (1)  is of the form  1 ∪  2, where  1 is an answer to 1,  2 is
an answer to 2, and  1 ∼  2, or (2)  is an answer to 1 for which there does not exists an
answer to 2 that is compatible with  . We denote the query classes where we additionally
allow top-level projections by  -Bgp and  -wdQ.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Inconsistency-tolerant Semantics and Complexity Results</title>
      <p>
        In this section, we formally introduce the problems considered in the paper and present our
main results. As was done in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], we can explain non-validation of a SHACL shapes graph
in the style of database repairs. Hence, a repair is provided as a set  of facts to be added
and a set  of facts to be deleted, so that the resulting data graph validates the shapes graph.
Concretely, let  be a data graph, (,  ) a SHACL shapes graph, and let  be another data
graph disjoint from , called hypotheses. Then, a repair is a pair  = (, ), such that  ⊆ ,
 ⊆ , and the repaired graph  := ( ∖ ) ∪  validates (,  ). Note the  is necessary,
as allowing arbitrary atoms to be added makes the problem of checking the existence of a repair
undecidable.
      </p>
      <p>
        Instead of considering all possible repairs, we consider preference relations given by a
preorder ⪯ (a reflexive and transitive relation) on the set of repairs. Following [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], we study
subset-minimal (⊆ ), and cardinality-minimal (≤ ) repairs. We denote with = when there is no
preference order, and we use ⪯ as a placeholder for ⊆ , ≤ , and =. We say a repaired graph 
is ⪯ -minimal if  is ⪯ -minimal.
      </p>
      <p>We define the three inconsistency-tolerant semantics brave (∃), AR (∀), and IAR semantics (∩).
Consider a query , a mapping  , a data graph , a shapes graph (,  ), and hypotheses .
Then,  is an answer of  over Ψ = (, ,  , ) and preference order ⪯ ∈ { =, ≤ , ⊆} under:
• brave semantics, if  is an answer to  over some ⪯ -minimal repaired graph ,
• AR semantics, if  is an answer to  over all ⪯ -minimal repaired graphs ,
• IAR semantics, if  is an answer to  over ∩ := ⋂︀{ |  is a ⪯ -minimal repair}.
We illustrate SHACL and the various inconsistency-tolerant semantics with an example.
Example. Consider data graph , hypotheses , and the shapes graph (,  ):
 = {Stud (Ann), Stud (Ben), id (Ann, 3)id (Ben, 1), id (Ben, 2), enrolledIn(Ben, a)},
 = {enrolledIn(Ann, b), enrolledIn(Ben, c)},
 = {ActiveStud ↔ Stud ∧ =1id ∧ ≥ 1enrolledIn},
 = {ActiveStud(Ann), ActiveStud(Ben)}.</p>
      <p>Intuitively, the constraint states that “active students” are students that have exactly one ID and are
enrolled in at least one course. The targets ask to check whether Ann and Ben are active students. The
data graph  does not validate the shapes graph. Intuitively, the reason is that Ben has two IDs and
Ann is not enrolled in any course. Validation can be obtained by repairing  with the subset- and
cardinality-minimal repairs 1 = (, 1) and 2 = (, 2), where  = {enrolledIn(Ann, b)},
and each  includes the fact id (Ben, ). There are 4 more non-minimal repairs, i.e., repairs where
additionally enrolledIn(Ben, c) is added and, optionally, the atom enrolledIn(Ben, a) is removed.
Next, consider the Bgp 1 = Stud () ∧ (, ) and wdQ 2 = Stud () OPT id(, ). Clearly,
the mapping  1 = { → Ann,  → 3} is an answer of 1 under brave, AR, and IAR semantics.
The mappings  2 = { → Ben,  → 1} and  3 = { → Ben,  → 2} are answers of 1 over
1 and 2 , respectively, and hence under brave semantics, but not under AR nor IAR semantics.
For 2,  1 and  4 = { → Ben} are the solutions under IAR semantics. Clearly,  1,  2, and
 3 are still the answers under brave semantics and  1 under AR semantics. Note that the above
statements hold for each preference order ⪯ ∈ {⊆ , ≤ , =}.</p>
      <p>For query language ℒ ∈ {Bgp,  -Bgp, wdQ,  -wdQ}, preference order ⪯ ∈ { =, ≤ , ⊆} , and
inconsistency-tolerant semantics  ∈ {∃, ∀, ∩}, we define the CQA(ℒ, ⪯ , ) problem as follows:</p>
      <p>Input: A query  ∈ ℒ, Ψ = (, ,  , ), and a mapping  .</p>
      <p>Question: Is  an answer of  over Ψ and preference order ⪯ under -semantics?
The complete picture of our complexity results in all settings is shown in Table 1.</p>
      <p>
        We also consider the setting where there is no repair of the data graph that validates all the
targets. E.g., consider constraints s1 ↔  and s2 ↔ ¬ and targets s1(a) and s2(a); in this
case, there exists no repair for any input data graph, since adding (a) violates the second
constraint and not adding it violates the first one. Following [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], we relax the notion of repairs
and aim at satisfying the maximum number of targets. Such tuples  = (, ) that maximize
the number of targets validated by  are called maximal repairs and we define the problem
mCQA(ℒ, ⪯ , ) analogously to CQA(ℒ, ⪯ , ) where maximal repairs play the role of repairs. It
turns out that the complexities remain almost as in Table 1 – only the classes NP, coNP, DP
have to be replaced by the boolean hierarchy BH in data complexity and by Θ2P in combined
complexity.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and Future Work</title>
      <p>
        In this work, we have carried out a thorough complexity analysis of the CQA problem for data
graphs with SHACL constraints and we have pinpointed the complexity of this problem in a
multitude of settings – considering various query languages, inconsistency-tolerant semantics
of CQA, and preference relations on repairs. Additionally, we have studied the CQA problem
for maximal repairs in cases where no repair exists. Several new proof techniques had to be
developed to obtain these results. For instance, this has allowed us – in contrast to [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] – to
prove all our hardness results without making use of recursion in shapes constraints.
      </p>
      <p>The targets we considered here are shape atoms of the form s(c). The SHACL standard also
allows for richer targets over class and property names. Specifically, it allows to state that a data
graph must validate a shape name at each node of a certain class name, or domain (or range) of a
property name. All the membership results in this paper can be immediately updated to support
these richer targets. Some features of SHACL (such as disjointness and closed constraints) are
not considered here, but we strongly believe they do not change the complexity results.</p>
      <p>An immediate direction for future work is to investigate the notion of optimal repairs of
prioritized data graphs over SHACL constraints and specifically, the notions of global, Pareto
and completion optimality of repairs [15]. In particular, it would be of interest to study the
computational properties of the main reasoning tasks such as repair checking and repair
existence, and to analyze CQA for each of the three notions of optimal repairs and the various
settings studied in this paper.</p>
      <p>In the absence of general tractability results, a further important next step is to devise practical
algorithms for CQA over SHACL constraints, either by relying on heuristics or by identifying
meaningful and relevant fragments of SHACL that admit better complexity results.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>This work was supported by the Vienna Science and Technology Fund (WWTF)
[10.47379/ICT2201, 10.47379/VRG18013]. In addition, Ahmetaj was supported by the FWF
and netidee SCIENCE project T1349-N.
Philadelphia, Pennsylvania, USA, ACM Press, 1999, pp. 68–79. URL: https://doi.org/10.
1145/303976.303983. doi:10.1145/303976.303983.
[5] L. E. Bertossi, Database Repairing and Consistent Query Answering, Synthesis Lectures
on Data Management, Morgan &amp; Claypool Publishers, 2011. URL: https://doi.org/10.2200/
S00379ED1V01Y201108DTM020. doi:10.2200/S00379ED1V01Y201108DTM020.
[6] D. Lembo, M. Lenzerini, R. Rosati, M. Ruzzi, D. F. Savo, Inconsistency-tolerant semantics for
description logics, in: P. Hitzler, T. Lukasiewicz (Eds.), Web Reasoning and Rule Systems
Fourth International Conference, RR 2010, Bressanone/Brixen, Italy, September 22-24, 2010.
Proceedings, volume 6333 of Lecture Notes in Computer Science, Springer, 2010, pp. 103–117.
URL: https://doi.org/10.1007/978-3-642-15918-3_9. doi:10.1007/978-3-642-15918-3\
_9.
[7] S. Arming, R. Pichler, E. Sallinger, Complexity of repair checking and consistent query
answering, in: W. Martens, T. Zeume (Eds.), 19th International Conference on Database
Theory, ICDT 2016, Bordeaux, France, March 15-18, 2016, volume 48 of LIPIcs, Schloss
Dagstuhl - Leibniz-Zentrum für Informatik, 2016, pp. 21:1–21:18. URL: https://doi.org/10.
4230/LIPIcs.ICDT.2016.21. doi:10.4230/LIPICS.ICDT.2016.21.
[8] M. Bienvenu, R. Rosati, Tractable approximations of consistent query answering for
robust ontology-based data access, in: F. Rossi (Ed.), IJCAI 2013, Proceedings of the 23rd
International Joint Conference on Artificial Intelligence, Beijing, China, August 3-9, 2013,
IJCAI/AAAI, 2013, pp. 775–781. URL: http://www.aaai.org/ocs/index.php/IJCAI/IJCAI13/
paper/view/6904.
[9] J. Wijsen, Foundations of query answering on inconsistent databases, SIGMOD Rec.
48 (2019) 6–16. URL: https://doi.org/10.1145/3377391.3377393. doi:10.1145/3377391.
3377393.
[10] M. Bienvenu, C. Bourgaux, Inconsistency-tolerant querying of description logic knowledge
bases, in: J. Z. Pan, D. Calvanese, T. Eiter, I. Horrocks, M. Kifer, F. Lin, Y. Zhao (Eds.),
Reasoning Web: Logical Foundation of Knowledge Graph Construction and Query Answering
- 12th International Summer School 2016, Aberdeen, UK, September 5-9, 2016, Tutorial
Lectures, volume 9885 of Lecture Notes in Computer Science, Springer, 2016, pp. 156–202. URL:
https://doi.org/10.1007/978-3-319-49493-7_5. doi:10.1007/978-3-319-49493-7\_5.
[11] J. Pérez, M. Arenas, C. Gutierrez, Semantics and complexity of SPARQL, ACM Trans.</p>
      <p>Database Syst. 34 (2009) 16:1–16:45. URL: https://doi.org/10.1145/1567274.1567278. doi:10.
1145/1567274.1567278.
[12] S. Ahmetaj, T. C. Merkl, R. Pichler, Consistent query answering over shacl constraints,
Accepted to KR 2024, the 21st International Conference on Principles of Knowledge
Representation and Reasoning (2024).
[13] S. Ahmetaj, T. C. Merkl, R. Pichler, Consistent query answering over shacl constraints,</p>
      <p>CoRR abs/2406.16653 (2024). URL: https://arxiv.org/abs/2406.16653. arXiv:2406.16653.
[14] J. Corman, J. L. Reutter, O. Savkovic, Semantics and validation of recursive SHACL,
in: D. Vrandecic, K. Bontcheva, M. C. Suárez-Figueroa, V. Presutti, I. Celino, M. Sabou,
L. Kafee, E. Simperl (Eds.), The Semantic Web - ISWC 2018 - 17th International Semantic
Web Conference, Monterey, CA, USA, October 8-12, 2018, Proceedings, Part I, volume
11136 of Lecture Notes in Computer Science, Springer, 2018, pp. 318–336. URL: https://doi.
org/10.1007/978-3-030-00671-6_19. doi:10.1007/978-3-030-00671-6\_19.
[15] S. Staworko, J. Chomicki, J. Marcinkowski, Prioritized repairing and consistent query
answering in relational databases, Ann. Math. Artif. Intell. 64 (2012) 209–246. URL: https:
//doi.org/10.1007/s10472-012-9288-8. doi:10.1007/S10472-012-9288-8.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Knublauch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kontokostas</surname>
          </string-name>
          ,
          <article-title>Shapes constraint language (SHACL)</article-title>
          .
          <source>W3C Recommendation</source>
          ,
          <fpage>W3C</fpage>
          .,
          <year>2017</year>
          . https://www.w3.org/TR/shacl/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahmetaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>David</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ortiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shehu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Simkus</surname>
          </string-name>
          ,
          <article-title>Reasoning about explanations for non-validation in SHACL</article-title>
          , in: M.
          <string-name>
            <surname>Bienvenu</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Lakemeyer</surname>
          </string-name>
          , E. Erdem (Eds.),
          <source>Proceedings of the 18th International Conference on Principles of Knowledge Representation and Reasoning</source>
          , KR 2021,
          <article-title>Online event</article-title>
          ,
          <source>November</source>
          <volume>3</volume>
          -
          <issue>12</issue>
          ,
          <year>2021</year>
          ,
          <year>2021</year>
          , pp.
          <fpage>12</fpage>
          -
          <lpage>21</lpage>
          . URL: https://doi.org/10.24963/kr.2021/2. doi:
          <volume>10</volume>
          .24963/kr.2021/2.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahmetaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>David</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Simkus</surname>
          </string-name>
          ,
          <article-title>Repairing SHACL constraint violations using answer set programming</article-title>
          , in: U. Sattler,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Keet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Presutti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P. A.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Takeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Monnin</surname>
          </string-name>
          , G. Pirrò, C. d'Amato (Eds.),
          <source>The Semantic Web - ISWC 2022 - 21st International Semantic Web Conference, Virtual Event, October 23-27</source>
          ,
          <year>2022</year>
          , Proceedings, volume
          <volume>13489</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2022</year>
          , pp.
          <fpage>375</fpage>
          -
          <lpage>391</lpage>
          . URL: https: //doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -19433-7_
          <fpage>22</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -19433-7\_
          <fpage>22</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. E.</given-names>
            <surname>Bertossi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chomicki</surname>
          </string-name>
          ,
          <article-title>Consistent query answers in inconsistent databases</article-title>
          , in: V.
          <string-name>
            <surname>Vianu</surname>
            ,
            <given-names>C. H.</given-names>
          </string-name>
          <string-name>
            <surname>Papadimitriou</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the Eighteenth ACM SIGACTSIGMOD-SIGART Symposium on Principles of Database Systems</source>
          , May 31 - June 2,
          <year>1999</year>
          ,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>