<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Reconciling SHACL and Ontologies: Semantics and Validation via Rewriting</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shqiponja Ahmetaj</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Magdalena Ortiz</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anouk Oudshoorn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mantas Šimkus</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Technical University of Vienna</institution>
          ,
          <addr-line>Wiedner Hauptstraße 11, 1040 Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Umeå University</institution>
          ,
          <addr-line>Universitetstorget 4, 901 87 Umeå</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This poster summarizes our recent work [1] on SHACL validation in the presence of OWL 2 QL ontologies. To overcome the challenge posed by the non-monotonic behavior of SHACL constraints, we propose a new intuitive validation semantics and a rewriting algorithm that embeds the efects of the ontological axioms into the SHACL constraints. We analyze the complexity of validation in this setting. SHACL and OWL are two prominent W3C standards for managing RDF data, specifically designed to target two diferent issues. OWL gives semantics to RDF data and addresses its possible incompleteness by means of ontological axioms that complete the data with implicit information. OWL and its profiles are based on Description Logics (DLs) and make the openworld assumption (OWA); data not present can still be true. RDF was soon adopted in many applications and making decisions based on correct data became particularly crucial. Therefore, the W3C proposed the Shapes Constraint Language (or SHACL), [2] a machine-readable language for describing and validating RDF graphs. Unlike OWL, SHACL operates under the closed-world assumption (CWA) and assumes the completeness of data. SHACL specifies the notion of a shapes graph; a set of (possibly recursive) constraints and targets for these constraints. However, the semantics of recursive constraints is not given in the standard. This led to recent works that propose semantics based on first-order logic and logic programming [3, 4, 5]. Combining SHACL and OWL, and specifically performing data validation while considering ontological knowledge, is a relevant but challenging problem. It is envisioned by the SHACL standard, but no guidance is given in the specificaiton as to how this should be realized. To illustrate the potential benefits of leveraging an ontology when performing validation, consider a simplistic database of pet owners containing the following facts:</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;SHACL</kwd>
        <kwd>OWL 2 QL</kwd>
        <kwd>validation</kwd>
        <kwd>rewriting</kwd>
        <kwd>complexity</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>hasWingedPet (linda , blu), Bird (blu), PetOwner (john), hasPet (john, ace)
and the simple SHACL constraint
petOwnerShape ←</p>
      <sec id="sec-1-1">
        <title>PetOwner ∨ ∃hasPet</title>
        <p>
          with the target petOwnerShape(linda), which asks to verify whether linda is a pet owner. As
linda has a winged pet, we would like to see her as a pet owner. Currently, we cannot conclude
this as we are missing the background knowledge that owning a winged pet implies owning a
pet. In [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], we define a semantics that can reason with such background knowledge. Such a
semantics requires integrating the OWA of OWL and the CWA of SHACL.
        </p>
        <p>
          The main contributions of our work can be summarized as follows:
∘ We present a novel notion of SHACL validation in the presence of a DL-Liteℛ ontology, the
logic underlying OWL 2 QL. Specifically, we consider stratified SHACL constraints, which do
not allow the full combination of recursion and negation. Our notion of stratification is derived
from the well-known class of stratified logic programs [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
∘ As SHACL constraints can contain negation, we have to be very careful when defining a
semantics for validation. We use knowledge stemming from the ontology to complete the data
graph with additional facts in an austere way: only a minimal amount of new successor nodes
is added at each step of the procedure, with only merging successors if the axioms give us a
reason to. Validation of constraints over a data graph in the presence of an ontology is defined
as validation of the constraints in the possibly infinite austere canonical model that we introduce.
We discuss this in more detail in Section 2.
∘ Validation in the presence of ontologies requires reasoning over a structure that may be
much larger than the data graph, even potentially infinite, and the complexity of the decision
problem is not obvious. We discuss both combined complexity and data complexity. We prove
that validation is decidable and is PTime-complete in data complexity. This coincides with the
complexity of stratified constraints over plain data graphs [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], showing that the addition of
a DL-Liteℛ ontology does not increase worst-case data complexity. In combined complexity,
in contrast, we see an increase to ExpTime-completeness. This is somewhat surprising since
standard reasoning in DL-Liteℛ ontologies and validation of stratified SHACL constraints over
plain data graphs on its own are both tractable in combined complexity.
∘ Our upper bounds on complexity follow from a constraint rewriting technique that we introduce
in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. This procedure takes as input an ontology  and a set  of stratified constraints, and
outputs an updated set  of stratified constraints such that  and ( , ) behave the same
on any input data graph. Thus to perform validation in the presence of ontologies, we do not
need to complete the data graph and we can reuse standard SHACL validators over the original
graph. This technique joins the ranks of other rewriting-based methods for reasoning with
infinite structures (see, e.g., [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]). Previous work [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] introduced a rewriting technique for positive
SHACL, but we address SHACL with negation, which is non-monotonic and significantly more
challenging.1
1The impossibility of a rewriting for SHACL with negation given in Theorem 1 of [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] does not apply to our semantics,
nor to their own minimal-model semantics, as acknowledged by the authors in personal communication.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Semantics of SHACL Validation with Ontologies</title>
      <p>
        In this section, we describe in more detail the semantics we propose in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. For a given ontology
 , data graph , and a shapes graph (, ), we need to define when (,  ) validates (, ).
The usual open-world semantics for ontologies would check for validation over all models of 
and  . However, this only works for positive constraints, as the following example shows.
      </p>
      <p>Let  = {ℎ  (, ), ()}, and  = {}. Let (, {(linda)}) be a
shapes graph, where  only contains the constraint  ← ∃ hasWingedPet ∧ ¬∃hasPet .Dog .
We have an empty ontology, so we can use the usual setting for SHACL validation to conclude
that  indeed validates (, {(linda)}), as linda has a winged pet and not a pet that is a dog.
However, if we consider all possible models of  and  , we must conclude non-validation since
there is a model of  and  that includes ℎ (linda, b), (), for some . The problem
we encounter here is the non-monotonicity of SHACL, that is, adding facts to the data may cause
a previously valid setting to become invalid.</p>
      <p>
        We want an intuitive semantics that coincides with the usual validation in case the ontology
is empty. As done in related settings (see e.g., [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ]) we rely on the chase procedure [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
known from Knowledge Representation and Database Theory. Roughly speaking, a chase
procedure takes as input a data graph and an ontology and iteratively applies the axioms of
the ontology to the data by adding atoms over possibly fresh individuals until all the axioms in
the ontology are satisfied. The result of the chase is a so-called canonical or universal model,
and can be used as a representative of all the models. For DL-Liteℛ ontologies, such chase
procedures may not terminate and result in infinite models. There are several chase variants
producing diferent canonical models [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. While for positive constraints these diferences
do not matter, constraints with negation can distinguish between them, resulting in diferent
validation answers, as illustrated in the example below.
      </p>
      <p>
        The semantics we propose in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is based on a special chase procedure that constructs
an austere canonical model. The main ingredient is an auxiliary notion of a good successor
configuration , which, for each object and its type, determines a set of successors that allows us
to satisfy the axioms with as few fresh objects as possible while preserving the universality of
the model. Our notion of austere canonical model is closely related to the core chase [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. It
will typically create fewer fresh successors than the oblivious chase, which, roughly speaking,
applies the axioms of the ontology without first checking whether the axiom is already satisfied.
It may also create fewer successors than the restricted chase, which may be sensitive to the order
of rule applications.
      </p>
      <p>To illustrate the austere canonical model construction, consider the data graph  we
introduced above, but now with  containing four axioms:</p>
      <sec id="sec-2-1">
        <title>PetOwner ⊑ ∃hasPet</title>
      </sec>
      <sec id="sec-2-2">
        <title>PetOwner ⊑ ∃hasWingedPet hasWingedPet ⊑ hasPet ∃hasWingetPet − ⊑ Bird</title>
        <p>
          The good successor configuration will not generate a fresh successor for linda, since she has
blue as a winged pet, but also as a pet due to the second axiom. The austere canonical model
(figure on the right) will only add a hasPet -role from linda to blue. In contrast, the canonical
model obtained from the oblivious chase (figure on the left) will introduce two fresh objects ,
, to satisfy the two existential axioms. Note that this larger model is nevertheless minimal in
the sense of [
          <xref ref-type="bibr" rid="ref10 ref8">10, 8</xref>
          ]. In the figure, we use hasP (hasWP ) instead of hasPet (hasWingedPet ).
        </p>
        <p>linda; PetOwner
hasWP,hasP
hasP
hasWP,hasP</p>
        <p>linda; PetOwner
hasWP,hasP
blu; Bird

; Bird
blu; Bird</p>
        <p>
          Now, consider the shapes graph (, ) with  = { ← ∃ hasPet .¬Bird } and  = {(linda)}.
The shapes graph asks to validate whether linda has a pet that is not a bird. Clearly, the austere
canonical model provides the expected answer, as it does not validate (, ). In contrast, the
canonical model on the left-hand-side of the figure—and thus the semantics of [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] adopted for
SHACL in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]—results in the unintended validation of (, ).
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Validation via Rewriting</title>
      <p>
        We define a class of stratified constraints in SHACL that allows for both recursion and negation,
but restricts their interaction. Analogously as in logic programming [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], we define the shape
dependency graph for a set of constraints , and define  to be stratified if this graph does not
have cycles involving negation [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        For such stratified constraints, we propose a rewriting technique for validation in the presence
of DL-Liteℛontologies. Given an ontology  and a set of stratified constraints , we compile 
and  into a new set  of stratified constraints so that for every data graph  that is consistent
with  , and every target , we have that (,  ) validates (, ) if  validates ( , ). This is
achieved by means of an inference procedure that uses a collection of inference rules to capture
the possible “propagation” of shape names in the anonymous part of the canonical model. We
define in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] a procedure for positive SHACL, and then show how this can be extended into an
algorithm for SHACL with stratified negation.
      </p>
      <p>To illustrate the core ideas of the constraint rewriting algorithm, consider  containing the
following two constraints:
birdShape ←</p>
      <p>Bird
birdOwnerShape ← ∃
hasPet .birdShape
and the same four axioms in  as above. The obtained set of constraints  contains all
constraints of which the right-hand side sufices to conclude that the structure of the original
constraint will appear in an austere canonical model based on the given  . For instance,
birdShape ← ∃
hasWingedPet −
birdOwnerShape ← ∃
hasWingedPet
are both contained in  . Note that this rewriting is data-independent: for
instance, if ′ = {hasWingedPet (linda, blu), hasPet (john, ace)}, we can directly
conclude that (′,  ) validates (, {birdOwnerShape(linda), birdShape(blu)}, but not
(, {birdOwnerShape(john), birdShape(ace)}, as only knowing that someone has a pet does
not sufice to get a pet owner shape under this  .</p>
      <p>
        The rewriting algorithm allows us to obtain an ExpTime upper bound on the combined
complexity of validation. We show in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] that this bound is tight.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Outlook</title>
      <p>
        There are several directions for future work. Our rewriting algorithm considers a restricted
SHACL fragment, and we plan to extend it to support more syntactic features like complex path
expressions and counting (number restrictions on paths). We believe the mentioned features
can be incorporated in principle, but it requires a substantial extension. Another direction is to
support ontology languages that go beyond OWL 2 QL. Our approach seems to generalize nicely
to ontologies in Horn-ℋℐ, but non-Horn ontology languages appear more challenging.
An implementation of our approach also remains for future work. The rewriting algorithm
was meant to demonstrate the principle feasibility of the approach. Our rewriting is best-case
exponential; in particular, there is a rule (rule 3 in Definition 5.3 of [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) that forces us to add
exponentially many new constraints. A way to avoid this problem will be needed in order
to achieve an eficient implementation of the rewriting. Extending the SHACL fragment to
consider unstratified negation is also an interesting direction for future work.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>The project leading to this application has received funding from the European Union’s
Horizon 2020 research and innovation programme under grant agreement No 101034440.</p>
      <p>This work was partially supported by the Wallenberg AI, Autonomous Systems and Software
Program (WASP) funded by the Knut and Alice Wallenberg Foundation. In addition, Šimkus
was partially supported by the Austrian Science Fund (FWF) project P30873, and Ahmetaj was
supported by the FWF and netidee SCIENCE project T1349-N.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Reconciling</surname>
            <given-names>SHACL</given-names>
          </string-name>
          <source>and ontologies: Semantics and validation via rewriting</source>
          ,
          <year>2023</year>
          . ECAI, to appear.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <issue>W3C</issue>
          ,
          <article-title>Shape constraint language (SHACL)</article-title>
          ,
          <source>Technical Report</source>
          . (
          <year>2017</year>
          ). https://www.w3.org/TR/shacl.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Corman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Reutter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Savkovic</surname>
          </string-name>
          ,
          <article-title>Semantics and validation of recursive SHACL</article-title>
          ,
          <source>in: Proc. ISWC</source>
          <year>2018</year>
          , volume
          <volume>11136</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2018</year>
          , pp.
          <fpage>318</fpage>
          -
          <lpage>336</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Pareti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Konstantinidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mogavero</surname>
          </string-name>
          ,
          <article-title>Satisfiability and containment of recursive SHACL</article-title>
          ,
          <source>J. Web Semantics</source>
          <volume>74</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Andresel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Corman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ortiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Reutter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Savkovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Šimkus</surname>
          </string-name>
          ,
          <article-title>Stable model semantics for recursive SHACL</article-title>
          ,
          <source>in: The Web Conference</source>
          <year>2020</year>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Apt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Blair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Walker</surname>
          </string-name>
          ,
          <article-title>Towards a Theory of Declarative Knowledge</article-title>
          , Morgan Kaufmann Publishers Inc., San Francisco, CA, USA,
          <year>1988</year>
          , p.
          <fpage>89</fpage>
          -
          <lpage>148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Eiter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ortiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Simkus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tran</surname>
          </string-name>
          , G. Xiao,
          <article-title>Query rewriting for Horn-SHIQ plus rules</article-title>
          ,
          <source>in: Proc. AAAI</source>
          <year>2012</year>
          , AAAI Press,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>O.</given-names>
            <surname>Savkovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kharlamov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lamparter</surname>
          </string-name>
          ,
          <article-title>Validation of SHACL constraints over KGs with OWL 2 QL ontologies via rewriting</article-title>
          ,
          <source>in: Proc. ESWC</source>
          <year>2019</year>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Borgwardt</surname>
          </string-name>
          , W. Forkel,
          <article-title>Closed-world semantics for conjunctive queries with negation over ELH-bottom ontologies</article-title>
          , in: S. Kraus (Ed.),
          <source>Proc. IJCAI</source>
          <year>2019</year>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.</given-names>
            <surname>Motik</surname>
          </string-name>
          , I. Horrocks, U. Sattler,
          <article-title>Bridging the gap between OWL and relational databases</article-title>
          ,
          <source>in: Proc. WWW</source>
          <year>2007</year>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Deutsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Remmel</surname>
          </string-name>
          ,
          <article-title>The chase revisited</article-title>
          ,
          <source>in: Proc. PODS</source>
          <year>2008</year>
          , ACM,
          <year>2008</year>
          , pp.
          <fpage>149</fpage>
          -
          <lpage>158</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>