<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Upper Bound for Provenance in ELH</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rafael Pen~aloza</string-name>
          <email>rafael.penaloza@unimib.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Milano-Bicocca</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We investigate the entailment problem in ELHr ontologies annotated with provenance information. In more detail, we show that subsumption entailment is in NP if provenance is represented with polynomials from the Trio semiring and in PTime if the semiring is not commutative. The proof is based on the construction of a weighted tree automaton which recognises a language that matches with the corresponding provenance polynomial.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The study of provenance has recently gained interest in description logics as a
manner to keep track of the sources that are responsible for a consequence to
follow from an ontology [
        <xref ref-type="bibr" rid="ref4 ref6">4, 6</xref>
        ]. The basic idea behind provenance is to assign
a unique label to each axiom in an ontology, and obtain a summary of the
causes for deriving a consequence through two operators from a semiring: a
product, which combines together the axioms used in one derivation, and a sum
which accumulates the products from the di erent possible derivations. These
two operations must satisfy some properties, forming a semiring.
      </p>
      <p>
        Although the motivation and the basic underlying structure is reminiscent of
axiom pinpointing [
        <xref ref-type="bibr" rid="ref3">3, 19</xref>
        ], there are subtle but important di erences which
warrant further analysis. A primary di erence is that provenance does not require
minimality of the information provided (as opposed to the notion of justi
cation), but still requires a coherence between the provenance elements forming
a so-called provenance monomial ; the product of variables identifying the
axioms needed to derive a desired consequence. In addition, work in provenance
is usually pursued in an abstract form, studying the properties based on a
general semiring, which can later be instantiated to speci c algebraic structures
depending on the application. Axiom pinpointing can indeed be obtained by
instantiating to a very speci c semiring.
      </p>
      <p>
        Very recently, the problem of answering provenance queries in the description
logic ELHr was studied [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. That work focused on a semiring where the
product operation is commutative and idempotent, and expressed the provenance
information through an expanded polynomial; that is, a sum of monomials. One
of the main results was a consequence-based algorithm for the monomial of an
? Copyright © 2021 for this paper by its authors. Use permitted under Creative
      </p>
      <p>Commons License Attribution 4.0 International (CC BY 4.0).
entailment problem; that is, deciding whether the provenance polynomial for a
consequence contains a given monomial m. It was shown that this problem is in
PSpace, but the best matching lower bound was the polynomial hardness for
reasoning in EL.</p>
      <p>
        In this paper we improve that upper bound by showing that the monomial
for a subsumption problem is in NP. To achieve this goal, we view the
completion algorithm from [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] as a weighted tree automaton, which accepts all the
completion-like proofs of a derivation. Through the behaviour of this automaton,
the monomial problem is reduced to a membership problem in regular languages.
To preserve polynomiality in the behaviour computation, we adapt the notion
of structure sharing to automata construction with the help of acyclic recursive
automata (also known as hierarchical state machines [20]). These automata are
exponentially more succinct than NFA, but not more expressive, and retain most
of the complexity properties of NFA.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>
        A semiring is an algebraic structure S = (S; ; ; 0; 1) where and are
associative binary operators over S with neutral elements 0 and 1, respectively,
and such that is commutative, and distributes over [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In the context of
this paper, we consider two speci c well known semirings: the language semiring,
and the trio semiring.
      </p>
      <p>The language semiring L = (L( ); [; ; ;; f"g) is the semiring of all languages
(that is, sets of nite words) over the alphabet with the usual concatenation
of languages , and the union of sets [. The empty word is denoted by ". To
reduce notation, we often represent singleton languages merely by the word they
contain, when it is clear from the context.</p>
      <p>
        The trio semiring K = (N[NV]; +; ; 0; 1) is the semiring of polynomials with
coe cients in N and variables in a countably in nite set NV, with the operation +
de ned as usual and idempotent and commutative [
        <xref ref-type="bibr" rid="ref11 ref12 ref7">7, 11, 12</xref>
        ]. We also consider
in Section 6 the case in which is non-commutative. Polynomials in the trio
semiring in this work are in expanded form, meaning that they are sums of
monomials. Every polynomial can be represented in this form.
      </p>
      <p>Objects of the language and the trio semirings are similar, but have subtle
di erences: every language L can be seen as a (potentially in nite) polynomial,
where each monomial is a word in L. Conversely, the class of all monomials
in a polynomial P in expanded form can be seen as a language modulo the
commutativity of the product .</p>
      <p>
        We consider a syntactic restriction of the ontology language ELHr [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Concept and role names are taken from the disjoint countable sets NC and NR,
respectively, also disjoint from NV. ELHr general concept inclusions (GCIs) C v D
are built through the grammar rules C ::= A j 9R:C j C u C j &gt;, D ::= A j 9R,
where R 2 NR, A 2 NC. Role inclusions (RIs) and range restrictions (RRs) are
of the form R v S and ran(R) v A, respectively, with R; S 2 NR and A 2 NC. An
ELHr axiom is a GCI, RI, or RR. An ELHr TBox is a nite set of ELHr axioms.
The reason for syntactically restricting ELHr is that conjunctions or quali ed
restrictions of a role on the right-hand side of GCIs lead to counter-intuitive
behavior when adding provenance annotations; see [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for a detailed discussion
on this issue.
      </p>
      <p>An annotated ELHr TBox T is a set of ELHr axioms, each annotated with an
element from v 2 NV [ f1g representing provenance information. Axioms
annotated with provenance information can be derived from an annotated ontology.
They are annotated with monomials (potentially with more than one variable)
representing the derivation of the axiom w.r.t. O. From now on, NM represents
the set of all monomials.</p>
      <p>An annotated interpretation is a triple I = ( I ; Im; I ) where I ; Im are
non-empty disjoint sets (the domain and domain of monomials of I,
respectively), and I maps
{ every A 2 NC to AI I Im;
{ every R 2 NR to RI I I Im; and
{ every m; n 2 NM to mI ; nI 2 Im s.t. mI = nI i m and n are equal modulo
associativity, commutativity and -idempotency (e.g., (n m)I = (m n)I ).
As mentioned, we consider in Section 6 the case in which
We extend I to complex ELHr expressions as usual:
is non-commutative.
(&gt;)I =</p>
      <p>I</p>
      <p>f1I g;
(9R)I = f(d; mI ) j 9e 2</p>
      <p>I s.t. (d; e; mI ) 2 RI g;
(C u D)I = f(d; (m</p>
      <p>n)I ) j (d; mI ) 2 CI ; (d; nI ) 2 DI g;
(ran(R))I = f(e; mI ) j 9d 2</p>
      <p>I s.t. (d; e; mI ) 2 RI g;
(9R:C)I = f(d; (m
n)I ) j 9e 2</p>
      <p>I s.t.</p>
      <p>(d; e; mI ) 2 RI ; (e; nI ) 2 CI g:
The annotated interpretation I satis es : (R v S; m) if for all n 2 NM; (d; e; nI ) 2
RI implies (d; e; (m n)I ) 2 SI; and (C v D; m) if for all n 2 NM; (d; nI ) 2 CI
implies (d; (m n)I ) 2 DI. I is a model of T , denoted I j= T , if it satis es all
annotated axioms in T . T entails ( ; m), denoted O j= ( ; m), if I j= ( ; m) for
every model I of O.</p>
      <p>We are interested in the provenance for a subsumption problem: given a TBox
T and two concept names A; B, nd all monomials m such that T j= (A v B; m).
We solve it by constructing an ARA that accepts representatives of all these
monomials. We use this construction to answer, given a monomial m, whether
T j= (A v B; m) holds, and show that this problem is in NP.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Automata</title>
      <p>
        We consider two generalisations of non-deterministic nite automata (NFA) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ];
namely, weighted tree automata and acyclic recursive automata.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Weighted Tree Automata</title>
        <p>
          Tree automata [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] generalise NFAs by accepting trees rather than words; the
branching of the trees is identi ed by the arity of the automaton. Weighted tree
automata further generalise this notion by not only accepting or rejecting an
input tree, but assigning a value from a given semiring (its weight ) [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. For the
scope of this paper, we consider only unlabelled trees.
        </p>
        <p>Let k 1. A weighted tree automaton over S of arity k is a tuple of the
form A = (Q; S; wt; I; f ) where Q is a nite set of states, S = (S; ; ; 0; 1) is a
semiring, wt : Qk+1 ! S is the transition weight function, I Q is the set of
initial states, and f : Q ! S is the exit weight function.</p>
        <p>As usual, we represent trees of arity k as nite non-empty sets T f1; : : : ; kg
such that if wi 2 T , then w; wj 2 T for each w 2 f1; : : : ; kg ; 1 i; j k. Given
a tree T of arity k, a run : T ! Q of A over T assigns a state to each node
in T . The weight of this run is wt( ) = Nw12T wt( (w); (w1); : : : ; (wk))
Nw12=T f ( (w)); that is, the product of all the transition and exit weights given
the states assigned by . For non-commutative semirings, this product is made
from the root to the leafs, and in the order of the children (i.e., top-down,
left-toright). Given a state q 2 Q, we de ne wt(q) = L (")=q wt( ); that is, the sum of
the weights of all runs that label the root of a tree with q. The behaviour of the
automaton A is the sum of the weights of all its initial states kAk := Lq2I wt(q).
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Acyclic Recursive Automata</title>
        <p>Acyclic recursive automata generalise NFA by allowing an automaton to call
another one, but the calls between automata must respect a hierarchical ordering.
They were originally introduced as hierarchical state machines [20] with a slightly
di erent structure.</p>
        <p>De nition 1 (ARA). An acyclic recursive automaton over the alphabet is
a nite set A = fAi j i 2 Ig of NFAs Ai = (Qi; i; i; Ii; Fi), where (I; ) is a
partially ordered set of indices, such that: (i) for all i 6= j 2 I, Qi \ Qj = ;; (ii)
i = [ fmj j j &lt; ig; and (iii) fmi j i 2 Ig \ = ;.</p>
        <p>We call the symbols mi, which are added to the alphabets of the di erent
automata in A, call triggers. When an automaton Ai reads the symbol mj , it \calls"
the automaton Aj , which continues reading the word until it chooses to return
the control to the \calling" automaton Ai (signalled by the symbol mj ). This
return is only possible if Aj is in one of its accepting states. In practice, the
automaton Aj is in charge of accepting a portion of the input word.</p>
        <p>Each automaton Ai may call any other automaton Aj where j &lt; i. Hence,
there may be a sequence of nested calls, but the depth of this nesting is always
bounded by the number n of automata in A. Moreover, the automaton Ai can
never call itself either directly or indirectly. To de ne the language accepted by
the ARA A, we adapt the notion of a run to take into account also the nested
calls between the automata.</p>
        <p>De nition 2 (valid run). A run of the ARA A = fAj j j 2 Ig is a nite
sequence = q0; s1; q1; : : : ; sk; qk such that qi 2 Sj2I Qj for all 0 i k and
si 2 [ fmj; mj j j &lt; ig for all 1 i k. The notion of a valid run on an
automaton Ai is inductively de ned as follows. The run = q0; s1; q1; : : : ; sk; qk
is valid on Ai i
{ fs1; : : : ; skg and (qj; sj+1; qj+i) 2 i for all 0 j &lt; k or
{ j is the smallest index such that sj 2= , sj is of the form m`, there exists
j0 &gt; j with sj0 = m`, and
q0; s1; q2; : : : ; sj 1; qj 1 and qj0 ; sj0+1; : : : ; qk are valid in Ai
qj; sj+1; qj+1; : : : ; sj0 1; qj0 1 is valid in A` and
(qj 1; m`; qj0 ) 2 Ai
So far, we have not yet expressed the use of initial and nal states in accepting a
word. In a nutshell, whenever we call an automaton, its execution should accept
a segment of the input word, by traversing from an initial to a nal state.
De nition 3 (successful run). Given a valid run = q0; s1; q1; : : : ; sk; qk, the
index i; 1 i k is called a top-level call index i si = mj for some j, and for
every ` &lt; i such that s` = mj0 there is an `0; ` &lt; `0 &lt; i such that s`0 = mj0 . If i
is a top-level call index with si = mj, then the smallest index `; i &lt; ` k such
that s` = mj is its match. This is denoted as ` = match(i).</p>
        <p>A valid run = q0; s1; q1; : : : ; sk; qk is successful in Ai i q0 2 Ii, qk 2 Fi
and for every top-level call index i with si = mj and ` = match(i), the sequence
qi; si+1; : : : ; q` 1 is a successful run in Aj.</p>
        <p>The run is successful in the ARA A = fAi j i 2 Ig i it is successful in
Aj, for some maximal element j of I. The word accepted by this run is the
concatenation of all symbols of appearing in . The language of A is the set
L(A) of all words accepted by a successful run in A. By extension, the language
accepted by Ai is the set L(Ai) of all words accepted by a successful run in Ai,
for each i 2 I.</p>
        <p>
          ARAs are not more expressive than NFAs; they also accept regular languages.
The main di erence is that an ARA can be exponentially more succinct than an
NFA for representing a given language. For example, the language fa2n g that
contains only one word with 2n symbols a can only be recognised by NFAs with
at least 2n states, but is accepted by an ARA having n automata with 3 states
each (hence 3n states in total); see the appendix of the extended version [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
The size of the ARA A is the total number of states in the NFAs in A.
        </p>
        <p>The relevant properties of ARAs for this paper are the following. Deciding
whether the ARA A accepts a word w requires only polynomial time. The
concatenation of n ARAs is obtained by adding a new NFA with n + 1 states that
calls each ARA once. The union of n ARAs is obtained by adding a new NFA
with 2 states, which non-deterministically calls one of the ARAs. Abusing the
notation, given two ARAs A, B we denote as A B and A[B the ARAs obtained
through these constructions, respectively.</p>
        <p>T = f(R1 v R3; R1 v R2; R2 v R3; ; ; );
(ran(R) v A; R v S; ran(S) v A; ; ; );
(A v 9S; A v 9R; R v S; ; ; ); (A v C; A v B; B v C; ; ; );
(A v 9R; A v B; B v 9R; ; ; );
(A v C; A v B1; A v B2; B1 u B2 v C; ; );
(ran(R) v C; ran(R) v B1; ran(R) v B2; B1 v C1; B2 v C2; C1 u C2 v C);
(A v C; A u B v C; &gt; v B; ; ; );
(A v D; A v 9S; ran(S) v B; B v C; S v R; 9R:C v D);
(A v C; A v 9R; &gt; v B; 9R:B v C; ; )</p>
        <p>j A; B; C; D 2 NC(T ) [ f&gt;g; R; S 2 NR(T )g
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>The Weighted Automaton</title>
      <p>Our goal is to build an ARA which accepts representatives for all the
monomials in the provenance of a subsumption relation. To do so, we rst present
a weighted tree automaton whose behaviour (which is a language) can be seen
as a polynomial (in expanded form) constructed by the provenance monomials
for the desired consequence; modulo commutativity. The method for computing
this behaviour will give rise to the ARA.</p>
      <p>The construction of the automaton is based on considering the \proofs" of
a derivation based on the consequence-based algorithm, built in a top-down
manner (from the desired consequence, deconstructed back to the axioms used).
Formally, we have a di erent automaton for each consequence that we might
want to verify. However, all the automata are equivalent, except for the initial
state; which refers to the desired consequence. The automaton, which reads trees
of arity 5, is also very simple because all transitions that refer to a consequence
step have weight f"g (the neutral of the language semiring product), and the
only \real" weight is found at the nal states (the exit weight) which is given by
the provenance label of the axiom in the TBox.</p>
      <p>Let A0; B0 be two distinguished concept names appearing in an annotated
TBox T ; the weighted automaton AA0vB0 = (Q [ f g; L; wt; I; f ) is given by
{ Q is the set of all axioms in restricted normal form on the alphabet of T ;
{ wt( ) = f"g if 2 T (see Table 1) and wt( ) = ; otherwise;
{ I = fA0 v B0g;
{ f (q) = fvg if (q; v) 2 T ; f (q) = f"g if q 2 fX v X; X v &gt;; g; and f (q) = ;
otherwise.</p>
      <p>
        The special symbol is used to keep the arity of the automaton to 5. In a
nutshell, the transitions of this automaton can be seen as the completion rules
from [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], but applied backwards, from the consequence to the premises that
generate it. The weight of any run which labels the root with A0 v B0 is either
A v C
      </p>
      <p>{w}
B u C v D
{u}</p>
      <p>A v D
C v D
&gt; v B
{v}
{ε} {ε} {ε}
{ε} {ε} {ε}
; if the labelled tree does not represent a derivation of the consequence, or a
single word concatenating the annotations of the axioms from T used in the
derivation. Modulo idempotency, this word represents a provenance monomial
for A0 v B0. The behaviour of the automaton is then the language containing a
representation of all such provenance monomials.</p>
      <p>Example 4. Consider the annotated TBox T containing the following ve axioms
T := f(BuC v D; u); (&gt; v B; v); (A v C; w); (A v 9R; x); (9R:B v B; y)g. One
possible run of the automaton AAvD is depicted in Figure 1. The two internal
nodes (marked with ) have a transition weight of f"g. The weight of this run is
fwuvg. It can be seen that every non-leaf node is the consequence obtained from
its successors (ignoring the dummy nodes ). There are at least two other runs
with weight di erent from ;; one has weight fvwug and the other fxvywug. The
behaviour kAAvDk contains fwuv; vwu; xvywug. Note that the rst two words
correspond to the same provenance monomial in K, due to commutativity. In L,
they are two di erent objects.</p>
      <p>
        As it can be seen from the example, the behaviour of AA0vB0 does not directly
yield the set of all provenance monomials for A0 v B0. However, these monomials
can be extracted from kAA0vB0 k by taking into account the commutativity and
idempotency of K. Abusing the notation, given a word !, we will denote as
[!] a representative monomial ! w.r.t. commutativity and idempotency. Hence
for instance [xuxv] = uvx. The next theorem is a direct consequence of the
correctness of the completion algorithm [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>Theorem 5. There is a run
T j= (A0 v B0; [m]).</p>
      <p>Thus, the behaviour of this automaton, which accumulates the weights of all
possible runs, represents all provenance monomials for the consequence A0 v B0.
The question is: how to nd this behaviour? Answering this question is the scope
of the following section; but before that, we emphasise that the automata for
di erent consequences are all identical except for the initial state, which is used
to label the root node (that is, the goal that we aim to reach through a proof).
Hence, for the TBox in Example 4, we get fv; xvyg kAAvBk.</p>
      <p>of AA0vB0 with weight wt( ) = fmg 6= f"g i</p>
    </sec>
    <sec id="sec-5">
      <title>The Behaviour</title>
      <p>
        Following the general idea from [
        <xref ref-type="bibr" rid="ref2 ref9">2, 9</xref>
        ], we compute the behaviour of the
automaton via a bottom-up approach, by iteratively accumulating the provenance
of intermediate consequences used in the derivation of A0 v B0. However, the
technique must be adapted to handle the semiring L, which is not a lattice.
      </p>
      <p>Speci cally, we build the functions wti : Q ! L(NV ), i 2 N as follows:
{ wt0 = f ;1
{ for i
0, wti+1(q) = wti(q) [ S(q;q1;:::;q5)2T wti(q1)
wti(q5)
It can be shown by induction on i that wti(q) has a representative for all the
monomials arising from trees with root labelled with q and depth at most i. In
particular for i = 0, wt0(q) is the label of the axiom q if it appears in T , f"g
if q is a tautology or , and ; otherwise. Importantly, wti(q) wti+1(q) for all
q 2 Q and all i 2 N. We can thus see the construction of wti as a monotone
operator which, in particular, has a smallest xpoint: the limit of the functions
wti. This xpoint is, in fact, the behaviour of AA0vB0 .</p>
      <p>Theorem 6. The behaviour of AA0vB0 is limn!1 wtn(A0 v B0).
Importantly, the functions wti actually assign a language to each state of the
automaton. To nd out the behaviour of a di erent consequence, say A1 v B1,
one does not need to recompute the automaton and the functions wti, but needs
to nd limn!1 wtn(A1 v B1). In other words, nding these functions provides
enough information for computing the provenance monomials of all possible
consequences (in normal form) from the TBox.</p>
      <p>In general, the construction of wti will not yield the xpoint after nitely
many applications. Indeed, w.r.t. TBox f(A v B; u); (B v A; v)g, we get that
limn!1 wtn(A v B) = (uv) u, but each wti(A v B) contains nitely many
words. However, recall that we are not interested in the language kAA0vB0 k
per se, but rather in the monomials that the words in this language represent.
Since the Trio semiring (which we use to characterise the provenance) uses a
commutative and idempotent product operation, we are only interested in the
symbols that appear in the words, and not in the actual words themselves. That
is, we are only interested in the languages up to representative monomials.
De nition 7 (K-equivalence). Two languages L; L0 are K-equivalent
(denoted as L K L0) i f[!] j ! 2 Lg = f[!] j ! 2 L0g.</p>
      <p>For example, (uv) u and fuv; ug are K-equivalent. While the languages wti(q)
and the words therein may grow inde nitely, their representative monomials are
limited by the provenance variables appearing in T ; which are at most jT j. Hence,
there exists an n 2 N such that wtm(q) K wtn(q) holds for all q 2 Q and all
m n. Following Theorem 5, for this n wtn(A0 v B0) contains representatives
for all the provenance monomials for A0 v B0.
1 Recall that f is the exit weight function of the automaton.</p>
      <p>As argued before, wti(q) contains the weights of all runs of height at most i
with root q. It can be seen that for every run of height greater than jQj jT j
there is a smaller run 0 such that wt( ) = wt( 0). This means that the least
xpoint for wti is found after at most jQj jT j iterations, which is polynomial in
jT j. Speci cally, the number of iterations needed to reach a xpoint is bounded
by O(jT j4).</p>
      <p>
        Recall that each wti(q) is a language. By construction, it is a regular
language; indeed, it is formed by concatenation and union of nite languages. If we
tried to represent these languages extensionally, enumerating all the words they
contain, we would potentially need exponential space: potentially, the language
may contain exponentially many words. Exploiting the fact that these languages
are regular, we can represent them through NFAs. In fact, wt0 is composed of
very simple automata with at most two states, and the construction of wti+1
from wti requires only concatenation and union of automata, which are basic
automata operations [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. However, iteratively constructing these NFA as in the
de nition of wti can also lead to an exponential blowup; for an example see
the extended version of this paper [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. To keep the construction tractable, we
exploit the succinctness power of ARAs.
      </p>
      <p>Note once again that each wt0(q) contains either a word of length 1, the
empty word, or is the empty language. All these languages are recognisable by
q
NFA with at most two states. We call these automata A0. For each successive
q q0
wti+1(q) we construct an automaton Ai+1 that calls the automata Ai , which
accept the languages wti(q0); q0 2 Q. Thus we are constructing an ARA with
the ordering Aiq Ajq0 for all q; q0 2 Q and all 0 i &lt; j. Importantly, each
automaton Aiq+1 requires at most ve states (to concatenate the languages of
the successive states) for each transition (q; q1; : : : ; q5) 2 T (recall Table 1). Since
the number of such transitions is bounded by jQj5, it follows that the size of each
q0
ARA Aiq := fAj j q0 2 Q; j &lt; ig[fAiqg is in O(jQj5 i). Let now Aq := Aqn, where
n is the number of iterations needed to reach a xpoint w.r.t. K-equivalence. As
seen, its size is in O(jQj6 jT j); that is, it is bounded by a polynomial on jT j.
Moreover, this ARA Aq su ces to nd all the provenance monomials for the
consequence q, as expressed next.</p>
      <p>Theorem 8. T j= (A0 v B0; m) i there is a word ! 2 L(AA0vB0 ) such that
[m] = [!].</p>
      <p>Example 9. Consider again the TBox from Example 4. The languages wti are
q
extensionally represented in Table 2. The construction of the automata Ai+1
mAvB</p>
      <p>i
mAvC</p>
      <p>i
mCvD
i
mBuCvD
i
mBuCvD</p>
      <p>i
m&gt;vB
i
m&gt;vB
i
m∃R.BvB</p>
      <p>i
mCvD
i
mAvB
i
m&gt;vB
i
mAvD
i
mAvC
i
for i 0 and q 2 fC v D; A v B; A v Dg is depicted in Figure 2, where each
transition miq is a call to the automaton Aiq. Hence, for instance A1CvD may
nondeterministically call A0CvD (which yields the empty language), or concatenate
a word accepted by A0BuCvD with a word from A0&gt;vB. Thus, L(A1CvD) = fuvg.
Similarly, we can see that L(A1AvD) = ;. Note that for a xed q 2 Q the structure
of the automata Aiq+1 is the same for all i 0. The di erence is that they call
the automata from the previous iteration.</p>
      <p>Recall that deciding whether a word ! is a accepted by an ARA A is polynomial
on the size of A. In particular, for Aq this task is polynomial on jT j. However,
Theorem 8 requires to rst nd the word ! that needs to be tested. One idea is
to build an automaton A that accepts the language Lm := f! j [!] = [m]g and
check whether L(Aq)\Lm 6= ;. However, it is not at all clear whether Lm is even
a regular language; speci cally, to the best of our knowledge it has never been
veri ed whether the commutative closure of a regular language it also regular.</p>
      <p>To solve this issue, we rst guess (in polynomial time on the size of m)
an ordering of the symbols in m|say 1; : : : ; k|and then verify whether Aq
accepts a word from
that is, a word where the symbols rst appear in the speci ed order. Note that the
language in Equation (1) is regular, and can be recognised by an NFA with k + 1
states. Recall also that given an ARA A and an NFA A, it is possible to construct
an ARA A0 of size bounded by jAjjAj such that L(A0) = L(A) \ L(A). Thus,
verifying whether the chosen order yields a word accepted by Aq is polynomial
on jT j and jmj. The non-deterministic ordering guess yields the following.
Theorem 10. Deciding T j= (A0 v B0; m) is in NP.</p>
    </sec>
    <sec id="sec-6">
      <title>The Non-commutative Case</title>
      <p>We now consider the case where the semiring is not commutative. The idea of
non-commutativity is to preserve the information of the order in which axioms
were used to derive a consequence. We consider here a left-absorbing product :
for multiple occurrences of the same provenance symbol, we take into account
the rst (or left-most) one. Thus, e.g., [uvu] = [uv] 6= [vu]. We call this case
r
non-commutative ELH .
r
Example 11. Let T := f(A v B; m); (B v C; n)g. In non-commutative ELH ,
T j= (A v C; mn) but T 6j= (A v C; nm).</p>
      <p>
        Non-commutativity also means that, e.g., the concept expression A u B is not
interpreted in the same way as B u A, which may seem counterintuitive since in
classical DL semantics these concepts are equivalent. One possible use case for
this semantics is for representing de nitional sentences in natural language
processing [
        <xref ref-type="bibr" rid="ref15">15, 18</xref>
        ], where the order of the words usually also changes the meaning.
For example, the logic would distinguish White u Wine from Wine u White.
      </p>
      <p>Interestingly, we know from Equation (1) that we can verify whether Aq
accepts a representative (under left-absorption) of a monomial m. The bene t
in this case is that it is not necessary to rst guess the right ordering, as it is
required by the ordering given in m. This yields the following result.
Theorem 12. T j= (A0 v B0; m) w.r.t. a left-absorbing, non-commutative
semiring can be decided in polynomial time.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions</title>
      <p>
        In this paper we have studied the complexity of deciding whether the provenance
of a subsumption relation contains a given monomial m. In previous work [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
it was shown through a completion algorithm, that this problem is in PSpace
when the semiring product is idempotent and commutative, but only a
polynor
mial lower bound (derived from reasoning in ELH ) was given. By viewing the
completion algorithm backwards, as a decomposition approach based on tree
automata, and exploiting a less known class of automata (ARAs) to simulate
structure sharing, we were able to lower this upper bound to NP.
Unfortunately, the polynomial lower bound remains the best available at the moment.
If we substitute commutativity by a notion of left-absorption, we obtain a tight
polynomial-time complexity for this problem. Interestingly, the technique
developed can be applied to instance queries and assertion entailments, simply
by extending the automaton construction to the ABox-handling rules from [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Hence, the same complexity bounds hold in both cases. One avenue for future
work is to close the remaining complexity gaps in these problems.
      </p>
      <p>
        Note that the complexity results depend strongly on the properties of the
provenance semiring. Indeed, the xpoint computation of the functions wti only
terminates due to the idempotence and commutativity (or left-absorption) of the
product. However, the construction of the automaton AA0vB0 remains correct
for a larger class of semirings (Theorem 5). We will study whether the technique
can be applied in practice for these other semirings. One particular point of
interest is to consider closure semirings [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] or other approaches for handling
the repeating structure of the ARAs.
18. Petrucci, G., Ghidini, C., Rospocher, M.: Ontology learning in the deep. In: EKAW.
      </p>
      <p>pp. 480{495 (2016)
19. Schlobach, S., Cornet, R.: Non-standard reasoning services for the debugging of
description logic terminologies. In: IJCAI (2003)
20. Yannakakis, M.: Hierarchical state machines. In: van Leeuwen, J., Watanabe, O.,
Hagiya, M., Mosses, P.D., Ito, T. (eds.) Theoretical Computer Science: Exploring
New Frontiers of Theoretical Informatics. pp. 315{330. Springer Berlin Heidelberg,
Berlin, Heidelberg (2000)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nardi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel-Schneider</surname>
            ,
            <given-names>P</given-names>
          </string-name>
          . (eds.):
          <article-title>The Description Logic Handbook: Theory, Implementation, and Applications</article-title>
          . Cambridge University Press, second edn. (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , Pen~aloza, R.:
          <article-title>Automata-based axiom pinpointing</article-title>
          .
          <source>J. Autom. Reasoning</source>
          <volume>45</volume>
          (
          <issue>2</issue>
          ),
          <volume>91</volume>
          {
          <fpage>129</fpage>
          (
          <year>2010</year>
          ). https://doi.org/10.1007/s10817-010-9181-2, https://doi.org/10.1007/s10817-010-9181-2
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , Pen~aloza, R.:
          <article-title>Axiom pinpointing in general tableaux</article-title>
          .
          <source>J. Log. Comput</source>
          .
          <volume>20</volume>
          (
          <issue>1</issue>
          ),
          <volume>5</volume>
          {
          <fpage>34</fpage>
          (
          <year>2010</year>
          ). https://doi.org/10.1093/logcom/exn058, https://doi.org/10.1093/logcom/exn058
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bourgaux</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ozaki</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Querying attributed DL-Lite ontologies using provenance semirings</article-title>
          .
          <source>In: AAAI</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bourgaux</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ozaki</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Pen~aloza, R.,
          <string-name>
            <surname>Predoiu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Provenance for the description logic elhr</article-title>
          . In: Bessiere,
          <string-name>
            <surname>C</surname>
          </string-name>
          . (ed.)
          <source>IJCAI</source>
          . pp.
          <year>1862</year>
          {
          <year>1869</year>
          . ijcai.
          <source>org</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lanti</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ozaki</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Pen~aloza, R.,
          <string-name>
            <surname>Xiao</surname>
          </string-name>
          , G.:
          <article-title>Enriching ontologybased data access with provenance</article-title>
          .
          <source>In: IJCAI</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Cheney</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chiticariu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>W.C.</given-names>
          </string-name>
          :
          <article-title>Provenance in databases: Why, how, and where</article-title>
          .
          <source>Foundations and Trends in Databases</source>
          <volume>1</volume>
          (
          <issue>4</issue>
          ),
          <volume>379</volume>
          {
          <fpage>474</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Comon</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dauchet</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gilleron</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Loding,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Jacquemard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Lugiez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Tison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Tommasi</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Tree automata techniques and applications</article-title>
          . Available on: http://www.grappa.
          <source>univ-lille3.fr/tata</source>
          (
          <year>2007</year>
          ), release October,
          <volume>12th</volume>
          <fpage>2007</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Droste</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuich</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahonis</surname>
          </string-name>
          , G.:
          <article-title>Multi-valued mso logics overwords and trees</article-title>
          .
          <source>Fundam. Inf</source>
          .
          <volume>84</volume>
          (
          <issue>3</issue>
          ,4),
          <volume>305</volume>
          {327 (Dec
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Golan</surname>
            ,
            <given-names>J.S.:</given-names>
          </string-name>
          <article-title>The theory of semirings with applications in mathematics and theoretical computer science</article-title>
          ,
          <source>Pitman monographs and surveys in pure and applied mathematics</source>
          , vol.
          <volume>54</volume>
          . Longman Scienti c &amp;
          <string-name>
            <surname>Technical</surname>
          </string-name>
          (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Green</surname>
            ,
            <given-names>T.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karvounarakis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tannen</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Provenance semirings</article-title>
          .
          <source>In: PODS</source>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Green</surname>
            ,
            <given-names>T.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tannen</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>The semiring framework for database provenance</article-title>
          .
          <source>In: PODS</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Hopcroft</surname>
            ,
            <given-names>J.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ullman</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          :
          <article-title>Introduction to Automata Theory, Languages, and Computation</article-title>
          . Addison-Wesley Publishing Company (
          <year>1979</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>D.J.:</given-names>
          </string-name>
          <article-title>Algebraic structures for transitive closure</article-title>
          .
          <source>Theoretical Computer Science</source>
          <volume>4</volume>
          (
          <issue>1</issue>
          ),
          <volume>59</volume>
          {
          <fpage>76</fpage>
          (
          <year>1977</year>
          ). https://doi.org/https://doi.org/10.1016/
          <fpage>0304</fpage>
          -
          <lpage>3975</lpage>
          (
          <issue>77</issue>
          )
          <fpage>90056</fpage>
          -
          <lpage>1</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Distel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Learning formal de nitions for Snomed CT from text</article-title>
          .
          <source>In: AIME</source>
          . pp.
          <volume>73</volume>
          {
          <issue>77</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>M.Droste</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Kuich</surname>
          </string-name>
          , H.Vogler (eds.):
          <source>Handbook of Weighted Automata. Monographs in Theoretical Computer Science</source>
          , Springer (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. Pen~aloza, R.:
          <article-title>An upper bound for provenance in ELHr</article-title>
          .
          <source>CoRR abs/2108</source>
          .12774 (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>