Introduction

An Upper Bound for Provenance in ELH

Rafael Pen~aloza

rafael.penaloza@unimib.it 0 0 University of Milano-Bicocca , Italy

We investigate the entailment problem in ELHr ontologies annotated with provenance information. In more detail, we show that subsumption entailment is in NP if provenance is represented with polynomials from the Trio semiring and in PTime if the semiring is not commutative. The proof is based on the construction of a weighted tree automaton which recognises a language that matches with the corresponding provenance polynomial.

Introduction

The study of provenance has recently gained interest in description logics as a manner to keep track of the sources that are responsible for a consequence to follow from an ontology [ 4, 6 ]. The basic idea behind provenance is to assign a unique label to each axiom in an ontology, and obtain a summary of the causes for deriving a consequence through two operators from a semiring: a product, which combines together the axioms used in one derivation, and a sum which accumulates the products from the di erent possible derivations. These two operations must satisfy some properties, forming a semiring.

Although the motivation and the basic underlying structure is reminiscent of axiom pinpointing [ 3, 19 ], there are subtle but important di erences which warrant further analysis. A primary di erence is that provenance does not require minimality of the information provided (as opposed to the notion of justi cation), but still requires a coherence between the provenance elements forming a so-called provenance monomial ; the product of variables identifying the axioms needed to derive a desired consequence. In addition, work in provenance is usually pursued in an abstract form, studying the properties based on a general semiring, which can later be instantiated to speci c algebraic structures depending on the application. Axiom pinpointing can indeed be obtained by instantiating to a very speci c semiring.

Very recently, the problem of answering provenance queries in the description logic ELHr was studied [ 5 ]. That work focused on a semiring where the product operation is commutative and idempotent, and expressed the provenance information through an expanded polynomial; that is, a sum of monomials. One of the main results was a consequence-based algorithm for the monomial of an ? Copyright © 2021 for this paper by its authors. Use permitted under Creative

Commons License Attribution 4.0 International (CC BY 4.0). entailment problem; that is, deciding whether the provenance polynomial for a consequence contains a given monomial m. It was shown that this problem is in PSpace, but the best matching lower bound was the polynomial hardness for reasoning in EL.

In this paper we improve that upper bound by showing that the monomial for a subsumption problem is in NP. To achieve this goal, we view the completion algorithm from [ 5 ] as a weighted tree automaton, which accepts all the completion-like proofs of a derivation. Through the behaviour of this automaton, the monomial problem is reduced to a membership problem in regular languages. To preserve polynomiality in the behaviour computation, we adapt the notion of structure sharing to automata construction with the help of acyclic recursive automata (also known as hierarchical state machines [20]). These automata are exponentially more succinct than NFA, but not more expressive, and retain most of the complexity properties of NFA. 2

Preliminaries

A semiring is an algebraic structure S = (S; ; ; 0; 1) where and are associative binary operators over S with neutral elements 0 and 1, respectively, and such that is commutative, and distributes over [ 10 ]. In the context of this paper, we consider two speci c well known semirings: the language semiring, and the trio semiring.

The language semiring L = (L( ); [; ; ;; f"g) is the semiring of all languages (that is, sets of nite words) over the alphabet with the usual concatenation of languages , and the union of sets [. The empty word is denoted by ". To reduce notation, we often represent singleton languages merely by the word they contain, when it is clear from the context.

The trio semiring K = (N[NV]; +; ; 0; 1) is the semiring of polynomials with coe cients in N and variables in a countably in nite set NV, with the operation + de ned as usual and idempotent and commutative [ 7, 11, 12 ]. We also consider in Section 6 the case in which is non-commutative. Polynomials in the trio semiring in this work are in expanded form, meaning that they are sums of monomials. Every polynomial can be represented in this form.

Objects of the language and the trio semirings are similar, but have subtle di erences: every language L can be seen as a (potentially in nite) polynomial, where each monomial is a word in L. Conversely, the class of all monomials in a polynomial P in expanded form can be seen as a language modulo the commutativity of the product .

We consider a syntactic restriction of the ontology language ELHr [ 1 ]. Concept and role names are taken from the disjoint countable sets NC and NR, respectively, also disjoint from NV. ELHr general concept inclusions (GCIs) C v D are built through the grammar rules C ::= A j 9R:C j C u C j >, D ::= A j 9R, where R 2 NR, A 2 NC. Role inclusions (RIs) and range restrictions (RRs) are of the form R v S and ran(R) v A, respectively, with R; S 2 NR and A 2 NC. An ELHr axiom is a GCI, RI, or RR. An ELHr TBox is a nite set of ELHr axioms. The reason for syntactically restricting ELHr is that conjunctions or quali ed restrictions of a role on the right-hand side of GCIs lead to counter-intuitive behavior when adding provenance annotations; see [ 5 ] for a detailed discussion on this issue.

An annotated ELHr TBox T is a set of ELHr axioms, each annotated with an element from v 2 NV [ f1g representing provenance information. Axioms annotated with provenance information can be derived from an annotated ontology. They are annotated with monomials (potentially with more than one variable) representing the derivation of the axiom w.r.t. O. From now on, NM represents the set of all monomials.

An annotated interpretation is a triple I = ( I ; Im; I ) where I ; Im are non-empty disjoint sets (the domain and domain of monomials of I, respectively), and I maps { every A 2 NC to AI I Im; { every R 2 NR to RI I I Im; and { every m; n 2 NM to mI ; nI 2 Im s.t. mI = nI i m and n are equal modulo associativity, commutativity and -idempotency (e.g., (n m)I = (m n)I ). As mentioned, we consider in Section 6 the case in which We extend I to complex ELHr expressions as usual: is non-commutative. (>)I =

f1I g; (9R)I = f(d; mI ) j 9e 2

I s.t. (d; e; mI ) 2 RI g; (C u D)I = f(d; (m

n)I ) j (d; mI ) 2 CI ; (d; nI ) 2 DI g; (ran(R))I = f(e; mI ) j 9d 2

I s.t. (d; e; mI ) 2 RI g; (9R:C)I = f(d; (m n)I ) j 9e 2

I s.t.

(d; e; mI ) 2 RI ; (e; nI ) 2 CI g: The annotated interpretation I satis es : (R v S; m) if for all n 2 NM; (d; e; nI ) 2 RI implies (d; e; (m n)I ) 2 SI; and (C v D; m) if for all n 2 NM; (d; nI ) 2 CI implies (d; (m n)I ) 2 DI. I is a model of T , denoted I j= T , if it satis es all annotated axioms in T . T entails ( ; m), denoted O j= ( ; m), if I j= ( ; m) for every model I of O.

We are interested in the provenance for a subsumption problem: given a TBox T and two concept names A; B, nd all monomials m such that T j= (A v B; m). We solve it by constructing an ARA that accepts representatives of all these monomials. We use this construction to answer, given a monomial m, whether T j= (A v B; m) holds, and show that this problem is in NP. 3

Automata

We consider two generalisations of non-deterministic nite automata (NFA) [ 13 ]; namely, weighted tree automata and acyclic recursive automata. 3.1

Weighted Tree Automata

Tree automata [ 8 ] generalise NFAs by accepting trees rather than words; the branching of the trees is identi ed by the arity of the automaton. Weighted tree automata further generalise this notion by not only accepting or rejecting an input tree, but assigning a value from a given semiring (its weight ) [ 16 ]. For the scope of this paper, we consider only unlabelled trees.

Let k 1. A weighted tree automaton over S of arity k is a tuple of the form A = (Q; S; wt; I; f ) where Q is a nite set of states, S = (S; ; ; 0; 1) is a semiring, wt : Qk+1 ! S is the transition weight function, I Q is the set of initial states, and f : Q ! S is the exit weight function.

As usual, we represent trees of arity k as nite non-empty sets T f1; : : : ; kg such that if wi 2 T , then w; wj 2 T for each w 2 f1; : : : ; kg ; 1 i; j k. Given a tree T of arity k, a run : T ! Q of A over T assigns a state to each node in T . The weight of this run is wt( ) = Nw12T wt( (w); (w1); : : : ; (wk)) Nw12=T f ( (w)); that is, the product of all the transition and exit weights given the states assigned by . For non-commutative semirings, this product is made from the root to the leafs, and in the order of the children (i.e., top-down, left-toright). Given a state q 2 Q, we de ne wt(q) = L (")=q wt( ); that is, the sum of the weights of all runs that label the root of a tree with q. The behaviour of the automaton A is the sum of the weights of all its initial states kAk := Lq2I wt(q). 3.2

Acyclic Recursive Automata

Acyclic recursive automata generalise NFA by allowing an automaton to call another one, but the calls between automata must respect a hierarchical ordering. They were originally introduced as hierarchical state machines [20] with a slightly di erent structure.

De nition 1 (ARA). An acyclic recursive automaton over the alphabet is a nite set A = fAi j i 2 Ig of NFAs Ai = (Qi; i; i; Ii; Fi), where (I; ) is a partially ordered set of indices, such that: (i) for all i 6= j 2 I, Qi \ Qj = ;; (ii) i = [ fmj j j < ig; and (iii) fmi j i 2 Ig \ = ;.

We call the symbols mi, which are added to the alphabets of the di erent automata in A, call triggers. When an automaton Ai reads the symbol mj , it \calls" the automaton Aj , which continues reading the word until it chooses to return the control to the \calling" automaton Ai (signalled by the symbol mj ). This return is only possible if Aj is in one of its accepting states. In practice, the automaton Aj is in charge of accepting a portion of the input word.

Each automaton Ai may call any other automaton Aj where j < i. Hence, there may be a sequence of nested calls, but the depth of this nesting is always bounded by the number n of automata in A. Moreover, the automaton Ai can never call itself either directly or indirectly. To de ne the language accepted by the ARA A, we adapt the notion of a run to take into account also the nested calls between the automata.

De nition 2 (valid run). A run of the ARA A = fAj j j 2 Ig is a nite sequence = q0; s1; q1; : : : ; sk; qk such that qi 2 Sj2I Qj for all 0 i k and si 2 [ fmj; mj j j < ig for all 1 i k. The notion of a valid run on an automaton Ai is inductively de ned as follows. The run = q0; s1; q1; : : : ; sk; qk is valid on Ai i { fs1; : : : ; skg and (qj; sj+1; qj+i) 2 i for all 0 j < k or { j is the smallest index such that sj 2= , sj is of the form m`, there exists j0 > j with sj0 = m`, and q0; s1; q2; : : : ; sj 1; qj 1 and qj0 ; sj0+1; : : : ; qk are valid in Ai qj; sj+1; qj+1; : : : ; sj0 1; qj0 1 is valid in A` and (qj 1; m`; qj0 ) 2 Ai So far, we have not yet expressed the use of initial and nal states in accepting a word. In a nutshell, whenever we call an automaton, its execution should accept a segment of the input word, by traversing from an initial to a nal state. De nition 3 (successful run). Given a valid run = q0; s1; q1; : : : ; sk; qk, the index i; 1 i k is called a top-level call index i si = mj for some j, and for every ` < i such that s` = mj0 there is an `0; ` < `0 < i such that s`0 = mj0 . If i is a top-level call index with si = mj, then the smallest index `; i < ` k such that s` = mj is its match. This is denoted as ` = match(i).

A valid run = q0; s1; q1; : : : ; sk; qk is successful in Ai i q0 2 Ii, qk 2 Fi and for every top-level call index i with si = mj and ` = match(i), the sequence qi; si+1; : : : ; q` 1 is a successful run in Aj.

The run is successful in the ARA A = fAi j i 2 Ig i it is successful in Aj, for some maximal element j of I. The word accepted by this run is the concatenation of all symbols of appearing in . The language of A is the set L(A) of all words accepted by a successful run in A. By extension, the language accepted by Ai is the set L(Ai) of all words accepted by a successful run in Ai, for each i 2 I.

ARAs are not more expressive than NFAs; they also accept regular languages. The main di erence is that an ARA can be exponentially more succinct than an NFA for representing a given language. For example, the language fa2n g that contains only one word with 2n symbols a can only be recognised by NFAs with at least 2n states, but is accepted by an ARA having n automata with 3 states each (hence 3n states in total); see the appendix of the extended version [ 17 ]. The size of the ARA A is the total number of states in the NFAs in A.

The relevant properties of ARAs for this paper are the following. Deciding whether the ARA A accepts a word w requires only polynomial time. The concatenation of n ARAs is obtained by adding a new NFA with n + 1 states that calls each ARA once. The union of n ARAs is obtained by adding a new NFA with 2 states, which non-deterministically calls one of the ARAs. Abusing the notation, given two ARAs A, B we denote as A B and A[B the ARAs obtained through these constructions, respectively.

T = f(R1 v R3; R1 v R2; R2 v R3; ; ; ); (ran(R) v A; R v S; ran(S) v A; ; ; ); (A v 9S; A v 9R; R v S; ; ; ); (A v C; A v B; B v C; ; ; ); (A v 9R; A v B; B v 9R; ; ; ); (A v C; A v B1; A v B2; B1 u B2 v C; ; ); (ran(R) v C; ran(R) v B1; ran(R) v B2; B1 v C1; B2 v C2; C1 u C2 v C); (A v C; A u B v C; > v B; ; ; ); (A v D; A v 9S; ran(S) v B; B v C; S v R; 9R:C v D); (A v C; A v 9R; > v B; 9R:B v C; ; )

j A; B; C; D 2 NC(T ) [ f>g; R; S 2 NR(T )g 4

The Weighted Automaton

Our goal is to build an ARA which accepts representatives for all the monomials in the provenance of a subsumption relation. To do so, we rst present a weighted tree automaton whose behaviour (which is a language) can be seen as a polynomial (in expanded form) constructed by the provenance monomials for the desired consequence; modulo commutativity. The method for computing this behaviour will give rise to the ARA.

The construction of the automaton is based on considering the \proofs" of a derivation based on the consequence-based algorithm, built in a top-down manner (from the desired consequence, deconstructed back to the axioms used). Formally, we have a di erent automaton for each consequence that we might want to verify. However, all the automata are equivalent, except for the initial state; which refers to the desired consequence. The automaton, which reads trees of arity 5, is also very simple because all transitions that refer to a consequence step have weight f"g (the neutral of the language semiring product), and the only \real" weight is found at the nal states (the exit weight) which is given by the provenance label of the axiom in the TBox.

Let A0; B0 be two distinguished concept names appearing in an annotated TBox T ; the weighted automaton AA0vB0 = (Q [ f g; L; wt; I; f ) is given by { Q is the set of all axioms in restricted normal form on the alphabet of T ; { wt( ) = f"g if 2 T (see Table 1) and wt( ) = ; otherwise; { I = fA0 v B0g; { f (q) = fvg if (q; v) 2 T ; f (q) = f"g if q 2 fX v X; X v >; g; and f (q) = ; otherwise.

The special symbol is used to keep the arity of the automaton to 5. In a nutshell, the transitions of this automaton can be seen as the completion rules from [ 5 ], but applied backwards, from the consequence to the premises that generate it. The weight of any run which labels the root with A0 v B0 is either A v C

{w} B u C v D {u}

A v D C v D > v B {v} {ε} {ε} {ε} {ε} {ε} {ε} ; if the labelled tree does not represent a derivation of the consequence, or a single word concatenating the annotations of the axioms from T used in the derivation. Modulo idempotency, this word represents a provenance monomial for A0 v B0. The behaviour of the automaton is then the language containing a representation of all such provenance monomials.

Example 4. Consider the annotated TBox T containing the following ve axioms T := f(BuC v D; u); (> v B; v); (A v C; w); (A v 9R; x); (9R:B v B; y)g. One possible run of the automaton AAvD is depicted in Figure 1. The two internal nodes (marked with ) have a transition weight of f"g. The weight of this run is fwuvg. It can be seen that every non-leaf node is the consequence obtained from its successors (ignoring the dummy nodes ). There are at least two other runs with weight di erent from ;; one has weight fvwug and the other fxvywug. The behaviour kAAvDk contains fwuv; vwu; xvywug. Note that the rst two words correspond to the same provenance monomial in K, due to commutativity. In L, they are two di erent objects.

As it can be seen from the example, the behaviour of AA0vB0 does not directly yield the set of all provenance monomials for A0 v B0. However, these monomials can be extracted from kAA0vB0 k by taking into account the commutativity and idempotency of K. Abusing the notation, given a word !, we will denote as [!] a representative monomial ! w.r.t. commutativity and idempotency. Hence for instance [xuxv] = uvx. The next theorem is a direct consequence of the correctness of the completion algorithm [ 5 ].

Theorem 5. There is a run T j= (A0 v B0; [m]).

Thus, the behaviour of this automaton, which accumulates the weights of all possible runs, represents all provenance monomials for the consequence A0 v B0. The question is: how to nd this behaviour? Answering this question is the scope of the following section; but before that, we emphasise that the automata for di erent consequences are all identical except for the initial state, which is used to label the root node (that is, the goal that we aim to reach through a proof). Hence, for the TBox in Example 4, we get fv; xvyg kAAvBk.

of AA0vB0 with weight wt( ) = fmg 6= f"g i

The Behaviour

Following the general idea from [ 2, 9 ], we compute the behaviour of the automaton via a bottom-up approach, by iteratively accumulating the provenance of intermediate consequences used in the derivation of A0 v B0. However, the technique must be adapted to handle the semiring L, which is not a lattice.

Speci cally, we build the functions wti : Q ! L(NV ), i 2 N as follows: { wt0 = f ;1 { for i 0, wti+1(q) = wti(q) [ S(q;q1;:::;q5)2T wti(q1) wti(q5) It can be shown by induction on i that wti(q) has a representative for all the monomials arising from trees with root labelled with q and depth at most i. In particular for i = 0, wt0(q) is the label of the axiom q if it appears in T , f"g if q is a tautology or , and ; otherwise. Importantly, wti(q) wti+1(q) for all q 2 Q and all i 2 N. We can thus see the construction of wti as a monotone operator which, in particular, has a smallest xpoint: the limit of the functions wti. This xpoint is, in fact, the behaviour of AA0vB0 .

Theorem 6. The behaviour of AA0vB0 is limn!1 wtn(A0 v B0). Importantly, the functions wti actually assign a language to each state of the automaton. To nd out the behaviour of a di erent consequence, say A1 v B1, one does not need to recompute the automaton and the functions wti, but needs to nd limn!1 wtn(A1 v B1). In other words, nding these functions provides enough information for computing the provenance monomials of all possible consequences (in normal form) from the TBox.

In general, the construction of wti will not yield the xpoint after nitely many applications. Indeed, w.r.t. TBox f(A v B; u); (B v A; v)g, we get that limn!1 wtn(A v B) = (uv) u, but each wti(A v B) contains nitely many words. However, recall that we are not interested in the language kAA0vB0 k per se, but rather in the monomials that the words in this language represent. Since the Trio semiring (which we use to characterise the provenance) uses a commutative and idempotent product operation, we are only interested in the symbols that appear in the words, and not in the actual words themselves. That is, we are only interested in the languages up to representative monomials. De nition 7 (K-equivalence). Two languages L; L0 are K-equivalent (denoted as L K L0) i f[!] j ! 2 Lg = f[!] j ! 2 L0g.

For example, (uv) u and fuv; ug are K-equivalent. While the languages wti(q) and the words therein may grow inde nitely, their representative monomials are limited by the provenance variables appearing in T ; which are at most jT j. Hence, there exists an n 2 N such that wtm(q) K wtn(q) holds for all q 2 Q and all m n. Following Theorem 5, for this n wtn(A0 v B0) contains representatives for all the provenance monomials for A0 v B0. 1 Recall that f is the exit weight function of the automaton.

As argued before, wti(q) contains the weights of all runs of height at most i with root q. It can be seen that for every run of height greater than jQj jT j there is a smaller run 0 such that wt( ) = wt( 0). This means that the least xpoint for wti is found after at most jQj jT j iterations, which is polynomial in jT j. Speci cally, the number of iterations needed to reach a xpoint is bounded by O(jT j4).

Recall that each wti(q) is a language. By construction, it is a regular language; indeed, it is formed by concatenation and union of nite languages. If we tried to represent these languages extensionally, enumerating all the words they contain, we would potentially need exponential space: potentially, the language may contain exponentially many words. Exploiting the fact that these languages are regular, we can represent them through NFAs. In fact, wt0 is composed of very simple automata with at most two states, and the construction of wti+1 from wti requires only concatenation and union of automata, which are basic automata operations [ 13 ]. However, iteratively constructing these NFA as in the de nition of wti can also lead to an exponential blowup; for an example see the extended version of this paper [ 17 ]. To keep the construction tractable, we exploit the succinctness power of ARAs.

Note once again that each wt0(q) contains either a word of length 1, the empty word, or is the empty language. All these languages are recognisable by q NFA with at most two states. We call these automata A0. For each successive q q0 wti+1(q) we construct an automaton Ai+1 that calls the automata Ai , which accept the languages wti(q0); q0 2 Q. Thus we are constructing an ARA with the ordering Aiq Ajq0 for all q; q0 2 Q and all 0 i < j. Importantly, each automaton Aiq+1 requires at most ve states (to concatenate the languages of the successive states) for each transition (q; q1; : : : ; q5) 2 T (recall Table 1). Since the number of such transitions is bounded by jQj5, it follows that the size of each q0 ARA Aiq := fAj j q0 2 Q; j < ig[fAiqg is in O(jQj5 i). Let now Aq := Aqn, where n is the number of iterations needed to reach a xpoint w.r.t. K-equivalence. As seen, its size is in O(jQj6 jT j); that is, it is bounded by a polynomial on jT j. Moreover, this ARA Aq su ces to nd all the provenance monomials for the consequence q, as expressed next.

Theorem 8. T j= (A0 v B0; m) i there is a word ! 2 L(AA0vB0 ) such that [m] = [!].

Example 9. Consider again the TBox from Example 4. The languages wti are q extensionally represented in Table 2. The construction of the automata Ai+1 mAvB

i mAvC

i mCvD i mBuCvD i mBuCvD

i m>vB i m>vB i m∃R.BvB

i mCvD i mAvB i m>vB i mAvD i mAvC i for i 0 and q 2 fC v D; A v B; A v Dg is depicted in Figure 2, where each transition miq is a call to the automaton Aiq. Hence, for instance A1CvD may nondeterministically call A0CvD (which yields the empty language), or concatenate a word accepted by A0BuCvD with a word from A0>vB. Thus, L(A1CvD) = fuvg. Similarly, we can see that L(A1AvD) = ;. Note that for a xed q 2 Q the structure of the automata Aiq+1 is the same for all i 0. The di erence is that they call the automata from the previous iteration.

Recall that deciding whether a word ! is a accepted by an ARA A is polynomial on the size of A. In particular, for Aq this task is polynomial on jT j. However, Theorem 8 requires to rst nd the word ! that needs to be tested. One idea is to build an automaton A that accepts the language Lm := f! j [!] = [m]g and check whether L(Aq)\Lm 6= ;. However, it is not at all clear whether Lm is even a regular language; speci cally, to the best of our knowledge it has never been veri ed whether the commutative closure of a regular language it also regular.

To solve this issue, we rst guess (in polynomial time on the size of m) an ordering of the symbols in m|say 1; : : : ; k|and then verify whether Aq accepts a word from that is, a word where the symbols rst appear in the speci ed order. Note that the language in Equation (1) is regular, and can be recognised by an NFA with k + 1 states. Recall also that given an ARA A and an NFA A, it is possible to construct an ARA A0 of size bounded by jAjjAj such that L(A0) = L(A) \ L(A). Thus, verifying whether the chosen order yields a word accepted by Aq is polynomial on jT j and jmj. The non-deterministic ordering guess yields the following. Theorem 10. Deciding T j= (A0 v B0; m) is in NP.

The Non-commutative Case

We now consider the case where the semiring is not commutative. The idea of non-commutativity is to preserve the information of the order in which axioms were used to derive a consequence. We consider here a left-absorbing product : for multiple occurrences of the same provenance symbol, we take into account the rst (or left-most) one. Thus, e.g., [uvu] = [uv] 6= [vu]. We call this case r non-commutative ELH . r Example 11. Let T := f(A v B; m); (B v C; n)g. In non-commutative ELH , T j= (A v C; mn) but T 6j= (A v C; nm).

Non-commutativity also means that, e.g., the concept expression A u B is not interpreted in the same way as B u A, which may seem counterintuitive since in classical DL semantics these concepts are equivalent. One possible use case for this semantics is for representing de nitional sentences in natural language processing [ 15, 18 ], where the order of the words usually also changes the meaning. For example, the logic would distinguish White u Wine from Wine u White.

Interestingly, we know from Equation (1) that we can verify whether Aq accepts a representative (under left-absorption) of a monomial m. The bene t in this case is that it is not necessary to rst guess the right ordering, as it is required by the ordering given in m. This yields the following result. Theorem 12. T j= (A0 v B0; m) w.r.t. a left-absorbing, non-commutative semiring can be decided in polynomial time. 7

Conclusions

In this paper we have studied the complexity of deciding whether the provenance of a subsumption relation contains a given monomial m. In previous work [ 5 ], it was shown through a completion algorithm, that this problem is in PSpace when the semiring product is idempotent and commutative, but only a polynor mial lower bound (derived from reasoning in ELH ) was given. By viewing the completion algorithm backwards, as a decomposition approach based on tree automata, and exploiting a less known class of automata (ARAs) to simulate structure sharing, we were able to lower this upper bound to NP. Unfortunately, the polynomial lower bound remains the best available at the moment. If we substitute commutativity by a notion of left-absorption, we obtain a tight polynomial-time complexity for this problem. Interestingly, the technique developed can be applied to instance queries and assertion entailments, simply by extending the automaton construction to the ABox-handling rules from [ 5 ]. Hence, the same complexity bounds hold in both cases. One avenue for future work is to close the remaining complexity gaps in these problems.

Note that the complexity results depend strongly on the properties of the provenance semiring. Indeed, the xpoint computation of the functions wti only terminates due to the idempotence and commutativity (or left-absorption) of the product. However, the construction of the automaton AA0vB0 remains correct for a larger class of semirings (Theorem 5). We will study whether the technique can be applied in practice for these other semirings. One particular point of interest is to consider closure semirings [ 14 ] or other approaches for handling the repeating structure of the ARAs. 18. Petrucci, G., Ghidini, C., Rospocher, M.: Ontology learning in the deep. In: EKAW.

pp. 480{495 (2016) 19. Schlobach, S., Cornet, R.: Non-standard reasoning services for the debugging of description logic terminologies. In: IJCAI (2003) 20. Yannakakis, M.: Hierarchical state machines. In: van Leeuwen, J., Watanabe, O., Hagiya, M., Mosses, P.D., Ito, T. (eds.) Theoretical Computer Science: Exploring New Frontiers of Theoretical Informatics. pp. 315{330. Springer Berlin Heidelberg, Berlin, Heidelberg (2000)

1. Baader , F. , Calvanese , D. , McGuinness , D. , Nardi , D. , Patel-Schneider , P . (eds.): The Description Logic Handbook: Theory, Implementation, and Applications . Cambridge University Press, second edn. ( 2007 )

2. Baader , F. , Pen~aloza, R.: Automata-based axiom pinpointing . J. Autom. Reasoning 45 ( 2 ), 91 { 129 ( 2010 ). https://doi.org/10.1007/s10817-010-9181-2, https://doi.org/10.1007/s10817-010-9181-2

3. Baader , F. , Pen~aloza, R.: Axiom pinpointing in general tableaux . J. Log. Comput . 20 ( 1 ), 5 { 34 ( 2010 ). https://doi.org/10.1093/logcom/exn058, https://doi.org/10.1093/logcom/exn058

4. Bourgaux , C. , Ozaki , A. : Querying attributed DL-Lite ontologies using provenance semirings . In: AAAI ( 2019 )

5. Bourgaux , C. , Ozaki , A. , Pen~aloza, R., Predoiu , L. : Provenance for the description logic elhr . In: Bessiere, C . (ed.) IJCAI . pp. 1862 { 1869 . ijcai. org ( 2020 )

6. Calvanese , D. , Lanti , D. , Ozaki , A. , Pen~aloza, R., Xiao , G.: Enriching ontologybased data access with provenance . In: IJCAI ( 2019 )

7. Cheney , J. , Chiticariu , L. , Tan , W.C. : Provenance in databases: Why, how, and where . Foundations and Trends in Databases 1 ( 4 ), 379 { 474 ( 2009 )

8. Comon , H. , Dauchet , M. , Gilleron , R. , Loding, C. , Jacquemard , F. , Lugiez , D. , Tison , S. , Tommasi , M. : Tree automata techniques and applications . Available on: http://www.grappa. univ-lille3.fr/tata ( 2007 ), release October, 12th 2007

9. Droste , M. , Kuich , W. , Rahonis , G.: Multi-valued mso logics overwords and trees . Fundam. Inf . 84 ( 3 ,4), 305 {327 (Dec 2008 )

10. Golan , J.S.: The theory of semirings with applications in mathematics and theoretical computer science , Pitman monographs and surveys in pure and applied mathematics , vol. 54 . Longman Scienti c & Technical ( 1992 )

11. Green , T.J. , Karvounarakis , G. , Tannen , V. : Provenance semirings . In: PODS ( 2007 )

12. Green , T.J. , Tannen , V. : The semiring framework for database provenance . In: PODS ( 2017 )

13. Hopcroft , J.E. , Ullman , J.D. : Introduction to Automata Theory, Languages, and Computation . Addison-Wesley Publishing Company ( 1979 )

14. Lehmann , D.J.: Algebraic structures for transitive closure . Theoretical Computer Science 4 ( 1 ), 59 { 76 ( 1977 ). https://doi.org/https://doi.org/10.1016/ 0304 - 3975 ( 77 ) 90056 - 1

15. Ma , Y. , Distel , F. : Learning formal de nitions for Snomed CT from text . In: AIME . pp. 73 { 77 ( 2013 )

16. M.Droste , W. Kuich , H.Vogler (eds.): Handbook of Weighted Automata. Monographs in Theoretical Computer Science , Springer ( 2009 )

17. Pen~aloza, R.: An upper bound for provenance in ELHr . CoRR abs/2108 .12774 ( 2021 )