1. Introduction

Path Description Dependencies in Feature-Based DLs

Eva Feng

Alexander Borgida

Enrico Franconi

Peter F. Patel-Schneider

David Toman

Grant Weddell

0 0 Cheriton School of Computer Science, University of Waterloo , Waterloo , Canada 1 Department of Computer Science, Rutgers University , New Brunswick , US 2 KRDB Research Centre for Knowledge and Data, Free University of Bozen-Bolzano , Italy

All members of the FunDL family of description logics replace roles with partial functions (features) and have a concept constructor, called path functional dependency (PFD), for expressing extensions of functional dependencies, useful in object-relational data sources. In this paper, we consider generalizing PFDs to path description dependencies (PDD), by allowing inverse features in them, which result in paths being set-valued. We show that logical consequence for partial-ℒℱ ℐ, one of the most expressive dialects of the FunDL family, remains EXPTIME complete for coherent terminologies, when extended with PDDs. As a first application, we consider referring expression types, which are concept descriptions used to identify individuals in query answers. In the new dialect, such a type denotes a set of concept descriptions. As such, one must prove in advance that any interpretation of any member of the set will never have more than one element. We show that this “singularity condition” can be diagnosed as a collection of logical consequence questions wrt a TBox.

1. Introduction

The FunDL dialects of description logics (DLs) were designed primarily to support reasoning that is required in structured data integration and view-based query rewriting over object-relational data sources such as relational databases [1, 2] and more recently JSON stores [3]. Consequently, all such dialects replace roles with partial functions called features for a better alignment with the ubiquitous notion of an attribute or column in such data sources, and all have a path functional dependency (PFD) concept constructor for capturing equality-generating dependencies such as keys, uniqueness constraints and functional dependencies that are commonly part of schemata for such data sources.

A PFD generalizes the notion of a functional dependency by allowing path functions in place of attributes to express navigation over a sequence of features. For example, suppose for an accountant’s personal database of clients, some are friends. An axiom expressing that the combination of the first name and the dial number of the phone identifies at most one friend among all clients in the database can be expressed in a FunDL dialect as follows.

CLIENT-FRIEND ⊑ (CLIENT : fname, phone.dialnum → id )

( 1 ) Here, the axiom with the PFD concept constructor is used to express a key or uniqueness condition underlying a form of identification vetting . It can be paraphrased in English as: client friends are clients that, among all clients, can be identified by a combination of their first name and the dial number of their phone.

In this paper, we generalize the PFD concept constructor with a path description dependency (PDD) constructor that allows traversing features in both directions. For example, consider a generalization of the above example where the accountant’s clients have possibly multiple phones and that those who are friends can still be identified by their first name and at least one of their phones when such exists. So, instead of a phone feature there would be a PHONE role relating clients with their possibly multiple phones. This can be done in description logics by encoding the primitive role PHONE, or indeed any primitive role, as the composition of an inverse feature with another feature, where the two features globally characterize the domain and range of the role (e.g., phonedom and phoneran). In our example, the role PHONE would be systematically represented as the composition (phonedom− .phoneran). The above example can then be captured with a PDD by the following axiom.

(CLIENT-FRIEND ⊓ ∃phonedom− .phoneran.⊤) ⊑ (CLIENT : fname, phonedom− .phoneran.dialnum → id ) ( 2 ) This axiom also expresses a key or uniqueness condition: any pair of distinct clients for which at least one is also a friend with at least one phone will always disagree on either their first name or on all the dial numbers of their respective sets of phones.

Furthermore, we may be interested in recording additional information for each pair of instances in the PHONE role, for example, the starting date of the telecom contract for a client and a phone. We can do that by reifying the PHONE role as the HAVING-PHONE concept, so that pairs of instances in the PHONE role have a one-to-one association with instances of the HAVING-PHONE concept.

HAVING-PHONE ⊑ (HAVING-PHONE : phonedom, phoneran → id ) This generalizes to arbitrary -ary relations, which can be similarly treated in FunDL dialects [4].

Now consider where there is a need to view the set of phones for a person as a single instance which aggregates the phones, that is, as a so-called plural entity [5] about which additional information needs to be recorded, for example, a count of the number of such phones for a given plural entity. A role PHONES would relate an instance of the plural entity to the phones it aggregates. Exploiting the same encoding proposed above, the role PHONES would be systematically represented as the composition (phonesdom− .phonesran). A feature phonesgroup then relates a client with its own phones, now seen as a single plural entity. A refinement of ( 2 ) accommodates this level of indirection.

(CLIENT-FRIEND ⊓ ∃phonesgroup.phonesdom− .phoneran.⊤) ⊑ (CLIENT : fname, phonesgroup.phonesdom− .phonesran.dialnum → id ) ( 3 ) Note that the occurrences of inverse feature navigations, phonesdom− , are now preceded by feature navigations of phonesgroup, and that this is not the case with subsumption ( 2 ) where inverse feature navigations happen at the start of path descriptions. Consequently, ( 2 ) can be rewritten as an axiom involving a non-key PFD in which inverse feature navigations are avoided, as in the following.

HAVING-PHONE ⊑ (HAVING-PHONE : phonedom.fname, phoneran.dialnum → phonedom) (4)

However, this is not possible with subsumption ( 3 ) which requires the greater expressiveness of PDDs.

The remainder of the paper is organized as follows. The necessary background and definitions are given in Section 2 where we introduce the new FunDL dialect called set-ℒℱ ℐ which replaces PFDs in the partial-ℒℱ ℐ dialect of FunDL with PDDs. Our main result is presented in Section 3 in which we show that logical consequence for set-ℒℱ ℐ is EXPTIME-complete when interpretations are assumed to satisfy a coherence condition. We also show that an alternative set equality semantics leads immediately to undecidability.

As a first application to illustrate the utility of the new dialect, we show in Section 4 how a singularity condition for a so-called referring expression type ()1 with respect to a TBox can be diagnosed as a collection of logical implication problems. An denotes a set of concept descriptions in set-ℒℱ ℐ that should qualify as referring concepts, that is, as a means of referring to elements of the domain of an interpretation. The diagnosis of singularity for an ensures that any interpretation of any referring concept in the set will never denote more than one element. Summary comments and an outline of open problems and future work are then given in Section 5.

2. Background and Definitions

We now formally define the artifacts introduced in our introductory comments: a new member of the FunDL family of DLs called set-ℒℱ ℐ and referring expression types. The latter is a pattern language for specifying a set of referring concepts in set-ℒℱ ℐ which are intended to describe the existence of individuals with complex properties. The new dialect derives from an existing dialect called partial-ℒℱ ℐ by allowing the general use of set valued path descriptions in place of path functions. Doing so in the case of the PFD concept constructor (common to all existing FunDL dialects) obtains our new PDD concept constructor.2 Definition 1 (Referring Concepts and TBoxes in set-ℒℱ ℐ). Let F and PC be sets of feature names and primitive concept names, respectively. A path description is defined by the grammar

Pd ::= id | . Pd | − 1. Pd, for ∈ F. A concept description is defined by the grammar in the first column of Figure 1. A referring concept is a concept description parsed by the last four productions, and a TBox concept 1Pronounced “are-ee”.

2The acronyms are respectively short for description logic with set-valued path descriptions, dependencies and inverses and description logic with partial functions, dependencies and inverses. (disjunction)

(negation) (value restriction) (existential quantification) (primitive concept; A ∈ PC) (conjunction) (nominal) ::= 1 ⊔ 2 | ¬ 1ℐ ∪ 2ℐ △ℐ ∖ ℐ

Semantics: Defn of (· )ℐ | : Pd1, ..., Pd → Pd { | ∀ ∈ ℐ : Pdℐ ({}) ̸= ∅ ∧ Pdℐ ({}) ̸= ∅ ∧ (PDD) (⋀︀=1 Pdℐ ({}) ∩ Pdℐ ({}) ̸= ∅) → Pdℐ ({}) ∩ Pdℐ ({})} ̸= ∅ | ∀Pd. | ∃Pd. | A | 1 ⊓ 2 | {} { | Pdℐ ({}) ⊆ ℐ } { | Pdℐ ({}) ∩ ℐ ̸= ∅} Aℐ ⊆ △ ℐ 1ℐ ∩ 2ℐ {ℐ } is a concept description parsed by all but the final production, that is, are concepts free of nominals. This includes the third production, a concept constructor called a path description dependency (PDD). A subsumption is an expression of the form 1 ⊑ 2, where the are TBox concepts, and where PDDs occur only in 2 but not within the scope of negation.3 A terminology (TBox) consists of a finite set of subsumptions, and a posed question is a single subsumption.

Semantics is defined with respect to a structure ℐ = (△ℐ , · ℐ ), where △ℐ is a domain of objects or entities and · ℐ an interpretation function that fixes the interpretations of primitive concept names to be subsets of △ℐ and feature names to be partial functions ℐ : △ℐ → △ℐ . The interpretations of path descriptions Pd are functions Pdℐ : 2△ℐ → 2△ℐ over the powerset of △ℐ defined as follows, where ⊆ △ ℐ :

⎧ if Pd = “ id”, Pdℐ () = ⎨ Pd1ℐ ({ ℐ () | ∈ }) if Pd = “. Pd1”,

⎩ Pd1ℐ ({ | ℐ () ∈ }) if Pd = “ − 1. Pd1”.

The semantics of derived concept descriptions is defined by the centre column of Figure 1.

An interpretation ℐ satisfies a subsumption 1 ⊑ 2 if 1ℐ ⊆ 2ℐ and is a model of , written

ℐ |= , if it satisfies all subsumptions in . Given a terminolgy and posed question , the logical consequence problem asks if is satisfied in all models of , written |= . □

As suggested by subsumptions ( 2 ), ( 3 ) and ( 4 ) above, PDDs adopt a set intersection semantics. Consequently, any subsumption that is symmetric and a simple key of the form “ ⊑ ( : Pd1, ..., Pd → id )” can be captured as an identification constraint (IdC) proposed in [7]. IdCs are a variant of equality-generating dependencies that, unlike in our case, correspond to an additional kind of assertion in a TBox, separate from subsumptions, that have the form “(id Pd1, ..., Pd)”. Note that path descriptions in IdCs also allow the occurrence of test relations “?” denoting the identity role on the interpretation of . A more general asymmetric version of test relations of the form “(1, 2)” could be added as an additional possibility for 3Violating this latter condition leads immediately to undecidability [6]. path descriptions in PDDs as syntactic sugar. To suggest the simple mapping involved, consider a PDD using such an asymmetric test relation:

3 ⊑ (4 : Pd1 .(1, 2). Pd2 → id ).

This would be syntactic shorthand for the PDD

(3 ⊓ ∀Pd1.1) ⊑ ((4 ⊓ ∀Pd1.2) : Pd1 . Pd2 → id ).

The formal definition of a referring expression type in the context of set-ℒℱ ℐ now follows. Such types were first introduced in [ 8] by abstracting constants and by admitting a production to express preference among WFFs free in one variable as a means of entity reference in first order knowledge bases. They have been adapted for FunDL dialects in [ 3] as part of a proposed semantics for JSON values as FunDL knowledge bases.

Definition 2 (Referring Expression Types). A referring expression type () is defined by the following grammar, where A is a primitive concept:

::= A | {?} | ∃Pd. | 1 ⊓ 2 | 1 ; 2

The language of referring concepts inhabiting , ℒ(), is given as follows:

ℒ(A) = {A} ℒ({?}) = {{} | is a constant symbol} ℒ(∃Pd.) = {∃Pd. | ∈ ℒ()} ℒ(1 ⊓ 2) = {1 ⊓ 2 | 1 ∈ ℒ(1) and 2 ∈ ℒ(2)} ℒ(1; 2)) = ℒ(1) ∪ ℒ(2)

Given a TBox and referring expression type , the singularity problem for with respect to is to determine if |ℐ | ≤ 1 for any ∈ ℒ() and any interpretation ℐ. □

For example, a referring expression type denoting possible referring concepts for CLIENT entities might be given as the following: (CLIENT-FRIEND ⊓ ∃phonesgroup.phonedom− .phoneran.⊤

⊓ ∃fname.{?} ⊓ ∃phonesgroup.phonedom− .phoneran.dialnum.{?}) ; (CLIENT ⊓ ∃fname.{?} ⊓ ∃lname.{?}) Here, the use of “;” indicates a preference for using the first name and a phone number for referring to a client who is also a friend with at least one phone number, and by using a combination of the first name and last name otherwise (see [ 3] for further details). Here also, ensuring that a TBox logically implies both the following subsumption as well as ( 3 ) above will be needed in any diagnosis of singularity for ( 5 ).

CLIENT ⊑ (CLIENT : fname, lname → id )

( 5 ) ( 6 ) 3. Decidability of Logical Consequence for set-ℒℱ ℐ To simplify the following constructions, we assume that both the TBox and the posed question are in normal forms defined as follows: Definition 3 (Normal Form TBoxes and Posed Questions). Let A, B, and C denote primitive concepts, ⊥(= A ⊓ ¬A), or ⊤(= A ⊔ ¬A). A TBox in normal form consists of eight possible subsumptions: A ⊑ C, A ⊓ B ⊑ C, A ⊑ B ⊔ C, A ⊑ ∀.B, A ⊑ ∃.B, A ⊑ ∀ − .B, A ⊑ ∃ − .B, and A ⊑ B : Pd1, . . . , Pd → Pd. TBoxes that do not mention a PDD, that is, do not have a subsumption of the form A ⊑ B : Pd1, . . . , Pd → Pd, are called simple.

A posed question in normal form has the form A ⊑ B : Pd1, . . . , Pd → Pd or is a simple

subsumption A ⊑ B. □

It is easy to see that for every TBox and posed question there is a conservative extension of the TBox that preserves entailment.

Logical consequence for partial-ℒℱ ℐ was shown in [9] to be undecidable, which will therefore also be the case with set-ℒℱ ℐ. One way to regain decidability suggested in [9] is to adopt a coherency condition on a TBox . The condition requires that implicitly contains a set of subsumptions of the following form, where ranges over TBox concepts in the language of , and where − explicitly appears in a concept ∃ − .D in the normalized TBox (we call such features interesting): ∃ − . ⊑ ∀ − . (* ) Note that, while we assume these subsumptions are part of every TBox, they are not part of the input to our decision procedure since the coherency condition is an integral part of the procedure itself. For the remainder of the paper we assume the coherency condition for every set-ℒℱ ℐ TBox.

Lemma 4. Let be a set-ℒℱ ℐ TBox and = A ⊑ B a posed question. For any counterexample to |= , there is also a tree-shaped counterexample ℐ for which: (a) ℐ |= , (b) ℐ is rooted by a witness ∈ (A ⊓ ¬B)ℐ , and (c) for any ∈ F, there is no element of △ℐ that has two or more incoming features .

Proof (sketch): The tree-shaped model is constructed by unravelling the original counterexample model, starting from the witness , as follows. For every ∈ F: (i) all successors of an object are added to ℐ and subsequently unravelled, and (ii) exactly one of the interesting -predecessors is also added and unravelled (note that the single -predecessor can be the parent of the current object). It follows from (* ) that, in such an unravelling, all simple subsumptions in must be satisfied (by inspection) and also that, since no object in △ℐ that has two or more incoming features for any ∈ F, all PDDs in hold vacuously. □

Note that the Pd paths starting from the witness uniquely identify objects in △ℐ . Lemma 5. Let be a set-ℒℱ ℐ TBox and = A ⊑ B : Pd1, . . . , Pd → Pd0 a posed question. For any counterexample to |= , there is also a counterexample ℐ, called a two-tree counterexample, constructed as follows:

1. let ℐ1 be an unravelling of from an object 1 ∈ A ; 2. let ℐ2 be an unravelling of from an object 2 ∈ B ; and 3. let ℐ be a disjoint union of ℐ1 and ℐ2 modified to agree on a path

respectively, if and only if Pd ({1}) ∩ Pd ({2}) ̸= ∅.

Pd starting in 1 and 2, Proof (sketch): The unravellings ℐ1 and ℐ2 are as in Lemma 4. The domain △ℐ is then the disjoint union of the domains of ℐ1 and ℐ2 factored by the equivalence relation generated by the path agreements. Note that it is suficient to only consider path agreements that are symmetric with respect to 1 and 2; this is a consequence of (* ) and of the syntactic structure of the PDD concept constructor. □

The properties established by Lemmata 4 and 5 allow one to reduce the existence of an posed question-appropriate counterexample as test of satisfiability of a formula in the Ackermann class prefix ∃* ∀∃* [10]. We use the Skolemized version of the problem that replaces the inner existential variables by appropriate unary function symbols and the outer ones by constants.

The main intuition is that we use a single term to denote the (possibly two) corresponding objects in the unravellings ℐ1 from 1 and ℐ2 from 2, if they exist. We introduce the following function symbols and unary predicates to facilitate this construction: • function symbols f() and ¯f() for every feature appearing in , standing for and (inverse of) − , respectively; • unary predicates PA () and PA() whose interpretation will consist of individuals corresponding to individuals in Aℐ for every primitive concept in ; • unary predicates N() and N() whose interpretation will consist of individuals that exist in ℐ (note that set-ℒℱ ℐ uses partial features while the Skolem functions are total); • unary predicates EqPd() that assert equality between the two paths originating in the corresponding objects in the two unravellings.

The encoding itself relies on the following auxiliary axioms (and auxiliary predicate definitions), where ranges over {, }: Features and inverse features: There cannot be objects with the following f–¯f patterns: ∀.¬N (f.¯f()) and ∀.¬N (¯f.f()).

We have such an axiom for every ∈ F. This restriction corresponds to functionality of features and to disallowing two or more incoming ’s in Lemma 4; Equalities and paths: The following axioms describe how equalities (based on intersection) propagate along PDs: ∀.(N() ∧ N(f()) ∧ N() ∧ N(f())) → (EqPd(f()) ↔ Eq. Pd()) ∀.(N() ∧ N(¯f()) ∧ N() ∧ N(¯f())) → (EqPd() ↔ Eq. Pd(¯f()) ∀.(N() ∧ N(f()) ∧ N() ∧ N(f())) → (EqPd() ↔ Eq − . Pd(f())) ∀.(N() ∧ N(¯f()) ∧ N() ∧ N(¯f())) → (EqPd(¯f()) ↔ Eq − . Pd()) Equalities and functionality: The following axioms simulate the functionality of features: ∀.N() ∧ N(f ()) ∧ N() ∧ N(f ()) ∧ Eqid () → Eqid (f ()) ∀.N() ∧ N(¯f()) ∧ N() ∧ N(¯f()) ∧ Eqid (¯f()) → Eqid () Equalities and concept membership: The following axioms simulate the efects of equality on concept membership:

∀.N() ∧ N() ∧ Eqid () → (PC () ↔ PC()) where ranges over all primitive concepts in ∪ .

Path existence: The NPd() predicate asserts that all objects on the path Pd starting in exist (on side ):

Nid () → N () N. Pd() → (N () ∧ NPd(f ()))

N − . Pd() → (N () ∧ NPd(¯f())) We call the above collection of axioms Π. With the above preparation, we are ready to map subsumptions in normalized TBoxes to the following axioms: Simple TBox subsumptions:

A ⊑ C A ⊓ B ⊑ C A ⊑ B ⊔ C A ⊑ ∀.B A ⊑ ∃.B ↦→ ↦→ ↦→ ↦→ ↦→ A ⊑ ∀ − .B ↦→ A ⊑ ∃ − .B ↦→ ∀.(N () ∧ PA ()) → PC () ∀.(N () ∧ PA () ∧ PB ())) → PC () ∀.(N () ∧ PA ()) → (PB () ∨ PC ()) ∀.(N () ∧ N (f ()) ∧ PA ()) → PB (f ()), ∀.(N (¯f()) ∧ N () ∧ PA (¯f()) → PB () ∀.(N () ∧ PA ()) → (N (f ()) ∧ PB (f ())), ̸= ¯f() ∀.(N (¯f()) ∧ PA (¯f()) → (N () ∧ PB ()) ∀.(N () ∧ N (f ()) ∧ PA (f ())) → PB (), ∀.(N (¯f()) ∧ N () ∧ PA ()) → PB (¯f()) ∀.(N (f ()) ∧ PA (f ())) → (N () ∧ PB ()), ∀.(N () ∧ PA ()) → (N (¯f()) ∧ PB (¯f())), ̸= f () Note that since a single feature can be coded using both f and (inverse of ) ¯f, we need two axioms for both the value and existential restrictions. Also, for the existential restrictions of the form ∃. and ∃ − . we need to ensure that ¯f is not used when is of the form f () (and vice versa as that would create a superfluous inverse). Since the Ackermann prefix class disallows explicit equalities, we need to simulate this inequality by explicitly listing all the possible terms for that do not start with f . This is always possible since there are only finitely many features in .

Equalities and PFDs: To simulate the efects of PDDs, it is suficient to record their efects between the left and the right sides of the counterexample as follows:

A ⊑ B : Pd1, . . . , Pd → Pd ↦→ ∀.(N() ∧ N() ∧ P B ∧

A ∧ P ∀.(N() ∧ N() ∧ P B ∧

A ∧ P ⋀︀

=1 EqPd ()) → EqPd() ⋀︀ =1 EqPd ()) → EqPd() We call the set of these axioms Π . The posed questions then map to the following assertions about witnesses of non-entailment: Posed questions: The (ground) axioms represent a counterexample to the posed question; the constant 0 stands for the 1 (and 2) objects in Lemmata 4 and 5.

A ⊑ B ↦→ N(0), PA (0), ¬PB(0) A ⊑ B : Pd1, . . . , Pd → Pd ↦→ PA (0), PB(0),

NPd1 (0), . . . , N

Pd (0), NPd(0), NPd1 (0), . . . , NPd (0), NPd(0), EqPd1 (0), . . . , EqPd (0), ¬EqPd(0) We call the set of these axioms Π¬.

To show correctness of the above construction, we use the following auxiliary definition that maps feature paths in models of set-ℒℱ ℐ to paths in models of the corresponding Ackermann formula (and back).

pm(id ) = 0, pm(. Pd) = f(pm(Pd)), and pm( − . Pd) = ¯f(pm(Pd))

Theorem 6. Let and be a set-ℒℱ ℐ TBox and a posed question, respectively, both in normal form. Then |= if and only if Π ∪ Π ∪ Π¬ is unsatisfiable.

Proof (sketch): For posed question of the form A ⊑ B and a tree-shaped counter-example ℐ starting from (see Lemma 4) we create a model of Π ∪ Π ∪ Π¬ as follows: we assert • N(pm(Pd)) for every path . Pdℐ that exists in ℐ, and • PA (pm(Pd)) whenever . Pdℐ ∈ Aℐ in ℐ and verify that all formulae in Π ∪ Π ∪ Π¬ are true.

Similarly, given a model of Π ∪ Π ∪ Π¬ we construct ℐ as follows: • △ℐ = {Pd | N(pm(Pd)) holds}; • Aℐ = {Pd | N(pm(Pd)) ∧ PA (pm(Pd)) holds}; and • ℐ = {(Pd, . Pd) | N(f. pm(Pd))} ∪ {(Pd, − . Pd) | N(¯f. pm(Pd))}, and verify that ℐ is a model of and id falsifies .

For a posed question A ⊑ B : Pd1, . . . , Pd → Pd0 and a two-tree counterexample ℐ starting in 1 and 2 (see Lemma 5) we create a model of Π ∪ Π ∪ Π¬ as follows: we assert • N(pm(Pd)) for every path 1. Pdℐ that exists in ℐ; • PA (pm(Pd)) whenever 1. Pdℐ ∈ Aℐ ; • N(pm(Pd)) for every path 2. Pdℐ that exists in ℐ; • PA(pm(Pd)) whenever 2. Pdℐ ∈ Aℐ ; and • Eqid (pm(Pd)) whenever Pdℐ (1) ∩ Pdℐ (2) ̸= ∅.

We extend this to the remaining auxiliary predicates EqPd and NPd in a natural way and verify that this yields the required model of Π ∪ Π ∪ Π¬. The other direction constructs a two-tree model of that falsifies using the construction for A ⊑ B twice (once for starting in id and once for starting in id ) and equating paths id . Pd and id . Pd for which EqPd(0) holds in the model of Π ∪ Π ∪ Π¬. That yields a two-tree model of in which id and id witness the violation of . □

Corollary 7. Let and be a set-ℒℱ ℐ TBox and a posed question, respectively. Then |= is decidable and complete for EXPTIME.

Proof (sketch): It is easy to see from our construction that the size |Π∪Π ∪Π¬| is polynomial in | | + ||. The result then follows from EXPTIME bound for satisfiability of Ackermann prefix formulae [11] and closure of EXPTIME under complement. □ 3.1. Set Equality Semantics and Undecidability Here, we consider where a set semantics for PDDs is assumed, and briefly outline how this leads to undecidability for the logical consequence problem. In particular, now assume the semantics for a PDD concept : Pd1, ..., Pd → Pd is given as follows:

{ | ∀ ∈ ℐ : (⋀︀=1 Pdℐ ({}) = Pdℐ ({})) → Pdℐ ({}) = Pdℐ ({})}

Theorem 8. Let be a set-ℒℱ ℐ TBox in normal form and a posed question. Then, under the set equality semantics for PDDs, |= is undecidable.

Proof (sketch): The subsumptions A ⊑ A : − → id together with ⊤ ⊑ ∀.⊥ and ⊤ ⊑ ∃.A make the primitive concept A to behave as a nominal (i.e., contains exactly one object in every model of the above subsumptions). Subsequently asserting, for three such “nominals”, A, B, and C, that A ⊑ ∀ℎ.B ⊓ ∀.C and B ⊑ ∀.C postulates the existence of a triangle in every model which allows us to apply the construction in [12, Section 5], yielding undecidability using a tiling argument. □

4. Referring Expression Types and Singularity Diagnosis

We now show how the test for singularity of an for a given TBox presented in [3] is easily adapted for set-ℒℱ ℐ. This is achieved by appeal to a sequence of logical consequence problems for subsumptions expressing dependencies with PDDs that are induced by a given . The subsumptions derive from the following normalization.

Definition 9 (Normalized Referring Expression Types). We write Norm() to refer to an exhaustive application of the following rewrite rules: ⊓ (1; 2) ↦→ ⊓ 1; ⊓ 2 (1; 2) ⊓ ↦→ 1 ⊓ ; 2 ⊓ ∃Pd.(1; 2) ↦→ ∃Pd.1; ∃Pd.2 □

The definition of Norm is an adaptation of referring expression type normalization in [8] with the following consequences: ( 1 ) ℒ() = ℒ(Norm()), and ( 2 ) all preference operators (“;”) are at the top level of Norm(). We call the maximal “;”-free parts of Norm() preference-free components. The following auxiliary functions will then be used in formulating subsumptions with PDDs to statically test for singularity of each preference free component.

Pfs({?}) = {id }

Pfs(A) = { }

Pfs(∃Pd.) = {Pd . Pd | Pd ∈ Pfs()} Pfs(1 ⊓ 2) = Pfs(1) ∪ Pfs(2) Con({?}) = ⊤

Con(A) = A

Con(∃Pd.) = ∃Pd.Con() Con(1 ⊓ 2) = Con(1) ⊓ Con(2) The functions extract a set of paths leading to nominals and a FunDL concept from the preferencefree referring expression type. Altogether, the singularity test in [8, Theorem 20] will now apply:

Theorem 10. Let be a TBox and a referring expression type. Then all referring concepts

in ℒ() are singular with respect to if |= Con(′) ⊑ (Con(′) : Pfs(′) → id ) for every preference-free component ′ of Norm().

For example, a test for singularity of ( 5 ) with respect to TBox would require the following two subsumptions to be logical consequences of (and note that subsumptions ( 3 ) and ( 6 ) above also being logical consequences would be important factors in establishing this): |= 1 ⊑ (1 : fname, lname → id ) |= 2 ⊑ (2 : fname, phonesgroup.phonedom− .phoneran.dialnum → id ) where 1 = CLIENT ⊓ ∃fname.⊤ ⊓ ∃lname.⊤ and 2 = CLIENT-FRIEND ⊓ ∃fname.⊤ ⊓ ∃phonesgroup.phonedom− .phoneran.⊤ ⊓ ∃phonesgroup.phonedom− .phoneran.dialnum.⊤.

5. Summary and Future Work

The primary incentive for set-ℒℱ ℐ stems from an outline of future work in [3] which recognized the need for plural entities in formally capturing JSON array values and for the extension to referring expression types to accommodate references to such entities. Subsumption ( 4 ) and ( 5 ) in our introduction and definitions sections are derived from a case considered in this earlier work.

The diagnosis of singularity for referring expression types is only one of a number of possibilities. Other criteria are also important and remain avenues for further work on set-ℒℱ ℐ, in particular that concern the more general issues of identification resolution in query answering for backend data sources such as SQL engines, as explored in [13]. One example considered in [8] concerns strong identification (in contrast to singularity, an issue in weak identification ): to determine if any pair of distinct referring concepts in ℒ() must refer to distinct domain elements, that is, where reasoning about inequality is also important. This becomes a more complicated issue in set-ℒℱ ℐ.

There are also a number of open problems and research issues that remain for set-ℒℱ ℐ. We end with a couple that are more pressing: • On alternative semantics for PDDs: we have outlined how a set equality semantics leads to undecidability in Subsection 3.1. However, an alternative non-empty set equality semantics seems to be straightforward, with an analogous proof of decidability to our non-empty intersection semantics. The case is far less clear, however, for a “mixed mode” in which a PDD could involve both. • On undecidability of knowledge base (KB) consistency (deriving from [12, Section 5]): as with PFDs, it is necessary to impose restrictions on the use of PDDs for set-ℒℱ ℐ KBs that have ABoxes as well as TBoxes to ensure decidability of KB consistency. This has been explored extensively for Horn fragments of FunDL dialects and even to fragments with KB consistency in PTIME [14]. Obtaining analogous results for set-ℒℱ ℐ is desirable.

[1]

McIntyre ,

Toman ,

G. E.

Weddell , FunDL - A family of feature-based description logics, with applications in querying structured data sources , in: Description Logic,

Theory

Combination , and All That - Essays Dedicated to Franz Baader on the Occasion of His 60th Birthday , 2019 , pp. 404 - 430 .

[2]

Toman ,

G. E.

Weddell , Using feature-based description logics to avoid duplicate elimination in object-relational query languages , Künstliche Intell . 34 ( 2020 ) 355 - 363 . URL: https://doi.org/10.1007/s13218-020-00666-7. doi: 10 .1007/s13218-020-00666-7.

[3]

Borgida ,

Franconi ,

Toman ,

G. E.

Weddell , Understanding document data sources using ontologies with referring expressions , in: H. Aziz , D. Corrêa , T. French (Eds.), AI 2022: Advances in Artificial Intelligence - 35th Australasian Joint Conference , AI 2022 , Perth , WA, Australia, December 5- 8 , 2022 , Proceedings, volume 13728 of Lecture Notes in Computer Science, Springer, 2022 , pp. 367 - 380 .

[4]

Artale , E. Franconi,

Peñaloza ,

Sportelli , A decidable very expressive description logic for databases , in: ISWC 2017 - 16th International Semantic Web Conference , volume 10587 , Springer, 2017 , pp. 37 - 52 .

[5]

Franconi , A treatment of plurals and plural quantifications based on a theory of collections , Minds Mach . 3 ( 1993 ) 453 - 474 .

[6]

Toman , G. Weddell, On Keys and Functional Dependencies as First-Class Citizens in Description Logics , in: Proc. of Int. Joint Conf. on Automated Reasoning (IJCAR) , 2006 , pp. 647 - 661 .

[7]

Calvanese , G. De Giacomo,

Lembo ,

Lenzerini ,

Rosati , Path-Based Identification Constraints in Description Logics , in: KR, 2008 , pp. 231 - 241 .

[8]

Borgida ,

Toman , G. Weddell, On referring expressions in query answering over first order knowledge bases , in: Proc. Principles of Knowledge Representation and Reasoning , KR 2016 , 2016 , pp. 319 - 328 .

[9]

Toman ,

G. E.

Weddell , On the interaction between inverse features and path-functional dependencies in description logics , in: Proc. Int. Joint Conf. on Artificial Intelligence (IJCAI) , 2005 , pp. 603 - 608 .

[10]

Ackermann , Uber die Erfullbarkeit gewisser Zahlausdrucke, Mathematische Annalen 100 ( 1928 ) 638 - 649 .

[11]

Fürer , Alternation and the Ackermann case of the decision problem, L'Enseignement Mathematique ( 1981 ).

[12]

Toman ,

G. E.

Weddell , On keys and functional dependencies as first-class citizens in description logics , J. Aut. Reasoning 40 ( 2008 ) 117 - 132 .

[13]

Borgida ,

Toman ,

G. E.

Weddell , On referring expressions in information systems derived from conceptual modelling , in: I. Comyn-Wattiau , K.

Tanaka , I. Song, S.

Yamamoto , M. Saeki (Eds.), Conceptual Modeling - 35th International Conference, ER 2016 , volume 9974 of Lecture Notes in Computer Science, 2016 , pp. 183 - 197 .

[14]

McIntyre ,

Borgida ,

Toman ,

G. E.

Weddell , On limited conjunctions and partial features in parameter-tractable feature logics , in: The Thirty-Third AAAI Conference on Artificial Intelligence , AAAI 2019 , 2019 , pp. 2995 - 3002 .