Spectra of Cardinality Queries over Description Logic
                         Knowledge Bases (Extended Abstract)
                         Quentin Manière1,2 , Marcin Przybyłko1
                         1
                             Department of Computer Science, Leipzig University, Germany
                         2
                             Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Dresden/Leipzig, Germany

                         Keywords
                         Description Logics, Counting queries, Spectrum, Complexity of reasoning


                         1. Introduction
                         A recent line of research has explored ways of leveraging ontology-mediated query answering (OMQA)
                         to support counting queries, a class of aggregate queries that allows to perform analytics on data.
                         Several semantics for such queries have been investigated, differing on how the possibility of multiple
                         models can be taken into account [1, 2, 3].
                            In this paper we adopt the semantic proposed in [2], extending [4], that defines a counting query
                         as a conjunctive query (CQ) in which some variables have been designated as counting variables.
                         Such a query is evaluated in every model of the knowledge base (KB) by considering every possible
                         homomorphism of the query into the model and by then returning the number of obtained assignments
                         for the counting variables. A certain answer of the counting query over the KB is then defined as an
                         interval J𝑚, 𝑀 K that contains all the possible answers across all possible models, i.e. a uniform lower
                         bound 𝑚 and a uniform upper bound 𝑀 . The complexity of deciding whether a given interval is a
                         certain answer under this semantics is now well-understood for a variety of DLs [5, 6].
                            In the present paper, rather than providing uniform bounds, we aim to compute (a representation of)
                         the set of possible answers, which is a subset (of tuples) of natural numbers with infinity, i.e. a subset of
                         N∞ := {0, 1, 2, . . . , ∞}. We call this subset the spectrum of the counting query, inspired by the notion
                         of spectrum of a formula, that is the set of the possible cardinalities of its models [7, 8]. To do so, we
                         first investigate the possible shapes of spectra for counting conjunctive queries (CCQs) and ontologies
                         expressed in the 𝒜ℒ𝒞ℐℱ. Traditional CQ answering is well-understood in this expressive DL [9, 10]
                         that supports functionality constraints whose interactions with counting queries have never been
                         studied to the best of our knowledge (those proposed in [5] and denoted 𝒩 − being more restricted).
                            One of the challenges encountered in our work is to clarify how to represent spectra. Indeed, the set
                         of possible answers of a CCQ across models of a KB might, a priori, be an arbitrarily complex set of
                         natural numbers, and thus hard to describe by means other than providing the CCQ-KB couple. We aim
                         to identify classes of ontology-mediated queries (OMQs) whose spectra admit an effective representation.
                         By effective, we intend that (i) the representation is finite, ideally with a size that can be bounded by
                         the size of the OMQ, (ii) independent of the specific description logic, and (iii) spectrum membership
                         can be efficiently tested, i.e. membership can be tested in polynomial time with respect to the size of
                         the integer and of the representation. Finding such a representation, whenever it exists, can be viewed
                         as a precomputation allowing for further analytics.


                            DL 2024: 37th International Workshop on Description Logics, June 18–21, 2024, Bergen, Norway
                          $ quentin.maniere@uni-leipzig.de (Q. Manière); marcin.przybylko@uni-leipzig.de (M. Przybyłko)
                           0000-0001-9618-8359 (Q. Manière); 0000-0003-1859-7055 (M. Przybyłko)
                                        © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
2. Contributions
Our main contributions are the following. First, we introduce the notion of a spectrum of a CCQ
and formalize the problem of computing effective representation thereof, if it exists. We show that
connected and individual-free CCQs evaluated on 𝒜ℒ𝒞ℐℱ KBs always admit effectively representable
spectra, as those must be finitely generated subsets of N∞ . This further motivates a focus on cardinality
queries, i.e. Boolean atomic CCQs, introduced in [11], which admit effective representation. We fully
characterize the possible shapes of spectra for concept cardinality queries on 𝒜ℒ𝒞ℐℱ KBs, in particular
showing that every finitely generated subset of N∞ can be realized. We also study several sublogics
of 𝒜ℒ𝒞ℐℱ, from ℰℒ and DL-Litecore , for most of which we obtain full characterizations of possible
shapes of spectra. For some, only simpler shapes, such as J𝑚, +∞K, are possible (see Table 1). For
the ℰℒℐℱ⊥ DL, corresponding to the Horn fragment of 𝒜ℒ𝒞ℐℱ, we notably use tailored variations
of the cycle-reversion techniques introduced to tackle finite model reasoning in such DLs [12, 13, 14].
Our work also features a wealth of examples to prove whether a given shape can indeed be obtained.
We further investigate the case of role cardinality queries which feature challenging shapes of spectra
already for ℰℒ⊥ KBs. Through connections with the concept case and refinements of the corresponding
constructions, we are able to show that, also in the case of role cardinality queries, the possible shapes
for ℰℒℐℱ⊥ KBs are better-behaved than for general 𝒜ℒ𝒞ℐℱ KBs. We conclude with bounds regarding
the data complexity of computing some of those effective representations, notably relying on existing
results on DLs augmented with closed predicates.
   The rest of this extended abstract highlights preliminaries regarding spectra, shows a full char-
acterization for 𝒜ℒ𝒞ℐℱ and concept cardinality queries, and discusses our tailored version of the
cycle-reversion technique.


3. Effective representations of spectra
We assume familiarity with the DL 𝒜ℒ𝒞ℐℱ, its sublogics and semantics [15]. We assume two disjoint
sets of variables and counting variables. A counting conjunctive query (CCQ) takes the form 𝑞(𝑥        ¯) =
∃𝑦¯ ∃𝑧¯ 𝜓(𝑥¯ , 𝑦¯, 𝑧¯), where 𝑥
                              ¯ and 𝑦¯ are tuples of distinct variables, 𝑧¯ is a tuple of distinct counting
variables and 𝜓 is a conjunction of concept and role atoms whose terms are drawn from 𝑥           ¯ ∪ 𝑦¯ ∪ 𝑧¯
or individual names. For a given tuple 𝑎     ¯ of individuals, and a given model ℐ of a KB 𝒦, we define:
#𝑞(𝑎 ¯)ℐ := #{𝜋|𝑧¯ | 𝜋 : 𝑞 → ℐ homomorphism such that 𝜋(𝑥        ¯) = 𝑎¯}. The spectrum of 𝑞 and 𝑎 ¯ over 𝒦
                                                ℐ
is further defined as: Sp𝒦 (𝑞(𝑎  ¯)) := {#𝑞(𝑎 ¯) | ℐ |= 𝒦}.
   While we do not know whether all spectra can be effectively represented, in the sense explained
in the introduction, we identify a class of CCQs, namely connected and individual-free CCQs, whose
spectra admit such a representation based on the following property:

Lemma 1. If 𝑞 is connected and individual-free, then Sp𝒦 (𝑞) is closed under addition.

   It is indeed well-known that every subset of N∞ closed under addition is finitely generated (see e.g.
[16], Chapter 2, Proposition 4.1). In particular, for every spectrum Sp𝒦 (𝑞) of a satisfiable connected
and individual-free CCQ 𝑞, there exist a finite subset 𝑆 of N∞ and two numbers 𝑀, 𝑛 ∈ N∞ such that
Sp𝒦 (𝑞) = 𝑆 ∪ {𝑀 + 𝑘 · 𝑛 | 𝑘 ∈ N}. Therefore, the problem of computing Sp𝒦 (𝑞) for such a pair (𝒦, 𝑞)
can be properly defined as the task of computing 𝑆, 𝑀 and 𝑛.
   Notice that Sp𝒦 (𝑞) = ∅ if and only if 𝒦 is unsatisfiable; and, similarly, Sp𝒦 (𝑞) = {0} if and only if 𝒦
is satisfiable but 𝑞 is unsatisfiable with respect to 𝒦. Spectra non-closed under addition are easily found
by dropping the above restriction to connected and individual-free CCQs, as shown by the following:

Example 1. Consider the empty KB 𝒦 and the two Boolean CCQs 𝑞1 := ∃𝑧1 ∃𝑧2 C(𝑧1 ) ∧ C(𝑧2 ) and
𝑞2 := ∃𝑧1 ∃𝑧2 r(a, 𝑧1 ) ∧ r(a, 𝑧2 ), where 𝑧1 and 𝑧2 are counting variables. It is easily verified that
Sp𝒦 (𝑞1 ) = Sp𝒦 (𝑞2 ) = {𝑛2 | 𝑛 ∈ N} ∪ {∞}.
    Table 1
    Possible shapes of spectra for some description logics, here 𝑚 ∈ N and 𝑉 is any subsemigroup of N. ⋆
    indicates no other shape is possible.
                 ℒ                     J𝑚, ∞K     ∅    {0}   {0} ∪ J𝑚, ∞K      {∞}     {0, ∞}    𝑉 ∪ {∞}
             𝒜ℒ𝒞ℐℱ                 ⋆      ✓       ✓    ✓           ✓            ✓        ✓           ✓
             ℰℒℐℱ⊥                 ·      ✓       ✓    ✓           ✓            ✓        ✓           ·
            DL-Liteℱ               ⋆      ✓       ✓    ✓           ✓            ✓        ✓           ·
    DL-Litecore , 𝒜ℒ𝒞ℐ, 𝒜ℒ𝒞ℱ       ⋆      ✓       ✓    ✓           ✓            ·        ·           ·
              ℰℒℐℱ                 ⋆      ✓       ✓    ·           ·            ✓        ·           ·
                  ℰℒ               ⋆      ✓       ·    ·           ·            ·        ·           ·


4. Spectrum of a concept cardinality query
We now present two results regarding the query 𝑞C := ∃𝑧 C(𝑧), where C is a concept name and 𝑧 a
counting variable. Computing the⃒ spectrum    of 𝑞C over a KB 𝒦 thus corresponds to the natural task of
deciding the possible values of ⃒Cℐ ⃒ across the models ℐ of 𝒦. Naturally, 𝑞C satisfies preconditions of
                                    ⃒

Lemma 1 and, thus, its spectrum is finitely generated. Conversely, one can ask which sets are spectra
of concept cardinality queries. For a DL ℒ, we say that a set 𝑉 is ℒ-concept realizable if there is a
concept 𝐶 and an ℒ KB 𝒦 such that Sp𝒦 (𝑞C ) = 𝑉 . For 𝒜ℒ𝒞ℐℱ KBs, we have the following complete
characterization.

Theorem 1. A subset of N∞ is 𝒜ℒ𝒞ℐℱ-concept realizable iff it is ∅, {0}, or any subsemigroup of N∞
containing ∞.

   The “only-if” direction is essentially a consequence of Lemma 1, while the “if” direction is a general-
ization of the following example in which Sp𝒦 (𝑞C ) = 2N ∪ {∞}.

Example 2. Consider the KB 𝒦 with the empty ABox and the following 𝒜ℒ𝒞ℐℱ TBox enforcing that 𝑟 is
a bijection between 𝐴 and 𝐵:

          C≡A⊔B          A ⊓ B ⊑ ⊥ A ⊑ ∃r.B           B ⊑ ∃r− .A       ⊤ ⊑ ≤ 1r.⊤     ⊤ ⊑ ≤ 1r− .⊤

  We now turn to ℰℒℐℱ⊥ KBs, for which we are not able to obtain a full characterization of the
realizable spectrum. However, we prove that for a set to be realizable, it must have 𝛼 = 1, that is the
possible shapes simplify to 𝑆 ∪ J𝑚, ∞K for some 𝑚 ∈ N∞ and 𝑆 ⊆ J0, 𝑚K.

Theorem 2. If a subset of N∞ is ℰℒℐℱ⊥ -concept realizable, then it has shape ∅, {0}, {∞}, {0, ∞}, or
𝑆 ∪ J𝑚, ∞K for some 𝑚 ∈ N and 𝑆 ⊆ J0, 𝑚K.

   The key ingredient to prove the above⃒ is a⃒ construction   of two (potentially infinite) models ℐ and 𝒥
of an ℰℒℐℱ⊥ KB 𝒦 = (𝒯 , 𝒜) in which ⃒C𝒥 ⃒ = ⃒Cℐ ⃒ + 1 < ∞. To this end, we refine a cycle-reversion
                                                    ⃒ ⃒

technique which has been developed to study finite reasoning in ℰℒℐℱ⊥ [14]. More precisely, we tailor
the notion of cycles to characterize under which conditions the extension of concept C may be finite,
and then carefully manipulate the corresponding models to produce the above ℐ and 𝒥 . Definition 1
below describes the cycles of interest for our study.
   An inverse functional path (IFP) is a sequence 𝐾0 , r1 , 𝐾1 , . . . , r𝑛 , 𝐾𝑛 where 𝑛 ≥ 1, 𝐾0 , . . . , 𝐾𝑛 are
conjunctions of concept names and r1 , . . . , r𝑛 are (potentially inverse) roles such that for all 0 ≤ 𝑖 < 𝑛
we have 𝒯 |= 𝐾𝑖 ⊑ ∃r𝑖+1 .𝐾𝑖+1 and 𝒯 |= 𝐾𝑖+1 ⊑ ≤ 1r−         𝑖+1 .𝐾𝑖 .
   The interesting cycles for a concept C are the IFPs looping on themselves and forcing the presence of
(at least) one instance of C “per instance of the cycle”, as follows:

Definition 1. An IFP 𝐾0 , r1 , 𝐾1 , . . . , r𝑛 , 𝐾𝑛 is a C-generating cycle if 𝒯 |= 𝐾𝑛 ⊑ 𝐾0 and there exists
an IFP 𝐿0 , s1 , 𝐿1 , . . . , s𝑚 , 𝐿𝑚 such that 𝒯 |= 𝐿𝑚 ⊑ C and 𝒯 |= 𝐾𝑖 ⊑ 𝐿0 for some 0 ≤ 𝑖 ≤ 𝑛.
  Reversing those cycles now means to consider the ℰℒℐℱ⊥ TBox 𝒯C obtained from 𝒯 by adding, for
each C-generating cycle 𝐾0 , r1 , 𝐾1 , . . . , r𝑛 , 𝐾𝑛 and each 0 ≤ 𝑖 < 𝑛, the axioms 𝐾𝑖+1 ⊑ ∃r−
                                                                                               𝑖 .𝐾𝑖 and
𝐾𝑖 ⊑ ≤ 1r𝑖+1 .𝐾𝑖+1 . A key result towards the proof of Theorem 2 is now:

Lemma 2. There is a model ℐ of 𝒦 such that ⃒Cℐ ⃒ < ∞ if and only if the KB (𝒯C , 𝒜) is satisfiable.
                                                    ⃒ ⃒


5. Perspectives
A full characterization of spectra shapes for ℰℒℐℱ⊥ appears challenging as Theorem 2 suggests that
those spectra may have the shapes of arbitrary numerical semigroups. Furthermore, while Theorem 1
offers a complete characterization for 𝒜ℒ𝒞ℐℱ, how to compute the corresponding effective representa-
tions remains an open question.
   We believe that it could also be interesting to study the impact of our results on the closely related
problem of answering (Boolean atomic) queries under the bag semantics. While the semantics adopted
in the present paper does not coincide with bag semantics, as discussed for example in [17, 5], con-
siderations regarding the spectra and some of the corresponding techniques might be adapted to this
setting.


Acknowledgments
The authors acknowledge the financial support by the Federal Ministry of Education and Research
of Germany and by the Sächsische Staatsministerium für Wissenschaft Kultur und Tourismus in
the program Center of Excellence for AI-research “Center for Scalable Data Analytics and Artificial
Intelligence Dresden/Leipzig”, project identification number: ScaDS.AI. Second author was supported
by the DFG project LU 1417/3-1 QTEC.


References
 [1] D. Calvanese, E. Kharlamov, W. Nutt, C. Thorne, Aggregate queries over ontologies, in: Proceedings
     of the 2nd International Workshop on Ontologies and Information Systems for the Semantic Web
     (ONISW), 2008, pp. 97–104.
 [2] M. Bienvenu, Q. Manière, M. Thomazo, Answering counting queries over DL-Lite ontologies, in:
     Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI), 2020, pp.
     1608–1614.
 [3] C. Feier, C. Lutz, M. Przybylko, Answer counting under guarded TGDs, in: Proceedings of the
     24th International Conference on Database Theory (ICDT), 2021, pp. 11:1–11:22.
 [4] E. V. Kostylev, J. L. Reutter, Complexity of answering counting aggregate queries over DL-Lite,
     Journal of Web Semantics (JWS) 33 (2015) 94–111.
 [5] D. Calvanese, J. Corman, D. Lanti, S. Razniewski, Counting query answers over a DL-Lite knowl-
     edge base, in: Proceedings of the 29th International Joint Conference on Artificial Intelligence
     (IJCAI), 2020, pp. 1658–1666.
 [6] M. Bienvenu, Q. Manière, M. Thomazo, Counting queries over 𝒜ℒ𝒞ℋℐ ontologies, in: Proceedings
     of the 19th International Conference on Principles of Knowledge Representation and Reasoning
     (KR), 2022, pp. 53–62.
 [7] R. Fagin, Generalized first-order spectra and polynomial-time recognizable sets, Complexity of
     computation 7 (1974) 43–73.
 [8] A. Durand, R. Fagin, B. Loescher, Spectra with only unary function symbols, in: Proceedings of
     the 11th International Workshop on Computer Science Logic (CSL), 1997, pp. 189–202.
 [9] B. Glimm, I. Horrocks, C. Lutz, U. Sattler, Conjunctive query answering for the description logic
     𝒮ℋℐ𝒬, Journal of Artificial Intelligence Research (JAIR) 31 (2008) 157–204.
[10] C. Lutz, The complexity of conjunctive query answering in expressive description logics, in:
     Proceedings of the 4th International Joint Conference on Automated Reasoning (IJCAR), 2008, pp.
     179–193.
[11] M. Bienvenu, Q. Manière, M. Thomazo, Cardinality queries over DL-Lite ontologies, in: Proceedings
     of the 30th International Joint Conference on Artificial Intelligence (IJCAI), 2021, pp. 1801–1807.
[12] S. S. Cosmadakis, P. C. Kanellakis, M. Y. Vardi, Polynomial-time implication problems for unary
     inclusion dependencies, Journal of the ACM 37 (1990) 15–46.
[13] R. Rosati, Finite model reasoning in DL-Lite, in: Proceedings of the 5th European Semantic Web
     Conference (ESWC), 2008, pp. 215–229.
[14] Y. A. Ibáñez-García, C. Lutz, T. Schneider, Finite Model Reasoning in Horn Description Logics, in:
     Proceedings of the 14th International Conference on Principles of Knowledge Representation and
     Reasoning (KR), 2014, pp. 490–509.
[15] F. Baader, I. Horrocks, C. Lutz, U. Sattler, An Introduction to Description Logic, Cambridge
     University Press, 2017.
[16] P. A. Grillet, Commutative Semigroups, Springer New York, NY, 2001.
[17] C. Nikolaou, E. V. Kostylev, G. Konstantinidis, M. Kaminski, B. Cuenca Grau, I. Horrocks, Founda-
     tions of ontology-based data access under bag semantics, Journal of Artificial Intelligence (AIJ)
     (2019) 91–132.