Spectra of Cardinality Queries over Description Logic Knowledge Bases (Extended Abstract) Quentin Maniรจre1,2 , Marcin Przybyล‚ko1 1 Department of Computer Science, Leipzig University, Germany 2 Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Dresden/Leipzig, Germany Keywords Description Logics, Counting queries, Spectrum, Complexity of reasoning 1. Introduction A recent line of research has explored ways of leveraging ontology-mediated query answering (OMQA) to support counting queries, a class of aggregate queries that allows to perform analytics on data. Several semantics for such queries have been investigated, differing on how the possibility of multiple models can be taken into account [1, 2, 3]. In this paper we adopt the semantic proposed in [2], extending [4], that defines a counting query as a conjunctive query (CQ) in which some variables have been designated as counting variables. Such a query is evaluated in every model of the knowledge base (KB) by considering every possible homomorphism of the query into the model and by then returning the number of obtained assignments for the counting variables. A certain answer of the counting query over the KB is then defined as an interval J๐‘š, ๐‘€ K that contains all the possible answers across all possible models, i.e. a uniform lower bound ๐‘š and a uniform upper bound ๐‘€ . The complexity of deciding whether a given interval is a certain answer under this semantics is now well-understood for a variety of DLs [5, 6]. In the present paper, rather than providing uniform bounds, we aim to compute (a representation of) the set of possible answers, which is a subset (of tuples) of natural numbers with infinity, i.e. a subset of Nโˆž := {0, 1, 2, . . . , โˆž}. We call this subset the spectrum of the counting query, inspired by the notion of spectrum of a formula, that is the set of the possible cardinalities of its models [7, 8]. To do so, we first investigate the possible shapes of spectra for counting conjunctive queries (CCQs) and ontologies expressed in the ๐’œโ„’๐’žโ„โ„ฑ. Traditional CQ answering is well-understood in this expressive DL [9, 10] that supports functionality constraints whose interactions with counting queries have never been studied to the best of our knowledge (those proposed in [5] and denoted ๐’ฉ โˆ’ being more restricted). One of the challenges encountered in our work is to clarify how to represent spectra. Indeed, the set of possible answers of a CCQ across models of a KB might, a priori, be an arbitrarily complex set of natural numbers, and thus hard to describe by means other than providing the CCQ-KB couple. We aim to identify classes of ontology-mediated queries (OMQs) whose spectra admit an effective representation. By effective, we intend that (i) the representation is finite, ideally with a size that can be bounded by the size of the OMQ, (ii) independent of the specific description logic, and (iii) spectrum membership can be efficiently tested, i.e. membership can be tested in polynomial time with respect to the size of the integer and of the representation. Finding such a representation, whenever it exists, can be viewed as a precomputation allowing for further analytics. DL 2024: 37th International Workshop on Description Logics, June 18โ€“21, 2024, Bergen, Norway $ quentin.maniere@uni-leipzig.de (Q. Maniรจre); marcin.przybylko@uni-leipzig.de (M. Przybyล‚ko)  0000-0001-9618-8359 (Q. Maniรจre); 0000-0003-1859-7055 (M. Przybyล‚ko) ยฉ 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 2. Contributions Our main contributions are the following. First, we introduce the notion of a spectrum of a CCQ and formalize the problem of computing effective representation thereof, if it exists. We show that connected and individual-free CCQs evaluated on ๐’œโ„’๐’žโ„โ„ฑ KBs always admit effectively representable spectra, as those must be finitely generated subsets of Nโˆž . This further motivates a focus on cardinality queries, i.e. Boolean atomic CCQs, introduced in [11], which admit effective representation. We fully characterize the possible shapes of spectra for concept cardinality queries on ๐’œโ„’๐’žโ„โ„ฑ KBs, in particular showing that every finitely generated subset of Nโˆž can be realized. We also study several sublogics of ๐’œโ„’๐’žโ„โ„ฑ, from โ„ฐโ„’ and DL-Litecore , for most of which we obtain full characterizations of possible shapes of spectra. For some, only simpler shapes, such as J๐‘š, +โˆžK, are possible (see Table 1). For the โ„ฐโ„’โ„โ„ฑโŠฅ DL, corresponding to the Horn fragment of ๐’œโ„’๐’žโ„โ„ฑ, we notably use tailored variations of the cycle-reversion techniques introduced to tackle finite model reasoning in such DLs [12, 13, 14]. Our work also features a wealth of examples to prove whether a given shape can indeed be obtained. We further investigate the case of role cardinality queries which feature challenging shapes of spectra already for โ„ฐโ„’โŠฅ KBs. Through connections with the concept case and refinements of the corresponding constructions, we are able to show that, also in the case of role cardinality queries, the possible shapes for โ„ฐโ„’โ„โ„ฑโŠฅ KBs are better-behaved than for general ๐’œโ„’๐’žโ„โ„ฑ KBs. We conclude with bounds regarding the data complexity of computing some of those effective representations, notably relying on existing results on DLs augmented with closed predicates. The rest of this extended abstract highlights preliminaries regarding spectra, shows a full char- acterization for ๐’œโ„’๐’žโ„โ„ฑ and concept cardinality queries, and discusses our tailored version of the cycle-reversion technique. 3. Effective representations of spectra We assume familiarity with the DL ๐’œโ„’๐’žโ„โ„ฑ, its sublogics and semantics [15]. We assume two disjoint sets of variables and counting variables. A counting conjunctive query (CCQ) takes the form ๐‘ž(๐‘ฅ ยฏ) = โˆƒ๐‘ฆยฏ โˆƒ๐‘งยฏ ๐œ“(๐‘ฅยฏ , ๐‘ฆยฏ, ๐‘งยฏ), where ๐‘ฅ ยฏ and ๐‘ฆยฏ are tuples of distinct variables, ๐‘งยฏ is a tuple of distinct counting variables and ๐œ“ is a conjunction of concept and role atoms whose terms are drawn from ๐‘ฅ ยฏ โˆช ๐‘ฆยฏ โˆช ๐‘งยฏ or individual names. For a given tuple ๐‘Ž ยฏ of individuals, and a given model โ„ of a KB ๐’ฆ, we define: #๐‘ž(๐‘Ž ยฏ)โ„ := #{๐œ‹|๐‘งยฏ | ๐œ‹ : ๐‘ž โ†’ โ„ homomorphism such that ๐œ‹(๐‘ฅ ยฏ) = ๐‘Žยฏ}. The spectrum of ๐‘ž and ๐‘Ž ยฏ over ๐’ฆ โ„ is further defined as: Sp๐’ฆ (๐‘ž(๐‘Ž ยฏ)) := {#๐‘ž(๐‘Ž ยฏ) | โ„ |= ๐’ฆ}. While we do not know whether all spectra can be effectively represented, in the sense explained in the introduction, we identify a class of CCQs, namely connected and individual-free CCQs, whose spectra admit such a representation based on the following property: Lemma 1. If ๐‘ž is connected and individual-free, then Sp๐’ฆ (๐‘ž) is closed under addition. It is indeed well-known that every subset of Nโˆž closed under addition is finitely generated (see e.g. [16], Chapter 2, Proposition 4.1). In particular, for every spectrum Sp๐’ฆ (๐‘ž) of a satisfiable connected and individual-free CCQ ๐‘ž, there exist a finite subset ๐‘† of Nโˆž and two numbers ๐‘€, ๐‘› โˆˆ Nโˆž such that Sp๐’ฆ (๐‘ž) = ๐‘† โˆช {๐‘€ + ๐‘˜ ยท ๐‘› | ๐‘˜ โˆˆ N}. Therefore, the problem of computing Sp๐’ฆ (๐‘ž) for such a pair (๐’ฆ, ๐‘ž) can be properly defined as the task of computing ๐‘†, ๐‘€ and ๐‘›. Notice that Sp๐’ฆ (๐‘ž) = โˆ… if and only if ๐’ฆ is unsatisfiable; and, similarly, Sp๐’ฆ (๐‘ž) = {0} if and only if ๐’ฆ is satisfiable but ๐‘ž is unsatisfiable with respect to ๐’ฆ. Spectra non-closed under addition are easily found by dropping the above restriction to connected and individual-free CCQs, as shown by the following: Example 1. Consider the empty KB ๐’ฆ and the two Boolean CCQs ๐‘ž1 := โˆƒ๐‘ง1 โˆƒ๐‘ง2 C(๐‘ง1 ) โˆง C(๐‘ง2 ) and ๐‘ž2 := โˆƒ๐‘ง1 โˆƒ๐‘ง2 r(a, ๐‘ง1 ) โˆง r(a, ๐‘ง2 ), where ๐‘ง1 and ๐‘ง2 are counting variables. It is easily verified that Sp๐’ฆ (๐‘ž1 ) = Sp๐’ฆ (๐‘ž2 ) = {๐‘›2 | ๐‘› โˆˆ N} โˆช {โˆž}. Table 1 Possible shapes of spectra for some description logics, here ๐‘š โˆˆ N and ๐‘‰ is any subsemigroup of N. โ‹† indicates no other shape is possible. โ„’ J๐‘š, โˆžK โˆ… {0} {0} โˆช J๐‘š, โˆžK {โˆž} {0, โˆž} ๐‘‰ โˆช {โˆž} ๐’œโ„’๐’žโ„โ„ฑ โ‹† โœ“ โœ“ โœ“ โœ“ โœ“ โœ“ โœ“ โ„ฐโ„’โ„โ„ฑโŠฅ ยท โœ“ โœ“ โœ“ โœ“ โœ“ โœ“ ยท DL-Liteโ„ฑ โ‹† โœ“ โœ“ โœ“ โœ“ โœ“ โœ“ ยท DL-Litecore , ๐’œโ„’๐’žโ„, ๐’œโ„’๐’žโ„ฑ โ‹† โœ“ โœ“ โœ“ โœ“ ยท ยท ยท โ„ฐโ„’โ„โ„ฑ โ‹† โœ“ โœ“ ยท ยท โœ“ ยท ยท โ„ฐโ„’ โ‹† โœ“ ยท ยท ยท ยท ยท ยท 4. Spectrum of a concept cardinality query We now present two results regarding the query ๐‘žC := โˆƒ๐‘ง C(๐‘ง), where C is a concept name and ๐‘ง a counting variable. Computing theโƒ’ spectrum of ๐‘žC over a KB ๐’ฆ thus corresponds to the natural task of deciding the possible values of โƒ’Cโ„ โƒ’ across the models โ„ of ๐’ฆ. Naturally, ๐‘žC satisfies preconditions of โƒ’ Lemma 1 and, thus, its spectrum is finitely generated. Conversely, one can ask which sets are spectra of concept cardinality queries. For a DL โ„’, we say that a set ๐‘‰ is โ„’-concept realizable if there is a concept ๐ถ and an โ„’ KB ๐’ฆ such that Sp๐’ฆ (๐‘žC ) = ๐‘‰ . For ๐’œโ„’๐’žโ„โ„ฑ KBs, we have the following complete characterization. Theorem 1. A subset of Nโˆž is ๐’œโ„’๐’žโ„โ„ฑ-concept realizable iff it is โˆ…, {0}, or any subsemigroup of Nโˆž containing โˆž. The โ€œonly-ifโ€ direction is essentially a consequence of Lemma 1, while the โ€œifโ€ direction is a general- ization of the following example in which Sp๐’ฆ (๐‘žC ) = 2N โˆช {โˆž}. Example 2. Consider the KB ๐’ฆ with the empty ABox and the following ๐’œโ„’๐’žโ„โ„ฑ TBox enforcing that ๐‘Ÿ is a bijection between ๐ด and ๐ต: Cโ‰กAโŠ”B A โŠ“ B โŠ‘ โŠฅ A โŠ‘ โˆƒr.B B โŠ‘ โˆƒrโˆ’ .A โŠค โŠ‘ โ‰ค 1r.โŠค โŠค โŠ‘ โ‰ค 1rโˆ’ .โŠค We now turn to โ„ฐโ„’โ„โ„ฑโŠฅ KBs, for which we are not able to obtain a full characterization of the realizable spectrum. However, we prove that for a set to be realizable, it must have ๐›ผ = 1, that is the possible shapes simplify to ๐‘† โˆช J๐‘š, โˆžK for some ๐‘š โˆˆ Nโˆž and ๐‘† โІ J0, ๐‘šK. Theorem 2. If a subset of Nโˆž is โ„ฐโ„’โ„โ„ฑโŠฅ -concept realizable, then it has shape โˆ…, {0}, {โˆž}, {0, โˆž}, or ๐‘† โˆช J๐‘š, โˆžK for some ๐‘š โˆˆ N and ๐‘† โІ J0, ๐‘šK. The key ingredient to prove the aboveโƒ’ is aโƒ’ construction of two (potentially infinite) models โ„ and ๐’ฅ of an โ„ฐโ„’โ„โ„ฑโŠฅ KB ๐’ฆ = (๐’ฏ , ๐’œ) in which โƒ’C๐’ฅ โƒ’ = โƒ’Cโ„ โƒ’ + 1 < โˆž. To this end, we refine a cycle-reversion โƒ’ โƒ’ technique which has been developed to study finite reasoning in โ„ฐโ„’โ„โ„ฑโŠฅ [14]. More precisely, we tailor the notion of cycles to characterize under which conditions the extension of concept C may be finite, and then carefully manipulate the corresponding models to produce the above โ„ and ๐’ฅ . Definition 1 below describes the cycles of interest for our study. An inverse functional path (IFP) is a sequence ๐พ0 , r1 , ๐พ1 , . . . , r๐‘› , ๐พ๐‘› where ๐‘› โ‰ฅ 1, ๐พ0 , . . . , ๐พ๐‘› are conjunctions of concept names and r1 , . . . , r๐‘› are (potentially inverse) roles such that for all 0 โ‰ค ๐‘– < ๐‘› we have ๐’ฏ |= ๐พ๐‘– โŠ‘ โˆƒr๐‘–+1 .๐พ๐‘–+1 and ๐’ฏ |= ๐พ๐‘–+1 โŠ‘ โ‰ค 1rโˆ’ ๐‘–+1 .๐พ๐‘– . The interesting cycles for a concept C are the IFPs looping on themselves and forcing the presence of (at least) one instance of C โ€œper instance of the cycleโ€, as follows: Definition 1. An IFP ๐พ0 , r1 , ๐พ1 , . . . , r๐‘› , ๐พ๐‘› is a C-generating cycle if ๐’ฏ |= ๐พ๐‘› โŠ‘ ๐พ0 and there exists an IFP ๐ฟ0 , s1 , ๐ฟ1 , . . . , s๐‘š , ๐ฟ๐‘š such that ๐’ฏ |= ๐ฟ๐‘š โŠ‘ C and ๐’ฏ |= ๐พ๐‘– โŠ‘ ๐ฟ0 for some 0 โ‰ค ๐‘– โ‰ค ๐‘›. Reversing those cycles now means to consider the โ„ฐโ„’โ„โ„ฑโŠฅ TBox ๐’ฏC obtained from ๐’ฏ by adding, for each C-generating cycle ๐พ0 , r1 , ๐พ1 , . . . , r๐‘› , ๐พ๐‘› and each 0 โ‰ค ๐‘– < ๐‘›, the axioms ๐พ๐‘–+1 โŠ‘ โˆƒrโˆ’ ๐‘– .๐พ๐‘– and ๐พ๐‘– โŠ‘ โ‰ค 1r๐‘–+1 .๐พ๐‘–+1 . A key result towards the proof of Theorem 2 is now: Lemma 2. There is a model โ„ of ๐’ฆ such that โƒ’Cโ„ โƒ’ < โˆž if and only if the KB (๐’ฏC , ๐’œ) is satisfiable. โƒ’ โƒ’ 5. Perspectives A full characterization of spectra shapes for โ„ฐโ„’โ„โ„ฑโŠฅ appears challenging as Theorem 2 suggests that those spectra may have the shapes of arbitrary numerical semigroups. Furthermore, while Theorem 1 offers a complete characterization for ๐’œโ„’๐’žโ„โ„ฑ, how to compute the corresponding effective representa- tions remains an open question. We believe that it could also be interesting to study the impact of our results on the closely related problem of answering (Boolean atomic) queries under the bag semantics. While the semantics adopted in the present paper does not coincide with bag semantics, as discussed for example in [17, 5], con- siderations regarding the spectra and some of the corresponding techniques might be adapted to this setting. Acknowledgments The authors acknowledge the financial support by the Federal Ministry of Education and Research of Germany and by the Sรคchsische Staatsministerium fรผr Wissenschaft Kultur und Tourismus in the program Center of Excellence for AI-research โ€œCenter for Scalable Data Analytics and Artificial Intelligence Dresden/Leipzigโ€, project identification number: ScaDS.AI. Second author was supported by the DFG project LU 1417/3-1 QTEC. References [1] D. Calvanese, E. Kharlamov, W. Nutt, C. Thorne, Aggregate queries over ontologies, in: Proceedings of the 2nd International Workshop on Ontologies and Information Systems for the Semantic Web (ONISW), 2008, pp. 97โ€“104. [2] M. Bienvenu, Q. Maniรจre, M. Thomazo, Answering counting queries over DL-Lite ontologies, in: Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI), 2020, pp. 1608โ€“1614. [3] C. Feier, C. Lutz, M. Przybylko, Answer counting under guarded TGDs, in: Proceedings of the 24th International Conference on Database Theory (ICDT), 2021, pp. 11:1โ€“11:22. [4] E. V. Kostylev, J. L. Reutter, Complexity of answering counting aggregate queries over DL-Lite, Journal of Web Semantics (JWS) 33 (2015) 94โ€“111. [5] D. Calvanese, J. Corman, D. Lanti, S. Razniewski, Counting query answers over a DL-Lite knowl- edge base, in: Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI), 2020, pp. 1658โ€“1666. [6] M. Bienvenu, Q. Maniรจre, M. Thomazo, Counting queries over ๐’œโ„’๐’žโ„‹โ„ ontologies, in: Proceedings of the 19th International Conference on Principles of Knowledge Representation and Reasoning (KR), 2022, pp. 53โ€“62. [7] R. Fagin, Generalized first-order spectra and polynomial-time recognizable sets, Complexity of computation 7 (1974) 43โ€“73. [8] A. Durand, R. Fagin, B. Loescher, Spectra with only unary function symbols, in: Proceedings of the 11th International Workshop on Computer Science Logic (CSL), 1997, pp. 189โ€“202. [9] B. Glimm, I. Horrocks, C. Lutz, U. Sattler, Conjunctive query answering for the description logic ๐’ฎโ„‹โ„๐’ฌ, Journal of Artificial Intelligence Research (JAIR) 31 (2008) 157โ€“204. [10] C. Lutz, The complexity of conjunctive query answering in expressive description logics, in: Proceedings of the 4th International Joint Conference on Automated Reasoning (IJCAR), 2008, pp. 179โ€“193. [11] M. Bienvenu, Q. Maniรจre, M. Thomazo, Cardinality queries over DL-Lite ontologies, in: Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI), 2021, pp. 1801โ€“1807. [12] S. S. Cosmadakis, P. C. Kanellakis, M. Y. Vardi, Polynomial-time implication problems for unary inclusion dependencies, Journal of the ACM 37 (1990) 15โ€“46. [13] R. Rosati, Finite model reasoning in DL-Lite, in: Proceedings of the 5th European Semantic Web Conference (ESWC), 2008, pp. 215โ€“229. [14] Y. A. Ibรกรฑez-Garcรญa, C. Lutz, T. Schneider, Finite Model Reasoning in Horn Description Logics, in: Proceedings of the 14th International Conference on Principles of Knowledge Representation and Reasoning (KR), 2014, pp. 490โ€“509. [15] F. Baader, I. Horrocks, C. Lutz, U. Sattler, An Introduction to Description Logic, Cambridge University Press, 2017. [16] P. A. Grillet, Commutative Semigroups, Springer New York, NY, 2001. [17] C. Nikolaou, E. V. Kostylev, G. Konstantinidis, M. Kaminski, B. Cuenca Grau, I. Horrocks, Founda- tions of ontology-based data access under bag semantics, Journal of Artificial Intelligence (AIJ) (2019) 91โ€“132.