Answering Counting Queries over DL-Lite
         Ontologies (Extended Abstract)

           Meghyn Bienvenu1 , Quentin Manière1 , and Michaël Thomazo2
      1
           University of Bordeaux, CNRS, Bordeaux INP, LaBRI, Talence, France
               2
                 Inria, DI ENS, ENS, CNRS, University PSL, Paris, France


          Abstract. This extended abstract briefly summarizes our recent work
          [1] on answering counting queries over DL-Lite knowledge bases. Our
          contributions include the definition of a more general form of counting
          conjunctive query (CCQ), a detailed study of the data and combined
          complexity of CCQ answering over DL-Litecore and DL-LiteR knowledge
          bases under various restrictions on the TBox and query, and the precise
          data complexity of identifying the best certain interval.


    Ontology-mediated query answering (OMQA) utilizes ontologies to provide a
convenient vocabulary for query formulation and to capture domain knowledge
that is exploited during the querying process to obtain more complete sets of
answers [9, 2, 10]. Much of the work on OMQA considers ontologies formulated
using description logics (DLs), among which the DL-Lite family [4] has received
particular attention as it enjoys favorable computational properties.
    The vast majority of work on OMQA supposes that user queries are given
as conjunctive queries (CQs). However, there are many other kinds of database
queries, beyond plain CQs, that are relevant in practice. This motivates research
into the feasibility of adopting other database query languages for OMQA. While
enriching CQs with either negated atoms or inequalities has been shown to lead
to undecidability even in very restricted settings [6], the situation is more positive
for navigational queries (like regular path queries), which can be adopted without
losing decidability, sometimes even retaining tractable data complexity [3].
    Aggregate queries, which use numeric operators (e.g. count, sum, max) to
summarize selected parts of a dataset, constitute another prominent class of
database queries. Although such queries are widely used for data analysis, they
have been little explored in context of OMQA. This may be partly due to the
fact that it is not at all obvious how to define the semantics of such queries in the
OMQA setting. A first exploration of aggregate queries in OMQA was conducted
by Calvanese et al. [5]. They argued that the most straightforward adaptation
of classical certain answer semantics to aggregate queries was unsatisfactory, as
often values would differ from model to model, leading to no certain answers. For
this reason, an epistemic semantics was proposed, in which variables involved in
  Copyright c 2020 for this paper by its authors. Use permitted under Creative Com-
  mons License Attribution 4.0 International (CC BY 4.0).
2       M. Bienvenu et al.

                        Data                            Combined
               Rooted Exh. rooted              Rooted              Exh. rooted
                                           Π2p -h
 DL-Litecore   coNP-c        TC0 -c                 / coNEXP           PP-c
                                           PP-h
                                                                  Π2p -h
 DL-LiteR      coNP-c        coNP-c     coNEXP-h / coN2EXP               / coNEXP
                                                                  PP-h
                                       coNEXP-c (f.-d. TBox)
Table 1. Complexity results for rooted and exhaustive rooted counting CQs (‘f.-d.’
stands for ‘finite-depth’, ‘-h’ for ‘-hard’, ‘-c’ for ‘-complete’)


the aggregation are required to match to data constants. However, as discussed in
[7], this semantics can also give unintuitive results by ignoring ways of mapping
aggregate variables to anonymous elements inferred due the ontology axioms.
For instance, if no children of alex are listed in the data, then a query that asks
to return the number of children will yield 0 under epistemic semantics, even if
it can be inferred (e.g. due to a family tax benefit) that there must be at least 3
children. This led Kostylev and Reutter [7] to define an alternative semantics for
two kinds of counting queries (inspired by the COUNT and COUNT DISTINCT
in SQL) which adopts a form of certain answer semantics but considers lower and
upper bounds on the count value across different models. For the two considered
logics (DL-Litecore and DL-LiteR ), only the lower bounds on the count value
are non-trivial, and a complexity analysis shows that they are challenging to
identify: coNP-data complexity for both logics, and Π2p -hard (resp. coNEXP-
hard) in combined complexity for DL-Litecore (resp. DL-LiteR ). Several questions
were left unanswered by their work, including the difficulty of recognizing the
optimal lower bound and the impact of allowing multiple aggregation variables.
     In the present work, which is reported in [1], we return to the issue of han-
dling counting queries in OMQA and make several important contributions. We
first introduce a new notion of counting CQ that generalizes the two forms of
queries from [7] and allows arbitrarily many counting variables. We show that
existing complexity results for DL-Litecore and DL-LiteR KBs continue to hold
for our more general notion of counting CQ, and we further provide an improved
coNEXP upper bound for the relevant case of finite-depth TBoxes. We also con-
sider the impact of restricting the query structure, focusing on the class of rooted
queries, in which every query variable must be connected to an answer variable
or individual in the query graph. A recent result, obtained as part of a study
of bag semantics for OMQA [8], identified a case in which rootedness leads to
tractable data complexity for counting queries. This motivated us to perform
a thorough investigation of rooted counting queries, which yielded several im-
provements upon existing complexity bounds (see Table 1), including a tight
PP-completeness result for the natural subclass of exhaustive rooted counting
CQs, in which every non-answer variable is a counting variable. As our final
contribution, we investigate the problem of recognizing the best certain interval
  Answering Counting Queries over DL-Lite Ontologies (Extended Abstract)              3

and show it to be DP-complete in data complexity. Our results close some ques-
tions that were left open by the work of Kostylev and Reutter [7] and pave the
way for further study of counting and aggregate queries in the OMQA setting.


References
 1. Bienvenu, M., Manière, Q., Thomazo, M.: Answering counting queries over DL-Lite
    ontologies. In: Proceedings of the 29th International Joint Conference on Artificial
    Intelligence (IJCAI) (2020)
 2. Bienvenu, M., Ortiz, M.: Ontology-mediated query answering with data-tractable
    description logics. In: Tutorial Lectures of the 11th Reasoning Web International
    Summer School. pp. 218–307 (2015)
 3. Bienvenu, M., Ortiz, M., Simkus, M.: Regular path queries in lightweight descrip-
    tion logics: Complexity and algorithms. Journal of Artificial Intelligence Research
    (JAIR) 53, 315–374 (2015)
 4. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable
    reasoning and efficient query answering in description logics: The DL-Lite family.
    Journal of Automated Reasoning (JAR) 39(3), 385–429 (2007)
 5. Calvanese, D., Kharlamov, E., Nutt, W., Thorne, C.: Aggregate queries over on-
    tologies. In: Proceedings of the 2nd International Workshop on Ontologies and
    Information Systems for the Semantic Web (ONISW). pp. 97–104 (2008)
 6. Gutiérrez-Basulto, V., Ibáñez-Garcı́a, Y.A., Kontchakov, R., Kostylev, E.V.:
    Queries with negation and inequalities over lightweight ontologies. Journal of Web
    Semantics (JWS) 35, 184–202 (2015)
 7. Kostylev, E.V., Reutter, J.L.: Complexity of answering counting aggregate queries
    over DL-Lite. Journal of Web Semantics (JWS) 33(1), 94–111 (2015)
 8. Nikolaou, C., Kostylev, E.V., Konstantinidis, G., Kaminski, M., Grau, B.C., Hor-
    rocks, I.: Foundations of ontology-based data access under bag semantics. Artificial
    Intelligence (AIJ) 274, 91 – 132 (2019)
 9. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.:
    Linking data to ontologies. Journal of Data Semantics 10, 133–173 (2008)
10. Xiao, G., Calvanese, D., Kontchakov, R., Lembo, D., Poggi, A., Rosati, R., Za-
    kharyaschev, M.: Ontology-based data access: A survey. In: Proceedings of the
    27th International Joint Conference on Artificial Intelligence (IJCAI). pp. 5511–
    5519 (2018)