=Paper= {{Paper |id=Vol-3263/abstract-3 |storemode=property |title=Querying Inconsistent Prioritized Data with ORBITS: Algorithms, Implementation, and Experiments (Extended Abstract) |pdfUrl=https://ceur-ws.org/Vol-3263/abstract-3.pdf |volume=Vol-3263 |authors=Meghyn Bienvenu,Camille Bourgaux |dblpUrl=https://dblp.org/rec/conf/dlog/BienvenuB22 }} ==Querying Inconsistent Prioritized Data with ORBITS: Algorithms, Implementation, and Experiments (Extended Abstract)== https://ceur-ws.org/Vol-3263/abstract-3.pdf
Querying Inconsistent Prioritized Data with ORBITS:
Algorithms, Implementation, and Experiments
Extended Abstract

Meghyn Bienvenu1 , Camille Bourgaux2
1
    Univ. Bordeaux, CNRS, Bordeaux INP, LaBRI, Talence, France
2
    DI ENS, ENS, CNRS, PSL University & Inria, Paris, France


                                         Abstract
                                         This extended abstract of [1] presents our investigation of practical algorithms for inconsistency-tolerant
                                         query answering over prioritized knowledge bases. We introduce SAT encodings for Pareto- and
                                         completion-optimal repairs w.r.t. general priority relations over the knowledge base facts and propose
                                         several ways of computing answers under (optimal) repair-based semantics by exploiting different
                                         reasoning modes of SAT solvers. Proofs, pseudo-code for algorithms, and details on the experimental
                                         evaluation are provided in the appendix of [2].

                                         Keywords
                                         inconsistency-tolerant semantics, prioritized knowledge bases, SAT-based algorithms




1. Querying Inconsistent Prioritized Knowledge Bases
When a knowledge base (KB) consisting of a dataset and a logical theory (be it an ontology
or a set of database dependencies) is such that the data is inconsistent with the constraints, a
prominent approach is to adopt inconsistency-tolerant semantics in order to extract meaningful
information from the contradictory data. In the database setting, such an approach goes by the
name of consistent query answering (CQA) and has been extensively studied [3, 4]. A central
notion is that of a (subset) repair, defined as a maximal subset of the dataset that satisfies the
constraints. Intuitively, repairs represent all different ways of minimally modifying the data to
satisfy the constraints. As we do not know which repair corresponds to the true part of the
data, the CQA semantics stipulates that a tuple is a query answer if it is an answer w.r.t. every
repair. Inconsistency-tolerant semantics have also drawn considerable interest in the setting
of ontology-mediated query answering (OMQA) [5, 6, 7]. In addition to the AR semantics (the
OMQA analog of the CQA semantics), several other inconsistency-tolerant semantics have been
proposed (see [8, 9] for surveys and references), among which: the brave semantics [10], which
only requires a tuple to be an answer w.r.t. some repair, provides a natural notion of possible
answer, and the IAR semantics [11], which answers queries over the intersection of the repairs,
identifies the most reliable answers.

  DL 2022: 35th International Workshop on Description Logics, August 7–10, 2022, Haifa, Israel
" meghyn.bienvenu@labri.fr (M. Bienvenu); camille.bourgaux@ens.fr (C. Bourgaux)
 0000-0001-6229-8103 (M. Bienvenu); 0000-0002-8806-6682 (C. Bourgaux)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
   The basic notion of repair can be refined by exploiting preference information. An approach
introduced in the database setting [12] and recently explored in the OMQA setting [13] assumes
that preferences are given by a binary priority relation between conflicting facts. Three notions of
‘best’ repairs w.r.t. a priority relation were proposed, namely, Pareto-optimal, globally-optimal,
and completion-optimal repairs, and can be used in place of subset repairs in any repair-based
semantics. In the case where the priority relation is score-structured, that is, induced by assigning
scores to facts, the three kinds of optimal repair coincide.
   The complexity of answering queries under (optimal) repair-based semantics has been exten-
sively studied in the database and OMQA settings, refer to [4, 8] for an overview and references.
We can briefly summarize these (many!) complexity results as follows: query answering under
the AR (or CQA) semantics is coNP-hard in data complexity even in the simplest of settings (e.g.
key constraints, class disjointness), and adopting optimal repairs in place of subset repairs leads
to (co)NP-hardness for the brave and IAR semantics as well. Membership in (co)NP holds for
AR, brave, and IAR semantics w.r.t. subset, Pareto-optimal, and completion-optimal repairs in
the most commonly considered settings i.e. for database constraints given by primary keys or
more generally, functional dependencies (FDs), and for ontologies formulated in data-tractable
description logics such as those of the DL-Lite family [14]. Globally-optimal repairs lead to
higher complexity and are thus not considered in this paper.
   The preceding (co)NP complexity results naturally suggest the interest of employing SAT
solvers. Two recent systems, CQAPri [15, 16] – which targets DL-Lite KBs and AR, brave,
and IAR semantics, w.r.t. subset repairs as well as optimal repairs for the restricted class of
score-structured priority relations – and CAvSAT [17] – which targets relational databases and
AR semantics w.r.t. subset repairs – have begun to explore such an approach. While geared to
different forms of constraints, the two systems solve essentially the same problem, yet they
employ SAT solvers in different ways. This motivates a comprehensive study of the use of
SAT-based approaches for inconsistency-tolerant query answering, which abstracts from the
particular setting and provides a solid foundation for the future development of such systems.


2. SAT-Based Algorithms
We propose SAT-based algorithms to answer queries under the semantics that fall in the
(co)NP complexity class: X-AR, X-brave and X-IAR where X ∈ {S,P,C} indicates the kind of
repair: subset, Pareto-optimal, and completion-optimal respectively. They rely on pre-computed
conflicts, defined as the minimal inconsistent subsets of the data, and causes for a query potential
answer, defined as the minimal consistent subsets of the data that entails the query answer
together with the logical theory. While our algorithms can be applied to any KB for which we
can compute the conflicts and causes, the overall complexity of the resulting query answering
algorithms depends on the cost of computing these inputs. For FO-rewritable ontology languages
(like DL-Lite) or databases equipped with denial constraints, the sets of conflicts, candidate
answers, and their causes, can be computed in PTime via database query evaluation, yielding
procedures of the expected (co)NP complexity. We focus on the case where conflicts are binary
but we discuss how to extend the encodings to the general case. Binary conflicts allow us to
define a graph representation of conflicts and priorities: The directed conflict graph has facts as
nodes and an edge from 𝛼 to 𝛽 iff 𝛼 and 𝛽 are in conflict and 𝛼 is not preferred to 𝛽.
   We provide propositional encodings of the X-AR, X-brave, and X-IAR semantics, including the
first encodings for Pareto- and completion-optimal repairs. Our encodings are generic and are
built in a modular manner from a core set of basic formulas, some of them for which we consider
several variants. In particular, we consider two ways of encoding the absence, or contradiction,
of a cause for the query, and two ways of encoding Pareto-optimal repair maximality. In the
case of score-structured priority relations, since Pareto-optimal and completion-optimal repairs
coincide, this gives three possible encodings of maximality. For each semantics, we provide
several encodings, that handle either a single potential answer or several answers at the same
time. For the X-brave and X-IAR semantics we additionally provide encodings to check whether
a given cause is in some or every optimal repair, or if some fact is in all optimal repairs.
   Based upon these encodings, we develop several algorithms which utilize different functional-
ities of modern SAT solvers. An initial preprocessing step serves to (1) handle self-inconsistent
facts, and (2) find the answers that have some cause that contains only facts without any
outgoing edge in the directed conflict graph, and thus trivially hold in all optimal repairs. It
then remains to filter the remaining potential answers. The four first algorithms we propose to
do so are generic in the sense that they can be used for all semantics.
       • The first one is similar to the algorithm used by CQAPri: For each answer to filter, it
         checks whether the corresponding SAT encoding is satisfiable.
       • The second one is similar to the CAvSAT algorithm: It handles all potential answers
         together with a weighted MaxSAT instance where soft clauses correspond to answers.
       • The third one uses the same multi-answer encoding and relies on minimal unsatisfiable
         subsets of the soft clauses w.r.t. the hard clauses to filter the answers.
       • The fourth one iteratively evaluates the multi-answer encoding, treating the variables
         corresponding to potential answers as assumptions.
While we may need to consider all causes to decide whether an answer holds under X-AR
semantics, in the X-brave or X-IAR case it is sufficient to find a single cause that belongs to
some or every optimal repair. We hence propose algorithms specific to these cases.
       • The first one checks for each cause whether it belongs to some/every optimal repair using
         the dedicated encoding.
       • The second one is specific to X-IAR. It considers the answers and their causes in turn
         while maintaining two sets of facts, checking facts individually as it goes: the X-IAR facts
         that belong to the intersection of the optimal repairs and the non-X-IAR facts.
       • The last one is also specific to X-IAR. The difference with the previous one is that for
         each answer, it uses a weighted MaxSAT solver to decide which facts hold under X-IAR
         among those that belong to some cause and have not already been checked.


3. Implementation and Experiments
We implemented the proposed algorithms in our orbits1 system (Optimal Repair-Based
Inconsistency-Tolerant Semantics). orbits takes as input two JSON files containing respectively
1
    https://github.com/bourgaux/orbits
the directed conflict graph and the potential answers associated to their causes. The user also
specifies a semantics (AR, IAR, or brave), a kind of repair together with the encoding variants
to use to encode optimality and contradiction, and the algorithm to use to compute the answers
w.r.t. the chosen semantics. The set of answers is output as a JSON file.
   We evaluated orbits on three (sets of) KBs. The first is the CQAPri benchmark [18], a
synthetic benchmark crafted to evaluate inconsistency-tolerant query answering over DL-
Lite KBs, adapted from the LUBM∃20 benchmark [19]. The two others, called Food Inspection
and Physicians, are real-world datasets built from public open data [20, 21, 22], which have
already been used to evaluate data cleaning and consistent query answering systems [23, 17].
They consist of relational databases built from the original csv files, on which typical integrity
constraints (keys and FDs) have been added. The size of the conflict graphs we obtain ranges from
2K to 946K facts and 2K to 3M conflicts. We added score-structured and non-score-structured
priority relations on these conflict graphs.
   Our experimental evaluation aims at assessing (i) the impact of adopting different kinds of
repairs, and (ii) the relative performances of alternative procedures for the same semantics.
More precisely, we consider the following questions.
    • What is the impact in terms of number of answers of adopting optimal repairs rather
      than standard repairs, or completion-optimal repairs instead of Pareto-optimal repairs
      when the priority relation is not score-structured?
    • How do different kinds of repairs compare in terms of computation time?
    • Given a semantics and type of repair, what is the impact in terms of computation times of
      the choice of: How to encode optimality ? How to encode contradictions ? The algorithm
      used to filter the non-trivial answers?
   Our most important finding is that the choice of an algorithm and encoding can have a huge
impact on the computation time: Changing a single parameter among the algorithm, optimality
encoding, and contradiction encoding can result in a significant change (sometimes of several
orders of magnitude). The comparison of the possible procedures for each semantics on the
different datasets and queries shows that there is not a ‘best’ method in general. However,
we still gain some relevant insights. For example we found that one of the three optimality
encodings often performs better while the one based on completion-optimal repairs never
significantly outperforms the others. We also found that one of the algorithms specifically
tailored for the X-IAR semantics is generally the best one to use with this semantics. Finally,
we observe that one variant of the contradiction encoding does not work well with one variant
of the optimality encoding in general.
   While in some cases our results can be used to single out some approaches as more effective,
more often there are no clear winner(s). This suggests that to minimize runtimes, it may make
sense to launch multiple algorithms in parallel, and/or devise methods that can help predict
which algorithm and encoding will perform best on a given dataset and query, e.g. using
machine learning techniques.


Acknowledgements
This work was supported by the ANR AI Chair INTENDED (ANR-19-CHIA-0014).
References
 [1] M. Bienvenu, C. Bourgaux, Querying inconsistent prioritized data with ORBITS: Al-
     gorithms, implementation, and experiments, in: Proceedings of the 19th International
     Conference on Principles of Knowledge Representation and Reasoning (KR) (to appear),
     2022.
 [2] M. Bienvenu, C. Bourgaux, Querying inconsistent prioritized data with ORBITS: Algo-
     rithms, implementation, and experiments, 2022. arxiv.org/abs/2202.07980 [cs.LO].
 [3] M. Arenas, L. E. Bertossi, J. Chomicki, Consistent query answers in inconsistent databases,
     in: Proceedings of the 18th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of
     Database Systems (PODS), 1999, pp. 68–79.
 [4] J. Wijsen, Foundations of query answering on inconsistent databases, SIGMOD Record
     48 (2019) 6–16. URL: https://doi.org/10.1145/3377391.3377393. doi:10.1145/3377391.
     3377393.
 [5] A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, R. Rosati, Linking data to
     ontologies, Journal of Data Semantics 10 (2008) 133–173.
 [6] M. Bienvenu, M. Ortiz, Ontology-mediated query answering with data-tractable description
     logics, in: Tutorial Lectures of the 11th Reasoning Web International Summer School,
     2015, pp. 218–307.
 [7] G. Xiao, D. Calvanese, R. Kontchakov, D. Lembo, A. Poggi, R. Rosati, M. Zakharyaschev,
     Ontology-based data access: A survey, in: Proceedings of the 27th International Joint
     Conference on Artificial Intelligence (IJCAI), 2018, pp. 5511–5519.
 [8] M. Bienvenu, C. Bourgaux, Inconsistency-tolerant querying of description logic knowledge
     bases, in: Tutorial Lectures of the 12th International Reasoning Web Summer School, 2016,
     pp. 156–202.
 [9] M. Bienvenu, A short survey on inconsistency handling in ontology-mediated query
     answering, Künstliche Intelligenz 34 (2020) 443–451. URL: https://doi.org/10.1007/
     s13218-020-00680-9. doi:10.1007/s13218-020-00680-9.
[10] M. Bienvenu, R. Rosati, Tractable approximations of consistent query answering for robust
     ontology-based data access, in: Proceedings of the 23rd International Joint Conference on
     Artificial Intelligence (IJCAI), 2013.
[11] D. Lembo, M. Lenzerini, R. Rosati, M. Ruzzi, D. F. Savo, Inconsistency-tolerant semantics for
     description logics, in: Proceedings of the 4th International Conference on Web Reasoning
     and Rule Systems (RR), 2010, pp. 103–117.
[12] S. Staworko, J. Chomicki, J. Marcinkowski, Prioritized repairing and consistent query
     answering in relational databases, Annals of Mathematics and Artifcial Intelligence
     (AMAI) 64 (2012) 209–246. URL: https://doi.org/10.1007/s10472-012-9288-8. doi:10.1007/
     s10472-012-9288-8.
[13] M. Bienvenu, C. Bourgaux, Querying and repairing inconsistent prioritized knowledge
     bases: Complexity analysis and links with abstract argumentation, in: Proceedings of the
     17th International Conference on Principles of Knowledge Representation and Reasoning
     (KR), 2020, pp. 141–151.
[14] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, R. Rosati, Tractable reasoning and
     efficient query answering in description logics: The DL-Lite family, Journal of Automated
     Reasoning (JAR) 39 (2007) 385–429. URL: https://doi.org/10.1007/s10817-007-9078-x. doi:10.
     1007/s10817-007-9078-x.
[15] M. Bienvenu, C. Bourgaux, F. Goasdoué, Querying inconsistent description logic knowledge
     bases under preferred repair semantics, in: Proceedings of the 28th AAAI Conference on
     Artificial Intelligence (AAAI), 2014, pp. 996–1002.
[16] M. Bienvenu, C. Bourgaux, F. Goasdoué, Computing and explaining query answers over
     inconsistent DL-Lite knowledge bases, Journal of Artificial Intelligence Research (JAIR) 64
     (2019) 563–644. URL: https://doi.org/10.1613/jair.1.11395. doi:10.1613/jair.1.11395.
[17] A. A. Dixit, P. G. Kolaitis, A SAT-based system for consistent query answering, in: Pro-
     ceedings of the 22nd International Conference on Theory and Applications of Satisfiability
     Testing (SAT), 2019, pp. 117–135.
[18] C. Bourgaux, Inconsistency Handling in Ontology-Mediated Query Answering. (Ges-
     tion des incohérences pour l’accès aux données en présence d’ontologies), Ph.D. thesis,
     University of Paris-Saclay, France, 2016. URL: https://tel.archives-ouvertes.fr/tel-01378723.
[19] C. Lutz, I. Seylan, D. Toman, F. Wolter, The combined approach to OBDA: Taming
     role hierarchies using filters, in: Proceedings of the 12th International Semantic Web
     Conference (ISWC), 2013, pp. 314–330.
[20] Dataset: Food Inspections, Chicago Data Portal, https://data.cityofchicago.org/
     Health-Human-Services/Food-Inspections/4ijn-s7e5, accessed December 7, 2020.
[21] Dataset: New York City Restaurant Inspection Results, Department of Health and
     Mental Hygiene (DOHMH), NYC Open Data, https://data.cityofnewyork.us/Health/
     DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j, accessed December
     7, 2020.
[22] Dataset: National Downloadable File, Centers for Medicare & Medicaid Services, https:
     //data.cms.gov/provider-data/dataset/mj5m-pzi6, accessed December 10, 2020.
[23] T. Rekatsinas, X. Chu, I. F. Ilyas, C. Ré, Holoclean: Holistic data repairs with probabilistic
     inference, Proceedings of the VLDB Endowment (PVLDB) 10 (2017) 1190–1201.