=Paper= {{Paper |id=Vol-3741/paper38 |storemode=property |title=Combining Entity Resolution and Query Answering in Ontologies: A Formal Conceptual Framework |pdfUrl=https://ceur-ws.org/Vol-3741/paper38.pdf |volume=Vol-3741 |authors=Ronald Fagin,Phokion G. Kolaitis,Domenico Lembo,Lucian Popa,Federico Scafoglieri |dblpUrl=https://dblp.org/rec/conf/sebd/FaginKL0S24 }} ==Combining Entity Resolution and Query Answering in Ontologies: A Formal Conceptual Framework== https://ceur-ws.org/Vol-3741/paper38.pdf
                                Combining Entity Resolution and Query Answering
                                in Ontologies: A Formal Conceptual Framework
                                (Discussion Paper)

                                Ronald Fagin1 , Phokion G. Kolaitis1,2 , Domenico Lembo3 , Lucian Popa1 and
                                Federico Scafoglieri3,*
                                1
                                  IBM Research, Almaden, USA
                                2
                                  UC Santa Cruz, USA
                                3
                                  Sapienza University of Rome, Italy


                                            Abstract
                                            This paper reports on a recent formal framework for integrating entity resolution and query answer-
                                            ing within ontologies governed by rules expressed in the form of tuple-generating dependencies and
                                            equality-generating dependencies. Our framework outlines the semantics of the ontology by defining
                                            special instances involving equivalence classes of entities and sets of values. The former serve to gather
                                            entities referring to the same real-world object, and the latter to catalog alternative attribute values. This
                                            approach not only addresses entity resolution but also overcomes potential inconsistencies within the
                                            data. Additionally, we introduce a tailored chase procedure for this framework, ensuring non-failure and
                                            ultimately yielding a universal solution. This universal solution, in turn, can be used to obtain certain
                                            answers to conjunctive queries.

                                            Keywords
                                            Formal Conceptual Framework, Entity Resolution, Query Answering, Knowledge Bases, Ontology, Chase,
                                            Universal Model, tuple-generating dependencies, equality-generating dependencies, entity-resolution rules




                                1. Introduction
                                Entity resolution is the problem of determining whether different data records refer to the same
                                real-world object, such as the same individual or the same organization, etc. [1, 2].
                                   In this discussion paper we summarize a framework recently appeared in [3] for entity resolu-
                                tion in combination with query answering in the context of ontologies consisting of ground atoms
                                and rules specified as tuple-generating dependencies (tgds) and equality-generating dependencies
                                (egds). These rules have been widely investigated in databases and knowledge representation, e.g.
                                in [4, 5, 6, 7]; they can express axioms that are used in Description Logics (DLs) [8], as well as
                                in knowledge bases that are similar to Datalog +/- programs [9]. Additionally, egds are employed
                                to express typical entity resolution rules that one may write in practice, as in [10]. These rules
                                specify the conditions under which to entities have to be made equal, to say, for instance, that
                                two persons are indeed the same person if they have similar names and the same date of birth. In

                                SEBD 2024: 32nd Symposium on Advanced Database Systems, June 23-26, 2024, Villasimius, Sardinia, Italy
                                *
                                 Corresponding author.
                                $ fagin@us.ibm.com (R. Fagin); kolaitis@ucsc.edu (P. G. Kolaitis); i.lembo@diag.uniroma1.it (D. Lembo);
                                lpopa@us.ibm.com (L. Popa); scafoglieri@diag.uniroma1.it (F. Scafoglieri)
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
the following, we call such rules entity-egds, whereas we name value-egds the egds imposing
equality on values, used, e.g., to express that a person has only one name.
   In the proposed framework, the first challenge we face is to come up with a consistent way to
complete or modify the original ontology, while respecting all its rules. To this aim we propose
a new notion of valid models, called ontology solutions. They must satisfy all entity resolution
rules along with all other ontology rules. Furthermore, the solutions must include all original data
(i.e., no information is ever dropped or altered). Our approach is guided by the intuitive principle
that for each real-world object there must be a single node in the solution that represents all
“equivalent” entities denoting the object. To achieve this, we use equivalence classes of entities
that become first-class citizens in our framework. In addition, we relax the standard way in which
value-egds are satisfied by allowing solutions to use sets of values. Intuitively, we interpret egds
as matching dependencies [11, 1, 12]. That is, when the conditions expressed by (the body of)
an entity-egd or a value-egd hold in the data, and thus two entities or two values must be made
equal, we combine them through a merging function that amounts to taking the union of all the
alternatives. In this way, we group all entities that denote the same real-world object into a unique
set, which turns out to be an equivalence class. Similarly, when two values exist where only one
value is instead allowed according to the TBox, we explicitly form their union. Note that we
consider the union of entities a “global” feature for a solution, that is, an entity 𝑒 may belong to
exactly one equivalence class; in contrast, the union of values is “local” to the context in which a
value occurs, that is, a particular value may belong to more than one set of values.
   We also remark that the semantics we adopt for value-egds allows us to always have a solution
(that is, a model) for an ontology, even when there are no models according to the standard
first-order semantics. Indeed, if a value-egd enforces the equality of different values, we collect
together such values, whereas first-order logic would conclude that the ontology is inconsistent.
   Besides formalizing the aforementioned ideas, our contribution is on how to combine entity
resolution with query answering. In this respect, (𝑖) we define universal solutions and show that,
as for standard tgd and egd semantics, universal solutions can be used to obtain the certain answers
of Conjunctive Queries (CQs); (𝑖𝑖) we propose a variant of the classical chase procedure [4, 5]
tailored to our approach; (𝑖𝑖𝑖) we show that, when the chase procedure terminates, it returns a
universal solution (and thus we have an algorithm for computing the certain answers to CQs).
   In the following, we discuss our findings mainly by examples and refer the reader to [3] for a
complete formal treatment.


2. Framework
We refer the reader to [13] for an introduction to equivalence classes and relations. In the
following, let ∼ be an equivalence relation, we write [𝑥] to denote the equivalence class containing
𝑥, instead of [𝑥]∼ . Moreover, we may write, for example [𝑎, 𝑏, 𝑐]∼ (or simply [𝑎, 𝑏, 𝑐]) to denote
the equivalence class consisting of the elements 𝑎, 𝑏, and 𝑐.
Syntax. An atom is an expression of the form 𝑃 (𝑡1 , . . . , 𝑡𝑛 ), where 𝑃 is a symbol to denote an
𝑛-ary ontology or built-in predicate, i.e. a pre-interpreted predicate like the Jaccard similarity
(JaccSim). Each 𝑡𝑖 is either a variable or a constant, which can be in turn a value or an entity, the
latter used to denote a real-world object. We impose that when 𝑃 is a built-in predicate, constants
occurring in 𝑃 (𝑡1 , . . . , 𝑡𝑛 ) are only values. A ground atom is an atom with no variables.
   A conjunction 𝜑(𝑥) of atoms is an expression 𝑃1 (𝑡1 ) ∧ . . . ∧ 𝑃𝑚 (𝑡𝑚 ), where each 𝑃𝑗 (𝑡𝑗 ) is
an atom such that each variable in the tuple 𝑡𝑗 of terms is among those in the tuple 𝑥 of variables.
   An ontology 𝒪 is a pair (𝒯 , 𝒜), consisting of a TBox 𝒯 and a ABox 𝒜. The TBox is a finite
set of tuple-generating dependencies (tgds) and equality-generating dependencies (egds).
   A tgd is a formula of the form:
                                     ∀𝑥 (𝜑(𝑥) → ∃𝑦 𝜓(𝑥, 𝑦)),
 where 𝜑(𝑥) and 𝜓(𝑥, 𝑦) are conjunctions of atoms, such that 𝑥 and 𝑦 have no variables in
common and 𝜓(𝑥, 𝑦) is built-in free and, for simplicity, constant-free. As in [5], we assume that
all variables in 𝑥 appear in 𝜑(𝑥), but not necessarily in 𝜓(𝑥, 𝑦). We call 𝜑(𝑥) the body of the
tgd, and 𝜓(𝑥, 𝑦) the head of it.
   An egd is a formula of the form:
                                        ∀𝑥 (𝜑(𝑥) → 𝑦 = 𝑧),
 where 𝜑(𝑥) is a conjunction of atoms and 𝑦 and 𝑧 are distinct variables occurring in 𝜑(𝑥), such
that either both 𝑦 and 𝑧 are entity-variables (in which case we have an entity-egd) or both 𝑦 and 𝑧
are value-variables (in which case we have a value-egd). We call 𝜑(𝑥) the body of the egd.
   In the examples, for simplicity, we will omit the universal quantification in tgds and egds.

Example 1. Let 𝒯 be the TBox consisting of the rules:

               (𝑟1 ) Name(𝑝1 , 𝑛1 ) ∧ Name(𝑝2 , 𝑛2 ) ∧ JaccSim(𝑛1 , 𝑛2 , 0.7) → 𝑝1 = 𝑝2
               (𝑟2 ) Name(𝑝, 𝑛1 ) ∧ Name(𝑝, 𝑛2 ) → 𝑛1 = 𝑛2
               (𝑟3 ) HPhone(𝑝, 𝑓1 ) ∧ HPhone(𝑝, 𝑓2 ) → 𝑓1 = 𝑓2
               (𝑟4 ) HPhone(𝑝1 , 𝑓 ) ∧ HPhone(𝑝2 , 𝑓 ) → SameHouse(𝑝1 , 𝑝2 , 𝑓 )

   Here 𝑝1 , 𝑝2 , 𝑝 are entity-variables, i.e., occur in predicate arguments ranging over entities,
whereas 𝑛1 , 𝑛2 , 𝑓1 , 𝑓2 , 𝑓 are value-variables, i.e., occur in predicate arguments ranging over
values. The predicates have the meaning suggested by their names. In particular, the predicate
Name maintains the individual’s name, HPhone stores their home phone contact, whereas
SameHouse associates individuals living in the same house with their phone number. Each of
the four rules makes an assertion about the predicates. In particular, the first rule (𝑟1 ) states that
if the Jaccard similarity of two names is higher then 0.7, then these names are associated to the
same individual. The TBox also says that everyone can have only one name (𝑟2 ) and only one
landline phone number (𝑟3 ), and that two individuals with the same landline phone number live
in the same house (𝑟4 ).

  The ABox 𝒜 of an ontology 𝒪 is a finite set of ground atoms of the form 𝑃 (𝑐1 , . . . , 𝑐𝑛 ) where
each 𝑐𝑖 is an entity or a value.

Example 2. Let 𝒜 be the ABox consisting of the atoms

                   (𝑔1 ) Name(Doe1 , John Doe),   (𝑔2 ) HPhone(Doe1 , 358)
                   (𝑔3 ) Name(Doe2 , Johnny Doe), (𝑔4 ) HPhone(Doe2 , 635)
                   (𝑔5 ) Name(Doe3 , Mary Doe),   (𝑔6 ) HPhone(Doe3 , 358)
  In words, the ABox 𝒜 specifies that Doe1 has name John Doe and home phone number
358, Doe2 has name Johnny Doe and home phone number 635, Doe3 has name Mary Doe
and phone number 358.

Semantics. We assume to have countably infinite many entity-nulls and value-nulls (note that
we consider generic tgds, i.e. possibly having existential variables in the rule head, even though,
for space limits we do not provide examples of this kind). The semantics of the ontology is given
using special ABoxes, called ontology instances, whose ground atoms have components that are
either equivalence classes of entities and entity-nulls, which we call E-sets, or non-empty sets of
values and value-nulls, which we call V-sets.
Ontology Instance (Definition 1 of [3]). An instance ℐ for an ontology 𝒪 w.r.t. an equivalence
relation ∼ is a set of facts 𝑃 (𝑇1 , . . . , 𝑇𝑛 ) such that 𝑃 is a 𝑛-ary predicate in 𝒪, and each 𝑇𝑖 is
either an E-set w.r.t. ∼ or a V-set.
Example 3. Let 𝒪 = (𝒯 , 𝒜) be an ontology such that 𝒯 and 𝒜 are as in Example 1 and in
Example 2, respectively. Consider the following set ℐ of facts.

(𝑑1 ) Name([Doe1 , Doe2 ], {John Doe, Johnny Doe}) (𝑑2 ) HPhone([Doe1 , Doe2 ], {358, 635})
(𝑑3 ) Name([Doe3 ], {Mary Doe})                    (𝑑4 ) HPhone([Doe3 ], {358})
(𝑑5 ) SameHouse([Doe1 , Doe2 ], [Doe3 ])           (𝑑6 ) SameHouse([Doe3 ], [Doe1 , Doe2 ])

   ℐ is an instance for 𝒪 w.r.t. the equivalence relation ∼ over the set {Doe1 , Doe2 , Doe3 } such
that Doe1 ∼ Doe2 , i.e. Doe1 and Doe2 are in the same equivalence class and Doe3 is the only
element of another obviously disjoint equivalence class.

  To define the notions of satisfaction of tgds and egds by an instance, we first introduce the
notion of an assignment from a conjunction 𝜑(𝑥) of atoms to an instance ℐ for an ontology 𝒪.
Assignment (Definition 3 of [3]). An assignment from a conjunction 𝜑(𝑥) of atoms to an instance
ℐ is a mapping 𝜇 from the variables and values in 𝜑(𝑥) to E-sets and V-sets of ℐ such that
• each entity-variable is mapped to an E-set;
• different occurrences of a value-variable are mapped to V-sets with non-empty intersection;
• each value 𝑣 is mapped to a V-set 𝑉 , such that 𝑣 ∈ 𝑉 ;
• for each atom 𝑃 (𝑥, 𝑒, 𝑣) of 𝜑(𝑥), where 𝑥 is a variable, 𝑒 is an entity and 𝑣 is a value, ℐ
  contains a fact of the form 𝑃 (𝜇(𝑥), [𝑒], 𝜇(𝑣)) (the definition generalizes in the obvious way for
  atoms of different form).
Example 4. Let ℐ be the instance in Example 3. A possible assignment 𝜇 from the body of rule
(𝑟4 ) of Example 1 to ℐ is given below:




  We are now ready to define the semantics of tgds and egds.
Semantics of tgds (Definition 4 of [3]). An instance ℐ for an ontology 𝒪 satisfies a tgd
∀𝑥(𝜑(𝑥) → ∃𝑦𝜓(𝑥, 𝑦)) if for each assignment 𝜇 from 𝜑(𝑥) to ℐ there is an assignment 𝜇′ from
𝜓(𝑥, 𝑦) to ℐ such that, for each 𝑥 in 𝑥
• 𝜇(𝑥) = 𝜇′ (𝑥), if 𝑥 is an entity-variable
• the intersection of V-sets assigned by 𝜇 to each occurrence of 𝑥 in 𝜑(𝑥) is contained in the
  intersection of V-sets assigned by 𝜇′ to each occurrence of 𝑥 in 𝜓(𝑥, 𝑦), if 𝑥 is a value-variable
Example 5. Let ℐ be the instance in Example 3 and (𝑟4 ) be the tgd in Example 1. Let 𝜇 be an
assignment from the body of (𝑟4 ) to ℐ and let 𝜇′ be an assignment from the head of (𝑟4 ) to ℐ as
described below:




Since 𝜇 is the only assignment from the body of (𝑟4 ) to ℐ, we conclude that ℐ satisfies (𝑟4 ).
Semantics of egds (Definition 4 of [3]). An instance ℐ for an ontology 𝒪 satisfies an egd
∀𝑥(𝜑(𝑥) → 𝑦 = 𝑧) if each assignment 𝜇 from 𝜑(𝑥) to ℐ is such that 𝜇(𝑦) = 𝜇(𝑧)
  Note that according to the above definition, in every assignment 𝜑(𝑥) to ℐ all occurrences of 𝑦
and 𝑧 must be mapped to the same set, for an egd to be satisfied by an ontology instance.
Example 6. Let ℐ be the instance in Example 3 and (𝑟3 ) be the egd in Example 1. Let 𝜇 be an
assignment from the body of (𝑟3 ) to ℐ as described below:




The assignment 𝜇 trivially enjoys the conditions of the definition of egd satisfaction. It is easy to
see that the same holds for all assignments from the body of (𝑟3 ) to ℐ, and thus we conclude that
ℐ satisfies (𝑟3 ).
  Finally, we define when an instance is a solution for an ontology.
Ontology Solution (Definition 5 of [3]). Let 𝒪 = (𝒯 , 𝒜) be an ontology and ℐ be an instance.
ℐ is a solution for 𝒪 if ℐ satisfies 𝒯 and 𝒜, i.e.:
    • ℐ satisfies 𝒯 if ℐ satisfies every tgd and egd in 𝒯 .
    • ℐ satisfies 𝒜 if it satisfies every ground atom 𝑃 (𝑐1 , . . . , 𝑐𝑛 ) in 𝒜. A ground atom is
       satisfied if there is a fact 𝑃 (𝑇1 , . . . , 𝑇𝑛 ) in ℐ such that 𝑐𝑖 ∈ 𝑇𝑖 , for 1 ≤ 𝑖 ≤ 𝑛.
  Among all ontology solutions we are interested to the so-called universal solutions, which
exhibit special properties with respect to the task of query answering (see, e.g., [5, 14, 15]).
Universal Solution (Definitions 8 of [3]). A solution 𝒰 for an ontology 𝒪 is universal if, for
every solution ℐ for 𝒪, there is a homomorphism ℎ : 𝒰 → ℐ.
  The notion of homomorphism used in the previous definition is tailored to our framework
and suitably considers E-sets and V-sets occurring in instances. Informally, for an instance ℐ,
we denote with under (ℐ) the set containing all entities, entity-nulls, values, and value-nulls
occurring in ℐ. Then, let ℐ1 and ℐ2 be two instances, an homomorphism ℎ : ℐ1 → ℐ2 is a
structure-preserving mapping from under (ℐ1 ) to under (ℐ2 ) (we refer to Definition 7 of [3] for
further details).


3. Query Answering
A conjunctive query (CQ) 𝑞(𝑥) is a formula ∃𝑦 𝜑(𝑥, 𝑦), where 𝜑(𝑥, 𝑦) is a built-in free conjunc-
tion of atoms.
Answers to CQs. The answer 𝑞 ℐ to a CQ 𝑞(𝑥1 , . . . , 𝑥𝑛 ) on an instance ℐ is the set of all tuples
⟨𝑇1 , . . . , 𝑇𝑛 ⟩ such that there is an assignment 𝜇 from 𝑞 to ℐ for which:
     • 𝑇𝑖 = 𝜇(𝑥𝑖 ), if 𝑥𝑖 is an entity-variable;
     • 𝑇𝑖 , is the intersection of the V-sets assigned to the various occurrences of 𝑥𝑖 , if 𝑥𝑖 is a
         value-variable.
Example 7. Below we give an example of query, assignment 𝜇 from the query to the instance ℐ
of Example 3, and answers to the query.




It is easy to see that 𝑞 ℐ = {⟨{358}⟩}.

  When querying an ontology 𝒪, we are interested in reasoning over all solutions for 𝒪. We
thus adapt the classical notion of certain answers to our framework. To this aim, we need some
preliminary definitions. A tuple 𝑇 = ⟨𝑇1 , . . . , 𝑇𝑛 ⟩ is null-free if each 𝑇𝑖 is non-empty and
contains no nulls. Let 𝑇 = ⟨𝑇1 , . . . , 𝑇𝑛 ⟩ and 𝑇 ′ = ⟨𝑇1′ , . . . , 𝑇𝑛′ ⟩ be two tuples of E-sets and
V-sets, we say that 𝑇 ′ dominates 𝑇 , denoted 𝑇 ≤ 𝑇 ′ , if 𝑇𝑖 ⊆ 𝑇𝑖′ , for all 1 ≤ 𝑖 ≤ 𝑛.
Certain Answers (Definition 9 of [3]). A null-free tuple 𝑇 of E-sets and V-sets is a certain
answer to a CQ 𝑞 w.r.t. an ontology 𝒪 if
   1. for every solution ℐ for 𝒪, there is a tuple 𝑇 ′ ∈ 𝑞 ℐ such that 𝑇 ≤ 𝑇 ′
   2. there is no null-free tuple 𝑇 ′ that satisfies (1) and 𝑇 < 𝑇 ′
We write 𝑐𝑒𝑟𝑡(𝑞, 𝒪) to denote the set of certain answers to 𝑞 w.r.t. 𝒪
  We are now able to present the main result of this section, which asserts that universal solutions
can be used to compute certain answers to CQs in our framework.
Compute Certain Answers over a Universal Solution (Theorem 1 of [3]). Let 𝑞 be a CQ, let 𝒪
be an ontology, and let 𝒰 be a universal solution for 𝒪. Then 𝑐𝑒𝑟𝑡(𝑞, 𝒪) = 𝑞 𝒰 ↓𝜌 .
  The operator ↓𝜌 first eliminates nulls occurring in 𝑞 𝒰 , then, in the resulting set, it eliminates
tuples that are dominated by other tuples (see [3] for the details).
4. Computing Universal Solution
In order to compute the universal solution that can be used to obtain the certain answers to
CQs, we adapt here the well-known notion of restricted chase [4, 16, 5, 15] to our framework.
Intuitively, given an ontology 𝒪 = (𝒯 , 𝒜), we define a procedure that starts from an instance for
𝒪 “specular” to the ABox 𝒜, which we call the base instance of 𝒪, and incrementally constructs
an instance that satisfies the tgds and the egds of the TBox 𝒯 .
Example 8. Let 𝒪 = (𝒯 , 𝒜) be an ontology such that 𝒯 and 𝒜 are as in Example 1 and in
Example 2, respectively. The base instance of 𝒪 is as follows:
            (𝑑1 ) Name([Doe1 ], {John Doe})       (𝑑2 ) Name([Doe2 ], {Johnny Doe})
            (𝑑3 ) HPhone([Doe1 ], {358})          (𝑑4 ) HPhone([Doe2 ], {635})
            (𝑑5 ) Name([Doe3 ], {Mary Doe})       (𝑑6 ) HPhone([Doe3 ], {358})

   A universal solution for 𝒪 is obtained by iteratively applying three chase rules, one for each
kind of rule that may occur in 𝒯 , as long as they are triggered in the (current) instance.
(i) entity-egd chase rule. An entity-egd ∀𝑥(𝜑(𝑥) → 𝑦 = 𝑧) is triggered in an instance ℐ if there
is an assignment 𝜇 from 𝜑(𝑥) to ℐ such that 𝜇(𝑦) ̸= 𝜇(𝑧). Then, the chase substitutes 𝜇(𝑦) and
𝜇(𝑧) everywhere ℐ with 𝜇(𝑦) ∪ 𝜇(𝑧).
Example 9. Let ℐ0 be the starting instance in Example 8 and (𝑟1 ) be the e-egd Name(𝑝1 , 𝑛1 ) ∧
Name(𝑝2 , 𝑛2 ) ∧ JaccSim(𝑛1 , 𝑛2 , 0.7) → 𝑝1 = 𝑝2 . The egd (𝑟1 ) is triggered in ℐ0 because of
(𝑑1 ),(𝑑2 ). Its application generates ℐ1 , which is as follows:

       (𝑑7 ) Name([Doe1 , Doe2 ], {John Doe}) (𝑑8 ) Name([Doe1 , Doe2 ], {Johnny Doe})
       (𝑑9 ) HPhone([Doe1 , Doe2 ], {358})    (𝑑10 ) HPhone([Doe1 , Doe2 ], {635})
       (𝑑5 ) Name([Doe3 ], {Mary Doe})        (𝑑6 ) HPhone([Doe3 ], {358})

  ℐ1 is obtained from ℐ0 by replacing everywhere [Doe1 ] and [Doe2 ] with [Doe1 , Doe2 ].
(ii) value-egd chase rule. Similarly to the entity-edg case, when a value-egd forces two different
sets of values to be equated, we take the union of the two sets and modify the instance accordingly.
Example 10. Let ℐ1 be the instance in Example 9 and (𝑟3 ) be the value-egd HPhone(𝑝, 𝑓1 ) ∧
HPhone(𝑝, 𝑓2 ) → 𝑓1 = 𝑓2 . The egd (𝑟3 ) is triggered in ℐ1 because of (𝑑9 ),(𝑑1 0). Its application
generates ℐ2 , shown below:
      (𝑑7 ) Name([Doe1 , Doe2 ], {John Doe}) (𝑑8 ) Name([Doe1 , Doe2 ], {Johnny Doe})
      (𝑑11 ) HPhone([Doe1 , Doe2 ], {358, 635}) (𝑑6 ) HPhone([Doe3 ], {358})
      (𝑑5 ) Name([Doe3 ], {Mary Doe})

  ℐ2 is obtained from ℐ1 by replacing in 𝑑9 and 𝑑10 {358} and {635} with their union
{358, 635}.
(iii) tgd chase rule. A tgd ∀𝑥 (𝜑(𝑥) → ∃𝑦𝜓(𝑥, 𝑦)) is triggered in an instance ℐ if there is an
assignment 𝜇 from 𝜑(𝑥) to ℐ such that there is no assignment 𝜇′ from 𝜓(𝑥, 𝑦) to ℐ, as required
by the definition of tgd satisfaction. Then, the chase adds new facts to ℐ so that 𝜇′ does exist in
the obtained instance.
Example 11. Let ℐ2 be the instance in Example 10 and (𝑟4 ) be the tgd HPhone(𝑝1 , 𝑓 ) ∧
HPhone(𝑝2 , 𝑓 ) → SameHouse(𝑝1 , 𝑝2 , 𝑓 ). The tgd (𝑟4 ) is triggered in ℐ2 because of (𝑑11 ),(𝑑6 ).
Its application generates ℐ3 , shown below:
    (𝑑7 ) Name([Doe1 , Doe2 ], {John Doe}) (𝑑8 ) Name([Doe1 , Doe2 ], {Johnny Doe})
    (𝑑11 ) HPhone([Doe1 , Doe2 ], {358, 635}) (𝑑6 ) HPhone([Doe3 ], {358})
    (𝑑5 ) Name([Doe3 ], {Mary Doe})           (𝑑12 ) SameHouse([Doe1 , Doe2 ], [Doe3 ], {358})
  ℐ3 is obtained by adding SameHouse([Doe1 , Doe2 ], [Doe3 ], {358}) to ℐ2 .
  The chase procedure terminates when it produces an instance for which no rule in the TBox is
applicable. One such sequence 𝜎 = ℐ0 , ℐ1 , . . . , ℐ𝑘 is called a finite chase and the result of the
chase, chase(𝒪, 𝜎), coincides with ℐ𝑘 .
The result of the chase is an Universal Solution (Theorem 2 of [3]). If 𝒪 is an ontology and 𝜎
is a finite chase for 𝒪, then chase(𝒪, 𝜎) is a universal solution for 𝒪.
   Given the ontology composed by the TBox in Example 1 and the ABox in Example 2, it is
possible to construct a finite chase sequence ℐ0 , ℐ1 , ℐ2 , ℐ3 , . . . ℐ𝑘 , where ℐ0 , ℐ1 , ℐ2 , ℐ3 are as in
Examples 8–11 and ℐ𝑘 is the universal solution of Example 3 (and thus 𝑞 ℐ coincides with the set
of certain answers to 𝑞).


5. Conclusion and Future Works
In this paper we reported on a principled approach to query answering and entity resolution that
may be relevant to several areas, like Data Exchange [5] and Integration [17], or Ontology-based
Data Access [18, 19] considered the central role played by tgds and egds in those contexts,
but also in Information Extraction [20, 21, 22] or in general for data-intensive tasks [23]. Our
investigation is however still initial and several aspects need further study. Although not explicitly
shown in this document, tgds could give rise to an infinite universal solution because of the
possible infinite creation of existentials. We left open how to obtain a universal solution from a
non-terminating chase sequence. Solving this case would extend our approach even to settings
where all universal solutions are infinite. Though not fully materializable, the infinite chase is
generally considered an important theoretical tool to show properties of query answering, such
as rewritability [15], possibly combined with partial materialization of the chase result [24].
Morover our framework intentionally considers expressive tgds and egds, in order to encompass
several popular ontology languages. It is however well-known that without suitable restrictions
query answering (and reasoning in general) is undecidable [25, 26]. It is thus crucial to identify
conditions ensuring decidability, and possibly tractability, of query answering.


Acknowledgments
Scafoglieri’s research was entirely and exclusively supported by PNRR MUR project PE0000013-
FAIR. Lembo’s research was supported by EU ICT-48 2020 project TAILOR (No. 952215), EU
ERA-NET Cofund ICT-AGRI-FOOD project ADCATER (No. 40705), PNRR MUR project
PE0000013-FAIR and the MUR PRIN 2022LA8XBH project Polar (POLicy specificAtion and
enfoRcement for privacy-enhanced data management).
References
 [1] O. Benjelloun, H. Garcia-Molina, D. Menestrina, Q. Su, S. E. Whang, J. Widom, Swoosh:
     a generic approach to entity resolution, Very Large Database J. 18 (2009) 255–276.
 [2] G. Papadakis, E. Ioannou, E. Thanos, T. Palpanas, The Four Generations of Entity Resolu-
     tion, Synthesis Lectures on Data Management, Morgan & Claypool Publishers, 2021.
 [3] R. Fagin, P. G. Kolaitis, D. Lembo, L. Popa, F. Scafoglieri, A Framework for Combining
     Entity Resolution and Query Answering in Knowledge Bases, in: Proceedings of the 20th
     International Conference on Principles of Knowledge Representation and Reasoning, 2023,
     pp. 229–239.
 [4] C. Beeri, M. Y. Vardi, A proof procedure for data dependencies, J. of the ACM 31 (1984)
     718–741.
 [5] R. Fagin, P. G. Kolaitis, R. J. Miller, L. Popa, Data exchange: Semantics and query
     answering, Theoretical Computer Science 336 (2005) 89–124.
 [6] J. Baget, M. Leclère, M. Mugnier, E. Salvat, On rules with existential variables: Walking
     the decidability line, Artificial Intelligence 175 (2011) 1620–1654.
 [7] M. Krötzsch, M. Marx, S. Rudolph, The power of the terminating chase (invited talk), in:
     Proc. of the 22nd Int. Conf. on Database Theory (ICDT), volume 127 of LIPIcs, 2019, pp.
     3:1–3:17.
 [8] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P. F. Patel-Schneider (Eds.), The De-
     scription Logic Handbook: Theory, Implementation and Applications, 2nd ed., Cambridge
     University Press, 2007.
 [9] A. Calì, G. Gottlob, T. Lukasiewicz, A general datalog-based framework for tractable query
     answering over ontologies, J. of Web Semantics 14 (2012) 57–83.
[10] M. Bienvenu, G. Cima, V. Gutieérrez-Basulto, LACE: A logical approach to collective
     entity resolution, in: Proc. of the 41st ACM SIGACT SIGMOD SIGAI Symp. on Principles
     of Database Systems (PODS), 2022, pp. 379–391.
[11] L. E. Bertossi, S. Kolahi, L. V. S. Lakshmanan, Data cleaning and query answering with
     matching dependencies and matching functions, Theoretical Computer Science 52 (2013)
     441–482.
[12] W. Fan, Dependencies revisited for improving data quality, in: Proc. of the 27th ACM
     SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS), 2008, pp.
     159–170.
[13] K. Devlin, Sets, functions, and logic: an introduction to abstract mathematics, Chapman
     and Hall/CRC, 2018.
[14] A. Calì, G. Gottlob, M. Kifer, Taming the infinite chase: Query answering under expressive
     relational constraints, J. of Artificial Intelligence Research 48 (2013) 115–174.
[15] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, R. Rosati, Tractable reasoning
     and efficient query answering in description logics: The DL-Lite family, J. of Automated
     Reasoning 39 (2007) 385–429.
[16] D. S. Johnson, A. C. Klug, Testing containment of conjunctive queries under functional and
     inclusion dependencies, J. of Computer and System Sciences 28 (1984) 167–189.
[17] A. Doan, A. Y. Halevy, Z. G. Ives, Principles of Data Integration, Morgan Kaufmann, 2012.
[18] M. Bienvenu, M. Ortiz, Ontology-mediated query answering with data-tractable description
     logics, in: W. Faber, A. Paschke (Eds.), Reasoning Web. Semantic Technologies for
     Intelligent Data Access – 11th Int. Summer School Tutorial Lectures (RW), volume 9203,
     2015, pp. 218–307.
[19] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, R. Rosati, Ontology-based data
     access and integration, in: L. Liu, M. T. Özsu (Eds.), Encyclopedia of Database Systems,
     Second Edition, Springer, 2018.
[20] D. Lembo, Y. Li, L. Popa, K. Qian, F. Scafoglieri, Ontology mediated information extraction
     with MASTRO SYSTEM-T, in: Proceedings of the ISWC 2020 Demos and Industry
     Tracks, volume 2721 of CEUR Electronic Workshop Proceedings, http://ceur-ws.org/,
     CEUR-WS.org, 2020, pp. 256–261.
[21] D. Lembo, F. M. Scafoglieri, Ontology-based document spanning systems for informa-
     tion extraction, Int. J. Semantic Comput. 14 (2020) 3–26. URL: https://doi.org/10.1142/
     S1793351X20400012. doi:10.1142/S1793351X20400012.
[22] R. Fagin, B. Kimelfeld, F. Reiss, S. Vansummeren, Document spanners: A formal approach
     to information extraction, Journal of the ACM (JACM) 62 (2015) 1–51.
[23] V. Christophides, V. Efthymiou, T. Palpanas, G. Papadakis, K. Stefanidis, An overview
     of end-to-end entity resolution for big data, ACM Computing Surveys (CSUR) 53 (2020)
     1–42.
[24] C. Lutz, D. Toman, F. Wolter, Conjunctive query answering in the description logic ℰℒ
     using a relational database system, in: Proc. of the 21st Int. Joint Conf. on Artificial
     Intelligence (IJCAI), 2009, pp. 2070–2075.
[25] C. Beeri, M. Y. Vardi, The implication problem for data dependencies, in: Proc. of the 8th
     Coll. on Automata, Languages and Programming (ICALP), volume 115 of Lecture Notes in
     Computer Science, Springer, 1981, pp. 73–85.
[26] A. Calì, D. Lembo, R. Rosati, On the decidability and complexity of query answering over
     inconsistent and incomplete databases, in: Proc. of the 22nd ACM SIGACT SIGMOD
     SIGART Symp. on Principles of Database Systems (PODS), 2003, pp. 260–271.