=Paper=
{{Paper
|id=Vol-3741/paper38
|storemode=property
|title=Combining Entity Resolution and Query Answering in Ontologies: A Formal Conceptual Framework
|pdfUrl=https://ceur-ws.org/Vol-3741/paper38.pdf
|volume=Vol-3741
|authors=Ronald Fagin,Phokion G. Kolaitis,Domenico Lembo,Lucian Popa,Federico Scafoglieri
|dblpUrl=https://dblp.org/rec/conf/sebd/FaginKL0S24
}}
==Combining Entity Resolution and Query Answering in Ontologies: A Formal Conceptual Framework==
Combining Entity Resolution and Query Answering in Ontologies: A Formal Conceptual Framework (Discussion Paper) Ronald Fagin1 , Phokion G. Kolaitis1,2 , Domenico Lembo3 , Lucian Popa1 and Federico Scafoglieri3,* 1 IBM Research, Almaden, USA 2 UC Santa Cruz, USA 3 Sapienza University of Rome, Italy Abstract This paper reports on a recent formal framework for integrating entity resolution and query answer- ing within ontologies governed by rules expressed in the form of tuple-generating dependencies and equality-generating dependencies. Our framework outlines the semantics of the ontology by defining special instances involving equivalence classes of entities and sets of values. The former serve to gather entities referring to the same real-world object, and the latter to catalog alternative attribute values. This approach not only addresses entity resolution but also overcomes potential inconsistencies within the data. Additionally, we introduce a tailored chase procedure for this framework, ensuring non-failure and ultimately yielding a universal solution. This universal solution, in turn, can be used to obtain certain answers to conjunctive queries. Keywords Formal Conceptual Framework, Entity Resolution, Query Answering, Knowledge Bases, Ontology, Chase, Universal Model, tuple-generating dependencies, equality-generating dependencies, entity-resolution rules 1. Introduction Entity resolution is the problem of determining whether different data records refer to the same real-world object, such as the same individual or the same organization, etc. [1, 2]. In this discussion paper we summarize a framework recently appeared in [3] for entity resolu- tion in combination with query answering in the context of ontologies consisting of ground atoms and rules specified as tuple-generating dependencies (tgds) and equality-generating dependencies (egds). These rules have been widely investigated in databases and knowledge representation, e.g. in [4, 5, 6, 7]; they can express axioms that are used in Description Logics (DLs) [8], as well as in knowledge bases that are similar to Datalog +/- programs [9]. Additionally, egds are employed to express typical entity resolution rules that one may write in practice, as in [10]. These rules specify the conditions under which to entities have to be made equal, to say, for instance, that two persons are indeed the same person if they have similar names and the same date of birth. In SEBD 2024: 32nd Symposium on Advanced Database Systems, June 23-26, 2024, Villasimius, Sardinia, Italy * Corresponding author. $ fagin@us.ibm.com (R. Fagin); kolaitis@ucsc.edu (P. G. Kolaitis); i.lembo@diag.uniroma1.it (D. Lembo); lpopa@us.ibm.com (L. Popa); scafoglieri@diag.uniroma1.it (F. Scafoglieri) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings the following, we call such rules entity-egds, whereas we name value-egds the egds imposing equality on values, used, e.g., to express that a person has only one name. In the proposed framework, the first challenge we face is to come up with a consistent way to complete or modify the original ontology, while respecting all its rules. To this aim we propose a new notion of valid models, called ontology solutions. They must satisfy all entity resolution rules along with all other ontology rules. Furthermore, the solutions must include all original data (i.e., no information is ever dropped or altered). Our approach is guided by the intuitive principle that for each real-world object there must be a single node in the solution that represents all “equivalent” entities denoting the object. To achieve this, we use equivalence classes of entities that become first-class citizens in our framework. In addition, we relax the standard way in which value-egds are satisfied by allowing solutions to use sets of values. Intuitively, we interpret egds as matching dependencies [11, 1, 12]. That is, when the conditions expressed by (the body of) an entity-egd or a value-egd hold in the data, and thus two entities or two values must be made equal, we combine them through a merging function that amounts to taking the union of all the alternatives. In this way, we group all entities that denote the same real-world object into a unique set, which turns out to be an equivalence class. Similarly, when two values exist where only one value is instead allowed according to the TBox, we explicitly form their union. Note that we consider the union of entities a “global” feature for a solution, that is, an entity 𝑒 may belong to exactly one equivalence class; in contrast, the union of values is “local” to the context in which a value occurs, that is, a particular value may belong to more than one set of values. We also remark that the semantics we adopt for value-egds allows us to always have a solution (that is, a model) for an ontology, even when there are no models according to the standard first-order semantics. Indeed, if a value-egd enforces the equality of different values, we collect together such values, whereas first-order logic would conclude that the ontology is inconsistent. Besides formalizing the aforementioned ideas, our contribution is on how to combine entity resolution with query answering. In this respect, (𝑖) we define universal solutions and show that, as for standard tgd and egd semantics, universal solutions can be used to obtain the certain answers of Conjunctive Queries (CQs); (𝑖𝑖) we propose a variant of the classical chase procedure [4, 5] tailored to our approach; (𝑖𝑖𝑖) we show that, when the chase procedure terminates, it returns a universal solution (and thus we have an algorithm for computing the certain answers to CQs). In the following, we discuss our findings mainly by examples and refer the reader to [3] for a complete formal treatment. 2. Framework We refer the reader to [13] for an introduction to equivalence classes and relations. In the following, let ∼ be an equivalence relation, we write [𝑥] to denote the equivalence class containing 𝑥, instead of [𝑥]∼ . Moreover, we may write, for example [𝑎, 𝑏, 𝑐]∼ (or simply [𝑎, 𝑏, 𝑐]) to denote the equivalence class consisting of the elements 𝑎, 𝑏, and 𝑐. Syntax. An atom is an expression of the form 𝑃 (𝑡1 , . . . , 𝑡𝑛 ), where 𝑃 is a symbol to denote an 𝑛-ary ontology or built-in predicate, i.e. a pre-interpreted predicate like the Jaccard similarity (JaccSim). Each 𝑡𝑖 is either a variable or a constant, which can be in turn a value or an entity, the latter used to denote a real-world object. We impose that when 𝑃 is a built-in predicate, constants occurring in 𝑃 (𝑡1 , . . . , 𝑡𝑛 ) are only values. A ground atom is an atom with no variables. A conjunction 𝜑(𝑥) of atoms is an expression 𝑃1 (𝑡1 ) ∧ . . . ∧ 𝑃𝑚 (𝑡𝑚 ), where each 𝑃𝑗 (𝑡𝑗 ) is an atom such that each variable in the tuple 𝑡𝑗 of terms is among those in the tuple 𝑥 of variables. An ontology 𝒪 is a pair (𝒯 , 𝒜), consisting of a TBox 𝒯 and a ABox 𝒜. The TBox is a finite set of tuple-generating dependencies (tgds) and equality-generating dependencies (egds). A tgd is a formula of the form: ∀𝑥 (𝜑(𝑥) → ∃𝑦 𝜓(𝑥, 𝑦)), where 𝜑(𝑥) and 𝜓(𝑥, 𝑦) are conjunctions of atoms, such that 𝑥 and 𝑦 have no variables in common and 𝜓(𝑥, 𝑦) is built-in free and, for simplicity, constant-free. As in [5], we assume that all variables in 𝑥 appear in 𝜑(𝑥), but not necessarily in 𝜓(𝑥, 𝑦). We call 𝜑(𝑥) the body of the tgd, and 𝜓(𝑥, 𝑦) the head of it. An egd is a formula of the form: ∀𝑥 (𝜑(𝑥) → 𝑦 = 𝑧), where 𝜑(𝑥) is a conjunction of atoms and 𝑦 and 𝑧 are distinct variables occurring in 𝜑(𝑥), such that either both 𝑦 and 𝑧 are entity-variables (in which case we have an entity-egd) or both 𝑦 and 𝑧 are value-variables (in which case we have a value-egd). We call 𝜑(𝑥) the body of the egd. In the examples, for simplicity, we will omit the universal quantification in tgds and egds. Example 1. Let 𝒯 be the TBox consisting of the rules: (𝑟1 ) Name(𝑝1 , 𝑛1 ) ∧ Name(𝑝2 , 𝑛2 ) ∧ JaccSim(𝑛1 , 𝑛2 , 0.7) → 𝑝1 = 𝑝2 (𝑟2 ) Name(𝑝, 𝑛1 ) ∧ Name(𝑝, 𝑛2 ) → 𝑛1 = 𝑛2 (𝑟3 ) HPhone(𝑝, 𝑓1 ) ∧ HPhone(𝑝, 𝑓2 ) → 𝑓1 = 𝑓2 (𝑟4 ) HPhone(𝑝1 , 𝑓 ) ∧ HPhone(𝑝2 , 𝑓 ) → SameHouse(𝑝1 , 𝑝2 , 𝑓 ) Here 𝑝1 , 𝑝2 , 𝑝 are entity-variables, i.e., occur in predicate arguments ranging over entities, whereas 𝑛1 , 𝑛2 , 𝑓1 , 𝑓2 , 𝑓 are value-variables, i.e., occur in predicate arguments ranging over values. The predicates have the meaning suggested by their names. In particular, the predicate Name maintains the individual’s name, HPhone stores their home phone contact, whereas SameHouse associates individuals living in the same house with their phone number. Each of the four rules makes an assertion about the predicates. In particular, the first rule (𝑟1 ) states that if the Jaccard similarity of two names is higher then 0.7, then these names are associated to the same individual. The TBox also says that everyone can have only one name (𝑟2 ) and only one landline phone number (𝑟3 ), and that two individuals with the same landline phone number live in the same house (𝑟4 ). The ABox 𝒜 of an ontology 𝒪 is a finite set of ground atoms of the form 𝑃 (𝑐1 , . . . , 𝑐𝑛 ) where each 𝑐𝑖 is an entity or a value. Example 2. Let 𝒜 be the ABox consisting of the atoms (𝑔1 ) Name(Doe1 , John Doe), (𝑔2 ) HPhone(Doe1 , 358) (𝑔3 ) Name(Doe2 , Johnny Doe), (𝑔4 ) HPhone(Doe2 , 635) (𝑔5 ) Name(Doe3 , Mary Doe), (𝑔6 ) HPhone(Doe3 , 358) In words, the ABox 𝒜 specifies that Doe1 has name John Doe and home phone number 358, Doe2 has name Johnny Doe and home phone number 635, Doe3 has name Mary Doe and phone number 358. Semantics. We assume to have countably infinite many entity-nulls and value-nulls (note that we consider generic tgds, i.e. possibly having existential variables in the rule head, even though, for space limits we do not provide examples of this kind). The semantics of the ontology is given using special ABoxes, called ontology instances, whose ground atoms have components that are either equivalence classes of entities and entity-nulls, which we call E-sets, or non-empty sets of values and value-nulls, which we call V-sets. Ontology Instance (Definition 1 of [3]). An instance ℐ for an ontology 𝒪 w.r.t. an equivalence relation ∼ is a set of facts 𝑃 (𝑇1 , . . . , 𝑇𝑛 ) such that 𝑃 is a 𝑛-ary predicate in 𝒪, and each 𝑇𝑖 is either an E-set w.r.t. ∼ or a V-set. Example 3. Let 𝒪 = (𝒯 , 𝒜) be an ontology such that 𝒯 and 𝒜 are as in Example 1 and in Example 2, respectively. Consider the following set ℐ of facts. (𝑑1 ) Name([Doe1 , Doe2 ], {John Doe, Johnny Doe}) (𝑑2 ) HPhone([Doe1 , Doe2 ], {358, 635}) (𝑑3 ) Name([Doe3 ], {Mary Doe}) (𝑑4 ) HPhone([Doe3 ], {358}) (𝑑5 ) SameHouse([Doe1 , Doe2 ], [Doe3 ]) (𝑑6 ) SameHouse([Doe3 ], [Doe1 , Doe2 ]) ℐ is an instance for 𝒪 w.r.t. the equivalence relation ∼ over the set {Doe1 , Doe2 , Doe3 } such that Doe1 ∼ Doe2 , i.e. Doe1 and Doe2 are in the same equivalence class and Doe3 is the only element of another obviously disjoint equivalence class. To define the notions of satisfaction of tgds and egds by an instance, we first introduce the notion of an assignment from a conjunction 𝜑(𝑥) of atoms to an instance ℐ for an ontology 𝒪. Assignment (Definition 3 of [3]). An assignment from a conjunction 𝜑(𝑥) of atoms to an instance ℐ is a mapping 𝜇 from the variables and values in 𝜑(𝑥) to E-sets and V-sets of ℐ such that • each entity-variable is mapped to an E-set; • different occurrences of a value-variable are mapped to V-sets with non-empty intersection; • each value 𝑣 is mapped to a V-set 𝑉 , such that 𝑣 ∈ 𝑉 ; • for each atom 𝑃 (𝑥, 𝑒, 𝑣) of 𝜑(𝑥), where 𝑥 is a variable, 𝑒 is an entity and 𝑣 is a value, ℐ contains a fact of the form 𝑃 (𝜇(𝑥), [𝑒], 𝜇(𝑣)) (the definition generalizes in the obvious way for atoms of different form). Example 4. Let ℐ be the instance in Example 3. A possible assignment 𝜇 from the body of rule (𝑟4 ) of Example 1 to ℐ is given below: We are now ready to define the semantics of tgds and egds. Semantics of tgds (Definition 4 of [3]). An instance ℐ for an ontology 𝒪 satisfies a tgd ∀𝑥(𝜑(𝑥) → ∃𝑦𝜓(𝑥, 𝑦)) if for each assignment 𝜇 from 𝜑(𝑥) to ℐ there is an assignment 𝜇′ from 𝜓(𝑥, 𝑦) to ℐ such that, for each 𝑥 in 𝑥 • 𝜇(𝑥) = 𝜇′ (𝑥), if 𝑥 is an entity-variable • the intersection of V-sets assigned by 𝜇 to each occurrence of 𝑥 in 𝜑(𝑥) is contained in the intersection of V-sets assigned by 𝜇′ to each occurrence of 𝑥 in 𝜓(𝑥, 𝑦), if 𝑥 is a value-variable Example 5. Let ℐ be the instance in Example 3 and (𝑟4 ) be the tgd in Example 1. Let 𝜇 be an assignment from the body of (𝑟4 ) to ℐ and let 𝜇′ be an assignment from the head of (𝑟4 ) to ℐ as described below: Since 𝜇 is the only assignment from the body of (𝑟4 ) to ℐ, we conclude that ℐ satisfies (𝑟4 ). Semantics of egds (Definition 4 of [3]). An instance ℐ for an ontology 𝒪 satisfies an egd ∀𝑥(𝜑(𝑥) → 𝑦 = 𝑧) if each assignment 𝜇 from 𝜑(𝑥) to ℐ is such that 𝜇(𝑦) = 𝜇(𝑧) Note that according to the above definition, in every assignment 𝜑(𝑥) to ℐ all occurrences of 𝑦 and 𝑧 must be mapped to the same set, for an egd to be satisfied by an ontology instance. Example 6. Let ℐ be the instance in Example 3 and (𝑟3 ) be the egd in Example 1. Let 𝜇 be an assignment from the body of (𝑟3 ) to ℐ as described below: The assignment 𝜇 trivially enjoys the conditions of the definition of egd satisfaction. It is easy to see that the same holds for all assignments from the body of (𝑟3 ) to ℐ, and thus we conclude that ℐ satisfies (𝑟3 ). Finally, we define when an instance is a solution for an ontology. Ontology Solution (Definition 5 of [3]). Let 𝒪 = (𝒯 , 𝒜) be an ontology and ℐ be an instance. ℐ is a solution for 𝒪 if ℐ satisfies 𝒯 and 𝒜, i.e.: • ℐ satisfies 𝒯 if ℐ satisfies every tgd and egd in 𝒯 . • ℐ satisfies 𝒜 if it satisfies every ground atom 𝑃 (𝑐1 , . . . , 𝑐𝑛 ) in 𝒜. A ground atom is satisfied if there is a fact 𝑃 (𝑇1 , . . . , 𝑇𝑛 ) in ℐ such that 𝑐𝑖 ∈ 𝑇𝑖 , for 1 ≤ 𝑖 ≤ 𝑛. Among all ontology solutions we are interested to the so-called universal solutions, which exhibit special properties with respect to the task of query answering (see, e.g., [5, 14, 15]). Universal Solution (Definitions 8 of [3]). A solution 𝒰 for an ontology 𝒪 is universal if, for every solution ℐ for 𝒪, there is a homomorphism ℎ : 𝒰 → ℐ. The notion of homomorphism used in the previous definition is tailored to our framework and suitably considers E-sets and V-sets occurring in instances. Informally, for an instance ℐ, we denote with under (ℐ) the set containing all entities, entity-nulls, values, and value-nulls occurring in ℐ. Then, let ℐ1 and ℐ2 be two instances, an homomorphism ℎ : ℐ1 → ℐ2 is a structure-preserving mapping from under (ℐ1 ) to under (ℐ2 ) (we refer to Definition 7 of [3] for further details). 3. Query Answering A conjunctive query (CQ) 𝑞(𝑥) is a formula ∃𝑦 𝜑(𝑥, 𝑦), where 𝜑(𝑥, 𝑦) is a built-in free conjunc- tion of atoms. Answers to CQs. The answer 𝑞 ℐ to a CQ 𝑞(𝑥1 , . . . , 𝑥𝑛 ) on an instance ℐ is the set of all tuples ⟨𝑇1 , . . . , 𝑇𝑛 ⟩ such that there is an assignment 𝜇 from 𝑞 to ℐ for which: • 𝑇𝑖 = 𝜇(𝑥𝑖 ), if 𝑥𝑖 is an entity-variable; • 𝑇𝑖 , is the intersection of the V-sets assigned to the various occurrences of 𝑥𝑖 , if 𝑥𝑖 is a value-variable. Example 7. Below we give an example of query, assignment 𝜇 from the query to the instance ℐ of Example 3, and answers to the query. It is easy to see that 𝑞 ℐ = {⟨{358}⟩}. When querying an ontology 𝒪, we are interested in reasoning over all solutions for 𝒪. We thus adapt the classical notion of certain answers to our framework. To this aim, we need some preliminary definitions. A tuple 𝑇 = ⟨𝑇1 , . . . , 𝑇𝑛 ⟩ is null-free if each 𝑇𝑖 is non-empty and contains no nulls. Let 𝑇 = ⟨𝑇1 , . . . , 𝑇𝑛 ⟩ and 𝑇 ′ = ⟨𝑇1′ , . . . , 𝑇𝑛′ ⟩ be two tuples of E-sets and V-sets, we say that 𝑇 ′ dominates 𝑇 , denoted 𝑇 ≤ 𝑇 ′ , if 𝑇𝑖 ⊆ 𝑇𝑖′ , for all 1 ≤ 𝑖 ≤ 𝑛. Certain Answers (Definition 9 of [3]). A null-free tuple 𝑇 of E-sets and V-sets is a certain answer to a CQ 𝑞 w.r.t. an ontology 𝒪 if 1. for every solution ℐ for 𝒪, there is a tuple 𝑇 ′ ∈ 𝑞 ℐ such that 𝑇 ≤ 𝑇 ′ 2. there is no null-free tuple 𝑇 ′ that satisfies (1) and 𝑇 < 𝑇 ′ We write 𝑐𝑒𝑟𝑡(𝑞, 𝒪) to denote the set of certain answers to 𝑞 w.r.t. 𝒪 We are now able to present the main result of this section, which asserts that universal solutions can be used to compute certain answers to CQs in our framework. Compute Certain Answers over a Universal Solution (Theorem 1 of [3]). Let 𝑞 be a CQ, let 𝒪 be an ontology, and let 𝒰 be a universal solution for 𝒪. Then 𝑐𝑒𝑟𝑡(𝑞, 𝒪) = 𝑞 𝒰 ↓𝜌 . The operator ↓𝜌 first eliminates nulls occurring in 𝑞 𝒰 , then, in the resulting set, it eliminates tuples that are dominated by other tuples (see [3] for the details). 4. Computing Universal Solution In order to compute the universal solution that can be used to obtain the certain answers to CQs, we adapt here the well-known notion of restricted chase [4, 16, 5, 15] to our framework. Intuitively, given an ontology 𝒪 = (𝒯 , 𝒜), we define a procedure that starts from an instance for 𝒪 “specular” to the ABox 𝒜, which we call the base instance of 𝒪, and incrementally constructs an instance that satisfies the tgds and the egds of the TBox 𝒯 . Example 8. Let 𝒪 = (𝒯 , 𝒜) be an ontology such that 𝒯 and 𝒜 are as in Example 1 and in Example 2, respectively. The base instance of 𝒪 is as follows: (𝑑1 ) Name([Doe1 ], {John Doe}) (𝑑2 ) Name([Doe2 ], {Johnny Doe}) (𝑑3 ) HPhone([Doe1 ], {358}) (𝑑4 ) HPhone([Doe2 ], {635}) (𝑑5 ) Name([Doe3 ], {Mary Doe}) (𝑑6 ) HPhone([Doe3 ], {358}) A universal solution for 𝒪 is obtained by iteratively applying three chase rules, one for each kind of rule that may occur in 𝒯 , as long as they are triggered in the (current) instance. (i) entity-egd chase rule. An entity-egd ∀𝑥(𝜑(𝑥) → 𝑦 = 𝑧) is triggered in an instance ℐ if there is an assignment 𝜇 from 𝜑(𝑥) to ℐ such that 𝜇(𝑦) ̸= 𝜇(𝑧). Then, the chase substitutes 𝜇(𝑦) and 𝜇(𝑧) everywhere ℐ with 𝜇(𝑦) ∪ 𝜇(𝑧). Example 9. Let ℐ0 be the starting instance in Example 8 and (𝑟1 ) be the e-egd Name(𝑝1 , 𝑛1 ) ∧ Name(𝑝2 , 𝑛2 ) ∧ JaccSim(𝑛1 , 𝑛2 , 0.7) → 𝑝1 = 𝑝2 . The egd (𝑟1 ) is triggered in ℐ0 because of (𝑑1 ),(𝑑2 ). Its application generates ℐ1 , which is as follows: (𝑑7 ) Name([Doe1 , Doe2 ], {John Doe}) (𝑑8 ) Name([Doe1 , Doe2 ], {Johnny Doe}) (𝑑9 ) HPhone([Doe1 , Doe2 ], {358}) (𝑑10 ) HPhone([Doe1 , Doe2 ], {635}) (𝑑5 ) Name([Doe3 ], {Mary Doe}) (𝑑6 ) HPhone([Doe3 ], {358}) ℐ1 is obtained from ℐ0 by replacing everywhere [Doe1 ] and [Doe2 ] with [Doe1 , Doe2 ]. (ii) value-egd chase rule. Similarly to the entity-edg case, when a value-egd forces two different sets of values to be equated, we take the union of the two sets and modify the instance accordingly. Example 10. Let ℐ1 be the instance in Example 9 and (𝑟3 ) be the value-egd HPhone(𝑝, 𝑓1 ) ∧ HPhone(𝑝, 𝑓2 ) → 𝑓1 = 𝑓2 . The egd (𝑟3 ) is triggered in ℐ1 because of (𝑑9 ),(𝑑1 0). Its application generates ℐ2 , shown below: (𝑑7 ) Name([Doe1 , Doe2 ], {John Doe}) (𝑑8 ) Name([Doe1 , Doe2 ], {Johnny Doe}) (𝑑11 ) HPhone([Doe1 , Doe2 ], {358, 635}) (𝑑6 ) HPhone([Doe3 ], {358}) (𝑑5 ) Name([Doe3 ], {Mary Doe}) ℐ2 is obtained from ℐ1 by replacing in 𝑑9 and 𝑑10 {358} and {635} with their union {358, 635}. (iii) tgd chase rule. A tgd ∀𝑥 (𝜑(𝑥) → ∃𝑦𝜓(𝑥, 𝑦)) is triggered in an instance ℐ if there is an assignment 𝜇 from 𝜑(𝑥) to ℐ such that there is no assignment 𝜇′ from 𝜓(𝑥, 𝑦) to ℐ, as required by the definition of tgd satisfaction. Then, the chase adds new facts to ℐ so that 𝜇′ does exist in the obtained instance. Example 11. Let ℐ2 be the instance in Example 10 and (𝑟4 ) be the tgd HPhone(𝑝1 , 𝑓 ) ∧ HPhone(𝑝2 , 𝑓 ) → SameHouse(𝑝1 , 𝑝2 , 𝑓 ). The tgd (𝑟4 ) is triggered in ℐ2 because of (𝑑11 ),(𝑑6 ). Its application generates ℐ3 , shown below: (𝑑7 ) Name([Doe1 , Doe2 ], {John Doe}) (𝑑8 ) Name([Doe1 , Doe2 ], {Johnny Doe}) (𝑑11 ) HPhone([Doe1 , Doe2 ], {358, 635}) (𝑑6 ) HPhone([Doe3 ], {358}) (𝑑5 ) Name([Doe3 ], {Mary Doe}) (𝑑12 ) SameHouse([Doe1 , Doe2 ], [Doe3 ], {358}) ℐ3 is obtained by adding SameHouse([Doe1 , Doe2 ], [Doe3 ], {358}) to ℐ2 . The chase procedure terminates when it produces an instance for which no rule in the TBox is applicable. One such sequence 𝜎 = ℐ0 , ℐ1 , . . . , ℐ𝑘 is called a finite chase and the result of the chase, chase(𝒪, 𝜎), coincides with ℐ𝑘 . The result of the chase is an Universal Solution (Theorem 2 of [3]). If 𝒪 is an ontology and 𝜎 is a finite chase for 𝒪, then chase(𝒪, 𝜎) is a universal solution for 𝒪. Given the ontology composed by the TBox in Example 1 and the ABox in Example 2, it is possible to construct a finite chase sequence ℐ0 , ℐ1 , ℐ2 , ℐ3 , . . . ℐ𝑘 , where ℐ0 , ℐ1 , ℐ2 , ℐ3 are as in Examples 8–11 and ℐ𝑘 is the universal solution of Example 3 (and thus 𝑞 ℐ coincides with the set of certain answers to 𝑞). 5. Conclusion and Future Works In this paper we reported on a principled approach to query answering and entity resolution that may be relevant to several areas, like Data Exchange [5] and Integration [17], or Ontology-based Data Access [18, 19] considered the central role played by tgds and egds in those contexts, but also in Information Extraction [20, 21, 22] or in general for data-intensive tasks [23]. Our investigation is however still initial and several aspects need further study. Although not explicitly shown in this document, tgds could give rise to an infinite universal solution because of the possible infinite creation of existentials. We left open how to obtain a universal solution from a non-terminating chase sequence. Solving this case would extend our approach even to settings where all universal solutions are infinite. Though not fully materializable, the infinite chase is generally considered an important theoretical tool to show properties of query answering, such as rewritability [15], possibly combined with partial materialization of the chase result [24]. Morover our framework intentionally considers expressive tgds and egds, in order to encompass several popular ontology languages. It is however well-known that without suitable restrictions query answering (and reasoning in general) is undecidable [25, 26]. It is thus crucial to identify conditions ensuring decidability, and possibly tractability, of query answering. Acknowledgments Scafoglieri’s research was entirely and exclusively supported by PNRR MUR project PE0000013- FAIR. Lembo’s research was supported by EU ICT-48 2020 project TAILOR (No. 952215), EU ERA-NET Cofund ICT-AGRI-FOOD project ADCATER (No. 40705), PNRR MUR project PE0000013-FAIR and the MUR PRIN 2022LA8XBH project Polar (POLicy specificAtion and enfoRcement for privacy-enhanced data management). References [1] O. Benjelloun, H. Garcia-Molina, D. Menestrina, Q. Su, S. E. Whang, J. Widom, Swoosh: a generic approach to entity resolution, Very Large Database J. 18 (2009) 255–276. [2] G. Papadakis, E. Ioannou, E. Thanos, T. Palpanas, The Four Generations of Entity Resolu- tion, Synthesis Lectures on Data Management, Morgan & Claypool Publishers, 2021. [3] R. Fagin, P. G. Kolaitis, D. Lembo, L. Popa, F. Scafoglieri, A Framework for Combining Entity Resolution and Query Answering in Knowledge Bases, in: Proceedings of the 20th International Conference on Principles of Knowledge Representation and Reasoning, 2023, pp. 229–239. [4] C. Beeri, M. Y. Vardi, A proof procedure for data dependencies, J. of the ACM 31 (1984) 718–741. [5] R. Fagin, P. G. Kolaitis, R. J. Miller, L. Popa, Data exchange: Semantics and query answering, Theoretical Computer Science 336 (2005) 89–124. [6] J. Baget, M. Leclère, M. Mugnier, E. Salvat, On rules with existential variables: Walking the decidability line, Artificial Intelligence 175 (2011) 1620–1654. [7] M. Krötzsch, M. Marx, S. Rudolph, The power of the terminating chase (invited talk), in: Proc. of the 22nd Int. Conf. on Database Theory (ICDT), volume 127 of LIPIcs, 2019, pp. 3:1–3:17. [8] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P. F. Patel-Schneider (Eds.), The De- scription Logic Handbook: Theory, Implementation and Applications, 2nd ed., Cambridge University Press, 2007. [9] A. Calì, G. Gottlob, T. Lukasiewicz, A general datalog-based framework for tractable query answering over ontologies, J. of Web Semantics 14 (2012) 57–83. [10] M. Bienvenu, G. Cima, V. Gutieérrez-Basulto, LACE: A logical approach to collective entity resolution, in: Proc. of the 41st ACM SIGACT SIGMOD SIGAI Symp. on Principles of Database Systems (PODS), 2022, pp. 379–391. [11] L. E. Bertossi, S. Kolahi, L. V. S. Lakshmanan, Data cleaning and query answering with matching dependencies and matching functions, Theoretical Computer Science 52 (2013) 441–482. [12] W. Fan, Dependencies revisited for improving data quality, in: Proc. of the 27th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS), 2008, pp. 159–170. [13] K. Devlin, Sets, functions, and logic: an introduction to abstract mathematics, Chapman and Hall/CRC, 2018. [14] A. Calì, G. Gottlob, M. Kifer, Taming the infinite chase: Query answering under expressive relational constraints, J. of Artificial Intelligence Research 48 (2013) 115–174. [15] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, R. Rosati, Tractable reasoning and efficient query answering in description logics: The DL-Lite family, J. of Automated Reasoning 39 (2007) 385–429. [16] D. S. Johnson, A. C. Klug, Testing containment of conjunctive queries under functional and inclusion dependencies, J. of Computer and System Sciences 28 (1984) 167–189. [17] A. Doan, A. Y. Halevy, Z. G. Ives, Principles of Data Integration, Morgan Kaufmann, 2012. [18] M. Bienvenu, M. Ortiz, Ontology-mediated query answering with data-tractable description logics, in: W. Faber, A. Paschke (Eds.), Reasoning Web. Semantic Technologies for Intelligent Data Access – 11th Int. Summer School Tutorial Lectures (RW), volume 9203, 2015, pp. 218–307. [19] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, R. Rosati, Ontology-based data access and integration, in: L. Liu, M. T. Özsu (Eds.), Encyclopedia of Database Systems, Second Edition, Springer, 2018. [20] D. Lembo, Y. Li, L. Popa, K. Qian, F. Scafoglieri, Ontology mediated information extraction with MASTRO SYSTEM-T, in: Proceedings of the ISWC 2020 Demos and Industry Tracks, volume 2721 of CEUR Electronic Workshop Proceedings, http://ceur-ws.org/, CEUR-WS.org, 2020, pp. 256–261. [21] D. Lembo, F. M. Scafoglieri, Ontology-based document spanning systems for informa- tion extraction, Int. J. Semantic Comput. 14 (2020) 3–26. URL: https://doi.org/10.1142/ S1793351X20400012. doi:10.1142/S1793351X20400012. [22] R. Fagin, B. Kimelfeld, F. Reiss, S. Vansummeren, Document spanners: A formal approach to information extraction, Journal of the ACM (JACM) 62 (2015) 1–51. [23] V. Christophides, V. Efthymiou, T. Palpanas, G. Papadakis, K. Stefanidis, An overview of end-to-end entity resolution for big data, ACM Computing Surveys (CSUR) 53 (2020) 1–42. [24] C. Lutz, D. Toman, F. Wolter, Conjunctive query answering in the description logic ℰℒ using a relational database system, in: Proc. of the 21st Int. Joint Conf. on Artificial Intelligence (IJCAI), 2009, pp. 2070–2075. [25] C. Beeri, M. Y. Vardi, The implication problem for data dependencies, in: Proc. of the 8th Coll. on Automata, Languages and Programming (ICALP), volume 115 of Lecture Notes in Computer Science, Springer, 1981, pp. 73–85. [26] A. Calì, D. Lembo, R. Rosati, On the decidability and complexity of query answering over inconsistent and incomplete databases, in: Proc. of the 22nd ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS), 2003, pp. 260–271.