=Paper=
{{Paper
|id=Vol-3741/paper45
|storemode=property
|title=Schema Decomposition via Transformation Patterns
|pdfUrl=https://ceur-ws.org/Vol-3741/paper45.pdf
|volume=Vol-3741
|authors=Théo Abgrall
|dblpUrl=https://dblp.org/rec/conf/sebd/Abgrall24
}}
==Schema Decomposition via Transformation Patterns==
Schema Decomposition via Transformation Patterns Théo Abgrall1,∗ 1 Free University of Bozen-Bolzano Abstract A goal of database reverse engineering techniques is the extraction of a conceptual model describing an otherwise complex or obtuse database system. In such contexts, and notably, when applied to legacy databases, thorough preservation of the knowledge lying in the source schema is a reasonable priority. In the common database reverse engineering pipeline, database normalization is the process in which we are most able to dictate and maintain the information preservation property. We make use of the transformation pattern formalism as our base to guarantee information preservation. We present how these theoretical templates can be applied over complex database schema and how they can serve as the foundation of an automatic, deterministic database normalization methodology. Keywords Schema Transformation, Information Preservation, Conceptual Modeling, Database Normalization 1. Introduction As time passes, knowledge of and used in database management systems is at risk of becoming either lost or outdated. There are ways to mitigate this inevitability by requiring strict control and upkeep of a persistent conceptual model for each change done over the original database [1]. But in some scenarios, such as legacy database, the only approach left is recovery. Recovery of several forms of knowledge usually takes the form of a brand new conceptual model emerging from the bribes of information at its disposal. Database Reverse Engineering (DBRE) encompasses the set of methods and tools aiming for this result. Naturally, in such a context, the desire for the conceptual model generated to be precise and to reflect exactly the veiled information is quite high. Thus emerges our will to mix theory on information preservation with this process of conceptualization. Our foundation is the research done in [2] which explores the application of information capacity preservation in the restricted context of first-order schemata. Focusing on a simple notation based on first-order logic in which we translate concepts of database theory allows for the logical verification of constraint and schema preservation. This research was done through the lens of DBRE to provide methods for transforming databases. First, into other ones in higher normal forms and then, into conceptual models after passing by an intermediary step incorporating the abstract relational model notation. This process eventually crystallises into specific templates designed to represent database transformations called Transformation Patterns (TPs). This strong formalism was however lacking a proper associated application SEBD 2024: 32nd Symposium on Advanced Database Systems, June 23-26, 2024, Villasimius, Sardinia, Italy ∗ Corresponding author. $ tabgrall@unibz.it (T. Abgrall) 0000-0001-9999-2756 (T. Abgrall) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings methodology and served more as guidance for database engineer. To illustrate the various sections of the methodology we provide the following ongoing example Fig. 1: Figure 1: Ongoing Example This example presents a legacy database schema with a single universal relation 𝑃 𝑒𝑟𝑠𝑜𝑛 ∶ (𝑆𝑆𝑁, 𝑝ℎ𝑜𝑛𝑒, 𝑚𝑎𝑛𝑎𝑔𝑒𝑟, 𝑡𝑖𝑡𝑙𝑒, 𝑐𝑖𝑡𝑦, 𝑐𝑜𝑢𝑛𝑡𝑟𝑦, 𝑚𝑎𝑦𝑜𝑟) onto which are applied several constraints: - a primary key on 𝑆𝑆𝑁 denoted by a bold underline - a functional dependency 𝑐𝑖𝑡𝑦 → 𝑐𝑜𝑢𝑛𝑡𝑟𝑦, 𝑚𝑎𝑦𝑜𝑟 written with a top single arrow - another functional dependency 𝑚𝑎𝑛𝑎𝑔𝑒𝑟 → 𝑡𝑖𝑡𝑙𝑒, this time restrained to tuples satisfying a condition 𝑚𝑎𝑛𝑎𝑔𝑒𝑟, 𝑡𝑖𝑡𝑙𝑒 ≠ NULL requiring both attributes to be non-null - a multivalued dependency 𝑆𝑆𝑁 ↠ 𝑝ℎ𝑜𝑛𝑒 signed with a top double-headed arrow and finally - two inclusion dependency 𝑚𝑎𝑛𝑎𝑔𝑒𝑟 ⊆ 𝑆𝑆𝑁 and 𝑚𝑎𝑦𝑜𝑟 ⊆ 𝑆𝑆𝑁 , both shown as bottoms arrows. Using transformation patterns as our base, we propose a methodology based on them for automatic database normalization. It starts with a primer on information capacity and what leads us to focus on TPs as a means of preserving knowledge with no loss. Once we present how these restrictive patterns could be applied to more complex schemas, we describe a method- ology to remove all null values and all join dependencies from the source schema via schema decompositions. For closure, we contextualize our work among the many adjacent fields it grazes before discussing our future application of transformation patterns. 2. Background 2.1. Information Preservation At its core, most of the literature regarding the preservation of information during a schema transformation emerged from Hull’s [3] definition of four decreasingly strict notions of schema dominance. The main idea behind schema dominance is that for two relational schemas, any valid database instance in one of the schemas can be expressed in the other. Definition 1 (Schema Transformation). Let there be two schemas 𝑆 and 𝑇 . A schema trans- formation from 𝑆 to 𝑇 is a mapping function 𝑓𝑆→𝑇 ∶ 𝑆 → 𝑇 such that 𝑓𝑆→𝑇 𝐼(𝑆) → 𝐼(𝑇 ) where 𝐼 correspond to all valid database instances of 𝑆 and 𝑇 , respectively. Furthermore, a mapping function 𝑓𝑆→𝑇 ∶ 𝑆 → 𝑇 is total if ∀𝑥 ∃𝑦 𝑓𝑆→𝑇 (𝑥) = 𝑦, injective if ∀𝑥, 𝑥′ 𝑓𝑆→𝑇 (𝑥) = 𝑓𝑆→𝑇 (𝑥′ ), surjective if ∀𝑦 ∃𝑥 𝑓𝑆→𝑇 (𝑥) = 𝑦 and bijective if 𝑓𝑆→𝑇 is total, injective and surjective. Definition 2 (Schema Dominance). Let there be two schemas 𝑆 and 𝑇 and the two mapping functions 𝑓𝑆→𝑇 and 𝑓𝑇 →𝑆 . We say that 𝑆 dominates 𝑇 if for any instance 𝐼(𝑇 ) of 𝑇 : 𝑓𝑆→𝑇 ○ 𝑓𝑇 →𝑆 𝐼(𝑇 ) = 𝐼(𝑇 ). If 𝑆 dominates 𝑇 , then the mapping function 𝑓𝑇 →𝑆 is injective and total. Definition 3 (Schema Equivalence). Let there be two schemas 𝑆 and 𝑇 . We say that 𝑆 and 𝑇 are equivalent if and only if 𝑆 dominates 𝑇 and 𝑇 dominates 𝑆. If 𝑆 and 𝑇 are equivalent, then the mapping function 𝑓𝑆→𝑇 is bijective. A genesis of schema dominance can be found in [4] where it is defined as a query-equivalence problem. Later, these definitions were expanded [5] by answering the conjuncture that schema transformations could preserve primary keys, thus implying the preservation of not only instances but also relational constraints. Once more, the notions of schema dominance and equivalence were extended in [6] which defines a correct schema transformation as one which preserves both constraints and instances, thus differentiating between instance preservation, constraint preservation and their combination, information capacity preservation. 2.2. Transformation Patterns Walking in [2] footsteps, we restrict our definition of schema to that of first-order schemas. A first-order schema (FOS) 𝑆 is a couple (︀A𝑆 , C𝑆 ⌋︀ where A𝑆 define the alphabet of predicates used, therefore databases tables, and C𝑆 define the set of constraints applied over these relations. A predicate, written 𝑟(𝑎1 , 𝑎2 , ..., 𝑎𝑛 ) is composed of a predicate name 𝑟 and a finite set of attribute names of arity 𝑛. FOS provides a representation of various kinds of constraints, ranging from functional dependencies to constraints on values. More interestingly, restricting ourselves to first-order logic semantics allows for simple proofs of FOS equivalence through mutual entailment via a process automated by theorem solvers, in our case we use Prover9 [7]. Now that we need a way of writing schema transformation per the semantics of first-order logic, we move toward a definition of view as a mapping function over both alphabets of two FOS: Definition 4 (FO Mappings). For two schema 𝑆 and 𝑇 , a First-Order (FO) Mappings 𝑀𝑆→𝑇 is a set of views 𝑎𝑇 = 𝜑𝑆 where 𝑎𝑇 is each predicate present in 𝑇 and 𝜑𝑆 is an first-order expression over the alphabet A𝑆 of 𝑆. For readability, we write those FO Mappings in relational algebra. We can now translate the previous definitions of schema dominance by interchanging schema transformation with FO mappings. Thus resulting in the notion of FO Dominance, informally a schema 𝑆 FO Dominate another schema 𝑇 if 𝑆 dominates 𝑇 via mapping functions 𝑓𝑆→𝑇 and 𝑓𝑇 →𝑆 written as FO Mappings. Using the FOS notation, we can also write FO Dominance as such (C𝑆 ∪ 𝑀𝑆→𝑇 ) ⊧ (C𝑇 ∪ 𝑀𝑇 →𝑆 ) This lead to this final definition of Losslessness; Definition 5 (Losslessness). Let 𝑆 and 𝑇 be two first-order schemas, if 𝑆 FO Dominates 𝑇 and 𝑇 FO Dominates 𝑆, then 𝑆 and 𝑇 are the lossless representations of each other. A result which we can also write as (C𝑆 ∪ 𝑀𝑆→𝑇 ) ≡ (C𝑇 ∪ 𝑀𝑇 →𝑆 ) Transformation patterns [2] are crafted templates describing a particular schema trans- formation and the constraints necessary to ensure its losslessness. It can be written as the triple 𝑇 𝑃 = (𝐷𝑃𝑆 , 𝐷𝑃𝑇 , 𝑀 ) with 𝐷𝑃𝑆 and 𝐷𝑃𝑇 , two database patterns whose purpose is to represent the state of a database on both sides of a transformation 𝑀 written as mappings pattern in relational algebra. Database Pattern and Mapping Patterns are both similar to FOS and relational algebra mappings respectively, with the difference that instead of ma- nipulating attributes, they manipulate variables. Variables are sets of attributes defined by the constraints applied to them. For example, a relation 𝑊 𝑜𝑟𝑘𝑒𝑟(𝑆𝑆𝑁, 𝑛𝑎𝑚𝑒, 𝑑𝑜𝑏) with a sole primary key constraint on 𝑆𝑆𝑁 fits the following relation pattern 𝑅(𝑃 𝐾, 𝑅𝐸𝑆𝑇 ). A database pattern is a couple 𝐷𝑃 = (𝑅, 𝐶) composed of a set of relation patterns and one of constraint patterns. Relations Patterns are defined similarly to FOS predicate 𝑅 ∶ (𝐴𝑇 𝑇1 , 𝐴𝑇 𝑇2 , ..., 𝐴𝑇 𝑇𝑛 ) with 𝑅 standing as the relation name and 𝐴𝑇 𝑇1 , ..., 𝐴𝑇 𝑇𝑛 being variables either present in some constraint patterns or unconstrained and qualified as 𝑅𝐸𝑆𝑇 . Constraint Patterns forms vary from dependency to dependency, but they all can be written as 𝜓 𝜃 (︀𝑅1 , ...𝑅𝑛 ⌋︀(𝐴𝑇 𝑇1 , ..., 𝐴𝑇 𝑇𝑚 )(𝜎𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 ). 𝜓 denotes the type of constraints used and 𝜃 precises that type. Notably differentiating between a total inclusion dependency written 𝑖𝑛𝑑= (︀𝑅1 , 𝑅2 ⌋︀(𝐴𝑇 𝑇1 , 𝐴𝑇 𝑇2 ) and a 𝑖𝑠𝐴 one 𝑖𝑛𝑑⊆ (︀𝑅1 , 𝑅2 ⌋︀(𝐴𝑇 𝑇1 , 𝐴𝑇 𝑇2 ). Finally, 𝑐𝑜𝑛𝑑 de- notes a conjunction of simple conditions over values. Despite being a bit heavy, this formal notation is necessary as a thorough assignment of each constraint over their associated relation is mandatory. 2.2.1. Example We present a short example illustrating the potential of TPs. In Tab. 1 (𝑎) we have a simplified version of our ongoing example 𝑊 𝑜𝑟𝑘𝑒𝑟 ∶ (𝑆𝑆𝑁, 𝑝ℎ𝑜𝑛𝑒, 𝑚𝑎𝑛𝑎𝑔𝑒𝑟, 𝑐𝑖𝑡𝑦) with only one con- straint: 𝑚𝑎𝑛𝑎𝑔𝑒𝑟 is nullable. We want to horizontally split it into a second database schema (𝑏) containing two tables: one where all values of 𝑚𝑎𝑛𝑎𝑔𝑒𝑟 are non-null (𝑊 𝑜𝑟𝑘𝑒𝑟_𝑤_𝑀 𝑎𝑛𝑎𝑔𝑒𝑟) and in the other there is no 𝑚𝑎𝑛𝑎𝑔𝑒𝑟 (𝑊 𝑜𝑟𝑘𝑒𝑟_𝑤𝑡_𝑀 𝑎𝑛𝑎𝑔𝑒𝑟). Because there is little worth in keeping a column full of null values, we also want to drop 𝑚𝑎𝑛𝑎𝑔𝑒𝑟 from the second table. Worker_w_Manager SSN phone manager city Worker 012-3155 133039133 024-5143 Southfield SSN phone manager city 012-3155 125682994 024-5143 Southfield 012-3155 133039133 024-5143 Southfield 031-7471 131297308 040-5166 Pearland 012-3155 125682994 024-5143 Southfield Worker_wt_Manager 024-5143 181332948 𝑁 𝑈 𝐿𝐿 Pearland SSN phone city 031-7471 131297308 040-5166 Pearland 024-5143 181332948 Pearland 040-5166 156422537 𝑁 𝑈 𝐿𝐿 Southfield 040-5166 156422537 Southfield 040-5166 187797355 𝑁 𝑈 𝐿𝐿 Southfield 040-5166 187797355 Southfield (a) Source Schema (b) After Horizontal Decomposition Table 1 Example of a Database Horizontal Decomposition First, we need a way to represent (𝑎) in our formalism, and it starts with expressing that 𝑚𝑎𝑛𝑎𝑔𝑒𝑟 is nullable. A condition over a set of attributes is commonly written as 𝜎𝑐𝑜𝑛𝑑 𝑅 ≠ ∅ in a TP to denote that in the relation 𝑅, tuples satisfying 𝑐𝑜𝑛𝑑 could exists. This is called a guard constraint. They can also express conditions that either always hold or never do by writing 𝜎𝑐𝑜𝑛𝑑 𝑅 = 𝑅 and 𝜎𝑐𝑜𝑛𝑑 𝑅 = ∅ respectively. In our example, the nullable attribute manager would be written as such 𝜎𝑚𝑎𝑛𝑎𝑔𝑒𝑟=NULL 𝑊 𝑜𝑟𝑘𝑒𝑟 ≠ ∅. Put as a database pattern, we have a constraint pattern 𝜎𝐴𝑇 𝑇 =NULL 𝑅 ≠ ∅ applying over a set of attributes 𝐴𝑇 𝑇 , coalescing with any other number of unrelated attributes captured as the 𝑅𝐸𝑆𝑇 into the relation pattern 𝑅 ∶ (𝐴𝑇 𝑇, 𝑅𝐸𝑆𝑇 ). This is the first half of a transformation pattern, the part onto which we can match our relation 𝑊 𝑜𝑟𝑘𝑒𝑟. The second part denotes instead the two tables where 𝑚𝑎𝑛𝑎𝑔𝑒𝑟 is either always null or non-null. Once again, in our formalism, these two conditions are expressed as 𝜎𝐴𝑇 𝑇 =NULL 𝑅1 = ∅ and 𝜎𝐴𝑇 𝑇 =NULL 𝑅2 = 𝑅2 for 𝑐𝑜𝑛𝑑 is never satisfied and 𝑐𝑜𝑛𝑑 is always satisfied respectively. From these conditions, we can deduce two relation patterns 𝑅1 ∶ (𝐴𝑇 𝑇, 𝑅𝐸𝑆𝑇 ) and 𝑅2 ∶ (𝑅𝐸𝑆𝑇 ). Finally, its mapping patterns is two sets of FO-Mappings 𝑅1 = 𝜎𝐴𝑇 𝑇 ≠NULL (𝑅), 𝑅2 = Π𝑅𝐸𝑆𝑇 (𝜎𝐴𝑇 𝑇 =NULL (𝑅)) and 𝑅 = 𝑅1 ⊍ 𝑅2 . These mappings uses standards relational algebra notation plus the outer union operation denoted by ⊍. The full TP is shown in Fig. 2 (𝑎) where an additional constraint 𝑐𝑜𝑛𝑠𝑡 is used to capture residual constraints, a notion developed in the next section. Applying this TP over our 𝑊 𝑜𝑟𝑘𝑒𝑟 relation requires to match the variables in the former with the attributes in the latter. To do so, 𝑊 𝑜𝑟𝑘𝑒𝑟 has to be rewritten as an FOS so that its semantics can match those of any TPs. Here, this match is trivial and it is quickly deduced that 𝐴𝑇 𝑇 = {𝑚𝑎𝑛𝑎𝑔𝑒𝑟} and 𝑅𝐸𝑆𝑇 = {𝑆𝑆𝑁, 𝑝ℎ𝑜𝑛𝑒, 𝑐𝑖𝑡𝑦}. We call a transformation pattern, a database pattern or a mapping pattern instantiated once it has been matched with real values from a database schema. (a) TP for Horizontal Decomposition (b) TP for Vertical Decomposition of a MVD Figure 2: Example of a Transformation Patterns The decomposition resulting from a TP application can be expressed as a set of views. This is crucial when working with legacy databases or in cases where it’s neither practical nor desirable to actively modify the source database. And because the transformations are lossless and thus preserve information capacity, it is possible to update the source database from any of its equivalent views [8]. As they stand, TPs are useful tools for DBRE to database engineers as guidance for a normalization process, as well as an eventual translation into a conceptual model. TPs notably offer a way to resolve the issue of identity resolution in database schema. The crux is that in a traditional relational database, there is no way to identify an object with the same precision a conceptual model would allow. The closest is primary keys, but there are many cases in which it gets complicated. Arises the 𝐴𝑏𝑠𝑡𝑟𝑎𝑐𝑡 𝑅𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑎𝑙 𝑀 𝑜𝑑𝑒𝑙 (ARM) [9], a model which explicitly gives an Object Identifier to relational schemas. An extensive catalogue of TP turning databases into their ARM counterparts was presented in [2]. However, they required the database pattern to already be in a fully decomposed form, a process never explored before. 3. Transformation Pattern Application One thing missing for the application of TPs is a proper understanding of the 𝑐𝑜𝑛𝑠𝑡 constraint and more generally of the process of constraint propagation. If we added 𝑚𝑣𝑑(︀𝑅⌋︀(𝑆𝑆𝑁, 𝑝ℎ𝑜𝑛𝑒) and 𝑝𝑘(︀𝑅⌋︀(𝑆𝑆𝑁 ) to the relation 𝑊 𝑜𝑟𝑘𝑒𝑟 used in last example, then they would be considered as residual constraints in the application of the TP 𝑎) in Fig. 2 and assigned to both 𝑅𝐸𝑆𝑇 𝑘 and 𝑅𝐸𝑆𝑇 𝑙 as constraints over 𝑅𝐸𝑆𝑇 . Here, 𝑘 and 𝑙 separate the residual constraints applied over all variables satisfying 𝑐𝑜𝑛𝑑 and those that do not, respectively. Such basic propagation seems straightforward in our example but as we apply TPs over databases containing several tables, all linked via foreign keys, then we need to have a proper propagation methodology. One that has two main objectives: -matching the TP and the database schema and - merging the newly decomposed schema with the rest of the old one. In Fig. 3 we present the main components of a successful pattern application, this time starting with a vertical decomposition following 𝑆𝑆𝑁 ↠ 𝑝ℎ𝑜𝑛𝑒 (see Fig. 2 (𝑏)) and showing how this affect the attribute 𝑚𝑎𝑛𝑎𝑔𝑒𝑟. Figure 3: Showcase of a TP for multivalued dependencies decomposition (see Fig. 2 (𝑏)) applied over a schema example. 𝑆𝑅𝐸𝑆𝑇 is another FOS corresponding to 𝑆 minus the constraint matched by Σ𝑆 Here, Σ𝑆 denote the mapping function between the source database 𝑆 and the desired TP, matching with 𝐷𝑃𝑆 . The mapping is trivial when only one constraint needs to be matched. Once the TP is applied and we have a decomposition in 𝐷𝑃𝑆 , we need to merge this new database schema with the previous one, propagating all constraints from the latter in the process. We call the constraints left behind after the application of a TP leftover constraints. In Fig. 3, the leftover constraint is the guard constraint 𝜎𝑚𝑎𝑛𝑎𝑔𝑒𝑟=NULL 𝑊 𝑜𝑟𝑘𝑒𝑟 ≠ ∅. The propagation of those leftover constraints, and of their associated tables is done via a set of rules defined in the merging function Σ𝑇 . Later on, we will introduce further rules when dealing with overlapping guard constraints, a then-defined notion, but for now, the three most important rules are: • Basic Propagation: If a leftover constraint applies over a set of attributes found again in 𝐷𝑃𝑇 , then it is propagated fully. • Inclusion Update: Following a vertical decomposition, any inclusion dependencies over a set of attributes captured by the TP will get updated by changing the relations it applies over (i.e., one of the newly generated relations). • Inclusion Split: Following a horizontal decomposition, a previously total inclusion dependency (𝑖𝑛𝑑= ) will split into two subset dependencies (𝑖𝑛𝑑⊆ ) with the corresponding new relations names. Now that we have an intuitive base to process multiple TPs it is time to move on to our proposed decomposition methodology. 4. Decomposition Process Stemming out DBRE, TPs theory takes roots deeper inside the field of Database Normalization [10] centering around the question of data quality through the prism of structural redundancy. One of its solutions is the process of data decomposition in which a sequence of operations is applied over a database schema to remove redundancies and null values improving data consistency. Normal forms (NF) is the metric used to define the set of desirable properties a database satisfies. Ranging from the 1NF which checks for the atomization of each cell to the 5NF satisfied if all non-trivial join dependencies are only applied over superkeys. So far, most of the normalization approaches have taken the form of an algorithm [11] [12] focalizing on properties of functional dependencies and consequently rarely going beyond the Boyce-Codd NF. We believe that an approach based on lossless TPs can not only work just as well but also go beyond and reach the canonical 6NF, taking care of null values in the process. In this section, we will present several types of TPs along with a short methodology on how to apply them. Focusing only on a single database schema under the Universal Relation Assumption (URA), we assume that it is at least in 1NF and that its set of constraints is acyclic. We naturally expect the whole decomposition process to be lossless by definition, but another property we are aiming for is to attain some kind of determinism. In the last section, we unfolded a twice decomposed database schema, first vertically then horizontally. Of course, another sequence would have given another result. To say which final output is preferable might sound arbitrary, but we claim that the former, the one ending with fewer tables, is actually preferable and more in line with a restructuring of knowledge made for conceptual models. 4.1. Inclusion Dependency Extraction We preemptively create a singleton table based on the right-hand side of an inclusion depen- dency, then change that constraint to target the newly generated table. In the database schema 𝑆 made of 𝑊 𝑜𝑟𝑘𝑒𝑟 ∶ (𝑆𝑆𝑁, 𝑝ℎ𝑜𝑛𝑒, 𝑚𝑎𝑛𝑎𝑔𝑒𝑟) and 𝑖𝑛𝑑⊆ (︀𝑊 𝑜𝑟𝑘𝑒𝑟, 𝑊 𝑜𝑟𝑘𝑒𝑟⌋︀(𝑚𝑎𝑛𝑎𝑔𝑒𝑟, 𝑆𝑆𝑁 ) decomposing into 𝑆 ′ makes 𝑊 𝑜𝑟𝑘𝑒𝑟 ∶ (𝑆𝑆𝑁, 𝑝ℎ𝑜𝑛𝑒, 𝑚𝑎𝑛𝑎𝑔𝑒𝑟) 𝑅𝑆𝑆𝑁 ∶ (𝑆𝑆𝑁 ) and 𝑖𝑛𝑑⊆ (︀𝑊 𝑜𝑟𝑘𝑒𝑟, 𝑅𝑆𝑆𝑁 ⌋︀(𝑚𝑎𝑛𝑎𝑔𝑒𝑟, 𝑆𝑆𝑁 ). In the cases of transitive inclusion dependencies or sets of inclusion dependencies sharing the same right-hand side, additional patterns are necessary, some illustrated in Figure. 4. Figure 4: Instantiated Inclusion Dependency Extraction Pattern. Full line: regular case ; Dot line: transitive case 4.2. Vertical Decomposition Vertical Decomposition (VD) is one of the core operations along with horizontal decompositions. Its principle is to project into a new table all attributes present in the join constraint and to drop its right-hand-side attributes from the source relation. This applies to either a functional depen- dency or a multivalued dependency and both have their kind of TPs, see Fig. 2 (𝑏) for the latter. The TP defining the vertical decomposition of a relation 𝑅 ∶ (𝑅𝐸𝑆𝑇, 𝐿𝐻𝑆, 𝑅𝐻𝑆), with regards to a join dependency we write as 𝜓(︀𝑅⌋︀(𝐿𝐻𝑆, 𝑅𝐻𝑆), always generate 𝑅1 ∶ (𝑅𝐸𝑆𝑇, 𝐿𝐻𝑆), 𝑅2 ∶ (𝐿𝐻𝑆, 𝑅𝐻𝑆), a full inclusion between the tables 𝑖𝑛𝑑= (︀𝑅1, 𝑅2⌋︀(𝐿𝐻𝑆, 𝐿𝐻𝑆) and ends with this constraint 𝑝𝑘(︀𝑅⌋︀(𝐿𝐻𝑆) or 𝑝𝑘(︀𝑅⌋︀(𝐿𝐻𝑆, 𝑅𝐻𝑆) for functional dependency or multi- valued dependency, respectively, see Fig. 5. This figure also depicts the transformation done to our ongoing example. Figure 5: Instantiated Vertical Decomposition Pattern In this scenario, no sequencing is required and both decomposition can be done independently from the other. Independent in this context means that a pair of constraints 𝑐1 , 𝑐2 are in none of the three following combinations: shared RHS, transitive and critical. Decomposing a set of overlapping constraints 𝑐1 , 𝑐2 , ..., 𝑐𝑛 of the form 𝑐𝑖 = 𝜓(︀𝑅⌋︀(𝐿𝐻𝑆𝑖 , 𝑅𝐻𝑆𝑖 ) such that 𝑅𝐻𝑆1 ∧ 𝑅𝐻𝑆2 ∧ ... ∧ 𝑅𝐻𝑆𝑛 ≠ ∅ require a TP which pre- serve the greatest common subset of attributes in the right-hand side at each step. To illustrate, the greatest common denominator for the set 𝑐1 = 𝑋 → 𝑇 , 𝑐2 = 𝑌 → 𝑇 𝑈 𝑉 , 𝑐3 = 𝑍 ↠ 𝑇 𝑉 in the relation 𝑅 ∶ (𝑋, 𝑌, 𝑍, 𝑇, 𝑈, 𝑉 ) where we choose to start by projecting 𝑐1 is 𝑇 , whereas for 𝑐2 it would be 𝑇 𝑉 . The decomposed schema for both scenarios are 𝑅1 ∶ (𝑋, 𝑌, 𝑍, 𝑇, 𝑈, 𝑉 ), 𝑅2 ∶ (𝑋, 𝑇 ) and 𝑅1′ ∶ (𝑋, 𝑌, 𝑍, 𝑇, 𝑉 ), 𝑅2′ ∶ (𝑌, 𝑇, 𝑈, 𝑉 ) respectively. Unlike the previous overlap, transitive overlaps do not require any new TPs. What they require however is an ordering of the decompositions. It is quite clear how, without any ordering, we lose information contained in 𝑅 ∶ (𝑋, 𝑌, 𝑍), 𝑐1 = 𝑋 → 𝑌 , 𝑐2 = 𝑌 → 𝑍 by decomposing 𝑐1 first ending on 𝑅1 ∶ (𝑋, 𝑍) 𝑅2 ∶ (𝑋, 𝑌 ) thus losing 𝑐2 . The ordering we advocate is a relatively simple one in which, borrowing some tree terminology, all decompositions should start from the leaves then upward. The last cases of overlap we have to detect are the critical ones. We borrow our definition of critical constraints from [13], denoting a functional dependency 𝑋 → 𝐴 is critical if and only if there exists in the closure of the set of constraints 𝑌 → 𝐵 such that 𝑌 𝐵 ⊆ 𝑋𝐴 and 𝑋𝐴 ⊆⇑ 𝑌¯ (the closure of an attribute is written 𝑋 ¯ ). A common example is the pair 𝑋𝑌 → 𝑍, 𝑍 → 𝑋. This is where our approach clashes with the literature on database normalization as we refuse to stop here and to lose information, something usually required to reach the BCNF [14]. What we propose to solve an overlap akin to 𝑋𝑌 → 𝑍, 𝑍 → 𝑋 is to simply separate the problematic constraints into their own table, thus generating for any 𝑅 ∶ (𝑋, 𝑌, 𝑍, 𝑅𝐸𝑆𝑇 ) the relations 𝑅1 ∶ (𝑋, 𝑌, 𝑍) and 𝑅2 ∶ (𝑋, 𝑍). Because we do not pretend to present an exhaustive list of all possible types of constraint overlaps over join dependencies, we need a backup plan. That is, a solution which always guarantees both determinism and losslessness when used at the expense of redundancy. And the decomposition used to solve critical overlaps is that solution. We still consider this part of the methodology automatic under the assumption that we have a broader definition of overlaps. We now present a priority scale to follow whenever several types of overlaps, well, overlap: In order of priority, we start with dealing with critical constraints, then transitive ones and then ones that share an RHS. 4.3. Horizontal Decomposition Going further than most database normalization methodologies we propose the use of TPs, such as the one presented in Fig. 2, to express Horizontal Decomposition (HD) with for intent to remove null values from a database schema. We already introduced guard constraint in our example section and now expend on its condition 𝑐𝑜𝑛𝑑. That latter is a conjunction of atomic conditions ⋀𝑛𝑖 (𝐴𝑇 𝑇𝑖 ≠ NULL). As long as all guard constraints are independent of each other, the decomposition methodology is trivial and deterministic. To verify if this property also extends to overlapping guard constraints, we first need to define them. Definition 6 (Guard Overlaps). In a relation 𝑅 ∶ (𝐴𝑇 𝑇𝑛 ) onto which are applied two con- straints 𝜎𝐴𝑇 𝑇𝑘 =NULL 𝑅 ≠ ∅ and 𝜎𝐴𝑇 𝑇𝑙 =NULL 𝑅 ≠ ∅ where, 𝐴𝑇 𝑇𝑘 , 𝐴𝑇 𝑇𝑙 ⊆ 𝐴𝑇 𝑇𝑛 . We distinguish two types of overlap: • Containment: 𝐴𝑇 𝑇𝑘 ⊂ 𝐴𝑇 𝑇𝑙 • Non-Distinct: 𝐴𝑇 𝑇𝑘 ∧ 𝐴𝑇 𝑇𝑙 ≠ ∅, 𝐴𝑇 𝑇𝑘 ⊂⇑ 𝐴𝑇 𝑇𝑙 and 𝐴𝑇 𝑇𝑙 ⊂⇑ 𝐴𝑇 𝑇𝑘 We illustrate these two cases with an example: there be 𝑅 ∶ (𝑋, 𝑌, 𝑍, 𝑇 ) and three overlapping guard constraints 𝑐1 = 𝜎𝑋𝑌 =NULL 𝑅 ≠ ∅, 𝑐2 = 𝜎𝑋=NULL 𝑅 ≠ ∅ and 𝑐3 = 𝜎𝑌 𝑇 =NULL 𝑅 ≠ ∅. Both scenarios are shown with the pairs {𝑐1 , 𝑐2 } and {𝑐1 , 𝑐3 } clearly demonstrating the containment and non-distinct cases respectively. Focusing on the containment case, starting with an HD on 𝑐1 will result in two relations 𝑅1 ∶ (𝑋, 𝑌, 𝑍, 𝑇 ) and 𝑅2 ∶ (𝑍, 𝑇 ), one where no tuples satisfy 𝑋𝑌 = NULL and one where all does. The constraint propagation formula we presented in Section 3 dictates that 𝑐2 and 𝑐3 would hold in both relations which is impossible for 𝑐2 . This is why we need two additional rules: • Guard Loss: 𝜎𝑐𝑜𝑛𝑑1 𝑅 ≠ ∅ do not get propagated in 𝑅′ if 𝑅′ contains one of the following constraints 𝜎𝑐𝑜𝑛𝑑2 𝑅 = ∅, 𝜎𝑐𝑜𝑛𝑑2 𝑅 = 𝑅 for 𝑐𝑜𝑛𝑑2 ⊂ 𝑐𝑜𝑛𝑑1 and 𝑐𝑜𝑛𝑑1 ⊂ 𝑐𝑜𝑛𝑑2 respectively. • Guard Intuition: 𝜎𝑐𝑜𝑛𝑑1 𝑅 ≠ ∅ get propagated in 𝑅′ even if 𝜎𝑐𝑜𝑛𝑑2 𝑅 = 𝑅 holds, with 𝑐𝑜𝑛𝑑1 ⊂⇑ 𝑐𝑜𝑛𝑑2 and 𝑐𝑜𝑛𝑑1 ∧ 𝑐𝑜𝑛𝑑2 ≠ ∅. The intuition behind the first rule is that a constraint can only get propagated if it is not directly contradicted by another. What the other rules allow is for nullable conditions to hold in a table where a subset of the attribute in said condition is non-existent. In our example, this permits 𝑐3 to be propagated to 𝑅2 even though this table is missing the 𝑌 column. Adding these rules to the one previously described leads to the following result: Lemma 1. All sequences of horizontal decompositions are lossless and their schema outputs are equivalents. Proof 1. We want to show that for any binary relation HD(𝑇 𝑃1 , 𝑇 𝑃2 ) representing the sequential application of HD TPs 𝑇 𝑃1 → 𝑇 𝑃2 , then HD is symmetric. Losslessness is already guaranteed by the TP formalism, so only the deterministic aspect is touched. This property is trivial when it comes to independent guard constraints so we test it on both kinds of overlaps. Using the same notation used in the definition, in the containment case starting with a decomposition on the subsumed constraints gives 𝑅1 ∶ (𝐴𝑇 𝑇𝑛 ) and 𝑅2 ∶ (𝐴𝑇 𝑇𝑛 /𝐴𝑇 𝑇𝑘 ). The second decomposition here only affects 𝑅2 because of the guard loss property, giving a final decomposition of 𝑅1 , 𝑅2 and 𝑅3 ∶ (𝐴𝑇 𝑇𝑛 /𝐴𝑇 𝑇𝑙 ). Starting from 𝑇 𝑃2 would instead give us first 𝑅1′ ∶ (𝐴𝑇 𝑇𝑛 ) and 𝑅2′ ∶ (𝐴𝑇 𝑇𝑛 /𝐴𝑇 𝑇𝑙 ) and 𝑇 𝑃1 here only affect 𝑅1′ because of the guard loss property giving for final decomposition 𝑅1′ , 𝑅2′ and 𝑅3′ ∶ (𝐴𝑇 𝑇𝑛 /𝐴𝑇 𝑇𝑘 ). Both decompositions are the same, thus demonstrating the symmetric property of HD in this scenario. A similar logic is applied for non-distinct guard overlaps and the same result, that is 𝑅1 ∶ (𝐴𝑇 𝑇𝑛 ), 𝑅2 ∶ (𝐴𝑇 𝑇𝑛 /𝐴𝑇 𝑇𝑘 ), 𝑅3 ∶ (𝐴𝑇 𝑇𝑛 /𝐴𝑇 𝑇𝑙 ), 𝑅4 ∶ (𝐴𝑇 𝑇𝑛 /(𝐴𝑇 𝑇𝑘 ⋃ 𝐴𝑇 𝑇𝑙 )) is found in all sequences. ◻ Because all possible scenarios of overlap over guard constraints can be represented, our methodology for their decomposition is complete. We can mention a specific case where one or more overlapping guard constraints exist and the union of their set of attributes 𝐴𝑇 𝑇1 ⋃ ... ⋃ 𝐴𝑇 𝑇𝑚 = 𝐴𝑇 𝑇𝑛 is the size of the attribute set 𝐴𝑇 𝑇𝑛 in 𝑅. In this case, the relation satisfying all guard constraints would just be an empty table 𝑅∅ with 𝑚 constraints of the form 𝜎𝐴𝑇 𝑇𝑖 =NULL 𝑅∅ = 𝑅∅ . This is unnecessary and can be resolved by adding a last propagation rule: • Guard Total: 𝜎𝑐𝑜𝑛𝑑1 𝑅 ≠ ∅ with 𝑐𝑜𝑛𝑑1 over 𝐴𝑇 𝑇𝑘 gets propagated as 𝜎𝑐𝑜𝑛𝑑1 𝑅 = ∅ in 𝑅′ if 𝐴𝑇 𝑇𝑛 ⊆ 𝐴𝑇 𝑇𝑘 . 4.4. Conditional Vertical Decomposition A Conditional Function Dependency, defined in [15], is, within our formalism, a de- pendency 𝜓(︀𝑅⌋︀(𝐴𝑇 𝑇𝑖 )(𝜎𝑐𝑜𝑛𝑑 ) of type 𝜓 which only gets applied over tuples that sat- isfy 𝜎𝑐𝑜𝑛𝑑 . We return to our ongoing example, shorten this time as 𝑊 𝑜𝑟𝑘𝑒𝑟 ∶ (𝑆𝑆𝑁, 𝑚𝑎𝑛𝑎𝑔𝑒𝑟, 𝑡𝑖𝑡𝑙𝑒) onto which is applied a condition functional dependency 𝑐1 = 𝑓 𝑑(︀𝑊 𝑜𝑟𝑘𝑒𝑟⌋︀(𝑚𝑎𝑛𝑎𝑔𝑒𝑟, 𝑡𝑖𝑡𝑙𝑒)(𝑚𝑎𝑛𝑎𝑔𝑒𝑟, 𝑡𝑖𝑡𝑙𝑒 ≠ NULL). The functional dependency 𝑚𝑎𝑛𝑎𝑔𝑒𝑟 → 𝑡𝑖𝑡𝑙𝑒 holds only whenever 𝑚𝑎𝑛𝑎𝑔𝑒𝑟 and 𝑡𝑖𝑡𝑙𝑒 are both non-null. If only one CVD is applied we refer to the Fig. 6 instantiated with our recurring example. Otherwise, for any set of conditional dependencies, overlapping or not, we always start with all HDs followed by all VDs, respecting both methodologies for overlap along the way. Figure 6: Conditional Vertical Decomposition Pattern The final order is: IDE → VD → CVD → HD. What results is a pair of mapping patterns between an FOS schema and its equivalent 4NF decomposition, the latter shown in Fig. 7. Figure 7: 4NF Decomposition of the Ongoing Example. Inclusion dependencies are drawn merged for clarity but they are all distinct It is uncertain if a potential decomposition into 5NF and higher can be deterministic, meaning that any further process is likely to be semi-automatic. The end goal being a fully decomposed schema in 6NF which can easily be translated into a conceptual modeling language. Such process is prone to the arbitrary as is the nature of conceptual models. To illustrate, we argue for a new table 𝑃 𝑒𝑟𝑠𝑜𝑛_𝐶𝑖𝑡𝑦 ∶ (𝑆𝑆𝑁, 𝑐𝑖𝑡𝑦) to be added to our schema as it reflects better the relation between 𝑃 𝑒𝑟𝑠𝑜𝑛 and 𝐶𝑖𝑡𝑦 and for the drop of 𝐶𝑖𝑡𝑦 in the workers tables. Taking inspiration from conceptual modeling operations, such as attribute generalization in this case, to create new TPs is something I already encouraged in [16]. Another challenges is to properly draw out concepts from our database schema, an action fitting for TPs turning database schema into their ARM equivalents. 5. Related Work With regards to the rest of the literature on DBRE [17], ignoring the aggregation of knowledge from different sources and the type of output, the closest method to ours would be [18]. This approach also emphasizes the preservation of information, but we believe that shifting to an ontological formalism limits the kind of operations we can apply over the source database, notably direct updates from any point in the framework. Several papers presented their own interpretation to Hull’s information preservation criterion [3]: [8] brings up 𝑆𝐼𝐺, a formal- ism which represents via edges the relations between attributes as functions - [19] propose conflictfreeness as a restriction for the integration of two schemas - [20] put the focus on the transformation itself and how to embody them with instance preserving properties and - [21] which encapsulates how information preservation and bijective functions are related. Throughout the literature provided so far, one common scenario outside of DBRE in which information preservation is desirable is in data integration [22]. Because most database trans- formations are written as mappings, ours included, it is quite intuitive to link them to the mappings used in data integration following either Global as View (GAV) or Local as View (LAV) paradigms. [23] notably proposed a framework based on sequences of simple operations to represent more complex transformations in a data integration context. Delving deeper into information preservation with regard to data integration can lead us to consider methodologies used in data exchange [24]. While the main focus of data exchange is the generation of a proper database instance from constraints, the problem of schema exchange introduced in [25] is closer to our ambitions. Mostly with regards to the grouping of similar schemas into templates making an automatic schema transformation possible. 6. Conclusion and Discussions We presented a case study for the application of the transformation pattern formalism. Showing first how it can be applied to a database schema before asserting that it can serve as the base of a database decomposition process. Any database schema, once translated into FOS, can then be decomposed following a sequence of lossless transformations embodied by the TPs. What result is a pair of mapping patterns between the source database and its equivalent 4NF decomposition. The presented methodology places itself in the context of the SQL Semantic Transducer [26] in which the conceptual model of a database schema can act as its materialized view. Meaning that, once the jump from 6NF to any conceptual modeling language is done, queries and updates applied to the model would get propagated back to the source database. And so would the reverse, a property only guarantee because losslessness was preserved through the entire process. Upcoming works will therefore focus on transitions into both 6NF then to a conceptual model, likely going through an ARM schema in the process. Going further, we consider frameworks ensuring losslessness appealing to other database related fields such as data integration and data preparation. Both handle transformations where information loss is unavoidable, thus requiring both a definition of non-lossless, i.e. lossy, TPs and ways to enforce losslessness still, perhaps taking inspiration from the complementary view problem. References [1] M. Stonebraker, D. Deng, M. L. Brodie, Database decay and how to avoid it, in: 2016 IEEE International Conference on Big Data (Big Data), IEEE, 2016, pp. 7–16. [2] N. Ndefo, E. Franconi, A Study on Information-Preserving Schema Transformations, Inter- national Journal of Semantic Computing 14 (2020) 27–53. URL: https://www.worldscientific. com/doi/abs/10.1142/S1793351X20400024. doi:10.1142/S1793351X20400024. [3] R. Hull, Relative information capacity of simple relational database schemata, in: Pro- ceedings of the 3rd ACM SIGACT-SIGMOD symposium on Principles of database systems, 1984, pp. 97–109. [4] P. Atzeni, G. Ausiello, C. Batini, M. Moscarini, Inclusion and equivalence between relational database schemata, Theor. Comput. Sci. 19 (1982) 267–285. [5] J. Albert, Y. Ioannidis, R. Ramakrishnan, Equivalence of Keyed Relational Schemas by Conjunctive Queries, Journal of Computer and System Sciences 58 (1999) 512–534. URL: https://linkinghub.elsevier.com/retrieve/pii/S0022000099916288. doi:10.1006/jcss. 1999.1628. [6] X. Qian, Correct schema transformations, in: G. Goos, J. Hartmanis, J. Van Leeuwen, P. Apers, M. Bouzeghoub, G. Gardarin (Eds.), Advances in Database Technology — EDBT ’96, volume 1057, Springer Berlin Heidelberg, Berlin, Heidelberg, 1996, pp. 114–128. URL: http: //link.springer.com/10.1007/BFb0014146. doi:10.1007/BFb0014146, series Title: Lecture Notes in Computer Science. [7] W. McCune, Prover9 and mace4, 2005–2010. URL: http://www.cs.unm.edu/~mccune/ prover9/. [8] R. J. Miller, Y. E. Ioannidis, R. Ramakrishnan, The use of information capacity in schema integration and translation, in: VLDB, volume 93, 1993, pp. 120–133. [9] A. Borgida, D. Toman, G. Weddell, On referring expressions in information systems derived from conceptual modelling, in: Conceptual Modeling: 35th International Conference, ER 2016, Gifu, Japan, November 14-17, 2016, Proceedings 35, Springer, 2016, pp. 183–197. [10] C. Beeri, P. A. Bernstein, N. Goodman, A sophisticate’s introduction to database normal- ization theory, in: Readings in artificial intelligence and databases, Elsevier, 1989, pp. 468–479. [11] A. H. Bahmani, M. Naghibzadeh, B. Bahmani, Automatic database normalization and primary key generation, in: 2008 Canadian Conference on Electrical and Computer Engineering, IEEE, Niagara Falls, ON, Canada, 2008, pp. 000011–000016. URL: http: //ieeexplore.ieee.org/document/4564486/. doi:10.1109/CCECE.2008.4564486, iSSN: 0840-7789. [12] D.-M. Tsou, P. C. Fischer, Decomposition of a relation scheme into boyce-codd normal form, ACM SIGACT News 14 (1982) 23–29. [13] H. Koehler, Finding faithful boyce-codd normal form decompositions, in: Algorithmic Aspects in Information and Management: Second International Conference, AAIM 2006, Hong Kong, China, June 20-22, 2006. Proceedings 2, Springer, 2006, pp. 102–113. [14] C. Beeri, P. A. Bernstein, Computational problems related to the design of normal form relational schemas, ACM Transactions on Database Systems 4 (1979) 30–59. URL: https: //dl.acm.org/doi/10.1145/320064.320066. doi:10.1145/320064.320066. [15] W. Fan, F. Geerts, X. Jia, A. Kementsietsidis, Conditional functional dependencies for capturing data inconsistencies, ACM Transactions on Database Systems 33 (2008) 1–48. URL: https://dl.acm.org/doi/10.1145/1366102.1366103. doi:10.1145/1366102.1366103. [16] T. Abgrall, Formalization of Data Integration Transformations, in: S. Chiusano, T. Cerquitelli, R. Wrembel, K. Nørvåg, B. Catania, G. Vargas-Solar, E. Zumpano (Eds.), New Trends in Database and Information Systems, volume 1652, Springer International Publish- ing, Cham, 2022, pp. 615–622. URL: https://link.springer.com/10.1007/978-3-031-15743-1_ 56. doi:10.1007/978-3-031-15743-1_56, series Title: Communications in Computer and Information Science. [17] N. A. Mian, S. A. Khan, N. A. Zafar, Database Reverse Engineering Methods: What is Missing? 2 (2013). [18] L. Lubyte, S. Tessaris, Automatic Extraction of Ontologies Wrapping Relational Data Sources, in: S. S. Bhowmick, J. Küng, R. Wagner (Eds.), Database and Expert Sys- tems Applications, volume 5690, Springer Berlin Heidelberg, Berlin, Heidelberg, 2009, pp. 128–142. URL: http://link.springer.com/10.1007/978-3-642-03573-9_10. doi:10.1007/ 978-3-642-03573-9_10, series Title: Lecture Notes in Computer Science. [19] L. Ekenberg, P. Johannesson, Conflictfreeness as a basis for schema integration, in: International Conference on Information Systems and Management of Data, Springer, 1995, pp. 1–13. [20] P. McBrien, A. Poulovassilis, A formalisation of semantic schema integration, Information Systems 23 (1998) 307–334. [21] I. Kobayashi, Losslessness and semantic correctness of database schema transformation: Another look of schema equivalence, Information Systems 11 (1986) 41–59. URL: https: //linkinghub.elsevier.com/retrieve/pii/0306437986900220. doi:10.1016/0306-4379(86) 90022-0. [22] A. Doan, A. Halevy, Z. Ives, Principles of data integration, Elsevier, 2012. [23] P. McBrien, A. Poulovassilis, Data integration by bi-directional schema transforma- tion rules, in: Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405), IEEE, Bangalore, India, 2003, pp. 227–238. URL: http://ieeexplore.ieee.org/ document/1260795/. doi:10.1109/ICDE.2003.1260795. [24] R. Fagin, P. G. Kolaitis, R. J. Miller, L. Popa, Data exchange: semantics and query answering, Theoretical Computer Science 336 (2005) 89–124. URL: https://linkinghub.elsevier.com/ retrieve/pii/S030439750400725X. doi:10.1016/j.tcs.2004.10.033. [25] P. Papotti, R. Torlone, Schema exchange: Generic mappings for transforming data and metadata, Data & Knowledge Engineering 68 (2009) 665–682. URL: https://linkinghub. elsevier.com/retrieve/pii/S0169023X09000184. doi:10.1016/j.datak.2009.02.005. [26] T. Abgrall, E. Franconi, Understanding the sql semantic transducer, in: International Conference on Advanced Information Systems Engineering, Springer, 2024.