<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Changes to Code: An Approach Based on a Unified Data Model</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alberto Hernández Chillón</string-name>
          <email>alberto.hernandez1@um.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jesús García Molina</string-name>
          <email>jmolina@um.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>José Ramón Hoyos</string-name>
          <email>jose.hoyos@um.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>María José Ortín</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Computer Science, University of Murcia</institution>
          ,
          <addr-line>Murcia</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Schema evolution is a crucial activity when managing databases in order to satisfy new requirements and improve data and code quality. Therefore, a great research efort has been devoted to automate this activity for relational databases, and agile database development is the most recent innovation. With the emergence of NoSQL systems, the attention has shifted to study schema evolution for these stores that are characterized by the absence of a declaration of schema, and the use of APIs instead of the usage of SQL-like languages. Moreover, the number of multi-model database systems and tools is continuously growing as polyglot persistence becomes the architecture of choice to support the requirements of modern applications. In this scenario, several works have presented approaches to support schema evolution for relational and NoSQL systems. In previous work, we presented the U-Schema unified metamodel to represent schemas for relational systems and the four most widely used NoSQL paradigms. For U-Schema, we defined a taxonomy of schema change operations, and the Orion language to specify scripts defining sets of schema changes based on operations of the taxonomy. The Orion engine is then able to automatically update schema and data from Orion scripts. In this paper, we present an ongoing work aimed at adding schema and code co-evolution to Orion. The novelty of our proposal is to define schema changes in a platform-independent way, and then automatically generate static code analysis queries to find the statements that need to be updated. These queries are obtained by using model to text transformations, and they return information about the location of the code fragments to be updated, and the modifications that need to be applied so the code conforms to the new schema.</p>
      </abstract>
      <kwd-group>
        <kwd>NoSQL databases</kwd>
        <kwd>Schema evolution</kwd>
        <kwd>Taxonomy of changes</kwd>
        <kwd>Code update</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In the industrial and academic database community, the
idea of “one size does not fit all” has been gaining
acceptance and polyglot persistence is the scenario foreseen for
the future [1, 2]: relational database systems will be still
predominant but they will coexist with other kinds of
systems, such as NoSQL and NewSQL. In fact, the first
eight databases in the DB-engines ranking 1 are
multimodel, and database tooling provide support for database
systems of diferent data model paradigms.</p>
      <p>When several data models are popular and widely
used, the definition of a unified or generic metamodel is a
well-known approach to reduce the efort of
implementing multi-model database utilities and tools: the generic
metamodel is able to represent schemas of all the involved
data models. Some examples of such generic metamodels
nEvelop-O
CEUR
htp:/ceur-ws.org
ISN1613-073
https://db-engines.com/en/ranking.</p>
      <p>Schema [5]. In our research group, we created U-Schema
to provide a unified representation for relational and the
four most used NoSQL data models: columnar, document,
key-value, and graph [2]. We have used U-Schema to
design and implement the Orion approach for evolving
databases [6, 7] and the SkiQL language aimed to query
schemas [8]. Here, will focus on the schema evolution
based on the Orion language.</p>
      <p>Schema evolution is a crucial task in database
applications: a co-evolution of data and code is required
when schema changes are applied, as illustrated in
Figure 1. Therefore, a great research efort has been
devoted to deal with this problem in the scope of relational
databases [9, 10]. Since the appearance of NoSQL
systems, a great attention has also been paid to the evolution
of these systems [11, 12], which have two significant
differences regarding the relational systems: most of them
are based on APIs instead of using a SQL-like language.</p>
      <sec id="sec-1-1">
        <title>Moreover, some approaches have also addressed the poly</title>
        <p>glot persistence scenario [13, 14].</p>
        <p>In [6, 7], a taxonomy of schema changes for U-Schema
is proposed, and the Orion language is defined to
express schema change scripts. Co-evolution of schema
and data is automatically applied by the Orion engine. In
this paper, we present an ongoing work intended to add
will be assisted in two ways: depending on the kind of
schema change, code will be automatically updated or
(J. R. Hoyos); 0000-0002-5507-5566 (M. J. Ortín)</p>
        <p>© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License co-evolution of schema and code to Orion. Developers
either messages carrying information about the changes
to be applied manually will be shown to developers, in
a way similar to those described in [13]. Here, we will
present the current state of our work. The novelty of
our proposal is to define schema changes in a
platformindependent way, and then automatically generate static
code analysis queries to find the statements to be updated.</p>
        <p>These queries are obtained from a Orion script through
model to text transformations. We have validated
generated queries valid for MongoDB APIs and mappers for
the JavaScript language. Our work is described in
Section 3. Prior to this explanation we briefly introduce the
Orion approach, and we conclude by commenting the
related work and exposing some ideas about the tasks to
be done to complete the approach.
valued attributes and aggregation relationships, and the
latter refer to relations between entities, namely keys
and reference relationships. Entity types can be root or
non-root depending on if their objects are the root of an
aggregation hierarchy or are embedded in other objects.</p>
        <p>Figure 2 shows a U-Schema model example named
SoftwareDev, which will be used as a running example. We
will suppose that this schema corresponds to a document
database, in particular MongoDB, and therefore it has no
relationship types. The schema is shown by using the
graphical notation described in [16], whose semantics
will be easily understood from the explanation of the
schema. There are three root entity types in
Software2. U-Schema and the Orion Dev schema: Developer, Ticket, and Repository, and
Language there is one non-root entity type (DeveloperInfo) that
is aggregated by Developer.</p>
        <p>U-Schema is a unified metamodel that integrates the U-Schema allows structural variations of schema types
NoSQL and relational data models. It includes the to be represented. Thus, a schema type is formed by a set
elements traditionally used to create logical schemas, of variations. In the schema example, Developer has two
as those that are part of the Entity-Relationship (ER) structural variations that share a common set of features
model [15]. However, these elements are structured dif- (e.g., email, permissions, and dev_info) but difer in
ferently because their semantics are not exactly the same, two features: The first variation stores a feature named
and additional concepts are considered (e.g. structural is_active, and the second one has suspended instead.
variation of types), as briefly explained below. A detailed The remaining entity types have a single variation with
description of U-Schema and the bidirectional mappings a set of features.
between this unified metamodel and each of the individ- In the graphical notation, features are specified
ual data models can be found in [5]. by means of a name and a type, but aggregations</p>
        <p>A U-Schema model (i.e., a logical schema) consists of a and references are also shown by means of the
set of schema types that can be entity types, used to rep- typical arrows used in UML class diagrams. In
resent domain entities in any database, and relationship the example, Repository.title is an attribute,
types to represent relationships between nodes in graph Developer.dev_info aggregates an instance
databases. This illustrates that not all U-Schema concepts of DeveloperInfo, Ticket.repository_id and
are present in all the data models that it integrates. Ticket.developer_id hold references to the
corre</p>
        <p>An entity type is formed by a set of features that can sponding entity types, and Repository.developers
be structural or logical. The former are simple and multi- references the set of developers involved in a repository.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Detection of Code Afected by Schema Changes</title>
      <p>In this section, we describe the proposed approach in
detail. An overview of the designed and implemented
strategy is firstly presented. Secondly, the mappings
between each Orion operation and the updates to be
performed are discussed. Thirdly, some details about the
code analysis based on CodeQL queries are commented,
and finally the obtained output for the schema of the
running example is shown.</p>
      <sec id="sec-2-1">
        <title>3.1. Overview of the Code Analysis Strategy</title>
        <p>As noted in [13], depending on the semantics of the
schema change operations, four diferent situations
can be distinguished for the application code accessing
databases: (i) Code may need to be modified according
to the new schema, e.g., when features or entity types
are renamed; (ii) code may become invalid, e.g., when
entity types or features are removed; (iii) code is
syntactically correct but should be modified according to the
new schema to return correct values, e.g., when a casting
is applied on an attribute; and (iv) code does not need
to be changed since it is not directly afected, e.g., when
adding a new entity type.</p>
        <p>An analysis of the application code involves
finding the statements afected by each SCO in an
Orion script (e.g., renaming the Ticket entity type to
Active_Ticket). Once one or more afected statements
are identified, they could be modified to be adapted to
the new schema, or either a message could be generated
and shown to developers.</p>
        <p>In order to perform such code analysis and resulting
actions, a mapping should be established between each Figure 5: The Orion engine extension to handle code update.
SCO of the taxonomy and its impact on the code, based
on: (i) Identifying the statements that need to be updated,
and (ii) providing a suitable solution according to the
new schema. Table 1 shows the mappings for the cur- transformation consists of a set of rules that implement a
rently addressed Orion SCOs. These operations cover the mapping between each Orion SCO and CodeQL queries
four categories mentioned above. In the table, the first able to find the location of the application code to be
column indicates the Orion SCO, and the second column updated. Table 1 shows such mappings in the third and
specifies the actions to be applied on the code, which fourth columns. For each Orion operation up to two
are expressed in an abstract way and independent of any CodeQL queries are generated depending on whether
technology. The third and fourth columns are specific to the code has become invalid and needs to be updated
each programming language, in this case JavaScript, and according to the new schema (error queries), or the code
they indicate which code should be inspected to detect should be modified, but it continues to be valid ( warning
warnings and errors, as commented later in this section. queries).</p>
        <p>Similarly to [13], our aim is to assist developers to In the second step, the generated queries are applied
update code with two kinds of facilities: (i) Automating to the previously created CodeQL database. As a result of
all possible code changes, and (ii) generating log files this application, a log file in SARIF format 2 is obtained.
that carry information about the automatically applied This file contains information of the analysis process,
modification, and the changes that should be manually the applied queries, the examined files and every match
introduced by developers. Here, we will focus on the found in those files by the queries, along with useful
automatic generation of notification messages to provide information to locate the code statements to be changed
developers with such information. (e.g., the file where they are located, the line number, and</p>
        <p>We have started to build the Code Updater component a custom message).
of the Orion engine by implementing a 3-step strategy Because the log file is rather complex and dificult to
that uses the CodeQL code analysis engine [18] to dis- navigate, the third step performs a parsing of the log file
cover the code that is candidate to be changed. This and generates a digest useful for the developer where,
strategy is outlined in Figure 5 and commented below. for each Orion SCO in the input script, a set of
mes</p>
        <p>As a preliminary step, the code repository is converted sages show the statements that need to be changed along
into a CodeQL database that can be accessed by CodeQL with their location, and which changes should be applied,
queries. In the first step, a model-to-text ( m2t) transfor- e.g., renaming a variable to a new name or deleting an
mation is executed to automatically generate the CodeQL assignment.
queries that correspond to the input Orion model. This
Aggregate Operations</p>
        <p>On MongoDB methods that
retrieve all collections
On assignments with the literal
. .</p>
        <p>On assignments with the literal
. .
—
—
—
—
—
On  2 Mongoose schema.</p>
        <p>On  2 Mongoose schema.</p>
        <p>On  Mongoose schema.</p>
        <p>On  Mongoose schema.</p>
        <p>On  Mongoose schema.</p>
        <p>—
—
—
—
On access to  by property,
by method or by Mongoose
schema.</p>
        <p>On access to  by property,
by method or by Mongoose
schema.</p>
        <p>On MongoDB methods
involving . and on the  Mongoose
schema.</p>
        <p>On MongoDB methods
involving . and on the  Mongoose
schema.
—
On MongoDB methods
involving  1. and on the  1
Mongoose schema.</p>
        <p>On MongoDB methods
involving .  and on the 
Mongoose schema.</p>
        <p>On MongoDB methods
involving .  and on the 
Mongoose schema.</p>
        <p>On MongoDB methods
involving .  and on the 
Mongoose schema.</p>
      </sec>
      <sec id="sec-2-2">
        <title>3.2. Analyzing the Application Code</title>
        <sec id="sec-2-2-1">
          <title>For a first validation of our approach, we have chosen</title>
          <p>MongoDB as our target database and JavaScript as
language of the database code. According to the DB-Engines
ranking, MongoDB is the fith most popular database
(the first of the NoSQL stores), and the driver for Node.js
JavaScript frameworks is one of the most used to
access MongoDB databases in Web applications. Finally,
we have also considered the Mongoose object-document
mapper (ODM) [19, 20], which is the most widely used
ODM for MongoDB in JavaScript.</p>
          <p>In MongoDB, data on entity types are stored in
collections of JSON-like semi-structured documents [21].</p>
          <p>Therefore, a document is formed by a set of name-value
pairs of features, and values can be primitives (e.g.,
Number and String), another nested document, or arrays.</p>
          <p>Figure 6 shows an excerpt of JavaScript code accessing 1 import mongoose from "mongoose";
a MongoDB database for the running example. First, the 2 const RepositorySchema = new mongoose.Schema({
Developer collection is retrieved by using a method call 3 _id: {type: String, required: true},
provided by the MongoDB API, and a query is performed 4 developers: {type: [String],
that involves retrieving a single developer and using the 5 ref: "Developer", required:true},
permissions field. Then a variable is created with the 6 num_forks: {type: Number, required: true},
"Ticket" value, and another method call retrieves the 7 num_stars: {type: Number, required: true},
TdieclkeetteMcaonllyecotpioernatbiyonuwsinitgh tahniostvhaerriaqbuleer,yaonnd iatp.plies a1890 ttuiartgl:ls:e: {{{tttyyypppeee::: SS[Stttrrriiinnngg,,g]rr,eerqqeuuqiiurrieerdd::edtt:rrtuuree}}u,e},
11 });
12 export default mongoose.model("Repository",</p>
          <p>RepositorySchema);
code statements are involved when applying a Orion SCO.
// Try to access Developer.permissions The third column where a warning should be thrown,
developers = database.collection("Developer"); pieces of code which are syntactically correct but might
result1 = await developers.findOne( be semantically incorrect. The fourth column, on the
{_id: "13"}, other hand, indicates errors, those updates that are
manda{projection: {"permissions": 1}}); tory to fix code that is now invalid.</p>
          <p>According to these mappings, when an SCO that
// Query the Ticket entity type modifies an entity type is applied (e.g., a DELETE or
collection_name = "Ticket"; RENAME operation), it is necessary to look for
statetqiucekreyt=s={dmaetsasbaagsee:.co{l$lreecgteiox:n(c"oelrlreocrt"i}on}_;name); ments that access and return the corresponding
colresult2 = await tickets.deleteMany(query); lection, since those statements are now invalid. In
console.log(result2.deletedCount); JavaScript, it is common to use a MongoClient object to
} finally { obtain a MongoDB database object by using the .db()
await client.close(); method. Then, obtaining a specific collection from such
database object can be achieved in several ways,
classified in two kinds: (i) By property (e.g., database.E
or database[E], being E the name of the collection),
and (ii) by method (e.g., database.collection("E") or
Figure 6: JavaScript excerpt querying a MongoDB database. database.getCollection("E")), as shown in Figure 6,
lines 11 and 18. These statements have in common that</p>
          <p>As evidenced in this example, the code analysis should the literal name of the entity is used at some point (in
not only consider just database operations, but also any the examples, E), so when an SCO that modifies an entity
other statements such as assignments and variables that type is applied, this kind of accesses need to be updated.
can carry values referring to schema elements (e.g., entity Moreover, instead of directly using the literal “E”
develtype names and feature names), and diferentiate between opers can use variables that hold such value, so we also
these literal values and function or variable names that need to look for variables that were assigned that value.
might have the same name. Moreover, the analysis needs Finally, the Mongoose schema corresponding to E must
to take into account that in the case of feature names, it also be updated.
is necessary to identify not only the feature, but also if it In the example, the RENAME Ticket TO
belongs to the entity indicated in the SCO. Active_Ticket operation will cause the statement in</p>
          <p>On the other hand, Figure 7 shows a Mongoose schema line 18 to not be valid, and an error message should be
for the Repository entity type of the running example. triggered by our Code Updater. Also, a warning should
A Mongoose schema is composed of a set of fields with a be issued for the assignment in line 17 because it will not
name and a specific type, such as the num_stars field of return the desired results, although such an assignment
type Number, or the developers field, which references is still valid.
the Developer entity type stored as a list of Strings. Detecting code errors due to an SCO that modifies a</p>
          <p>In Table 1, the third and fourth columns shows which feature of an entity type requires of a diferent strategy.
To analyze the impact of an operation such as DELETE between these operations comes on the solution to be
Developer::permissions, it is necessary to take into applied for the update: For DELETE, the involved
stateaccount that only references to the permissions fea- ments should be removed as the collection being deleted
ture belonging to Developer should be updated, and no longer exists, while for the RENAME operation the
inthat other entity types might hold a feature with the volved statements should be updated to reference the
same name. Therefore, queries for these operations new name. The same case happens for ADD Attribute,
are organized as follows. MongoDB API operations ADD Reference, and ADD Aggregate: In JavaScript and
are first searched, such as findOne(), insertOne(), or MongoDB these operations do not turn any code invalid.
deleteMany(). Then, the process checks if the operation Instead, queries for these operations look for Mongoose
is being applied over the specified entity type (e.g., an ob- schema declarations and suggest updating such schemas
ject storing the Developer collection). Finally, the code by adding the new fields.
is analyzed to check if the field is referenced in some These similarities between SCOs ease the process
of the arguments of the method call, which use to be of generating CodeQL queries. For this purpose,
expressions or JSON objects containing the feature name. we created a library of CodeQL functions which
In our example, the statement in line 12 should be up- allows simpler queries, as illustrated in the query of
dated to reflect that the permissions field referenced by Figure 8: A CodeQL error query generated from the
projection has been deleted. RENAME ENTITY Ticket TO Active_Ticket. Here,</p>
          <p>Operations that modify features also impact Mon- the query looks for expressions that meet one of the
goose schemas. For example, the Orion operation RENAME following conditions on the Ticket collection: (i) It is
*::num_forks TO rank_forks would require updating accessed as a property (e.g., database.Ticket), (ii) it
the Repository schema shown in Figure 7, where the is accessed by using a method that retrieves a
colnum_forks field should be modified to reflect the RENAME lection (e.g., getCollection), and (iii) there is a
operation. Mongoose export statement that generates a Ticket</p>
          <p>Finally, we have to consider SCOs adding new features schema. These conditions are implemented with
to existing entity types, such as COPY or ADD attr. These three functions of the library: checkPropAccess,
SCOs do not turn invalid code, and therefore do not issue checkExprIsCollectionMethod, and
any errors, but we provide information for developers by checkIsMongooseExport, and they can be used
issuing warnings on Mongoose schemas indicating that for other SCOs. Finally, a message stating how the code
the developer should add the new features. needs to be updated is shown: “Ticket: Entity renamed to
“Active_Ticket”.”.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>3.3. CodeQL Query Implementation</title>
        <p>IMPORT javascript
IMPORT utils</p>
        <sec id="sec-2-3-1">
          <title>Once we defined which code statements throw warnings</title>
          <p>and errors for each considered SCO, we used CodeQL [18]
to implement queries that return those precise statements FROM Expr expr
from a code repository. CodeQL is a code analysis engine WHERE
developed by GitHub. It generates a database represen- checkPropAccess(expr, "Ticket")
tation of a code base, so the code can be treated as data. OR
Code patterns are modeled as CodeQL queries that can checkExprIsCollectionMethod(expr, "Ticket")
be executed to generate a result set that includes those OR
lines of code that matches the pattern. We have chosen checkIsMongooseExport(expr, "Ticket")
CodeQL because it is a semantic code analysis engine, it SELECT expr,
supports a wide variety of programming languages, and "Ticket: Entity renamed to \"Active_Ticket\"."
it can be integrated with GitHub repositories to analyze
the code they contain. EFNigTuITreY 8T:icCkoedteQTOL Aecrrtoirveq_uTeircykgeetnoepreartaetdiofno.r the RENAME</p>
          <p>A CodeQL query includes three clauses, as illustrated
in the example of Figure 8: (i) FROM defines the kind of
element being queried (in this example, Expr, a class The same principle is followed for
generatused to get JavaScript expressions), (ii) WHERE is used to ing a CodeQL query for the operation DELETE
iflter only the elements considered in the first clause that Developer::permissions, shown in Figure 9. Here,
also meet certain criteria, and (iii) SELECT to return the instead of single generic expressions, we look for
elements that were filtered and a custom message. MethodCallExpr, a CodeQL class used to get
expres</p>
          <p>Some Orion SCOs are mapped to similar CodeQL sions in which a method is invoked. According to
queries. For example, DELETE and RENAME an entity type what was stated in the previous subsection, we need
require updating the same statements. The diference to update expressions that meet one of the following
conditions: (i) A MongoDB method is invoked over
a specific entity ( Developer) and the permissions
ifeld shows as one of its arguments (either directly
or embedded as a value in a JSON object), or (ii) a
Mongoose schema for Developer is detected, and it
defines the permissions field in it. In the query, the use
of other functions of the library can be observed, such as
(checkIsMDBDataMethod and checkFeatIsInObject.</p>
          <p>Please note that all CodeQL queries generated by the
described process are able to match expressions that show
specific string values (e.g., a feature name being modified)
and also solve variables that are found on candidate
expressions, recursively, until those variables are evaluated
to a string value.
{
}
}</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>Once the CodeQL queries have been generated, develop</title>
          <p>ers can use the CodeQL engine to analyze folders with
code files and apply the supplied queries, as shown in
Figure 5. The engine outputs a SARIF file containing
the results of the analysis. This file stores information
such as the location of the analyzed files and the queries
provided, and a section for each match found on those
ifles. As shown in Figure 10, for each found match the file
stores the rule that found it
(‘softwaredev-1/000-op-deletefeature-developer-permissions-error’), a custom message
informing of the event (‘Developer.permissions: This
feature has been deleted.’), and the location of the match
divided into two blocks: (i) the file in which it was found
(‘models/Developer.js’ in this example), and (ii) the line
and column inside the file content.
"ruleId":
"softwaredev-1/000-op-delete-feature</p>
          <p>developer-permissions-error",
"ruleIndex": 0,
"rule": { "id": "...", "index": 0 },
"message": {
"text": "Developer.permissions: This feature has</p>
          <p>been deleted."
},
"locations": [{
"physicalLocation": {
"artifactLocation": {
"uri": "models/Developer.js",
"uriBaseId": "%SRCROOT%",
"index": 0 },
"region": {
"startLine": 20,
"startColumn": 16,
"endColumn": 60
}
}
}],
"partialFingerprints": {
"primaryLocationLineHash": "f31ed949860b8f:1",
"primaryLocationStartColumnFingerprint": "15"</p>
          <p>This file is then processed to produce a more readable
output that helps developers to better understand which
statements have been afected by SCOs and how to fix
them. In our case, we opted for a format in which for each
match entry a line such as &lt;file_path &gt;(&lt;line_number&gt;):
&lt;message&gt; is generated.
rewriting algorithm is performed which consider all the
versions of the schema. Query rewriting applies forward
In this section, we will contrast our proposal with some and backward query rewriting (i.e., applies SCOs and
of the most relevant works to co-evolve schema and code. their reverse equivalents to queries) to generate diferent</p>
          <p>In [10], Carlo Curino et al. present the PRISM++ tool sub-queries (one for each schema version), execute these
to automate the relational schema evolution. A set of sub-queries and union the obtained results.
Schema Modification Operators (SMOs) is defined to ex- Our work difers from previous ones in the following
press schema changes in form of atomic operations, as ways. PRISM++ is probably the most influential work
well as a set of Integrity Constraint Modification Oper- related to the automation of schema evolution, but it
ators (ICMOs). For each SMO, a set of instructions are is focused on relational databases. DB-Main is based
generated to change the relational schema, update the on a generic metamodel and schema transformations,
stored data, and rewrite queries under the given database but NoSQL data models are not considered, and code
constraints. Existing SQL queries are adapted by apply- updating is mainly focused on SQL queries. Also, GER
ing rewritten techniques and creating views. and U-Schema are clearly diferent, and a taxonomy of</p>
          <p>While PRISM++ only tackles the schema evolution changes was not defined. Regarding Typhon, U-Schema
for relational systems, DB-Main is a database engineer- has a richer semantic than TyphonML which allows the
ing environment built for relational databases and other structural variation and graph data models to be
repredata models prior to the emergence of NoSQL stores [22]. sented, among other diferences, as discussed in [ 5]. Also,
In addition to support schema evolution, DB-Main inte- we are addressing diferent APIs instead of rewriting code
grates utilities for reverse engineering, re-engineering, of a query language specific to the Typhon platform, and
and maintenance. DB-Main is composed of three basic we are searching code patterns in the use of APIs in order
pillars: (i) The GER generic metamodel defined by extend- to reduce the efort. Finally, when compared to Darwin,
ing ER, (ii) a transformation-based engineering process, this tool focuses on query rewriting [23], and our
apand (iii) the maintenance of a history of schema changes. proach is based on a more complex unified metamodel</p>
          <p>Query rewriting for NoSQL databases, in particular than the data model underlying Darwin, which relies on
MongoDB, has been addressed in [12] and [13]. In [13], a common interface used to access diferent stores. This
Jerome Fink et al. describe a strategy to adapt queries to also entails a richer taxonomy of operations.
schema changes for polystores. This work is part of the
Typhon project intended to ofer a solution for
designing polystores that can be constituted by relational and 5. Conclusions and Future Work
NoSQL databases. In Typhon, a family of languages was
created for defining conceptual schemas (TyphonML), While the first version of the Orion engine supported
changing schemas, and querying databases (TyphonQL). the co-evolution of schema and data, the strategy here
The strategy proposed to adapt queries is the following. presented is a first step to assist developers to update
When TyphonQL queries are issued on a polystore whose code when schemas are updated. To this moment, the
schema has changed, they are rewritten according to the efort is being focused on the mapping of each SCO to
new schema. Query rewriting depends on the change the code patterns to be detected, the CodeQL queries to
operations applied, and four actionscan be applied on be generated, the changes to be applied, and the
inforqueries: (i) no modification is needed, (ii) query is mod- mation to be provided to developers. As each mapping
ified to be valid, (iii) query is modified but a warning is elicited for a SCO, the corresponding rules are
impleindicates that the result could be incorrect, (iv) and the mented and added to the model to text transformation
query cannot be adapted. The last two actions produce and the analysis of the query result is extended.
messages to assist developers to change the code. A pat- As of future work, we consider the following: (i)
Imtern matching scheme is used to map each triple &lt;SCO, plementing the remaining taxonomy operations for
Monschema, query&gt; to a handler function that produces the goDB and JavaScript, (ii) expanding this implementation
query adaptation. to additional databases to cover all the considered data</p>
          <p>Darwin is a schema evolution platform for NoSQL models, (iii) integrating our approach with CI systems,
databases, which supports schema history extraction, and (iv) extending our current output and adding the
posseveral data update strategies, and query rewriting [12]. sibility to apply changes directly to code by applying the
When lazy data migration is applied, several versions of generic rewrite approach shown in [24], where a generic
the same entity can coexist in the database, and existing metamodel was created to represent code.
queries must be rewritten to be able to access all the The running example models as well as the generated
schema versions. To achieve this, a schema history graph queries used to illustrate our approach are available in a
is calculated in form of a set of evolution operations that public GitHub repository.3
specifies the sequence of schema changes. Then, a query
24th Int. Conf. on Software Analysis, Evolution and
Reengineering, SANER 2017, Klagenfurt, Austria,
Work funded by the Spanish Ministry of Science and February 20-24, 2017, 2017, pp. 457–461.
Innovation (project grant: PID2020-117391GB-I00). [12] U. Störl, M. Klettke, Darwin: A Data Platform for
Schema Evolution Management and Data
MigraReferences tion, in: DataPlat 2022: 1st International Workshop
on Data Platform Design, Management and
Opti[1] M. Stonebraker, The case for polystores, ACM mization, 2022.</p>
          <p>Sigmod Blog (2015). URL: https://wp.sigmod.org/ [13] J. Fink, M. Gobert, A. Cleve, Adapting Queries to
?p=1629. Database Schema Changes in Hybrid Polystores,
[2] P. Sadalage, M. Fowler, NoSQL Distilled. A Brief in: 20th IEEE International Working Conference
Guide to the Emerging World of Polyglot Persis- on Source Code Analysis and Manipulation, SCAM
tence, Addison-Wesley, 2012. 2020, Adelaide, Australia, September 28 - October
[3] J. Hainaut, et al, Database Evolution: the DB- 2, 2020, IEEE, 2020, pp. 127–131. URL: https://doi.</p>
          <p>Main Approach, in: Entity-Relationship Approach org/10.1109/SCAM51674.2020.00019. doi:10.1109/
- ER’94, 13th Int. Conf. on the Entity-Relationship SCAM51674.2020.00019.</p>
          <p>Approach, volume 881, Springer, 1994, pp. 112–131. [14] I. Holubová, M. Vavrek, S. Scherzinger, Evolution
[4] Typhon Project, Hybrid Polystore Mod- management in multi-model databases, Data &amp;
elling Language (Final Version), Technical Knowledge Engineering 136 (2021) 101932. doi:10.
Report, University of L’Aquila, 2018. URL: 1016/j.datak.2021.101932.
https://4d97e142-6f1b-4bbd-9bbb-577958797a89. [15] P. P.-S. Chen, The Entity-Relationship Model:
Tofilesusr.com/ugd/d3bb5c_ ward a Unified View of Data, ACM Transactions
3394b40f9cb54bcbb873f2c4ea1f2298.pdf. on Database Systems 1 (1976) 9–36.
[5] C. J. F. Candel, D. S. Ruiz, J. J. G. Molina, A unified [16] A. Hernández, S. Feliciano, D. Sevilla Ruiz, J. García
metamodel for nosql and relational databases, In- Molina, Exploring the Visualization of Schemas for
formation Systems 104 (2022) 101898. doi:10.1016/ Aggregate-Oriented NoSQL Databases, in: ER
Foj.is.2021.101898. rum 2017, 36th Int. Conf. on Conceptual Modeling
[6] A. Hernández Chillón, D. Sevilla Ruiz, J. Garcia- (ER), 2017, pp. 72–85.</p>
          <p>Molina, Towards a Taxonomy of Schema Changes [17] J. Banerjee, W. Kim, H.-J. Kim, H. F. Korth,
Semanfor NoSQL Databases: The Orion Language, in: tics and Implementation of Schema Evolution in
Conceptual Modeling - ER 2021 40th Int. Conf. OO Databases, SIGMOD Rec. 16 (1987) 311–322.
on Conceptual Modeling, St.John’s, NL, Canada, [18] Codeql web page, 2023. URL: https://codeql.github.
volume 13011, 2021, pp. 176–185. doi:10.1007/ com/, accessed February 2023.</p>
          <p>978-3-030-89022-3_15. [19] Mongoose Web Page, 2017. URL: http://mongoosejs.
[7] A. Hernández Chillon, M. Klettke, D. Sevilla Ruiz, com, accessed February 2023.</p>
          <p>J. Garcia-Molina, A Taxonomy of Schema Changes [20] S. bin Uzayr, N. Cloud, T. Ambler, Mongoose. In
for NoSQL Databases, CoRR abs/2205.11660 (2022). JavaScript Frameworks for Modern Web
DevelopURL: https://arxiv.org/abs/2205.11660. ment: The Essential Frameworks, Libraries, and
[8] C. J. F. Candel, J. J. G. Molina, D. S. Ruiz, Skiql: A uni- Tools to Learn Right Now, Apress, 2019. doi:10.
ifed schema query language, CoRR abs/2204.06670 1007/978-1-4842-4995-6_9.
(2022). URL: https://doi.org/10.48550/arXiv.2204. [21] K. Chodorow, M. Dirolf, MongoDB - The
Defini06670. doi:10.48550/arXiv.2204.06670. tive Guide: Powerful and Scalable Data Storage.,
[9] J. Hainaut, The transformational approach to O’Reilly, 2010.</p>
          <p>database engineering, in: Generative and Trans- [22] J.-M. Hick, J.-L. Hainaut, Strategy for database
formational Techniques in Software Engineering, application evolution: The DB-MAIN approach, in:
International Summer School, GTTSE 2005, Braga, International Conference on Conceptual Modeling,
Portugal, July 4-8, 2005. Revised Papers, 2005, pp. Springer, 2003, pp. 291–306.
95–143. URL: https://doi.org/10.1007/11877028_4. [23] M. Klettke, U. Störl, M. Shenavai, S. Scherzinger,
doi:10.1007/11877028\_4. NoSQL Schema Evolution and Big Data Migration
[10] C. Curino, H. J. Moon, A. Deutsch, C. Zaniolo, Au- at Scale, in: IEEE International Conference on Big
tomating the database schema evolution process, Data, IEEE Computer Society, 2016.
in: The VLDB Journal, volume 22, 2013, pp. 73–98. [24] C. J. F. Candel, A Unified Data Metamodel for
Re[11] L. Meurice, A. Cleve, Supporting schema evolu- lational and NoSQL databases: Schema Extraction
tion in schema-less nosql data stores, in: IEEE and Query, Ph.D. thesis, Faculty of Informatics,
University of Murcia, Murcia, Spain, 2022.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>