1. Introduction

Changes to Code: An Approach Based on a Unified Data Model

Alberto Hernández Chillón

alberto.hernandez1@um.es 0

Jesús García Molina

jmolina@um.es 0

José Ramón Hoyos

jose.hoyos@um.es 0

María José Ortín

0 0 Faculty of Computer Science, University of Murcia , Murcia , Spain

Schema evolution is a crucial activity when managing databases in order to satisfy new requirements and improve data and code quality. Therefore, a great research efort has been devoted to automate this activity for relational databases, and agile database development is the most recent innovation. With the emergence of NoSQL systems, the attention has shifted to study schema evolution for these stores that are characterized by the absence of a declaration of schema, and the use of APIs instead of the usage of SQL-like languages. Moreover, the number of multi-model database systems and tools is continuously growing as polyglot persistence becomes the architecture of choice to support the requirements of modern applications. In this scenario, several works have presented approaches to support schema evolution for relational and NoSQL systems. In previous work, we presented the U-Schema unified metamodel to represent schemas for relational systems and the four most widely used NoSQL paradigms. For U-Schema, we defined a taxonomy of schema change operations, and the Orion language to specify scripts defining sets of schema changes based on operations of the taxonomy. The Orion engine is then able to automatically update schema and data from Orion scripts. In this paper, we present an ongoing work aimed at adding schema and code co-evolution to Orion. The novelty of our proposal is to define schema changes in a platform-independent way, and then automatically generate static code analysis queries to find the statements that need to be updated. These queries are obtained by using model to text transformations, and they return information about the location of the code fragments to be updated, and the modifications that need to be applied so the code conforms to the new schema.

NoSQL databases Schema evolution Taxonomy of changes Code update

1. Introduction

In the industrial and academic database community, the idea of “one size does not fit all” has been gaining acceptance and polyglot persistence is the scenario foreseen for the future [1, 2]: relational database systems will be still predominant but they will coexist with other kinds of systems, such as NoSQL and NewSQL. In fact, the first eight databases in the DB-engines ranking 1 are multimodel, and database tooling provide support for database systems of diferent data model paradigms.

When several data models are popular and widely used, the definition of a unified or generic metamodel is a well-known approach to reduce the efort of implementing multi-model database utilities and tools: the generic metamodel is able to represent schemas of all the involved data models. Some examples of such generic metamodels nEvelop-O CEUR htp:/ceur-ws.org ISN1613-073 https://db-engines.com/en/ranking.

Schema [5]. In our research group, we created U-Schema to provide a unified representation for relational and the four most used NoSQL data models: columnar, document, key-value, and graph [2]. We have used U-Schema to design and implement the Orion approach for evolving databases [6, 7] and the SkiQL language aimed to query schemas [8]. Here, will focus on the schema evolution based on the Orion language.

Schema evolution is a crucial task in database applications: a co-evolution of data and code is required when schema changes are applied, as illustrated in Figure 1. Therefore, a great research efort has been devoted to deal with this problem in the scope of relational databases [9, 10]. Since the appearance of NoSQL systems, a great attention has also been paid to the evolution of these systems [11, 12], which have two significant differences regarding the relational systems: most of them are based on APIs instead of using a SQL-like language.

Moreover, some approaches have also addressed the poly

glot persistence scenario [13, 14].

In [6, 7], a taxonomy of schema changes for U-Schema is proposed, and the Orion language is defined to express schema change scripts. Co-evolution of schema and data is automatically applied by the Orion engine. In this paper, we present an ongoing work intended to add will be assisted in two ways: depending on the kind of schema change, code will be automatically updated or (J. R. Hoyos); 0000-0002-5507-5566 (M. J. Ortín)

© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License co-evolution of schema and code to Orion. Developers either messages carrying information about the changes to be applied manually will be shown to developers, in a way similar to those described in [13]. Here, we will present the current state of our work. The novelty of our proposal is to define schema changes in a platformindependent way, and then automatically generate static code analysis queries to find the statements to be updated.

These queries are obtained from a Orion script through model to text transformations. We have validated generated queries valid for MongoDB APIs and mappers for the JavaScript language. Our work is described in Section 3. Prior to this explanation we briefly introduce the Orion approach, and we conclude by commenting the related work and exposing some ideas about the tasks to be done to complete the approach. valued attributes and aggregation relationships, and the latter refer to relations between entities, namely keys and reference relationships. Entity types can be root or non-root depending on if their objects are the root of an aggregation hierarchy or are embedded in other objects.

Figure 2 shows a U-Schema model example named SoftwareDev, which will be used as a running example. We will suppose that this schema corresponds to a document database, in particular MongoDB, and therefore it has no relationship types. The schema is shown by using the graphical notation described in [16], whose semantics will be easily understood from the explanation of the schema. There are three root entity types in Software2. U-Schema and the Orion Dev schema: Developer, Ticket, and Repository, and Language there is one non-root entity type (DeveloperInfo) that is aggregated by Developer.

U-Schema is a unified metamodel that integrates the U-Schema allows structural variations of schema types NoSQL and relational data models. It includes the to be represented. Thus, a schema type is formed by a set elements traditionally used to create logical schemas, of variations. In the schema example, Developer has two as those that are part of the Entity-Relationship (ER) structural variations that share a common set of features model [15]. However, these elements are structured dif- (e.g., email, permissions, and dev_info) but difer in ferently because their semantics are not exactly the same, two features: The first variation stores a feature named and additional concepts are considered (e.g. structural is_active, and the second one has suspended instead. variation of types), as briefly explained below. A detailed The remaining entity types have a single variation with description of U-Schema and the bidirectional mappings a set of features. between this unified metamodel and each of the individ- In the graphical notation, features are specified ual data models can be found in [5]. by means of a name and a type, but aggregations

A U-Schema model (i.e., a logical schema) consists of a and references are also shown by means of the set of schema types that can be entity types, used to rep- typical arrows used in UML class diagrams. In resent domain entities in any database, and relationship the example, Repository.title is an attribute, types to represent relationships between nodes in graph Developer.dev_info aggregates an instance databases. This illustrates that not all U-Schema concepts of DeveloperInfo, Ticket.repository_id and are present in all the data models that it integrates. Ticket.developer_id hold references to the corre

An entity type is formed by a set of features that can sponding entity types, and Repository.developers be structural or logical. The former are simple and multi- references the set of developers involved in a repository.

3. Detection of Code Afected by Schema Changes

In this section, we describe the proposed approach in detail. An overview of the designed and implemented strategy is firstly presented. Secondly, the mappings between each Orion operation and the updates to be performed are discussed. Thirdly, some details about the code analysis based on CodeQL queries are commented, and finally the obtained output for the schema of the running example is shown.

3.1. Overview of the Code Analysis Strategy

As noted in [13], depending on the semantics of the schema change operations, four diferent situations can be distinguished for the application code accessing databases: (i) Code may need to be modified according to the new schema, e.g., when features or entity types are renamed; (ii) code may become invalid, e.g., when entity types or features are removed; (iii) code is syntactically correct but should be modified according to the new schema to return correct values, e.g., when a casting is applied on an attribute; and (iv) code does not need to be changed since it is not directly afected, e.g., when adding a new entity type.

An analysis of the application code involves finding the statements afected by each SCO in an Orion script (e.g., renaming the Ticket entity type to Active_Ticket). Once one or more afected statements are identified, they could be modified to be adapted to the new schema, or either a message could be generated and shown to developers.

In order to perform such code analysis and resulting actions, a mapping should be established between each Figure 5: The Orion engine extension to handle code update. SCO of the taxonomy and its impact on the code, based on: (i) Identifying the statements that need to be updated, and (ii) providing a suitable solution according to the new schema. Table 1 shows the mappings for the cur- transformation consists of a set of rules that implement a rently addressed Orion SCOs. These operations cover the mapping between each Orion SCO and CodeQL queries four categories mentioned above. In the table, the first able to find the location of the application code to be column indicates the Orion SCO, and the second column updated. Table 1 shows such mappings in the third and specifies the actions to be applied on the code, which fourth columns. For each Orion operation up to two are expressed in an abstract way and independent of any CodeQL queries are generated depending on whether technology. The third and fourth columns are specific to the code has become invalid and needs to be updated each programming language, in this case JavaScript, and according to the new schema (error queries), or the code they indicate which code should be inspected to detect should be modified, but it continues to be valid ( warning warnings and errors, as commented later in this section. queries).

Similarly to [13], our aim is to assist developers to In the second step, the generated queries are applied update code with two kinds of facilities: (i) Automating to the previously created CodeQL database. As a result of all possible code changes, and (ii) generating log files this application, a log file in SARIF format 2 is obtained. that carry information about the automatically applied This file contains information of the analysis process, modification, and the changes that should be manually the applied queries, the examined files and every match introduced by developers. Here, we will focus on the found in those files by the queries, along with useful automatic generation of notification messages to provide information to locate the code statements to be changed developers with such information. (e.g., the file where they are located, the line number, and

We have started to build the Code Updater component a custom message). of the Orion engine by implementing a 3-step strategy Because the log file is rather complex and dificult to that uses the CodeQL code analysis engine [18] to dis- navigate, the third step performs a parsing of the log file cover the code that is candidate to be changed. This and generates a digest useful for the developer where, strategy is outlined in Figure 5 and commented below. for each Orion SCO in the input script, a set of mes

As a preliminary step, the code repository is converted sages show the statements that need to be changed along into a CodeQL database that can be accessed by CodeQL with their location, and which changes should be applied, queries. In the first step, a model-to-text ( m2t) transfor- e.g., renaming a variable to a new name or deleting an mation is executed to automatically generate the CodeQL assignment. queries that correspond to the input Orion model. This Aggregate Operations

On MongoDB methods that retrieve all collections On assignments with the literal . .

On assignments with the literal . . — — — — — On 2 Mongoose schema.

On 2 Mongoose schema.

On Mongoose schema.

— — — — On access to by property, by method or by Mongoose schema.

On access to by property, by method or by Mongoose schema.

On MongoDB methods involving . and on the Mongoose schema.

On MongoDB methods involving . and on the Mongoose schema. — On MongoDB methods involving 1. and on the 1 Mongoose schema.

On MongoDB methods involving . and on the Mongoose schema.

3.2. Analyzing the Application Code For a first validation of our approach, we have chosen

MongoDB as our target database and JavaScript as language of the database code. According to the DB-Engines ranking, MongoDB is the fith most popular database (the first of the NoSQL stores), and the driver for Node.js JavaScript frameworks is one of the most used to access MongoDB databases in Web applications. Finally, we have also considered the Mongoose object-document mapper (ODM) [19, 20], which is the most widely used ODM for MongoDB in JavaScript.

In MongoDB, data on entity types are stored in collections of JSON-like semi-structured documents [21].

Therefore, a document is formed by a set of name-value pairs of features, and values can be primitives (e.g., Number and String), another nested document, or arrays.

Figure 6 shows an excerpt of JavaScript code accessing 1 import mongoose from "mongoose"; a MongoDB database for the running example. First, the 2 const RepositorySchema = new mongoose.Schema({ Developer collection is retrieved by using a method call 3 _id: {type: String, required: true}, provided by the MongoDB API, and a query is performed 4 developers: {type: [String], that involves retrieving a single developer and using the 5 ref: "Developer", required:true}, permissions field. Then a variable is created with the 6 num_forks: {type: Number, required: true}, "Ticket" value, and another method call retrieves the 7 num_stars: {type: Number, required: true}, TdieclkeetteMcaonllyecotpioernatbiyonuwsinitgh tahniostvhaerriaqbuleer,yaonnd iatp.plies a1890 ttuiartgl:ls:e: {{{tttyyypppeee::: SS[Stttrrriiinnngg,,g]rr,eerqqeuuqiiurrieerdd::edtt:rrtuuree}}u,e}, 11 }); 12 export default mongoose.model("Repository",

RepositorySchema); code statements are involved when applying a Orion SCO. // Try to access Developer.permissions The third column where a warning should be thrown, developers = database.collection("Developer"); pieces of code which are syntactically correct but might result1 = await developers.findOne( be semantically incorrect. The fourth column, on the {_id: "13"}, other hand, indicates errors, those updates that are manda{projection: {"permissions": 1}}); tory to fix code that is now invalid.

According to these mappings, when an SCO that // Query the Ticket entity type modifies an entity type is applied (e.g., a DELETE or collection_name = "Ticket"; RENAME operation), it is necessary to look for statetqiucekreyt=s={dmaetsasbaagsee:.co{l$lreecgteiox:n(c"oelrlreocrt"i}on}_;name); ments that access and return the corresponding colresult2 = await tickets.deleteMany(query); lection, since those statements are now invalid. In console.log(result2.deletedCount); JavaScript, it is common to use a MongoClient object to } finally { obtain a MongoDB database object by using the .db() await client.close(); method. Then, obtaining a specific collection from such database object can be achieved in several ways, classified in two kinds: (i) By property (e.g., database.E or database[E], being E the name of the collection), and (ii) by method (e.g., database.collection("E") or Figure 6: JavaScript excerpt querying a MongoDB database. database.getCollection("E")), as shown in Figure 6, lines 11 and 18. These statements have in common that

As evidenced in this example, the code analysis should the literal name of the entity is used at some point (in not only consider just database operations, but also any the examples, E), so when an SCO that modifies an entity other statements such as assignments and variables that type is applied, this kind of accesses need to be updated. can carry values referring to schema elements (e.g., entity Moreover, instead of directly using the literal “E” develtype names and feature names), and diferentiate between opers can use variables that hold such value, so we also these literal values and function or variable names that need to look for variables that were assigned that value. might have the same name. Moreover, the analysis needs Finally, the Mongoose schema corresponding to E must to take into account that in the case of feature names, it also be updated. is necessary to identify not only the feature, but also if it In the example, the RENAME Ticket TO belongs to the entity indicated in the SCO. Active_Ticket operation will cause the statement in

On the other hand, Figure 7 shows a Mongoose schema line 18 to not be valid, and an error message should be for the Repository entity type of the running example. triggered by our Code Updater. Also, a warning should A Mongoose schema is composed of a set of fields with a be issued for the assignment in line 17 because it will not name and a specific type, such as the num_stars field of return the desired results, although such an assignment type Number, or the developers field, which references is still valid. the Developer entity type stored as a list of Strings. Detecting code errors due to an SCO that modifies a

In Table 1, the third and fourth columns shows which feature of an entity type requires of a diferent strategy. To analyze the impact of an operation such as DELETE between these operations comes on the solution to be Developer::permissions, it is necessary to take into applied for the update: For DELETE, the involved stateaccount that only references to the permissions fea- ments should be removed as the collection being deleted ture belonging to Developer should be updated, and no longer exists, while for the RENAME operation the inthat other entity types might hold a feature with the volved statements should be updated to reference the same name. Therefore, queries for these operations new name. The same case happens for ADD Attribute, are organized as follows. MongoDB API operations ADD Reference, and ADD Aggregate: In JavaScript and are first searched, such as findOne(), insertOne(), or MongoDB these operations do not turn any code invalid. deleteMany(). Then, the process checks if the operation Instead, queries for these operations look for Mongoose is being applied over the specified entity type (e.g., an ob- schema declarations and suggest updating such schemas ject storing the Developer collection). Finally, the code by adding the new fields. is analyzed to check if the field is referenced in some These similarities between SCOs ease the process of the arguments of the method call, which use to be of generating CodeQL queries. For this purpose, expressions or JSON objects containing the feature name. we created a library of CodeQL functions which In our example, the statement in line 12 should be up- allows simpler queries, as illustrated in the query of dated to reflect that the permissions field referenced by Figure 8: A CodeQL error query generated from the projection has been deleted. RENAME ENTITY Ticket TO Active_Ticket. Here,

Operations that modify features also impact Mon- the query looks for expressions that meet one of the goose schemas. For example, the Orion operation RENAME following conditions on the Ticket collection: (i) It is *::num_forks TO rank_forks would require updating accessed as a property (e.g., database.Ticket), (ii) it the Repository schema shown in Figure 7, where the is accessed by using a method that retrieves a colnum_forks field should be modified to reflect the RENAME lection (e.g., getCollection), and (iii) there is a operation. Mongoose export statement that generates a Ticket

Finally, we have to consider SCOs adding new features schema. These conditions are implemented with to existing entity types, such as COPY or ADD attr. These three functions of the library: checkPropAccess, SCOs do not turn invalid code, and therefore do not issue checkExprIsCollectionMethod, and any errors, but we provide information for developers by checkIsMongooseExport, and they can be used issuing warnings on Mongoose schemas indicating that for other SCOs. Finally, a message stating how the code the developer should add the new features. needs to be updated is shown: “Ticket: Entity renamed to “Active_Ticket”.”.

3.3. CodeQL Query Implementation

IMPORT javascript IMPORT utils

Once we defined which code statements throw warnings

and errors for each considered SCO, we used CodeQL [18] to implement queries that return those precise statements FROM Expr expr from a code repository. CodeQL is a code analysis engine WHERE developed by GitHub. It generates a database represen- checkPropAccess(expr, "Ticket") tation of a code base, so the code can be treated as data. OR Code patterns are modeled as CodeQL queries that can checkExprIsCollectionMethod(expr, "Ticket") be executed to generate a result set that includes those OR lines of code that matches the pattern. We have chosen checkIsMongooseExport(expr, "Ticket") CodeQL because it is a semantic code analysis engine, it SELECT expr, supports a wide variety of programming languages, and "Ticket: Entity renamed to \"Active_Ticket\"." it can be integrated with GitHub repositories to analyze the code they contain. EFNigTuITreY 8T:icCkoedteQTOL Aecrrtoirveq_uTeircykgeetnoepreartaetdiofno.r the RENAME

A CodeQL query includes three clauses, as illustrated in the example of Figure 8: (i) FROM defines the kind of element being queried (in this example, Expr, a class The same principle is followed for generatused to get JavaScript expressions), (ii) WHERE is used to ing a CodeQL query for the operation DELETE iflter only the elements considered in the first clause that Developer::permissions, shown in Figure 9. Here, also meet certain criteria, and (iii) SELECT to return the instead of single generic expressions, we look for elements that were filtered and a custom message. MethodCallExpr, a CodeQL class used to get expres

Some Orion SCOs are mapped to similar CodeQL sions in which a method is invoked. According to queries. For example, DELETE and RENAME an entity type what was stated in the previous subsection, we need require updating the same statements. The diference to update expressions that meet one of the following conditions: (i) A MongoDB method is invoked over a specific entity ( Developer) and the permissions ifeld shows as one of its arguments (either directly or embedded as a value in a JSON object), or (ii) a Mongoose schema for Developer is detected, and it defines the permissions field in it. In the query, the use of other functions of the library can be observed, such as (checkIsMDBDataMethod and checkFeatIsInObject.

Please note that all CodeQL queries generated by the described process are able to match expressions that show specific string values (e.g., a feature name being modified) and also solve variables that are found on candidate expressions, recursively, until those variables are evaluated to a string value. { } }

Once the CodeQL queries have been generated, develop

ers can use the CodeQL engine to analyze folders with code files and apply the supplied queries, as shown in Figure 5. The engine outputs a SARIF file containing the results of the analysis. This file stores information such as the location of the analyzed files and the queries provided, and a section for each match found on those ifles. As shown in Figure 10, for each found match the file stores the rule that found it (‘softwaredev-1/000-op-deletefeature-developer-permissions-error’), a custom message informing of the event (‘Developer.permissions: This feature has been deleted.’), and the location of the match divided into two blocks: (i) the file in which it was found (‘models/Developer.js’ in this example), and (ii) the line and column inside the file content. "ruleId": "softwaredev-1/000-op-delete-feature

developer-permissions-error", "ruleIndex": 0, "rule": { "id": "...", "index": 0 }, "message": { "text": "Developer.permissions: This feature has

been deleted." }, "locations": [{ "physicalLocation": { "artifactLocation": { "uri": "models/Developer.js", "uriBaseId": "%SRCROOT%", "index": 0 }, "region": { "startLine": 20, "startColumn": 16, "endColumn": 60 } } }], "partialFingerprints": { "primaryLocationLineHash": "f31ed949860b8f:1", "primaryLocationStartColumnFingerprint": "15"

This file is then processed to produce a more readable output that helps developers to better understand which statements have been afected by SCOs and how to fix them. In our case, we opted for a format in which for each match entry a line such as <file_path >(<line_number>): <message> is generated. rewriting algorithm is performed which consider all the versions of the schema. Query rewriting applies forward In this section, we will contrast our proposal with some and backward query rewriting (i.e., applies SCOs and of the most relevant works to co-evolve schema and code. their reverse equivalents to queries) to generate diferent

In [10], Carlo Curino et al. present the PRISM++ tool sub-queries (one for each schema version), execute these to automate the relational schema evolution. A set of sub-queries and union the obtained results. Schema Modification Operators (SMOs) is defined to ex- Our work difers from previous ones in the following press schema changes in form of atomic operations, as ways. PRISM++ is probably the most influential work well as a set of Integrity Constraint Modification Oper- related to the automation of schema evolution, but it ators (ICMOs). For each SMO, a set of instructions are is focused on relational databases. DB-Main is based generated to change the relational schema, update the on a generic metamodel and schema transformations, stored data, and rewrite queries under the given database but NoSQL data models are not considered, and code constraints. Existing SQL queries are adapted by apply- updating is mainly focused on SQL queries. Also, GER ing rewritten techniques and creating views. and U-Schema are clearly diferent, and a taxonomy of

While PRISM++ only tackles the schema evolution changes was not defined. Regarding Typhon, U-Schema for relational systems, DB-Main is a database engineer- has a richer semantic than TyphonML which allows the ing environment built for relational databases and other structural variation and graph data models to be repredata models prior to the emergence of NoSQL stores [22]. sented, among other diferences, as discussed in [ 5]. Also, In addition to support schema evolution, DB-Main inte- we are addressing diferent APIs instead of rewriting code grates utilities for reverse engineering, re-engineering, of a query language specific to the Typhon platform, and and maintenance. DB-Main is composed of three basic we are searching code patterns in the use of APIs in order pillars: (i) The GER generic metamodel defined by extend- to reduce the efort. Finally, when compared to Darwin, ing ER, (ii) a transformation-based engineering process, this tool focuses on query rewriting [23], and our apand (iii) the maintenance of a history of schema changes. proach is based on a more complex unified metamodel

Query rewriting for NoSQL databases, in particular than the data model underlying Darwin, which relies on MongoDB, has been addressed in [12] and [13]. In [13], a common interface used to access diferent stores. This Jerome Fink et al. describe a strategy to adapt queries to also entails a richer taxonomy of operations. schema changes for polystores. This work is part of the Typhon project intended to ofer a solution for designing polystores that can be constituted by relational and 5. Conclusions and Future Work NoSQL databases. In Typhon, a family of languages was created for defining conceptual schemas (TyphonML), While the first version of the Orion engine supported changing schemas, and querying databases (TyphonQL). the co-evolution of schema and data, the strategy here The strategy proposed to adapt queries is the following. presented is a first step to assist developers to update When TyphonQL queries are issued on a polystore whose code when schemas are updated. To this moment, the schema has changed, they are rewritten according to the efort is being focused on the mapping of each SCO to new schema. Query rewriting depends on the change the code patterns to be detected, the CodeQL queries to operations applied, and four actionscan be applied on be generated, the changes to be applied, and the inforqueries: (i) no modification is needed, (ii) query is mod- mation to be provided to developers. As each mapping ified to be valid, (iii) query is modified but a warning is elicited for a SCO, the corresponding rules are impleindicates that the result could be incorrect, (iv) and the mented and added to the model to text transformation query cannot be adapted. The last two actions produce and the analysis of the query result is extended. messages to assist developers to change the code. A pat- As of future work, we consider the following: (i) Imtern matching scheme is used to map each triple <SCO, plementing the remaining taxonomy operations for Monschema, query> to a handler function that produces the goDB and JavaScript, (ii) expanding this implementation query adaptation. to additional databases to cover all the considered data

Darwin is a schema evolution platform for NoSQL models, (iii) integrating our approach with CI systems, databases, which supports schema history extraction, and (iv) extending our current output and adding the posseveral data update strategies, and query rewriting [12]. sibility to apply changes directly to code by applying the When lazy data migration is applied, several versions of generic rewrite approach shown in [24], where a generic the same entity can coexist in the database, and existing metamodel was created to represent code. queries must be rewritten to be able to access all the The running example models as well as the generated schema versions. To achieve this, a schema history graph queries used to illustrate our approach are available in a is calculated in form of a set of evolution operations that public GitHub repository.3 specifies the sequence of schema changes. Then, a query 24th Int. Conf. on Software Analysis, Evolution and Reengineering, SANER 2017, Klagenfurt, Austria, Work funded by the Spanish Ministry of Science and February 20-24, 2017, 2017, pp. 457–461. Innovation (project grant: PID2020-117391GB-I00). [12] U. Störl, M. Klettke, Darwin: A Data Platform for Schema Evolution Management and Data MigraReferences tion, in: DataPlat 2022: 1st International Workshop on Data Platform Design, Management and Opti[1] M. Stonebraker, The case for polystores, ACM mization, 2022.

Sigmod Blog (2015). URL: https://wp.sigmod.org/ [13] J. Fink, M. Gobert, A. Cleve, Adapting Queries to ?p=1629. Database Schema Changes in Hybrid Polystores, [2] P. Sadalage, M. Fowler, NoSQL Distilled. A Brief in: 20th IEEE International Working Conference Guide to the Emerging World of Polyglot Persis- on Source Code Analysis and Manipulation, SCAM tence, Addison-Wesley, 2012. 2020, Adelaide, Australia, September 28 - October [3] J. Hainaut, et al, Database Evolution: the DB- 2, 2020, IEEE, 2020, pp. 127–131. URL: https://doi.

Main Approach, in: Entity-Relationship Approach org/10.1109/SCAM51674.2020.00019. doi:10.1109/ - ER’94, 13th Int. Conf. on the Entity-Relationship SCAM51674.2020.00019.

Approach, volume 881, Springer, 1994, pp. 112–131. [14] I. Holubová, M. Vavrek, S. Scherzinger, Evolution [4] Typhon Project, Hybrid Polystore Mod- management in multi-model databases, Data & elling Language (Final Version), Technical Knowledge Engineering 136 (2021) 101932. doi:10. Report, University of L’Aquila, 2018. URL: 1016/j.datak.2021.101932. https://4d97e142-6f1b-4bbd-9bbb-577958797a89. [15] P. P.-S. Chen, The Entity-Relationship Model: Tofilesusr.com/ugd/d3bb5c_ ward a Unified View of Data, ACM Transactions 3394b40f9cb54bcbb873f2c4ea1f2298.pdf. on Database Systems 1 (1976) 9–36. [5] C. J. F. Candel, D. S. Ruiz, J. J. G. Molina, A unified [16] A. Hernández, S. Feliciano, D. Sevilla Ruiz, J. García metamodel for nosql and relational databases, In- Molina, Exploring the Visualization of Schemas for formation Systems 104 (2022) 101898. doi:10.1016/ Aggregate-Oriented NoSQL Databases, in: ER Foj.is.2021.101898. rum 2017, 36th Int. Conf. on Conceptual Modeling [6] A. Hernández Chillón, D. Sevilla Ruiz, J. Garcia- (ER), 2017, pp. 72–85.

Molina, Towards a Taxonomy of Schema Changes [17] J. Banerjee, W. Kim, H.-J. Kim, H. F. Korth, Semanfor NoSQL Databases: The Orion Language, in: tics and Implementation of Schema Evolution in Conceptual Modeling - ER 2021 40th Int. Conf. OO Databases, SIGMOD Rec. 16 (1987) 311–322. on Conceptual Modeling, St.John’s, NL, Canada, [18] Codeql web page, 2023. URL: https://codeql.github. volume 13011, 2021, pp. 176–185. doi:10.1007/ com/, accessed February 2023.

978-3-030-89022-3_15. [19] Mongoose Web Page, 2017. URL: http://mongoosejs. [7] A. Hernández Chillon, M. Klettke, D. Sevilla Ruiz, com, accessed February 2023.

J. Garcia-Molina, A Taxonomy of Schema Changes [20] S. bin Uzayr, N. Cloud, T. Ambler, Mongoose. In for NoSQL Databases, CoRR abs/2205.11660 (2022). JavaScript Frameworks for Modern Web DevelopURL: https://arxiv.org/abs/2205.11660. ment: The Essential Frameworks, Libraries, and [8] C. J. F. Candel, J. J. G. Molina, D. S. Ruiz, Skiql: A uni- Tools to Learn Right Now, Apress, 2019. doi:10. ifed schema query language, CoRR abs/2204.06670 1007/978-1-4842-4995-6_9. (2022). URL: https://doi.org/10.48550/arXiv.2204. [21] K. Chodorow, M. Dirolf, MongoDB - The Defini06670. doi:10.48550/arXiv.2204.06670. tive Guide: Powerful and Scalable Data Storage., [9] J. Hainaut, The transformational approach to O’Reilly, 2010.

database engineering, in: Generative and Trans- [22] J.-M. Hick, J.-L. Hainaut, Strategy for database formational Techniques in Software Engineering, application evolution: The DB-MAIN approach, in: International Summer School, GTTSE 2005, Braga, International Conference on Conceptual Modeling, Portugal, July 4-8, 2005. Revised Papers, 2005, pp. Springer, 2003, pp. 291–306. 95–143. URL: https://doi.org/10.1007/11877028_4. [23] M. Klettke, U. Störl, M. Shenavai, S. Scherzinger, doi:10.1007/11877028\_4. NoSQL Schema Evolution and Big Data Migration [10] C. Curino, H. J. Moon, A. Deutsch, C. Zaniolo, Au- at Scale, in: IEEE International Conference on Big tomating the database schema evolution process, Data, IEEE Computer Society, 2016. in: The VLDB Journal, volume 22, 2013, pp. 73–98. [24] C. J. F. Candel, A Unified Data Metamodel for Re[11] L. Meurice, A. Cleve, Supporting schema evolu- lational and NoSQL databases: Schema Extraction tion in schema-less nosql data stores, in: IEEE and Query, Ph.D. thesis, Faculty of Informatics, University of Murcia, Murcia, Spain, 2022.