=Paper=
{{Paper
|id=Vol-3135/dataplat_short3
|storemode=property
|title=Darwin: A Data Platform for Schema Evolution Management and Data Migration
|pdfUrl=https://ceur-ws.org/Vol-3135/dataplat_short3.pdf
|volume=Vol-3135
|authors=Uta Störl,Meike Klettke
|dblpUrl=https://dblp.org/rec/conf/edbt/StorlK22
}}
==Darwin: A Data Platform for Schema Evolution Management and Data Migration==
<pdf width="1500px">https://ceur-ws.org/Vol-3135/dataplat_short3.pdf</pdf>
<pre>
Darwin: A Data Platform for NoSQL
Schema Evolution Management and Data Migration
Uta Störl1 , Meike Klettke2
1
    University of Hagen, Germany
2
    University of Rostock, Germany


                                             Abstract
                                             During the development of NoSQL-backed software, the database schema evolves alongside the application code. Especially
                                             in agile development, new application releases are frequently deployed. This leads to heterogeneous data in the database
                                             and thus to new challenges for application development. To handle such heterogeneous data, we have developed various
                                             algorithms, implemented and evaluated them in a data platform for schema evolution management and data migration called
                                             Darwin. We provide an overview of Darwin, the concepts, algorithms, their implementations and the possible usage of Darwin
                                             in this paper.

                                             Keywords
                                             NoSQL databases, schema evolution, data migration


1. Introduction                                                   tems. This paper also presents specific architectural and
                                                                  implementation aspects of this data platform. Further-
Schema evolution management is one of the most chal- more, we have now published Darwin publicly1 in a fully
lenging problems in data management today [1]. The operational docker container so that the system can also
popularity of NoSQL databases makes this issue even be used by interested researchers.
more complex. Schema-flexible NoSQL databases are es-               The rest of the paper is organized as follows: In Sec-
pecially popular backends in agile development. New tion 2 we discuss the related work. Section 3 gives an
software releases can be deployed without migration- overview of the Darwin data platform. In Section 4 we
related application downtime. An empirical study of discuss the main functionalities of Darwin and their inter-
NoSQL database schema development shows that more play. Afterwards extensions of Darwin are presented in
schema-relevant changes are included with the use of Section 5. We conclude with a summary and an outlook
NoSQL databases in comparison to relational databases on further work.
[2]. In addition, schemas become more complex over
time and take longer to stabilize.
   Managing schema evolution involves two main tasks: 2. Related Work
discovering (extracting) structural changes to data and
dealing with these changes from an application develop- The implementation of Darwin bases on several research
ment perspective (data migration). In the past, we have results and theoretical achievements, partly developed
published several papers on specific research results and by our own group.
theoretical achievements of schema evolution manage-
ment [3, 4] and data migration [5, 6]. We also presented Schema Extraction There are several suggestions for
demo papers of implementations of some sub-aspects schema extraction for NoSQL databases [9, 10, 11]. In [3],
[7, 8]. This paper is intended to provide an overall view we developed an approach to schema extraction which
of the Darwin system and shows the interaction of the generates a graph structure representing all structural
different algorithms.                                             variants of a given dataset. In the next step, this inter-
   Darwin supports the whole schema evolution man- nal graph structure is summarized in a JSON schema
agement and data migration lifecycle. The system is description. Meanwhile, this basic functionality of static
implemented for different types of NoSQL database sys- schema extraction for NoSQL databases is available in
                                                                  some commercial tools like Hackolade2 , Studio 3T3 or
Published in the Workshop Proceedings of the EDBT/ICDT 2022 Joint research prototypes like Josch [12].
Conference (March 29-April 1, 2022), Edinburgh, UK
$ uta.stoerl@fernuni-hagen.de (U. Störl);
meike.klettke@uni-rostock.de (M. Klettke)
 0000-0003-2771-142X (U. Störl); 0000-0003-0551-8389
(M. Klettke)                                                                                                          1
                                                                                                                          https://github.com/dbishagen/darwin
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative   2
                                       Commons License Attribution 4.0 International (CC BY 4.0).                         https://hackolade.com/
                                       CEUR Workshop Proceedings (CEUR-WS.org)                                        3
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                                                                                                          https://studio3t.com/
Version History Extraction The lack of all NoSQL middleware between a Java application and a database
schema extraction approaches (cf. [13] for a survey) still storing variational data:
is that they do not consider and cannot detect schema
changes over time. This observation leads us to the de-
velopment of a schema version history extraction ap-
proach. This algorithm can be applied if a partial order of
the datasets is available (e.g. a timestamp or a creation-
date). It in turn extracts an internal graph structure and
adds timestamp information. Each structural change that
the algorithm detects triggers the generation of a new
schema version. In addition, the change operations are ex-
tracted from the differences of two consecutive structural
versions and are represented as evolution operations [4].
Thus this algorithm is able to uncover the complete evo-
lution history and the genesis of available databases. We
have presented a demonstration of this function, which
is essential for schema evolution management, in [7].

Query Rewriting To read NoSQL datasets in different
versions, query rewriting is necessary. Query rewriting
is a core database technology which has been introduced
to optimize query execution by using materialized views Figure 1: Darwin System Architecture
[14]. Query rewriting can also be used to handle irregular
structures. A query rewriting approach which considers
the data heterogeneity and generates different subqueries
                                                              • At the top of the application stack is the Java
for different varieties is developed in [15]. In [16], we
                                                                application. It stores its data in a NoSQL database,
suggest how query rewriting can use schema evolution
                                                                interacting with the system-independent Darwin
operations to unify different consecutive structural ver-
                                                                Persistence API (DPA).
sions in queries.
                                                              • Via the Darwin WebApp or the Darwin CLI, appli-
                                                                cation developers may trigger schema evolution
Data Migration While there are some approaches in
                                                                management and data migration tasks directly.
the area of managing schema evolution and data migra-
                                                                We will explain the these tasks in detail in Sec-
tion for relational systems [17, 18], there is very little in
                                                                tion 4.
the field of NoSQL databases. We proposed the concepts
                                                              • All user interfaces use the Darwin Core REST API.
of eager and lazy migration in NoSQL databases in [19].
                                                                This architecture allows the flexible use of Darwin,
KVolve, an extension for the Redis NoSQL database that
                                                                as we will see when we present the extensions in
supports lazy migration, was introduced in [20]. The IDE
                                                                Section 5.
integrated tool ControVol supports eager and lazy migra-
tion based on static type checks of object mapper class       • The Darwin Core REST API interfaces with the
declarations as recorded by the code repository [21].           core modules necessary for the schema evolution
   We presented initial ideas of hybrid data migration          management and data migration lifecycle, which
strategies (incremental and predictive migration) in [5]        we will present in detail in Section 4. These mod-
and described them in detail in [6]. We intensively stud-       ules are implemented independent of a concrete
ied the impact of different data migration strategies on        database system.
migration cost and latency and presented and discussed        • A Data Access Manager and a Schema and Com-
the results in [22].                                            mand Storage Manager were implemented as
                                                                a uniform interface for the interaction of the
                                                                core modules with the respective database sys-
3. System Architecture                                          tems. The Schema and Command Storage Man-
                                                                ager stores the schema versions and the schema
In this section, we will introduce the system architecture      evolution operations. This information can either
of Darwin and the interaction of the individual mod-            be stored in the same database as the data or in a
ules. The entire Darwin system is implemented in Java.          separate database.
Figure 1 shows the system architecture of Darwin. In          • The Drivers are responsible for the connection to
the agile application development use case, Darwin is a         the specific database system. Since the languages
       of all NoSQL DBMS differ, the mapping to the           To the best of our knowledge, Darwin is the only
       respective system is done in these modules.          schema management tool supporting the extraction of
                                                            the schema version history.
   Currently Darwin supports the most popular docu-
ment stores MongoDB and Couchbase, the wide column          Notes on Performance and Scalability Extracting
store Cassandra and the multi-model database system         and analyzing the entire data instance is a one-time effort.
ArangoDB. The architecture is designed for easy extensi-    After the initial schema and version history extraction,
bility. Adding a new DBMS requires only the implemen-       newly added entities can be analyzed incrementally and
tation of the appropriate driver.                           on-the-fly. Since Darwin does not load the entire data
                                                            instance into the main memory, but only incremental
4. Main Functionalities                                     batches [4], Darwin can safely handle large volumes of
                                                            data.
Darwin supports the whole schema evolution manage-
ment and data migration lifecycle. In the following we 4.2. Schema Evolution Management
will explain this lifecycle and the corresponding func-
tionalities of Darwin and their interaction.            Schema evolution is an ongoing process during the devel-
                                                        opment of an application. Schema evolution operations
                                                        (SMOs) can be
4.1. Schema and Version History
     Extraction                                                  • manually entered in Darwin using the Darwin CLI
                                                                   or Darwin WebApp, or
Schema extraction and version history extraction belong          • automatically observed by incremental version
to the data preprocessing steps which are implemented              history extraction.
in the Darwin tool. Both steps are necessary for the
analysis of available NoSQL datasets (which have been          Which datasets being migrated after a schema evo-
created outside of Darwin) and for understanding the        lution operation depends on the chosen data migration
implicit structures and their changes over time. The        strategy. Darwin has implemented eager, lazy, and var-
algorithms are detecting the variabilities in the NoSQL     ious hybrid data migration strategies. We will present
data, structural outliers and the different versions over   these strategies in Section 4.4.
time.
   Figure 2 shows an example of a version history ex-       4.3. Query Rewriting
traction performed in Darwin. The screenshot shows
two versions side by side in JSON schema notation. The   In NoSQL databases, storage of different schema versions
schema evolution operations are stated above. Changes    can be done in the same database. Schema evolution
w.r.t. the previous schema version are highlighted [7].  in combination with lazy or hybrid data migration has
                                                         to face the situation that datasets stored in the same
                                                         NoSQL database have different structural versions. In
                                                         this scenario, the evolution operations that can transform
                                                         datasets from the structural version 𝑛 to the successor
                                                         version 𝑛 + 1 are stored in the Schema and Command
                                                         Storage Manager (see Figure 1).
                                                               If such versioned datasets are to be accessed and
                                                         queried by an application, query rewriting is necessary
                                                         to distribute the query to the different structural ver-
                                                         sions. Forward query rewriting is applied if a query which
                                                         assumes the structural version 𝑛 is translated into the
                                                         structural version 𝑛 + 𝑖. In this case the list of evolution
Figure 2: Example of a Schema Version History Extraction operations (for translation from version 𝑛 to 𝑛 + 1, 𝑛 + 2,
                                                         . . . , 𝑛 + 𝑖) is applied onto the query. Backward query
   The schema and version history extraction algorithm rewriting is used to access preceding structural versions.
reads all datasets of the NoSQL database. The imple- To achieve this reverse evolution operations are used for
mented algorithm can run incrementally which means the translation of the query. In both cases, query rewrit-
in case of new datasets that only new data are analyzed ing generates different subqueries (one for each schema
and the results merged.                                  version), executes the subqueries and unions the results.
   Query rewriting enables a transparent access to strategy automatically minimizes data migration costs.
datasets in different schema versions and is the prerequi- The disadvantage of the lazy approach though is that data
site for a lazy and hybrid data migration.                 migration takes place during runtime and can negatively
                                                           impact data access latency.
4.4. Data Migration
                                                             Hybrid Data Migration Strategies Beside the two
Migration Strategies In Darwin, structural changes           basic data migration strategies that either migrate all
can be defined by schema evolution operations (SMOs).        datasets immediately after the introduction of a new
In the system both single-type operations (add, rename,      schema version (eager) or no dataset (lazy) at all, Dar-
delete) and multi-type operations (move and copy) [19]       win also offers several hybrid migration strategies that
are supported. The evolution operations define the           provide an intelligent control of the data migration. The
changes of the schema. For each evolution operation, a       hybrid migration strategies optimize the two different tar-
corresponding data migration operation is generated that     gets: low total migration costs and low latency at runtime
executes the same structural changes in the datasets. Dar-   when a dataset is accessed.
win implements several different data migration strate-
gies:
                                                              Incremental Migration A simple hybrid strategy is
                                                              the incremental migration. The data is only migrated
Eager Migration An eager data migration migrates completely at certain points in time (for example, after a
all datasets immediately after the introduction of a new certain number of schema evolution operations have been
schema version. Eager data migration has the advan- executed). Between two incremental migrations, the data
tage that all datasets always reflect the latest structural is migrated lazily. This approach has lower migration
version. This reduces the latency when datasets are ac- costs than eager migration. All datasets are, however,
cessed. A disadvantage is that migration costs are high updated even those not in use.
since in all cases all datasets are updated even those not
in use. Migration costs are especially concerning when
                                                              Predictive Migration A more sophisticated approach
the database is hosted in the cloud.
                                                              is the predictive migration. The so-called hot data, i.e.
                                                              the data that is frequently accessed, should be kept up-
Lazy Migration In order to avoid unnecessary migra- to-date. The prediction of the hot data is implemented
tion processes and thus reduce migration costs, Darwin in Darwin by keeping track of past data accesses while
provides another migration strategy – the so-called lazy ordering the accessed entities accordingly by means of
migration. The basic idea is that after the introduction of exponential smoothing. We use a prediction set whose
a new schema version, no dataset will be updated. The size is configurable. Data in this prediction set is migrated
new schema version and the corresponding schema evo- proactively after each schema change. Data not contained
lution operation are stored in the Schema and Command in the prediction set is migrated when it is accessed lazily.
Storage Manager (see Figure 1). All datasets are kept in This reduces both runtime overhead and migration costs.
their original version. As a result, lazy migration can          The size of the prediction set is configurable within
lead to NoSQL databases containing datasets in different Darwin. In [6] we presented initial approaches to adjust
structural versions.                                          the size of this prediction set self-adaptive depending on
   In case that datasets are accessed by a query, the query given bounds on migration cost and latency.
is rewritten onto the different versions (see Section 4.3).
The resulting datasets are migrated at runtime and stored
                                                              Selection of the appropriate Data Migration Strat-
in the database in the new version. In the case of a single-
                                                              egy We have extensively studied the impact of different
type operation, runtime migration is relatively simple
                                                              data migration strategies on migration cost and latency.
and efficient. In the case of a multi-type operation, the
                                                              Probabilistic Monte Carlo method of repeated sampling
migration is much more complex. It is possible that dur-
                                                              were used for the analysis. Figure 3 shows an exam-
ing a copy or move operation the corresponding objects
                                                              ple. The impact of these different migration strategies
are not yet in the latest version and must be migrated as
                                                              on migration costs (assuming a cloud hosted database)
well. This in turn can require that further objects need to
                                                              and the data access latency is obvious. We presented
be migrated (cascading migration). Currently, we limit
                                                              and discussed the results in detail in [22]. Nevertheless,
the depth of cascading migrations to two levels in Dar-
                                                              selecting the appropriate data migration strategy is a sig-
win. An analysis of the best fitting strategies for different
                                                              nificant challenge. We have developed the data migration
cases could be the subject of further research.
                                                              advisor MigCast for this purpose, which we present in
   A lazy approach has the advantage that only those
                                                              Section 5.1.
datasets currently in use are being migrated. Cold data
are not accessed and subsequently not migrated. This
                                                                trated within Darwin instead using database-provided
                                                                joins within the Drivers (cf. Figure 1). At the beginning
                                                                of the implementation of Darwin in 2014, MongoDB was
                                                                one of those NoSQL database systems which did not sup-
                                                                port joins. In MongoDB, this functionality is available
                                                                since version 3.2.
                                                                   Performing migration operations within Darwin offers
                                                                the opportunity to support NoSQL DBMS that do not
                                                                natively support all migration operations. To evaluate
                                                                this aspect, we have used the EvoBench benchmark which
                                                                we have developed. Figure 5 shows the results of the
                                                                evaluation which will be explained in Section 5.2.


                                                                5. Darwin Ecosystem
                                                                In addition to the core functionality of Darwin presented
                                                                in Section 4, two other important aspects were investi-
                                                                gated and corresponding tools were developed. In Sec-
                                                                tion 5.1 we introduce the data migration advisor MigCast.
Figure 3: Impact of different Migration Strategies (cf. [22])
                                                                Then, in Section 5.2 the schema evolution benchmark
                                                                EvoBench is presented.

4.5. Migration Optimization                                     5.1. MigCast
There are different opportunities to optimize the execu-
                                                           As explained in Section 4.4, selecting the appropriate mi-
tion of the migration operations. We would like to briefly
                                                           gration strategy is a huge challenge. We have developed
outline three aspects:
                                                           the data migration advisor MigCast for this purpose.
                                                             MigCast is implemented on top of Darwin. As input
Composition of Schema Evolution Operations parameters MigCast takes into account the characteris-
When a legacy dataset needs to be migrated from several tics of the data instance and the data access pattern, e.g.,
versions back, the pending schema changes may either a Pareto distribution of future reads and writes, the data
be applied stepwise or by composite migration. As a model changes (schema evolution), and particulars about
simple example, an add and a rename operation can be the cloud pricing model. With these inputs, MigCast pre-
combined into one add operation on the same data object. dicts the migration costs and the data access latency. This
In [5] we introduced the composition rules for schema estimation is based on three core modules: a Workload
evolution operations (both single-type and multi-type). Simulator, a Cost Calculator, and a Latency Profiler (see
Measurement results and aspects of the implementation Figure 4).
were presented in [23].

Caching An obvious optimization is the extensive use
of caching. Darwin contains a Schema Cache and a Com-
mand Cache to avoid repeated reading of this information
during data migration. Furthermore, a Composer Cache
was introduced for the described composition of migra-
tion operations, the effects of which were mentioned
above and discussed in detail in [23].

Location Simple single-type migration operations like
add and delete can be executed native directly in the
NoSQL DBMS. For more complex multi-type operations
like copy and move this is not always possible. This
depends on the offered functionality of the NoSQL DBMS.
                                                          Figure 4: MigCast Architecture
   For example, many NoSQL DBMS do not support joins.
In this case, copy and move operations have to be orches-
   In Figure 3 in Section 4.4 we have shown an example                                      6. Conclusion and Outlook
of such an estimation performed by MigCast. To support
the reproducibility of the experiments, the configuration                                   In this article, we have introduced the main algorithms
of the performed measurements and the results are stored                                    provided by Darwin – the tool for a continuous evolution
in a separate database (MigCastDB in Figure 4). MigCast                                     of NoSQL backends. Darwin can be applied to databases
is publicly available as part of the Darwin distribution4 .                                 that are starting from scratch as well as to already ex-
                                                                                            isting NoSQL databases even those containing different
                                                                                            versions of legacy data in the same database. In all cases,
5.2. EvoBench
                                                                                            the schema evolution and data migration of Darwin com-
Darwin belongs to the first systems tailored for an ongo-                                   ponent keeps dataset structures up-to-date.
ing evolution of database backends. For testing and eval-                                      The tool can be applied to different NoSQL databases
uating Darwin and other approaches for NoSQL schema                                         (e.g. MongoDB, Couchbase, Cassandra, and ArangoDB).
evolution and data migration, we defined and imple-                                         A side effect is that Darwin can also be used for the mi-
mented the benchmark EvoBench. EvoBench is the first                                        gration of data between different database systems, e.g.
available benchmark to validate the abilities of a system                                   from MongoDB into ArangoDB and thus enables inter-
to evolve NoSQL databases and to determine and compare                                      operability between different NoSQL backends.
the performance of the dedicated evolution operations                                          In the data migration component of Darwin different
[24, 25]. EvoBench bases on a Customer-Product-Order-                                       optimization aims (migration costs, latency) can be pur-
Invoice dataset originally introduced in [26] and defines                                   sued. In Section 4.4, we have introduced different data
20 schema evolutions on this application, ranging from                                      migration strategies and have shown their impact on the
simple extensions of the data model up to more complex                                      different cost metrics. One task of future work is to de-
refactoring.                                                                                velop a self-adaptive data migration which recommends a
   The EvoBench tool is implemented in Python. EvoBench                                     data migration strategy and optimizes parameter setting
treats the respective schema evolution platform as a black                                  in the dedicated algorithm [6].
box and uses the provided API for the schema evolution                                         Another direction of future development in Darwin is
operations. In the case of Darwin, we use the Darwin Core                                   the development of a polystore data migration method
Rest API (see Figure 1). In addition to the data model and                                  including schema optimization [27].
schema evolution operations predefined in the bench-                                           With Darwin we offer a complete solution that is re-
mark, the EvoBench tool also supports the use of your                                       quired for every long-running NoSQL database to keep
own data models and schema evolution operations for                                         the structures permanently up-to-date and to ensure that
experiments.                                                                                NoSQL data is operational over long periods of time.
   As an example, we return to the impacts discussed
in Section 4.5 when performing migration operations                                         Acknowledgments
natively in the database or in Darwin. Figure 5 shows the
execution time as well as the migrations costs (in terms                                    This work has been funded by Deutsche Forschungsge-
of executed operations) executing these operations on                                       meinschaft (German Research Foundation) – 385808805.
123,200 data objects [25] using MongoDB.                                                    We would like to thank all project members whose work
                                                                                            contributed to the success of the project, especially An-
             135                                         246,443
   add                   512                                 492,846               DB       dré Conrad, Andrea Hillenbrand, Mark Lukas Möller and
             141                                         246,443                   Darwin
rename                    523                                492,846                        Stefanie Scherzinger. Special thanks go to all students
 delete      98
                   297
                                                         246,443
                                                             492,846                        of Darmstadt University of Applied Sciences who have
                          549                                     864,437
                                                                                            contributed to the implementation of Darwin.
  copy                                     1,178                               1,599,659
 move                                            1,406                   1,234,037
                                              1,293                    1,110,838
                         Time in Seconds                        Migration Costs             References
Figure 5: Impact of different Locations of Migration Opera-
tion Execution (cf. [25])                                                                    [1] M. Stonebraker, My Top Ten Fears about the DBMS
                                                                                                 Field, in: Proc. ICDE, IEEE, 2018, pp. 24–28. doi:10.
                                                                                                 1109/ICDE.2018.00012.
  As explained earlier, EvoBench is designed to be inde-                                     [2] S. Scherzinger, S. Sidortschuck, An Empirical Study
pendent from Darwin and can also be used to evaluate                                             on the Design and Evolution of NoSQL Database
other schema evolution management platforms. We have                                             Schemas, in: Proc. ER, Springer, 2020, pp. 441–455.
deployed EvoBench and associated measurements in fully                                           doi:10.1007/978-3-030-62522-1\_33.
operational docker containers5 .                                                             [3] M. Klettke, U. Störl, S. Scherzinger, Schema
     4
          https://github.com/dbishagen/darwin                                                    Extraction and Structural Outlier Detection for
     5
          https://doi.org/10.5281/zenodo.4993636
     JSON-based NoSQL Data Stores, in: Proc. BTW,              [16] M. L. Möller, M. Klettke, A. Hillenbrand, U. Störl,
     GI, 2015, pp. 425–444. URL: https://dl.gi.de/20.500.           Query Rewriting for Continuously Evolving NoSQL
     12116/2420.                                                    Databases, in: Proc. ER, Springer, 2019, pp. 213–221.
 [4] M. Klettke, H. Awolin, U. Störl, D. Müller,                    doi:10.1007/978-3-030-33223-5\_18.
     S. Scherzinger, Uncovering the Evolution His-             [17] C. Curino, H. J. Moon, C. Zaniolo, Graceful database
     tory of Data Lakes, in: Proc. IEEE Big Data,                   schema evolution: the PRISM workbench, Proc.
     IEEE, 2017, pp. 2462–2471. doi:10.1109/BigData.                VLDB Endow. 1 (2008). doi:10.14778/1453856.
     2017.8258204.                                                  1453939.
 [5] M. Klettke, U. Störl, M. Shenavai, S. Scherzinger,        [18] S. Bhattacherjee, G. Liao, M. Hicks, D. J. Abadi,
     NoSQL schema evolution and big data migration at               BullFrog: Online Schema Evolution via Lazy Evalu-
     scale, in: Proc. IEEE Big Data, IEEE, 2016, pp. 2764–          ation, in: Proc. SIGMOD, ACM, 2021, pp. 194–206.
     2774. doi:10.1109/BigData.2016.7840924.                        doi:10.1145/3448016.3452842.
 [6] A. Hillenbrand, U. Störl, S. Nabiyev, M. Klettke, Self-   [19] S. Scherzinger, M. Klettke, U. Störl, Managing
     adapting data migration in the context of schema               Schema Evolution in NoSQL Data Stores, in: Proc.
     evolution in NoSQL databases, Distributed and Par-             DBPL@VLDB, 2013. URL: http://arxiv.org/abs/1308.
     allel Databases abs/2104.14828 (2021) 1–21. doi:10.            0514.
     1007/s10619-021-07334-1.                                  [20] K. Saur, T. Dumitras, M. W. Hicks, Evolving
 [7] U. Störl, D. Müller, A. Tekleab, S. Tolale, J. Sten-           NoSQL Databases without Downtime, in: 2016
     zel, M. Klettke, S. Scherzinger, Curating Varia-               IEEE International Conference on Software Main-
     tional Data in Application Development, in: Proc.              tenance and Evolution, IEEE, 2016, pp. 166–176.
     ICDE, IEEE, 2018, pp. 1605–1608. doi:10.1109/                  doi:10.1109/ICSME.2016.47.
     ICDE.2018.00187.                                          [21] S. Scherzinger, T. Cerqueus, E. C. de Almeida, Con-
 [8] A. Hillenbrand, M. Levchenko, U. Störl,                        troVol: A framework for controlled schema evolu-
     S. Scherzinger, M. Klettke, MigCast: Putting                   tion in NoSQL application development, in: Proc.
     a Price Tag on Data Model Evolution in NoSQL                   ICDE, IEEE, 2015, pp. 1464–1467. doi:10.1109/
     Data Stores, in: Proc. SIGMOD, ACM, 2019, pp.                  ICDE.2015.7113402.
     1925–1928. doi:10.1145/3299869.3320223.                   [22] A. Hillenbrand, S. Scherzinger, U. Störl, Re-
 [9] D. S. Ruiz, S. F. Morales, J. G. Molina, Inferring             maining in Control of the Impact of Schema
     Versioned Schemas from NoSQL Databases and Its                 Evolution in NoSQL Databases,             in: Proc.
     Applications, in: Proc. ER, Springer, 2015, pp. 467–           ER, Springer, 2021, pp. 149–159. doi:10.1007/
     480. doi:10.1007/978-3-319-25264-3\_35.                        978-3-030-89022-3\_13.
[10] L. Meurice, A. Cleve, Supporting schema evolution         [23] U. Störl, A. Tekleab, M. Klettke, S. Scherzinger,
     in schema-less NoSQL data stores, in: Proc. IEEE               In for a Surprise When Migrating NoSQL Data,
     SANER, IEEE, 2017, pp. 457–461. doi:10.1109/                   in: Proc. ICDE, IEEE, 2018, p. 1662. doi:10.1109/
     SANER.2017.7884653.                                            ICDE.2018.00202.
[11] M. A. Baazizi, D. Colazzo, G. Ghelli, C. Sartiani,        [24] M. L. Möller, M. Klettke, U. Störl, EvoBench — A
     Parametric schema inference for massive JSON                   Framework for Benchmarking Schema Evolution in
     datasets, VLDB J. 28 (2019) 497–521. doi:10.1007/              NoSQL, in: Proc. IEEE Big Data, IEEE, 2020, pp.
     s00778-018-0532-7.                                             1974–1984. doi:10.1109/BigData50022.2020.
[12] M. Fruth, K. Dauberschmidt, S. Scherzinger, Josch:             9378278.
     Managing Schemas for NoSQL Document Stores,               [25] A. Conrad, M. L. Möller, T. Kreiter, J.-C. Mair,
     in: Proc. ICDE, IEEE, 2021, pp. 2693–2696. doi:10.             M. Klettke, U. Störl, EvoBench: Benchmark-
     1109/ICDE51399.2021.00306.                                     ing Schema Evolution in NoSQL,             in: Proc.
[13] P. Contos, M. Svoboda,            JSON Schema In-              TPCTC@VLDB, Springer, 2021, pp. 33–49. doi:10.
     ference Approaches,           in: Proc. ER Work-               1007/978-3-030-94437-7_3.
     shops, Springer, 2020, pp. 173–183. doi:10.1007/          [26] C. Zhang, J. Lu, P. Xu, Y. Chen, UniBench: A Bench-
     978-3-030-65847-2\_16.                                         mark for Multi-model Database Management Sys-
[14] A. Y. Levy, A. O. Mendelzon, Y. Sagiv, D. Srivas-              tems, in: Proc. TPCTC@VLDB, Springer, 2018, pp.
     tava, Answering Queries Using Views, in: Proc.                 7–23. doi:10.1007/978-3-030-11404-6\_2.
     PODS, ACM Press, 1995, pp. 95–104. doi:10.1145/           [27] A. Conrad, S. Gärtner, U. Störl, Towards Automated
     212433.220198.                                                 Schema Optimization, in: Proc. ER Demos and
[15] Y. Papakonstantinou, Polystore Query Rewriting:                Posters, CEUR-WS.org, 2021, pp. 37–42. URL: http:
     The Challenges of Variety, in: EDBT/ICDT Work-                 //ceur-ws.org/Vol-2958/paper7.pdf.
     shops, CEUR-WS.org, 2016. URL: http://ceur-ws.
     org/Vol-1558/paper46.pdf.

</pre>