<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>April</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Darwin: A Data Platform for NoSQL Schema Evolution Management and Data Migration</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Uta Störl</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Meike Klettke</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Hagen</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Rostock</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>1</volume>
      <issue>2022</issue>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>During the development of NoSQL-backed software, the database schema evolves alongside the application code. Especially in agile development, new application releases are frequently deployed. This leads to heterogeneous data in the database and thus to new challenges for application development. To handle such heterogeneous data, we have developed various algorithms, implemented and evaluated them in a data platform for schema evolution management and data migration called Darwin. We provide an overview of Darwin, the concepts, algorithms, their implementations and the possible usage of Darwin in this paper.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;NoSQL databases</kwd>
        <kwd>schema evolution</kwd>
        <kwd>data migration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Schema evolution management is one of the most
challenging problems in data management today [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The
popularity of NoSQL databases makes this issue even
more complex. Schema-flexible NoSQL databases are
especially popular backends in agile development. New
software releases can be deployed without
migrationrelated application downtime. An empirical study of
NoSQL database schema development shows that more
schema-relevant changes are included with the use of
NoSQL databases in comparison to relational databases
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In addition, schemas become more complex over
time and take longer to stabilize.
      </p>
      <p>
        Managing schema evolution involves two main tasks: 2. Related Work
discovering (extracting) structural changes to data and
dealing with these changes from an application develop- The implementation of Darwin bases on several research
ment perspective (data migration). In the past, we have results and theoretical achievements, partly developed
published several papers on specific research results and by our own group.
theoretical achievements of schema evolution
management [
        <xref ref-type="bibr" rid="ref3">3, 4</xref>
        ] and data migration [5, 6]. We also presented Schema Extraction There are several suggestions for
demo papers of implementations of some sub-aspects schema extraction for NoSQL databases [9, 10, 11]. In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
[7, 8]. This paper is intended to provide an overall view we developed an approach to schema extraction which
of the Darwin system and shows the interaction of the generates a graph structure representing all structural
diferent algorithms. variants of a given dataset. In the next step, this
inter
      </p>
      <p>Darwin supports the whole schema evolution man- nal graph structure is summarized in a JSON schema
agement and data migration lifecycle. The system is description. Meanwhile, this basic functionality of static
implemented for diferent types of NoSQL database sys- schema extraction for NoSQL databases is available in
some commercial tools like Hackolade2, Studio 3T3 or
research prototypes like Josch [12].
tems. This paper also presents specific architectural and
implementation aspects of this data platform.
Furthermore, we have now published Darwin publicly1 in a fully
operational docker container so that the system can also
be used by interested researchers.</p>
      <p>The rest of the paper is organized as follows: In
Section 2 we discuss the related work. Section 3 gives an
overview of the Darwin data platform. In Section 4 we
discuss the main functionalities of Darwin and their
interplay. Afterwards extensions of Darwin are presented in
Section 5. We conclude with a summary and an outlook
on further work.
Version History Extraction The lack of all NoSQL
schema extraction approaches (cf. [13] for a survey) still
is that they do not consider and cannot detect schema
changes over time. This observation leads us to the
development of a schema version history extraction
approach. This algorithm can be applied if a partial order of
the datasets is available (e.g. a timestamp or a
creationdate). It in turn extracts an internal graph structure and
adds timestamp information. Each structural change that
the algorithm detects triggers the generation of a new
schema version. In addition, the change operations are
extracted from the diferences of two consecutive structural
versions and are represented as evolution operations [4].</p>
      <p>Thus this algorithm is able to uncover the complete
evolution history and the genesis of available databases. We
have presented a demonstration of this function, which
is essential for schema evolution management, in [7].</p>
      <p>Query Rewriting To read NoSQL datasets in diferent
versions, query rewriting is necessary. Query rewriting
is a core database technology which has been introduced
to optimize query execution by using materialized views
[14]. Query rewriting can also be used to handle irregular
structures. A query rewriting approach which considers
the data heterogeneity and generates diferent subqueries
for diferent varieties is developed in [ 15]. In [16], we
suggest how query rewriting can use schema evolution
operations to unify diferent consecutive structural
versions in queries.</p>
      <sec id="sec-1-1">
        <title>Data Migration While there are some approaches in</title>
        <p>the area of managing schema evolution and data
migration for relational systems [17, 18], there is very little in
the field of NoSQL databases. We proposed the concepts
of eager and lazy migration in NoSQL databases in [19].
KVolve, an extension for the Redis NoSQL database that
supports lazy migration, was introduced in [20]. The IDE
integrated tool ControVol supports eager and lazy
migration based on static type checks of object mapper class
declarations as recorded by the code repository [21].</p>
        <p>We presented initial ideas of hybrid data migration
strategies (incremental and predictive migration) in [5]
and described them in detail in [6]. We intensively
studied the impact of diferent data migration strategies on
migration cost and latency and presented and discussed
the results in [22].</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. System Architecture</title>
      <sec id="sec-2-1">
        <title>In this section, we will introduce the system architecture</title>
        <p>of Darwin and the interaction of the individual
modules. The entire Darwin system is implemented in Java.
Figure 1 shows the system architecture of Darwin. In
the agile application development use case, Darwin is a
middleware between a Java application and a database
storing variational data:
• At the top of the application stack is the Java
application. It stores its data in a NoSQL database,
interacting with the system-independent Darwin
Persistence API (DPA).
• Via the Darwin WebApp or the Darwin CLI,
application developers may trigger schema evolution
management and data migration tasks directly.
We will explain the these tasks in detail in
Section 4.
• All user interfaces use the Darwin Core REST API.</p>
        <p>This architecture allows the flexible use of Darwin,
as we will see when we present the extensions in
Section 5.
• The Darwin Core REST API interfaces with the
core modules necessary for the schema evolution
management and data migration lifecycle, which
we will present in detail in Section 4. These
modules are implemented independent of a concrete
database system.
• A Data Access Manager and a Schema and
Command Storage Manager were implemented as
a uniform interface for the interaction of the
core modules with the respective database
systems. The Schema and Command Storage
Manager stores the schema versions and the schema
evolution operations. This information can either
be stored in the same database as the data or in a
separate database.
• The Drivers are responsible for the connection to
the specific database system. Since the languages
of all NoSQL DBMS difer, the mapping to the
respective system is done in these modules.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Currently Darwin supports the most popular docu</title>
        <p>ment stores MongoDB and Couchbase, the wide column
store Cassandra and the multi-model database system
ArangoDB. The architecture is designed for easy
extensibility. Adding a new DBMS requires only the
implementation of the appropriate driver.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Main Functionalities</title>
      <sec id="sec-3-1">
        <title>Darwin supports the whole schema evolution manage</title>
        <p>ment and data migration lifecycle. In the following we
will explain this lifecycle and the corresponding
functionalities of Darwin and their interaction.</p>
        <sec id="sec-3-1-1">
          <title>4.1. Schema and Version History</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>Extraction</title>
          <p>Schema extraction and version history extraction belong
to the data preprocessing steps which are implemented
in the Darwin tool. Both steps are necessary for the
analysis of available NoSQL datasets (which have been
created outside of Darwin) and for understanding the
implicit structures and their changes over time. The
algorithms are detecting the variabilities in the NoSQL
data, structural outliers and the diferent versions over
time.</p>
          <p>Figure 2 shows an example of a version history
extraction performed in Darwin. The screenshot shows
two versions side by side in JSON schema notation. The
schema evolution operations are stated above. Changes
w.r.t. the previous schema version are highlighted [7].</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>The schema and version history extraction algorithm</title>
        <p>reads all datasets of the NoSQL database. The
implemented algorithm can run incrementally which means
in case of new datasets that only new data are analyzed
and the results merged.</p>
      </sec>
      <sec id="sec-3-3">
        <title>To the best of our knowledge, Darwin is the only</title>
        <p>schema management tool supporting the extraction of
the schema version history.</p>
        <p>Notes on Performance and Scalability Extracting
and analyzing the entire data instance is a one-time efort.
After the initial schema and version history extraction,
newly added entities can be analyzed incrementally and
on-the-fly. Since Darwin does not load the entire data
instance into the main memory, but only incremental
batches [4], Darwin can safely handle large volumes of
data.</p>
        <sec id="sec-3-3-1">
          <title>4.2. Schema Evolution Management</title>
          <p>Schema evolution is an ongoing process during the
development of an application. Schema evolution operations
(SMOs) can be
• manually entered in Darwin using the Darwin CLI
or Darwin WebApp, or
• automatically observed by incremental version
history extraction.</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>Which datasets being migrated after a schema evo</title>
        <p>lution operation depends on the chosen data migration
strategy. Darwin has implemented eager, lazy, and
various hybrid data migration strategies. We will present
these strategies in Section 4.4.</p>
        <sec id="sec-3-4-1">
          <title>4.3. Query Rewriting</title>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>In NoSQL databases, storage of diferent schema versions</title>
        <p>can be done in the same database. Schema evolution
in combination with lazy or hybrid data migration has
to face the situation that datasets stored in the same
NoSQL database have diferent structural versions. In
this scenario, the evolution operations that can transform
datasets from the structural version  to the successor
version  + 1 are stored in the Schema and Command
Storage Manager (see Figure 1).</p>
        <p>If such versioned datasets are to be accessed and
queried by an application, query rewriting is necessary
to distribute the query to the diferent structural
versions. Forward query rewriting is applied if a query which
assumes the structural version  is translated into the
structural version  + . In this case the list of evolution
operations (for translation from version  to  + 1,  + 2,
. . . ,  + ) is applied onto the query. Backward query
rewriting is used to access preceding structural versions.
To achieve this reverse evolution operations are used for
the translation of the query. In both cases, query
rewriting generates diferent subqueries (one for each schema
version), executes the subqueries and unions the results.</p>
      </sec>
      <sec id="sec-3-6">
        <title>Query rewriting enables a transparent access to datasets in diferent schema versions and is the prerequisite for a lazy and hybrid data migration.</title>
        <sec id="sec-3-6-1">
          <title>4.4. Data Migration</title>
        </sec>
      </sec>
      <sec id="sec-3-7">
        <title>Migration Strategies In Darwin, structural changes</title>
        <p>can be defined by schema evolution operations (SMOs).
In the system both single-type operations (add, rename,
delete) and multi-type operations (move and copy) [19]
are supported. The evolution operations define the
changes of the schema. For each evolution operation, a
corresponding data migration operation is generated that
executes the same structural changes in the datasets.
Darwin implements several diferent data migration
strategies:
Eager Migration An eager data migration migrates
all datasets immediately after the introduction of a new
schema version. Eager data migration has the
advantage that all datasets always reflect the latest structural
version. This reduces the latency when datasets are
accessed. A disadvantage is that migration costs are high
since in all cases all datasets are updated even those not
in use. Migration costs are especially concerning when
the database is hosted in the cloud.</p>
        <p>Lazy Migration In order to avoid unnecessary
migration processes and thus reduce migration costs, Darwin
provides another migration strategy – the so-called lazy
migration. The basic idea is that after the introduction of
a new schema version, no dataset will be updated. The
new schema version and the corresponding schema
evolution operation are stored in the Schema and Command
Storage Manager (see Figure 1). All datasets are kept in
their original version. As a result, lazy migration can
lead to NoSQL databases containing datasets in diferent
structural versions.</p>
        <p>In case that datasets are accessed by a query, the query
is rewritten onto the diferent versions (see Section 4.3).
The resulting datasets are migrated at runtime and stored
in the database in the new version. In the case of a
singletype operation, runtime migration is relatively simple
and eficient. In the case of a multi-type operation, the
migration is much more complex. It is possible that
during a copy or move operation the corresponding objects
are not yet in the latest version and must be migrated as
well. This in turn can require that further objects need to
be migrated (cascading migration). Currently, we limit
the depth of cascading migrations to two levels in
Darwin. An analysis of the best fitting strategies for diferent
cases could be the subject of further research.</p>
        <p>A lazy approach has the advantage that only those
datasets currently in use are being migrated. Cold data
are not accessed and subsequently not migrated. This
strategy automatically minimizes data migration costs.
The disadvantage of the lazy approach though is that data
migration takes place during runtime and can negatively
impact data access latency.</p>
        <p>Hybrid Data Migration Strategies Beside the two
basic data migration strategies that either migrate all
datasets immediately after the introduction of a new
schema version (eager) or no dataset (lazy) at all,
Darwin also ofers several hybrid migration strategies that
provide an intelligent control of the data migration. The
hybrid migration strategies optimize the two diferent
targets: low total migration costs and low latency at runtime
when a dataset is accessed.</p>
        <p>Incremental Migration A simple hybrid strategy is
the incremental migration. The data is only migrated
completely at certain points in time (for example, after a
certain number of schema evolution operations have been
executed). Between two incremental migrations, the data
is migrated lazily. This approach has lower migration
costs than eager migration. All datasets are, however,
updated even those not in use.</p>
        <p>Predictive Migration A more sophisticated approach
is the predictive migration. The so-called hot data, i.e.
the data that is frequently accessed, should be kept
upto-date. The prediction of the hot data is implemented
in Darwin by keeping track of past data accesses while
ordering the accessed entities accordingly by means of
exponential smoothing. We use a prediction set whose
size is configurable. Data in this prediction set is migrated
proactively after each schema change. Data not contained
in the prediction set is migrated when it is accessed lazily.
This reduces both runtime overhead and migration costs.</p>
        <p>The size of the prediction set is configurable within
Darwin. In [6] we presented initial approaches to adjust
the size of this prediction set self-adaptive depending on
given bounds on migration cost and latency.</p>
        <p>Selection of the appropriate Data Migration
Strategy We have extensively studied the impact of diferent
data migration strategies on migration cost and latency.
Probabilistic Monte Carlo method of repeated sampling
were used for the analysis. Figure 3 shows an
example. The impact of these diferent migration strategies
on migration costs (assuming a cloud hosted database)
and the data access latency is obvious. We presented
and discussed the results in detail in [22]. Nevertheless,
selecting the appropriate data migration strategy is a
significant challenge. We have developed the data migration
advisor MigCast for this purpose, which we present in
Section 5.1.</p>
        <sec id="sec-3-7-1">
          <title>4.5. Migration Optimization</title>
        </sec>
      </sec>
      <sec id="sec-3-8">
        <title>There are diferent opportunities to optimize the execution of the migration operations. We would like to briefly outline three aspects:</title>
        <p>Composition of Schema Evolution Operations
When a legacy dataset needs to be migrated from several
versions back, the pending schema changes may either
be applied stepwise or by composite migration. As a
simple example, an add and a rename operation can be
combined into one add operation on the same data object.
In [5] we introduced the composition rules for schema
evolution operations (both single-type and multi-type).
Measurement results and aspects of the implementation
were presented in [23].</p>
        <p>Caching An obvious optimization is the extensive use
of caching. Darwin contains a Schema Cache and a
Command Cache to avoid repeated reading of this information
during data migration. Furthermore, a Composer Cache
was introduced for the described composition of
migration operations, the efects of which were mentioned
above and discussed in detail in [23].</p>
      </sec>
      <sec id="sec-3-9">
        <title>Location Simple single-type migration operations like</title>
        <p>add and delete can be executed native directly in the
NoSQL DBMS. For more complex multi-type operations
like copy and move this is not always possible. This
depends on the ofered functionality of the NoSQL DBMS.</p>
        <p>For example, many NoSQL DBMS do not support joins.
In this case, copy and move operations have to be
orchestrated within Darwin instead using database-provided
joins within the Drivers (cf. Figure 1). At the beginning
of the implementation of Darwin in 2014, MongoDB was
one of those NoSQL database systems which did not
support joins. In MongoDB, this functionality is available
since version 3.2.</p>
        <p>Performing migration operations within Darwin ofers
the opportunity to support NoSQL DBMS that do not
natively support all migration operations. To evaluate
this aspect, we have used the EvoBench benchmark which
we have developed. Figure 5 shows the results of the
evaluation which will be explained in Section 5.2.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Darwin Ecosystem</title>
      <sec id="sec-4-1">
        <title>In addition to the core functionality of Darwin presented</title>
        <p>in Section 4, two other important aspects were
investigated and corresponding tools were developed. In
Section 5.1 we introduce the data migration advisor MigCast.
Then, in Section 5.2 the schema evolution benchmark
EvoBench is presented.</p>
        <sec id="sec-4-1-1">
          <title>5.1. MigCast</title>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>As explained in Section 4.4, selecting the appropriate mi</title>
        <p>gration strategy is a huge challenge. We have developed
the data migration advisor MigCast for this purpose.</p>
        <p>MigCast is implemented on top of Darwin. As input
parameters MigCast takes into account the
characteristics of the data instance and the data access pattern, e.g.,
a Pareto distribution of future reads and writes, the data
model changes (schema evolution), and particulars about
the cloud pricing model. With these inputs, MigCast
predicts the migration costs and the data access latency. This
estimation is based on three core modules: a Workload
Simulator, a Cost Calculator, and a Latency Profiler (see
Figure 4).</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusion and Outlook</title>
      <sec id="sec-5-1">
        <title>In Figure 3 in Section 4.4 we have shown an example</title>
        <p>of such an estimation performed by MigCast. To support
the reproducibility of the experiments, the configuration
of the performed measurements and the results are stored
in a separate database (MigCastDB in Figure 4). MigCast
is publicly available as part of the Darwin distribution4.</p>
        <p>In this article, we have introduced the main algorithms
provided by Darwin – the tool for a continuous evolution
of NoSQL backends. Darwin can be applied to databases
that are starting from scratch as well as to already
existing NoSQL databases even those containing diferent
5.2. EvoBench versions of legacy data in the same database. In all cases,
the schema evolution and data migration of Darwin
comDarwin belongs to the first systems tailored for an ongo- ponent keeps dataset structures up-to-date.
ing evolution of database backends. For testing and eval- The tool can be applied to diferent NoSQL databases
uating Darwin and other approaches for NoSQL schema (e.g. MongoDB, Couchbase, Cassandra, and ArangoDB).
evolution and data migration, we defined and imple- A side efect is that Darwin can also be used for the
mimented the benchmark EvoBench. EvoBench is the first gration of data between diferent database systems, e.g.
available benchmark to validate the abilities of a system from MongoDB into ArangoDB and thus enables
interto evolve NoSQL databases and to determine and compare operability between diferent NoSQL backends.
the performance of the dedicated evolution operations In the data migration component of Darwin diferent
[24, 25]. EvoBench bases on a Customer-Product-Order- optimization aims (migration costs, latency) can be
purInvoice dataset originally introduced in [26] and defines sued. In Section 4.4, we have introduced diferent data
20 schema evolutions on this application, ranging from migration strategies and have shown their impact on the
simple extensions of the data model up to more complex diferent cost metrics. One task of future work is to
derefactoring. velop a self-adaptive data migration which recommends a</p>
        <p>The EvoBench tool is implemented in Python. EvoBench data migration strategy and optimizes parameter setting
treats the respective schema evolution platform as a black in the dedicated algorithm [6].
box and uses the provided API for the schema evolution Another direction of future development in Darwin is
operations. In the case of Darwin, we use the Darwin Core the development of a polystore data migration method
Rest API (see Figure 1). In addition to the data model and including schema optimization [27].
schema evolution operations predefined in the bench- With Darwin we ofer a complete solution that is
remark, the EvoBench tool also supports the use of your quired for every long-running NoSQL database to keep
own data models and schema evolution operations for the structures permanently up-to-date and to ensure that
experiments. NoSQL data is operational over long periods of time.</p>
        <p>As an example, we return to the impacts discussed
in Section 4.5 when performing migration operations Acknowledgments
natively in the database or in Darwin. Figure 5 shows the
execution time as well as the migrations costs (in terms
of executed operations) executing these operations on
123,200 data objects [25] using MongoDB.</p>
        <p>246,449423,846
246,449423,846
246,449423,846</p>
        <p>DB</p>
        <p>Darwin
add
rename
delete
copy
move
135
141</p>
      </sec>
      <sec id="sec-5-2">
        <title>4https://github.com/dbishagen/darwin 5https://doi.org/10.5281/zenodo.4993636</title>
      </sec>
      <sec id="sec-5-3">
        <title>This work has been funded by Deutsche Forschungsge</title>
        <p>meinschaft (German Research Foundation) – 385808805.</p>
        <p>We would like to thank all project members whose work
contributed to the success of the project, especially
André Conrad, Andrea Hillenbrand, Mark Lukas Möller and
Stefanie Scherzinger. Special thanks go to all students
of Darmstadt University of Applied Sciences who have
contributed to the implementation of Darwin.
JSON-based NoSQL Data Stores, in: Proc. BTW, [16] M. L. Möller, M. Klettke, A. Hillenbrand, U. Störl,
GI, 2015, pp. 425–444. URL: https://dl.gi.de/20.500. Query Rewriting for Continuously Evolving NoSQL
12116/2420. Databases, in: Proc. ER, Springer, 2019, pp. 213–221.
[4] M. Klettke, H. Awolin, U. Störl, D. Müller, doi:10.1007/978-3-030-33223-5\_18.</p>
        <p>S. Scherzinger, Uncovering the Evolution His- [17] C. Curino, H. J. Moon, C. Zaniolo, Graceful database
tory of Data Lakes, in: Proc. IEEE Big Data, schema evolution: the PRISM workbench, Proc.
IEEE, 2017, pp. 2462–2471. doi:10.1109/BigData. VLDB Endow. 1 (2008). doi:10.14778/1453856.
2017.8258204. 1453939.
[5] M. Klettke, U. Störl, M. Shenavai, S. Scherzinger, [18] S. Bhattacherjee, G. Liao, M. Hicks, D. J. Abadi,
NoSQL schema evolution and big data migration at BullFrog: Online Schema Evolution via Lazy
Evaluscale, in: Proc. IEEE Big Data, IEEE, 2016, pp. 2764– ation, in: Proc. SIGMOD, ACM, 2021, pp. 194–206.
2774. doi:10.1109/BigData.2016.7840924. doi:10.1145/3448016.3452842.
[6] A. Hillenbrand, U. Störl, S. Nabiyev, M. Klettke, Self- [19] S. Scherzinger, M. Klettke, U. Störl, Managing
adapting data migration in the context of schema Schema Evolution in NoSQL Data Stores, in: Proc.
evolution in NoSQL databases, Distributed and Par- DBPL@VLDB, 2013. URL: http://arxiv.org/abs/1308.
allel Databases abs/2104.14828 (2021) 1–21. doi:10. 0514.</p>
        <p>1007/s10619-021-07334-1. [20] K. Saur, T. Dumitras, M. W. Hicks, Evolving
[7] U. Störl, D. Müller, A. Tekleab, S. Tolale, J. Sten- NoSQL Databases without Downtime, in: 2016
zel, M. Klettke, S. Scherzinger, Curating Varia- IEEE International Conference on Software
Maintional Data in Application Development, in: Proc. tenance and Evolution, IEEE, 2016, pp. 166–176.
ICDE, IEEE, 2018, pp. 1605–1608. doi:10.1109/ doi:10.1109/ICSME.2016.47.</p>
        <p>ICDE.2018.00187. [21] S. Scherzinger, T. Cerqueus, E. C. de Almeida,
Con[8] A. Hillenbrand, M. Levchenko, U. Störl, troVol: A framework for controlled schema
evoluS. Scherzinger, M. Klettke, MigCast: Putting tion in NoSQL application development, in: Proc.
a Price Tag on Data Model Evolution in NoSQL ICDE, IEEE, 2015, pp. 1464–1467. doi:10.1109/
Data Stores, in: Proc. SIGMOD, ACM, 2019, pp. ICDE.2015.7113402.</p>
        <p>1925–1928. doi:10.1145/3299869.3320223. [22] A. Hillenbrand, S. Scherzinger, U. Störl,
Re[9] D. S. Ruiz, S. F. Morales, J. G. Molina, Inferring maining in Control of the Impact of Schema
Versioned Schemas from NoSQL Databases and Its Evolution in NoSQL Databases, in: Proc.
Applications, in: Proc. ER, Springer, 2015, pp. 467– ER, Springer, 2021, pp. 149–159. doi:10.1007/
480. doi:10.1007/978-3-319-25264-3\_35. 978-3-030-89022-3\_13.
[10] L. Meurice, A. Cleve, Supporting schema evolution [23] U. Störl, A. Tekleab, M. Klettke, S. Scherzinger,
in schema-less NoSQL data stores, in: Proc. IEEE In for a Surprise When Migrating NoSQL Data,
SANER, IEEE, 2017, pp. 457–461. doi:10.1109/ in: Proc. ICDE, IEEE, 2018, p. 1662. doi:10.1109/
SANER.2017.7884653. ICDE.2018.00202.
[11] M. A. Baazizi, D. Colazzo, G. Ghelli, C. Sartiani, [24] M. L. Möller, M. Klettke, U. Störl, EvoBench — A
Parametric schema inference for massive JSON Framework for Benchmarking Schema Evolution in
datasets, VLDB J. 28 (2019) 497–521. doi:10.1007/ NoSQL, in: Proc. IEEE Big Data, IEEE, 2020, pp.
s00778-018-0532-7. 1974–1984. doi:10.1109/BigData50022.2020.
[12] M. Fruth, K. Dauberschmidt, S. Scherzinger, Josch: 9378278.</p>
        <p>Managing Schemas for NoSQL Document Stores, [25] A. Conrad, M. L. Möller, T. Kreiter, J.-C. Mair,
in: Proc. ICDE, IEEE, 2021, pp. 2693–2696. doi:10. M. Klettke, U. Störl, EvoBench:
Benchmark1109/ICDE51399.2021.00306. ing Schema Evolution in NoSQL, in: Proc.
[13] P. Contos, M. Svoboda, JSON Schema In- TPCTC@VLDB, Springer, 2021, pp. 33–49. doi:10.
ference Approaches, in: Proc. ER Work- 1007/978-3-030-94437-7_3.
shops, Springer, 2020, pp. 173–183. doi:10.1007/ [26] C. Zhang, J. Lu, P. Xu, Y. Chen, UniBench: A
Bench978-3-030-65847-2\_16. mark for Multi-model Database Management
Sys[14] A. Y. Levy, A. O. Mendelzon, Y. Sagiv, D. Srivas- tems, in: Proc. TPCTC@VLDB, Springer, 2018, pp.
tava, Answering Queries Using Views, in: Proc. 7–23. doi:10.1007/978-3-030-11404-6\_2.
PODS, ACM Press, 1995, pp. 95–104. doi:10.1145/ [27] A. Conrad, S. Gärtner, U. Störl, Towards Automated
212433.220198. Schema Optimization, in: Proc. ER Demos and
[15] Y. Papakonstantinou, Polystore Query Rewriting: Posters, CEUR-WS.org, 2021, pp. 37–42. URL: http:
The Challenges of Variety, in: EDBT/ICDT Work- //ceur-ws.org/Vol-2958/paper7.pdf.
shops, CEUR-WS.org, 2016. URL: http://ceur-ws.
org/Vol-1558/paper46.pdf.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Stonebraker</surname>
          </string-name>
          ,
          <article-title>My Top Ten Fears about the DBMS Field</article-title>
          ,
          <source>in: Proc. ICDE</source>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>24</fpage>
          -
          <lpage>28</lpage>
          . doi:
          <volume>10</volume>
          . 1109/ICDE.
          <year>2018</year>
          .
          <volume>00012</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Scherzinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sidortschuck</surname>
          </string-name>
          ,
          <article-title>An Empirical Study on the Design and Evolution of NoSQL Database Schemas</article-title>
          ,
          <source>in: Proc. ER</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>441</fpage>
          -
          <lpage>455</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -62522-1\_
          <fpage>33</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Klettke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Störl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Scherzinger</surname>
          </string-name>
          , Schema Extraction and Structural Outlier Detection for
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>