=Paper=
{{Paper
|id=Vol-3135/dataplat_short3
|storemode=property
|title=Darwin: A Data Platform for Schema Evolution Management and Data Migration
|pdfUrl=https://ceur-ws.org/Vol-3135/dataplat_short3.pdf
|volume=Vol-3135
|authors=Uta Störl,Meike Klettke
|dblpUrl=https://dblp.org/rec/conf/edbt/StorlK22
}}
==Darwin: A Data Platform for Schema Evolution Management and Data Migration==
Darwin: A Data Platform for NoSQL
Schema Evolution Management and Data Migration
Uta Störl1 , Meike Klettke2
1
University of Hagen, Germany
2
University of Rostock, Germany
Abstract
During the development of NoSQL-backed software, the database schema evolves alongside the application code. Especially
in agile development, new application releases are frequently deployed. This leads to heterogeneous data in the database
and thus to new challenges for application development. To handle such heterogeneous data, we have developed various
algorithms, implemented and evaluated them in a data platform for schema evolution management and data migration called
Darwin. We provide an overview of Darwin, the concepts, algorithms, their implementations and the possible usage of Darwin
in this paper.
Keywords
NoSQL databases, schema evolution, data migration
1. Introduction tems. This paper also presents specific architectural and
implementation aspects of this data platform. Further-
Schema evolution management is one of the most chal- more, we have now published Darwin publicly1 in a fully
lenging problems in data management today [1]. The operational docker container so that the system can also
popularity of NoSQL databases makes this issue even be used by interested researchers.
more complex. Schema-flexible NoSQL databases are es- The rest of the paper is organized as follows: In Sec-
pecially popular backends in agile development. New tion 2 we discuss the related work. Section 3 gives an
software releases can be deployed without migration- overview of the Darwin data platform. In Section 4 we
related application downtime. An empirical study of discuss the main functionalities of Darwin and their inter-
NoSQL database schema development shows that more play. Afterwards extensions of Darwin are presented in
schema-relevant changes are included with the use of Section 5. We conclude with a summary and an outlook
NoSQL databases in comparison to relational databases on further work.
[2]. In addition, schemas become more complex over
time and take longer to stabilize.
Managing schema evolution involves two main tasks: 2. Related Work
discovering (extracting) structural changes to data and
dealing with these changes from an application develop- The implementation of Darwin bases on several research
ment perspective (data migration). In the past, we have results and theoretical achievements, partly developed
published several papers on specific research results and by our own group.
theoretical achievements of schema evolution manage-
ment [3, 4] and data migration [5, 6]. We also presented Schema Extraction There are several suggestions for
demo papers of implementations of some sub-aspects schema extraction for NoSQL databases [9, 10, 11]. In [3],
[7, 8]. This paper is intended to provide an overall view we developed an approach to schema extraction which
of the Darwin system and shows the interaction of the generates a graph structure representing all structural
different algorithms. variants of a given dataset. In the next step, this inter-
Darwin supports the whole schema evolution man- nal graph structure is summarized in a JSON schema
agement and data migration lifecycle. The system is description. Meanwhile, this basic functionality of static
implemented for different types of NoSQL database sys- schema extraction for NoSQL databases is available in
some commercial tools like Hackolade2 , Studio 3T3 or
Published in the Workshop Proceedings of the EDBT/ICDT 2022 Joint research prototypes like Josch [12].
Conference (March 29-April 1, 2022), Edinburgh, UK
$ uta.stoerl@fernuni-hagen.de (U. Störl);
meike.klettke@uni-rostock.de (M. Klettke)
0000-0003-2771-142X (U. Störl); 0000-0003-0551-8389
(M. Klettke) 1
https://github.com/dbishagen/darwin
© 2022 Copyright for this paper by its authors. Use permitted under Creative 2
Commons License Attribution 4.0 International (CC BY 4.0). https://hackolade.com/
CEUR Workshop Proceedings (CEUR-WS.org) 3
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
https://studio3t.com/
Version History Extraction The lack of all NoSQL middleware between a Java application and a database
schema extraction approaches (cf. [13] for a survey) still storing variational data:
is that they do not consider and cannot detect schema
changes over time. This observation leads us to the de-
velopment of a schema version history extraction ap-
proach. This algorithm can be applied if a partial order of
the datasets is available (e.g. a timestamp or a creation-
date). It in turn extracts an internal graph structure and
adds timestamp information. Each structural change that
the algorithm detects triggers the generation of a new
schema version. In addition, the change operations are ex-
tracted from the differences of two consecutive structural
versions and are represented as evolution operations [4].
Thus this algorithm is able to uncover the complete evo-
lution history and the genesis of available databases. We
have presented a demonstration of this function, which
is essential for schema evolution management, in [7].
Query Rewriting To read NoSQL datasets in different
versions, query rewriting is necessary. Query rewriting
is a core database technology which has been introduced
to optimize query execution by using materialized views Figure 1: Darwin System Architecture
[14]. Query rewriting can also be used to handle irregular
structures. A query rewriting approach which considers
the data heterogeneity and generates different subqueries
• At the top of the application stack is the Java
for different varieties is developed in [15]. In [16], we
application. It stores its data in a NoSQL database,
suggest how query rewriting can use schema evolution
interacting with the system-independent Darwin
operations to unify different consecutive structural ver-
Persistence API (DPA).
sions in queries.
• Via the Darwin WebApp or the Darwin CLI, appli-
cation developers may trigger schema evolution
Data Migration While there are some approaches in
management and data migration tasks directly.
the area of managing schema evolution and data migra-
We will explain the these tasks in detail in Sec-
tion for relational systems [17, 18], there is very little in
tion 4.
the field of NoSQL databases. We proposed the concepts
• All user interfaces use the Darwin Core REST API.
of eager and lazy migration in NoSQL databases in [19].
This architecture allows the flexible use of Darwin,
KVolve, an extension for the Redis NoSQL database that
as we will see when we present the extensions in
supports lazy migration, was introduced in [20]. The IDE
Section 5.
integrated tool ControVol supports eager and lazy migra-
tion based on static type checks of object mapper class • The Darwin Core REST API interfaces with the
declarations as recorded by the code repository [21]. core modules necessary for the schema evolution
We presented initial ideas of hybrid data migration management and data migration lifecycle, which
strategies (incremental and predictive migration) in [5] we will present in detail in Section 4. These mod-
and described them in detail in [6]. We intensively stud- ules are implemented independent of a concrete
ied the impact of different data migration strategies on database system.
migration cost and latency and presented and discussed • A Data Access Manager and a Schema and Com-
the results in [22]. mand Storage Manager were implemented as
a uniform interface for the interaction of the
core modules with the respective database sys-
3. System Architecture tems. The Schema and Command Storage Man-
ager stores the schema versions and the schema
In this section, we will introduce the system architecture evolution operations. This information can either
of Darwin and the interaction of the individual mod- be stored in the same database as the data or in a
ules. The entire Darwin system is implemented in Java. separate database.
Figure 1 shows the system architecture of Darwin. In • The Drivers are responsible for the connection to
the agile application development use case, Darwin is a the specific database system. Since the languages
of all NoSQL DBMS differ, the mapping to the To the best of our knowledge, Darwin is the only
respective system is done in these modules. schema management tool supporting the extraction of
the schema version history.
Currently Darwin supports the most popular docu-
ment stores MongoDB and Couchbase, the wide column Notes on Performance and Scalability Extracting
store Cassandra and the multi-model database system and analyzing the entire data instance is a one-time effort.
ArangoDB. The architecture is designed for easy extensi- After the initial schema and version history extraction,
bility. Adding a new DBMS requires only the implemen- newly added entities can be analyzed incrementally and
tation of the appropriate driver. on-the-fly. Since Darwin does not load the entire data
instance into the main memory, but only incremental
4. Main Functionalities batches [4], Darwin can safely handle large volumes of
data.
Darwin supports the whole schema evolution manage-
ment and data migration lifecycle. In the following we 4.2. Schema Evolution Management
will explain this lifecycle and the corresponding func-
tionalities of Darwin and their interaction. Schema evolution is an ongoing process during the devel-
opment of an application. Schema evolution operations
(SMOs) can be
4.1. Schema and Version History
Extraction • manually entered in Darwin using the Darwin CLI
or Darwin WebApp, or
Schema extraction and version history extraction belong • automatically observed by incremental version
to the data preprocessing steps which are implemented history extraction.
in the Darwin tool. Both steps are necessary for the
analysis of available NoSQL datasets (which have been Which datasets being migrated after a schema evo-
created outside of Darwin) and for understanding the lution operation depends on the chosen data migration
implicit structures and their changes over time. The strategy. Darwin has implemented eager, lazy, and var-
algorithms are detecting the variabilities in the NoSQL ious hybrid data migration strategies. We will present
data, structural outliers and the different versions over these strategies in Section 4.4.
time.
Figure 2 shows an example of a version history ex- 4.3. Query Rewriting
traction performed in Darwin. The screenshot shows
two versions side by side in JSON schema notation. The In NoSQL databases, storage of different schema versions
schema evolution operations are stated above. Changes can be done in the same database. Schema evolution
w.r.t. the previous schema version are highlighted [7]. in combination with lazy or hybrid data migration has
to face the situation that datasets stored in the same
NoSQL database have different structural versions. In
this scenario, the evolution operations that can transform
datasets from the structural version 𝑛 to the successor
version 𝑛 + 1 are stored in the Schema and Command
Storage Manager (see Figure 1).
If such versioned datasets are to be accessed and
queried by an application, query rewriting is necessary
to distribute the query to the different structural ver-
sions. Forward query rewriting is applied if a query which
assumes the structural version 𝑛 is translated into the
structural version 𝑛 + 𝑖. In this case the list of evolution
Figure 2: Example of a Schema Version History Extraction operations (for translation from version 𝑛 to 𝑛 + 1, 𝑛 + 2,
. . . , 𝑛 + 𝑖) is applied onto the query. Backward query
The schema and version history extraction algorithm rewriting is used to access preceding structural versions.
reads all datasets of the NoSQL database. The imple- To achieve this reverse evolution operations are used for
mented algorithm can run incrementally which means the translation of the query. In both cases, query rewrit-
in case of new datasets that only new data are analyzed ing generates different subqueries (one for each schema
and the results merged. version), executes the subqueries and unions the results.
Query rewriting enables a transparent access to strategy automatically minimizes data migration costs.
datasets in different schema versions and is the prerequi- The disadvantage of the lazy approach though is that data
site for a lazy and hybrid data migration. migration takes place during runtime and can negatively
impact data access latency.
4.4. Data Migration
Hybrid Data Migration Strategies Beside the two
Migration Strategies In Darwin, structural changes basic data migration strategies that either migrate all
can be defined by schema evolution operations (SMOs). datasets immediately after the introduction of a new
In the system both single-type operations (add, rename, schema version (eager) or no dataset (lazy) at all, Dar-
delete) and multi-type operations (move and copy) [19] win also offers several hybrid migration strategies that
are supported. The evolution operations define the provide an intelligent control of the data migration. The
changes of the schema. For each evolution operation, a hybrid migration strategies optimize the two different tar-
corresponding data migration operation is generated that gets: low total migration costs and low latency at runtime
executes the same structural changes in the datasets. Dar- when a dataset is accessed.
win implements several different data migration strate-
gies:
Incremental Migration A simple hybrid strategy is
the incremental migration. The data is only migrated
Eager Migration An eager data migration migrates completely at certain points in time (for example, after a
all datasets immediately after the introduction of a new certain number of schema evolution operations have been
schema version. Eager data migration has the advan- executed). Between two incremental migrations, the data
tage that all datasets always reflect the latest structural is migrated lazily. This approach has lower migration
version. This reduces the latency when datasets are ac- costs than eager migration. All datasets are, however,
cessed. A disadvantage is that migration costs are high updated even those not in use.
since in all cases all datasets are updated even those not
in use. Migration costs are especially concerning when
Predictive Migration A more sophisticated approach
the database is hosted in the cloud.
is the predictive migration. The so-called hot data, i.e.
the data that is frequently accessed, should be kept up-
Lazy Migration In order to avoid unnecessary migra- to-date. The prediction of the hot data is implemented
tion processes and thus reduce migration costs, Darwin in Darwin by keeping track of past data accesses while
provides another migration strategy – the so-called lazy ordering the accessed entities accordingly by means of
migration. The basic idea is that after the introduction of exponential smoothing. We use a prediction set whose
a new schema version, no dataset will be updated. The size is configurable. Data in this prediction set is migrated
new schema version and the corresponding schema evo- proactively after each schema change. Data not contained
lution operation are stored in the Schema and Command in the prediction set is migrated when it is accessed lazily.
Storage Manager (see Figure 1). All datasets are kept in This reduces both runtime overhead and migration costs.
their original version. As a result, lazy migration can The size of the prediction set is configurable within
lead to NoSQL databases containing datasets in different Darwin. In [6] we presented initial approaches to adjust
structural versions. the size of this prediction set self-adaptive depending on
In case that datasets are accessed by a query, the query given bounds on migration cost and latency.
is rewritten onto the different versions (see Section 4.3).
The resulting datasets are migrated at runtime and stored
Selection of the appropriate Data Migration Strat-
in the database in the new version. In the case of a single-
egy We have extensively studied the impact of different
type operation, runtime migration is relatively simple
data migration strategies on migration cost and latency.
and efficient. In the case of a multi-type operation, the
Probabilistic Monte Carlo method of repeated sampling
migration is much more complex. It is possible that dur-
were used for the analysis. Figure 3 shows an exam-
ing a copy or move operation the corresponding objects
ple. The impact of these different migration strategies
are not yet in the latest version and must be migrated as
on migration costs (assuming a cloud hosted database)
well. This in turn can require that further objects need to
and the data access latency is obvious. We presented
be migrated (cascading migration). Currently, we limit
and discussed the results in detail in [22]. Nevertheless,
the depth of cascading migrations to two levels in Dar-
selecting the appropriate data migration strategy is a sig-
win. An analysis of the best fitting strategies for different
nificant challenge. We have developed the data migration
cases could be the subject of further research.
advisor MigCast for this purpose, which we present in
A lazy approach has the advantage that only those
Section 5.1.
datasets currently in use are being migrated. Cold data
are not accessed and subsequently not migrated. This
trated within Darwin instead using database-provided
joins within the Drivers (cf. Figure 1). At the beginning
of the implementation of Darwin in 2014, MongoDB was
one of those NoSQL database systems which did not sup-
port joins. In MongoDB, this functionality is available
since version 3.2.
Performing migration operations within Darwin offers
the opportunity to support NoSQL DBMS that do not
natively support all migration operations. To evaluate
this aspect, we have used the EvoBench benchmark which
we have developed. Figure 5 shows the results of the
evaluation which will be explained in Section 5.2.
5. Darwin Ecosystem
In addition to the core functionality of Darwin presented
in Section 4, two other important aspects were investi-
gated and corresponding tools were developed. In Sec-
tion 5.1 we introduce the data migration advisor MigCast.
Figure 3: Impact of different Migration Strategies (cf. [22])
Then, in Section 5.2 the schema evolution benchmark
EvoBench is presented.
4.5. Migration Optimization 5.1. MigCast
There are different opportunities to optimize the execu-
As explained in Section 4.4, selecting the appropriate mi-
tion of the migration operations. We would like to briefly
gration strategy is a huge challenge. We have developed
outline three aspects:
the data migration advisor MigCast for this purpose.
MigCast is implemented on top of Darwin. As input
Composition of Schema Evolution Operations parameters MigCast takes into account the characteris-
When a legacy dataset needs to be migrated from several tics of the data instance and the data access pattern, e.g.,
versions back, the pending schema changes may either a Pareto distribution of future reads and writes, the data
be applied stepwise or by composite migration. As a model changes (schema evolution), and particulars about
simple example, an add and a rename operation can be the cloud pricing model. With these inputs, MigCast pre-
combined into one add operation on the same data object. dicts the migration costs and the data access latency. This
In [5] we introduced the composition rules for schema estimation is based on three core modules: a Workload
evolution operations (both single-type and multi-type). Simulator, a Cost Calculator, and a Latency Profiler (see
Measurement results and aspects of the implementation Figure 4).
were presented in [23].
Caching An obvious optimization is the extensive use
of caching. Darwin contains a Schema Cache and a Com-
mand Cache to avoid repeated reading of this information
during data migration. Furthermore, a Composer Cache
was introduced for the described composition of migra-
tion operations, the effects of which were mentioned
above and discussed in detail in [23].
Location Simple single-type migration operations like
add and delete can be executed native directly in the
NoSQL DBMS. For more complex multi-type operations
like copy and move this is not always possible. This
depends on the offered functionality of the NoSQL DBMS.
Figure 4: MigCast Architecture
For example, many NoSQL DBMS do not support joins.
In this case, copy and move operations have to be orches-
In Figure 3 in Section 4.4 we have shown an example 6. Conclusion and Outlook
of such an estimation performed by MigCast. To support
the reproducibility of the experiments, the configuration In this article, we have introduced the main algorithms
of the performed measurements and the results are stored provided by Darwin – the tool for a continuous evolution
in a separate database (MigCastDB in Figure 4). MigCast of NoSQL backends. Darwin can be applied to databases
is publicly available as part of the Darwin distribution4 . that are starting from scratch as well as to already ex-
isting NoSQL databases even those containing different
versions of legacy data in the same database. In all cases,
5.2. EvoBench
the schema evolution and data migration of Darwin com-
Darwin belongs to the first systems tailored for an ongo- ponent keeps dataset structures up-to-date.
ing evolution of database backends. For testing and eval- The tool can be applied to different NoSQL databases
uating Darwin and other approaches for NoSQL schema (e.g. MongoDB, Couchbase, Cassandra, and ArangoDB).
evolution and data migration, we defined and imple- A side effect is that Darwin can also be used for the mi-
mented the benchmark EvoBench. EvoBench is the first gration of data between different database systems, e.g.
available benchmark to validate the abilities of a system from MongoDB into ArangoDB and thus enables inter-
to evolve NoSQL databases and to determine and compare operability between different NoSQL backends.
the performance of the dedicated evolution operations In the data migration component of Darwin different
[24, 25]. EvoBench bases on a Customer-Product-Order- optimization aims (migration costs, latency) can be pur-
Invoice dataset originally introduced in [26] and defines sued. In Section 4.4, we have introduced different data
20 schema evolutions on this application, ranging from migration strategies and have shown their impact on the
simple extensions of the data model up to more complex different cost metrics. One task of future work is to de-
refactoring. velop a self-adaptive data migration which recommends a
The EvoBench tool is implemented in Python. EvoBench data migration strategy and optimizes parameter setting
treats the respective schema evolution platform as a black in the dedicated algorithm [6].
box and uses the provided API for the schema evolution Another direction of future development in Darwin is
operations. In the case of Darwin, we use the Darwin Core the development of a polystore data migration method
Rest API (see Figure 1). In addition to the data model and including schema optimization [27].
schema evolution operations predefined in the bench- With Darwin we offer a complete solution that is re-
mark, the EvoBench tool also supports the use of your quired for every long-running NoSQL database to keep
own data models and schema evolution operations for the structures permanently up-to-date and to ensure that
experiments. NoSQL data is operational over long periods of time.
As an example, we return to the impacts discussed
in Section 4.5 when performing migration operations Acknowledgments
natively in the database or in Darwin. Figure 5 shows the
execution time as well as the migrations costs (in terms This work has been funded by Deutsche Forschungsge-
of executed operations) executing these operations on meinschaft (German Research Foundation) – 385808805.
123,200 data objects [25] using MongoDB. We would like to thank all project members whose work
contributed to the success of the project, especially An-
135 246,443
add 512 492,846 DB dré Conrad, Andrea Hillenbrand, Mark Lukas Möller and
141 246,443 Darwin
rename 523 492,846 Stefanie Scherzinger. Special thanks go to all students
delete 98
297
246,443
492,846 of Darmstadt University of Applied Sciences who have
549 864,437
contributed to the implementation of Darwin.
copy 1,178 1,599,659
move 1,406 1,234,037
1,293 1,110,838
Time in Seconds Migration Costs References
Figure 5: Impact of different Locations of Migration Opera-
tion Execution (cf. [25]) [1] M. Stonebraker, My Top Ten Fears about the DBMS
Field, in: Proc. ICDE, IEEE, 2018, pp. 24–28. doi:10.
1109/ICDE.2018.00012.
As explained earlier, EvoBench is designed to be inde- [2] S. Scherzinger, S. Sidortschuck, An Empirical Study
pendent from Darwin and can also be used to evaluate on the Design and Evolution of NoSQL Database
other schema evolution management platforms. We have Schemas, in: Proc. ER, Springer, 2020, pp. 441–455.
deployed EvoBench and associated measurements in fully doi:10.1007/978-3-030-62522-1\_33.
operational docker containers5 . [3] M. Klettke, U. Störl, S. Scherzinger, Schema
4
https://github.com/dbishagen/darwin Extraction and Structural Outlier Detection for
5
https://doi.org/10.5281/zenodo.4993636
JSON-based NoSQL Data Stores, in: Proc. BTW, [16] M. L. Möller, M. Klettke, A. Hillenbrand, U. Störl,
GI, 2015, pp. 425–444. URL: https://dl.gi.de/20.500. Query Rewriting for Continuously Evolving NoSQL
12116/2420. Databases, in: Proc. ER, Springer, 2019, pp. 213–221.
[4] M. Klettke, H. Awolin, U. Störl, D. Müller, doi:10.1007/978-3-030-33223-5\_18.
S. Scherzinger, Uncovering the Evolution His- [17] C. Curino, H. J. Moon, C. Zaniolo, Graceful database
tory of Data Lakes, in: Proc. IEEE Big Data, schema evolution: the PRISM workbench, Proc.
IEEE, 2017, pp. 2462–2471. doi:10.1109/BigData. VLDB Endow. 1 (2008). doi:10.14778/1453856.
2017.8258204. 1453939.
[5] M. Klettke, U. Störl, M. Shenavai, S. Scherzinger, [18] S. Bhattacherjee, G. Liao, M. Hicks, D. J. Abadi,
NoSQL schema evolution and big data migration at BullFrog: Online Schema Evolution via Lazy Evalu-
scale, in: Proc. IEEE Big Data, IEEE, 2016, pp. 2764– ation, in: Proc. SIGMOD, ACM, 2021, pp. 194–206.
2774. doi:10.1109/BigData.2016.7840924. doi:10.1145/3448016.3452842.
[6] A. Hillenbrand, U. Störl, S. Nabiyev, M. Klettke, Self- [19] S. Scherzinger, M. Klettke, U. Störl, Managing
adapting data migration in the context of schema Schema Evolution in NoSQL Data Stores, in: Proc.
evolution in NoSQL databases, Distributed and Par- DBPL@VLDB, 2013. URL: http://arxiv.org/abs/1308.
allel Databases abs/2104.14828 (2021) 1–21. doi:10. 0514.
1007/s10619-021-07334-1. [20] K. Saur, T. Dumitras, M. W. Hicks, Evolving
[7] U. Störl, D. Müller, A. Tekleab, S. Tolale, J. Sten- NoSQL Databases without Downtime, in: 2016
zel, M. Klettke, S. Scherzinger, Curating Varia- IEEE International Conference on Software Main-
tional Data in Application Development, in: Proc. tenance and Evolution, IEEE, 2016, pp. 166–176.
ICDE, IEEE, 2018, pp. 1605–1608. doi:10.1109/ doi:10.1109/ICSME.2016.47.
ICDE.2018.00187. [21] S. Scherzinger, T. Cerqueus, E. C. de Almeida, Con-
[8] A. Hillenbrand, M. Levchenko, U. Störl, troVol: A framework for controlled schema evolu-
S. Scherzinger, M. Klettke, MigCast: Putting tion in NoSQL application development, in: Proc.
a Price Tag on Data Model Evolution in NoSQL ICDE, IEEE, 2015, pp. 1464–1467. doi:10.1109/
Data Stores, in: Proc. SIGMOD, ACM, 2019, pp. ICDE.2015.7113402.
1925–1928. doi:10.1145/3299869.3320223. [22] A. Hillenbrand, S. Scherzinger, U. Störl, Re-
[9] D. S. Ruiz, S. F. Morales, J. G. Molina, Inferring maining in Control of the Impact of Schema
Versioned Schemas from NoSQL Databases and Its Evolution in NoSQL Databases, in: Proc.
Applications, in: Proc. ER, Springer, 2015, pp. 467– ER, Springer, 2021, pp. 149–159. doi:10.1007/
480. doi:10.1007/978-3-319-25264-3\_35. 978-3-030-89022-3\_13.
[10] L. Meurice, A. Cleve, Supporting schema evolution [23] U. Störl, A. Tekleab, M. Klettke, S. Scherzinger,
in schema-less NoSQL data stores, in: Proc. IEEE In for a Surprise When Migrating NoSQL Data,
SANER, IEEE, 2017, pp. 457–461. doi:10.1109/ in: Proc. ICDE, IEEE, 2018, p. 1662. doi:10.1109/
SANER.2017.7884653. ICDE.2018.00202.
[11] M. A. Baazizi, D. Colazzo, G. Ghelli, C. Sartiani, [24] M. L. Möller, M. Klettke, U. Störl, EvoBench — A
Parametric schema inference for massive JSON Framework for Benchmarking Schema Evolution in
datasets, VLDB J. 28 (2019) 497–521. doi:10.1007/ NoSQL, in: Proc. IEEE Big Data, IEEE, 2020, pp.
s00778-018-0532-7. 1974–1984. doi:10.1109/BigData50022.2020.
[12] M. Fruth, K. Dauberschmidt, S. Scherzinger, Josch: 9378278.
Managing Schemas for NoSQL Document Stores, [25] A. Conrad, M. L. Möller, T. Kreiter, J.-C. Mair,
in: Proc. ICDE, IEEE, 2021, pp. 2693–2696. doi:10. M. Klettke, U. Störl, EvoBench: Benchmark-
1109/ICDE51399.2021.00306. ing Schema Evolution in NoSQL, in: Proc.
[13] P. Contos, M. Svoboda, JSON Schema In- TPCTC@VLDB, Springer, 2021, pp. 33–49. doi:10.
ference Approaches, in: Proc. ER Work- 1007/978-3-030-94437-7_3.
shops, Springer, 2020, pp. 173–183. doi:10.1007/ [26] C. Zhang, J. Lu, P. Xu, Y. Chen, UniBench: A Bench-
978-3-030-65847-2\_16. mark for Multi-model Database Management Sys-
[14] A. Y. Levy, A. O. Mendelzon, Y. Sagiv, D. Srivas- tems, in: Proc. TPCTC@VLDB, Springer, 2018, pp.
tava, Answering Queries Using Views, in: Proc. 7–23. doi:10.1007/978-3-030-11404-6\_2.
PODS, ACM Press, 1995, pp. 95–104. doi:10.1145/ [27] A. Conrad, S. Gärtner, U. Störl, Towards Automated
212433.220198. Schema Optimization, in: Proc. ER Demos and
[15] Y. Papakonstantinou, Polystore Query Rewriting: Posters, CEUR-WS.org, 2021, pp. 37–42. URL: http:
The Challenges of Variety, in: EDBT/ICDT Work- //ceur-ws.org/Vol-2958/paper7.pdf.
shops, CEUR-WS.org, 2016. URL: http://ceur-ws.
org/Vol-1558/paper46.pdf.