=Paper=
{{Paper
|id=Vol-1558/paper10
|storemode=property
|title=Finding and Fixing Type Mismatches in the Evolution of Object-NoSQL Mappings
|pdfUrl=https://ceur-ws.org/Vol-1558/paper10.pdf
|volume=Vol-1558
|authors=Stefanie Scherzinger,Eduardo Cunha de Almeida,Thomas Cerqueus,Leandro Batista de Almeida,Pedro Holanda
|dblpUrl=https://dblp.org/rec/conf/edbt/ScherzingerACAH16
}}
==Finding and Fixing Type Mismatches in the Evolution of Object-NoSQL Mappings==
Finding and Fixing Type Mismatches
in the Evolution of Object-NoSQL Mappings
Stefanie Scherzinger Eduardo Cunha Thomas Cerqueus
OTH Regensburg, Germany de Almeida University of Lyon, France
stefanie.scherzinger- UFPR, Brazil thomas.cerqueus
@oth-regensburg.de eduardo@inf.ufpr.br @insa-lyon.fr
Leandro Batista Pedro Holanda
de Almeida UFPR, Brazil
UTFPR, Brazil ptholanda@inf.ufpr.br
leandro@dainf.ct.utfpr.edu.br
ABSTRACT Let us consider a standard development environment with
NoSQL data stores are popular backends for managing big an editor (e.g., an IDE like Eclipse) and a code repository.
data that is evolving over time: Due to their schema-flexi- The production environment contains a schema-flexible No-
bility, a new release of the application does not require a full SQL data store, possibly offered as database-as-a-service
migration of data already persisted in production. Instead, (DaaS). A platform-as-a-service infrastructure (PaaS) takes
using object-NoSQL mappers, developers can specify lazy care of the load balancing at runtime. As the data store does
data migrations that are executed on-the-fly, when a legacy not enforce a schema, the entities stored by different releases
entity is loaded into the application. This paper features of the application may differ in their structure. Neverthe-
ControVol, an IDE plugin that tracks evolutionary changes less, the NoSQL data store is capable of evaluating queries
in object-NoSQL mappings, such as adding, renaming, or over the structurally heterogeneous entities.
removing an attribute, which may conflict with entities al- We now switch to a real-life example. Figure 1 visual-
ready persisted in production. If not resolved prior to launch izes schema evolution in the object-NoSQL mappings of the
of the new application, harmful evolutionary changes can open source project ExtraLeague [3]. ExtraLeague imple-
cause runtime exceptions or data loss. In this demo, we fo- ments a small website for managing company-internal soc-
cus on a novel feature of ControVol, detecting changes to at- cer championships. This project is written in Java, uses
tribute types that are not backwards-compatible with legacy Google App Engine (as PaaS) and Google Cloud Datastore
entities. When such changes occur, ControVol issues warn- (as DaaS) [10], as well as the object-NoSQL mapper Objec-
ings, and upon the request of developers, assists in safely tify. At the point of writing this paper, 9 contributors have
carrying out type changes. collectively made about 700 commits to this project hosted
on GitHub. The chart reads as follows. The x-axis shows
the number of commits to the project, which may be inter-
1. INTRODUCTION preted as the progress in time. The y-axis shows the number
Over the past years, application development for big data of object-NoSQL mappings (i.e., the Java classes that de-
management with NoSQL data stores has matured: De-
velopers no longer need to code against proprietary APIs.
Instead, object-NoSQL mappers introduce a desirable level
of abstraction [15]. Like their counterparts for relational
databases, object-NoSQL mappers such as Objectify [9],
Morphia [8], or Hibernate OGM [5] marshal and unmar-
shal between stored entities and objects in the application
space. Annotated Java classes, the object-NoSQL mappings,
declare how objects are to be persisted as entities.
Going beyond this core business, Objectify and Morphia
enhance a key feature of NoSQL data stores in agile software
development: the schema-flexibility of NoSQL backends [7].
c 2016, Copyright is with the authors. Published in the Workshop Pro-
ceedings of the EDBT/ICDT 2016 Joint Conference (March 15, 2016, Bor-
deaux, France) on CEUR-WS.org (ISSN 1613-0073). Distribution of this
paper is permitted under the terms of the Creative Commons license CC-
by-nc-nd 4.0
Figure 1: Schema evolution in ExtraLeague [3].
{ { {
"kind": "Player", "kind": "Player", "kind": "Player",
"id": 1, "id": 2, "id": 3,
"name": "Gollum", "name": "Bilbo", "name": "Frodo",
"health": "poor" "health": 5 "health": 1.2
} } }
(a) Legacy entity with (b) Legacy entity with (c) Up-to-date entity with (d) Class Player declaring
String-typed health Integer-typed health Double-typed health the health type as Double
Figure 2: (a) – (c) Legacy entities (in JSON format) and (d) the current object-NoSQL mapping.
clare the current data model). Over time, the project grows sion would also warn when attributes were retyped, but then
to 13 mappings. Some object-NoSQL mappings are removed forced the developer to restore the original type. This effec-
from the project, resulting in interim dips in the chart. At tively made it impossible to change attribute types. Contro-
any point in time, the chart states how many classes remain Vol, as presented in this paper, actually enables developers
unchanged (“Attrs not modified”). If a mapping contains to change types in a controlled manner.
at least one retyped attribute, it is flagged as such (“Attrs Videos and further information on ControVol are available
retyped”). Otherwise, if it contains at least one removed at https://sites.google.com/site/controvolplugin/.
attribute, it is also flagged accordingly (“Attrs removed”).
Finally, when attributes have been added, but none of the
other cases apply, the mapping is flagged as such (“Attrs 2. TYPE MISMATCHES BY EXAMPLE
added”). Thus, we classify the object-NoSQL mappings by In discussing mismatched types, we focus on Java prim-
their most disruptive schema change. itive types1 : Boolean, Byte, Short, Integer, Long, Float,
In total, the object-NoSQL mappings have changed in 47 Double, and String. ControVol may be extended to handle
commits. When the object-NoSQL mappings change with structured types as well.
a new release, this effectively amounts to a schema change. We next consider the scenarios featured in this demo. Fig-
Yet rather than migrating all legacy entities prior to a re- ure 2 shows player entities stored by different releases of an
lease, legacy entities may be migrated lazily, when they are online role playing game. It also shows the object-NoSQL
loaded into the application, one at a time. This is supported mapping of the current release. The object mapper anno-
by several object-NoSQL mappers, such as Objectify. tation @Entity declares that player objects may be stored.
Lazily evolving an entity by adding an attribute is gener- Annotation @Id marks the unique ID among all players.
ally a safe operation: When a legacy entity is loaded into the
application, the attribute is added as the entity is mapped 2.1 Runtime Errors due to Type Changes
to a Java object. When the object is saved again, the entity Our upcoming example illustrates how mismatched types
is persisted with the new attribute. can cause runtime errors. As runtime errors are particu-
However, when attributes are removed, renamed, or re- larly undesirable in web applications, the underlying prob-
typed, the object-NoSQL mappings may no longer be back- lem should be addressed prior to launch. We then describe
wards-compatible with legacy entities. Accidental removal how ControVol finds and fixes these issues.
or faulty renamings of attributes lead to runtime problems, a
topic we have addressed in an earlier paper on ControVol [2]. Example 1. The legacy entity for player Gollum in Fig-
In this paper, we focus on attribute retypings, which also ure 2(a) records Gollum’s health as a String. Let us as-
occur in Figure 1. Retypings can lead to sume Gollum’s player has been stored several releases back,
1. data loss (due to implicit type conversions), when health was classified as “poor”, “fair”, or “excellent”.
2. runtime exceptions (due to type incompatibilities), and Much has changed since then: The object-NoSQL mapping
3. confusing query results (due to the lack of standard- in Figure 2(d) expects to load a Double value. Thus, load-
ization of NoSQL query languages). ing Gollum’s entity will cause a runtime exception due to an
These runtime issues may only be sporadic: The produc- unsuccessful type cast. 2
tion data store may contain only a small number of struc-
tural outliers that nobody in the development team is aware ControVol monitors code changes from within Eclipse to
of. Yet this makes trouble shooting even more difficult. detect this issue at development time. To do so, ControVol
Therefore, systematic tool support is of the essence. accesses the code repository and compares different versions
Contributions: In this demonstration, we present new of object-NoSQL mappings. For instance, if ControVol de-
ControVol features for (1) finding type mismatches in the tects the earlier declaration shown in Figure 3, then it warns
evolution of object-NoSQL mappings, (2) addressing the about the type mismatch with the declaration in Figure 2(d).
subtleties that retyping can have on query evaluation in Note the warning symbol in line 10 of the screenshot, in-
some NoSQL data stores, and (3) suggesting quick fixes so jected by ControVol.
that developers may pro-actively address these problems. 1
We use the term primitive type casually, to refer to classes
Our earlier demos of ControVol [1,11] showed how ControVol of the java.lang package that wrap Java primitive types
finds and fixes issues caused by the removal and renaming (int, long, float, etc.). Void, Character, and Object are cur-
of attributes in object-NoSQL mappings. This earlier ver- rently not considered by ControVol, as Objectify does not
support storing values of these types.
Figure 3: Legacy object-NoSQL mapping by which
Gollum’s entity from Figure 2(a) was persisted.
Figure 5: ControVol-generated code stub.
out a runtime exception, since the Integer value is implicitly
cast to Double (yet loss of precision is possible). However, as
long as Bilbo’s entity is not migrated, this type mix in the
data store can produce seemingly confusing query results.
For instance, evaluating the query
SELECT id FROM Player ORDER BY health ASC LIMIT 10
on our entities from Figure 2 in Google Cloud Datastore
returns Bilbo’s ID before Frodo’s. Yet Bilbo’s health is 5,
while Frodo’s health is 1.2 This seems counter-intuitive given
Figure 4: Quick fixes proposed by ControVol. ascending sort. Worse, MongoDB returns Frodo’s ID before
Bilbo’s for the same query. 2
ControVol also proposes quick fixes to address this issue, This puzzling behavior is due to the lack of standardiza-
as shown in Figure 4. Developers can choose to (1) suppress tion in NoSQL query languages. In Google Cloud Datastore,
this warning, (2) generate a code stub for translating the all queries are evaluated over indexes. The indexes contain
String to a Double, or (3) to change the type back to String. hierarchically sorted entries, with primary order on the value
We discuss the second quick fix in greater detail. If se- type and secondary on a type-specific ordering [4]. Let us
lected, ControVol rewrites the class as shown in Figure 5:2 consider the index capturing the entities from Figure 2, as
• The original health attribute of type String has been well as a fourth player named Peregrin with id 4 and a Dou-
restored. Annotation @IgnoreSave ensures that the value ble health value of 9.9.
is loaded from legacy entities, but not saved anymore. In displaying the index as shown in Table 1, we employ
• The new health attribute of type Double has been re- the visual notation from [10]: Google Cloud Datastore eval-
named, so as not to conflict with the legacy attribute. uates the query from Example 2 using an index containing
• A method stub for migrateHealth has been generated. the keys of player entities (consisting of the kind “Player”
Due to the Objectify annotation @OnLoad, this method and the player id), and the values of their health properties.
is invoked whenever a player entity is loaded. These are sorted in ascending order. The index entries are
The developers may now translate the value of legacy at- sorted, with Integer values before Strings, and further before
tribute health to a Double value within migrateHealth.3 floating point values.
2.2 Mixed Value Types in Query Evaluation Key health ↑ Key health ↑
We next give an example of the effect that attributes with Player/2 5 Player/3 1.2
mixed value types can cause during query evaluation. Player/1 "poor" Player/2 5
Player/3 1.2 Player/4 9.9
Example 2. The entity for legacy player Bilbo in Fig- Player/4 9.9 Player/1 "poor"
ure 2(b) can be loaded by the mapping in Figure 2(d) with-
2 Table 1: Datastore. Table 2: MongoDB.
In Figure 5, we use the Objectify syntax @OnLoad. As the
annotations are not standardized for object-NoSQL map-
pers, we require different annotations when using a different Now, retrieving all player IDs in ascending order of their
object-NoSQL mapper. ControVol is currently being ex- health is conducted by a single scan over this index. Scan-
tended to support Morphia as well. ning the index from top to bottom retrieves Bilbo’s ID,
3
Lazy migration using object-NoSQL mappers can be very then Gollum’s, Frodo’s, and finally Peregrin’s. To NoSQL
convenient for small, incremental schema changes. Yet when novices, this may seem surprising, and understandably so:
object-NoSQL mappings need to be compatible with entities When the entities are loaded as Java objects into the ap-
from several releases back, their declarations become clut-
tered with migration code. We refer to [13] for a proposal plication, Frodo’s health value has type Double (due to the
on how multi-step lazy migration may be realized outside of implicit type conversion on loading). This makes it particu-
the application layer. larly confusing why Bilbo with a health value of 5.0 should
be sorted before Frodo with a health of 1.2. From the view- Acknowledgments
point of the developer, this is a puzzle that can only be ControVol was partly funded by CNPq grant 441944/2014-0.
solved by inspecting the raw contents of the data store. We thank Tegawendé F. Bissyandé from University of Lux-
Worse yet, this effect is product-specific: Table 2 cap- embourg for suggesting to us to crawl GitHub projects. We
tures the order in which MongoDB returns the query re- further thank Uta Störl from Darmstadt University of Ap-
sults. Here, the sort operator returns all numeric values plied Sciences for sharing her insights on queries over mixed
before strings, and hence, a different result. value types in MongoDB. Last but not least, we thank the
Thus, even when developers aim at platform independence anonymous reviewers for the careful reading of our paper
by using object-NoSQL mappers, the implementation de- and their helpful suggestions.
tails of NoSQL data stores shine through. Since such effects
can be easy to miss, we have set up ControVol to warn if a
change to an object-NoSQL mapping might introduce mixed 4. REFERENCES
value types. This gives developers a chance to react, e.g., [1] T. Cerqueus, E. Cunha de Almeida, and
to identify these legacy entities and to eagerly migrate them S. Scherzinger. ControVol: Let yesterday’s data catch
to Double values. This can be done by writing custom code up with today’s application code. In Proc. WWW’15,
or using declarative, special-purpose schema evolution lan- poster, 2015.
guages, c.f. [12, 14]. Having made sure that incompatible [2] T. Cerqueus, E. Cunha de Almeida, and
legacy entities no longer exist, and having allowed the data S. Scherzinger. Safely Managing Data Variety in Big
store indexes sufficient time to be rebuilt (which happens Data Software Development. In Proc. BIGDSE’15,
asynchronously in Google Cloud Datastore), the warning is- 2015.
sued by ControVol may be suppressed. [3] ExtraLeague, Sept. 2015. Source code available at
https://github.com/squix78/extraleague, latest
2.3 Demo Outline release at http://ncaleague-test.appspot.com.
The general outline for our interactive demo is this: [4] Google Cloud Platform. Datastore Indexes, Jan. 2016.
1. We introduce the typical setup for NoSQL web devel- https://cloud.google.com/appengine/docs/java/
opment: Developers write code in the Eclipse IDE, datastore/indexes/#Java_Properties_with_mixed_
manage its versions in Git, and regularly deploy the value_types.
application to a PaaS framework (Google App En- [5] Hibernate OGM, Jan. 2016.
gine). The application is backed by a NoSQL data http://hibernate.org/ogm/.
store (Google Cloud Datastore). [6] Hibernate OGM Roadmap for Hibernate OGM 5.0,
2. We show common pitfalls in evolving the application Jan. 2016. http://hibernate.org/ogm/roadmap/.
code: Foremost, we focus on issues introduced by type
[7] Z. H. Liu and D. Gawlick. Management of Flexible
changes in object-NoSQL mappings. We also demon-
Schema Data in RDBMSs - Opportunities and
strate the earlier features of ControVol, such as finding
Limitations for NoSQL. In CIDR’15, 2015.
problems caused by renaming or removing attributes.
3. We provoke various consequences of mismatched types: [8] Morphia, Jan. 2016.
data loss, data corruption, runtime errors, and counter- https://github.com/mongodb/morphia/.
intuitive query results, as discussed in Section 2. Con- [9] Objectify, Jan. 2016.
troVol then finds these issues and proposes quick fixes, https://github.com/objectify/objectify.
which it carries out semi-automatically. [10] D. Sanderson. Programming Google App Engine with
4. We further run ControVol on open source projects pub- Java. O’Reilly Media, Inc., 2015.
licly hosted on GitHub, and thus, on real-life code. [11] S. Scherzinger, E. Cunha de Almeida, and
We discuss the impact of conflicting type declarations with T. Cerqueus. ControVol: A Framework for Controlled
our audience, as well as the tradeoffs between quick fixes. Schema Evolution in NoSQL Application
Development. In Proc. ICDE’15, demo paper, 2015.
3. SUMMARY [12] S. Scherzinger, M. Klettke, and U. Störl. Managing
Schema Evolution in NoSQL Data Stores. In Proc.
When it comes to assessing the potential impact of Con-
DBPL’13, 2013.
troVol, we point out two current trends: First, schema-
flexible NoSQL data stores are gaining popularity [7], es- [13] S. Scherzinger, U. Störl, and M. Klettke. A
pecially in agile web development: When projects undergo Datalog-based Protocol for Lazy Data Migration in
frequent releases and cannot afford downtime due to a re- Agile NoSQL Application Development. In Proc.
lease, NoSQL data stores can be suitable backends. Second, DBPL’15, 2015.
object-NoSQL mappers with support for lazy migration are [14] J. Schildgen and S. Deßloch. NotaQL is not a Query
gaining popularity. Among the currently popular libraries, Language! It’s for Data Transformation on
there are Objectify for Google Cloud Datastore, and Mor- Wide-Column Stores. In Proc. BICOD’15, 2015.
phia for MongoDB. The future roadmap for Hibernate OGM [15] U. Störl, T. Hauf, M. Klettke, and S. Scherzinger.
explicitly mentions similar plans for lazy data migration [6]. Schemaless NoSQL Data Stores – Object-NoSQL
Thus, there seems to be a trend for vendors of object-NoSQL Mappers to the Rescue? In Proc. BTW’15, 2015.
mappers to provide annotation-based support for data mi-
gration. Looking at both trends, we see a growing market for
tools like ControVol, especially considering the gaping void
when it comes to a tooling eco-system for NoSQL-backed
application development.