=Paper= {{Paper |id=Vol-1558/paper10 |storemode=property |title=Finding and Fixing Type Mismatches in the Evolution of Object-NoSQL Mappings |pdfUrl=https://ceur-ws.org/Vol-1558/paper10.pdf |volume=Vol-1558 |authors=Stefanie Scherzinger,Eduardo Cunha de Almeida,Thomas Cerqueus,Leandro Batista de Almeida,Pedro Holanda |dblpUrl=https://dblp.org/rec/conf/edbt/ScherzingerACAH16 }} ==Finding and Fixing Type Mismatches in the Evolution of Object-NoSQL Mappings== https://ceur-ws.org/Vol-1558/paper10.pdf
                       Finding and Fixing Type Mismatches
                   in the Evolution of Object-NoSQL Mappings

          Stefanie Scherzinger                              Eduardo Cunha                          Thomas Cerqueus
        OTH Regensburg, Germany                               de Almeida                         University of Lyon, France
          stefanie.scherzinger-                UFPR, Brazil                  thomas.cerqueus
          @oth-regensburg.de               eduardo@inf.ufpr.br                 @insa-lyon.fr
                           Leandro Batista                     Pedro Holanda
                             de Almeida                          UFPR, Brazil
                             UTFPR, Brazil                  ptholanda@inf.ufpr.br
                      leandro@dainf.ct.utfpr.edu.br

ABSTRACT                                                                    Let us consider a standard development environment with
NoSQL data stores are popular backends for managing big                  an editor (e.g., an IDE like Eclipse) and a code repository.
data that is evolving over time: Due to their schema-flexi-              The production environment contains a schema-flexible No-
bility, a new release of the application does not require a full         SQL data store, possibly offered as database-as-a-service
migration of data already persisted in production. Instead,              (DaaS). A platform-as-a-service infrastructure (PaaS) takes
using object-NoSQL mappers, developers can specify lazy                  care of the load balancing at runtime. As the data store does
data migrations that are executed on-the-fly, when a legacy              not enforce a schema, the entities stored by different releases
entity is loaded into the application. This paper features               of the application may differ in their structure. Neverthe-
ControVol, an IDE plugin that tracks evolutionary changes                less, the NoSQL data store is capable of evaluating queries
in object-NoSQL mappings, such as adding, renaming, or                   over the structurally heterogeneous entities.
removing an attribute, which may conflict with entities al-                 We now switch to a real-life example. Figure 1 visual-
ready persisted in production. If not resolved prior to launch           izes schema evolution in the object-NoSQL mappings of the
of the new application, harmful evolutionary changes can                 open source project ExtraLeague [3]. ExtraLeague imple-
cause runtime exceptions or data loss. In this demo, we fo-              ments a small website for managing company-internal soc-
cus on a novel feature of ControVol, detecting changes to at-            cer championships. This project is written in Java, uses
tribute types that are not backwards-compatible with legacy              Google App Engine (as PaaS) and Google Cloud Datastore
entities. When such changes occur, ControVol issues warn-                (as DaaS) [10], as well as the object-NoSQL mapper Objec-
ings, and upon the request of developers, assists in safely              tify. At the point of writing this paper, 9 contributors have
carrying out type changes.                                               collectively made about 700 commits to this project hosted
                                                                         on GitHub. The chart reads as follows. The x-axis shows
                                                                         the number of commits to the project, which may be inter-
1.    INTRODUCTION                                                       preted as the progress in time. The y-axis shows the number
  Over the past years, application development for big data              of object-NoSQL mappings (i.e., the Java classes that de-
management with NoSQL data stores has matured: De-
velopers no longer need to code against proprietary APIs.
Instead, object-NoSQL mappers introduce a desirable level
of abstraction [15]. Like their counterparts for relational
databases, object-NoSQL mappers such as Objectify [9],
Morphia [8], or Hibernate OGM [5] marshal and unmar-
shal between stored entities and objects in the application
space. Annotated Java classes, the object-NoSQL mappings,
declare how objects are to be persisted as entities.
  Going beyond this core business, Objectify and Morphia
enhance a key feature of NoSQL data stores in agile software
development: the schema-flexibility of NoSQL backends [7].




 c 2016, Copyright is with the authors. Published in the Workshop Pro-
ceedings of the EDBT/ICDT 2016 Joint Conference (March 15, 2016, Bor-
deaux, France) on CEUR-WS.org (ISSN 1613-0073). Distribution of this
paper is permitted under the terms of the Creative Commons license CC-
by-nc-nd 4.0
                                                                           Figure 1: Schema evolution in ExtraLeague [3].
  {                                {                             {
      "kind": "Player",                "kind": "Player",             "kind": "Player",
      "id":   1,                       "id":   2,                    "id":   3,
      "name": "Gollum",                "name": "Bilbo",              "name": "Frodo",
      "health": "poor"                 "health": 5                   "health": 1.2
  }                                }                             }


      (a) Legacy entity with           (b) Legacy entity with    (c) Up-to-date entity with     (d) Class Player declaring
       String-typed health              Integer-typed health        Double-typed health         the health type as Double

       Figure 2: (a) – (c) Legacy entities (in JSON format) and (d) the current object-NoSQL mapping.


clare the current data model). Over time, the project grows      sion would also warn when attributes were retyped, but then
to 13 mappings. Some object-NoSQL mappings are removed           forced the developer to restore the original type. This effec-
from the project, resulting in interim dips in the chart. At     tively made it impossible to change attribute types. Contro-
any point in time, the chart states how many classes remain      Vol, as presented in this paper, actually enables developers
unchanged (“Attrs not modified”). If a mapping contains          to change types in a controlled manner.
at least one retyped attribute, it is flagged as such (“Attrs      Videos and further information on ControVol are available
retyped”). Otherwise, if it contains at least one removed        at https://sites.google.com/site/controvolplugin/.
attribute, it is also flagged accordingly (“Attrs removed”).
Finally, when attributes have been added, but none of the
other cases apply, the mapping is flagged as such (“Attrs        2.     TYPE MISMATCHES BY EXAMPLE
added”). Thus, we classify the object-NoSQL mappings by             In discussing mismatched types, we focus on Java prim-
their most disruptive schema change.                             itive types1 : Boolean, Byte, Short, Integer, Long, Float,
   In total, the object-NoSQL mappings have changed in 47        Double, and String. ControVol may be extended to handle
commits. When the object-NoSQL mappings change with              structured types as well.
a new release, this effectively amounts to a schema change.         We next consider the scenarios featured in this demo. Fig-
Yet rather than migrating all legacy entities prior to a re-     ure 2 shows player entities stored by different releases of an
lease, legacy entities may be migrated lazily, when they are     online role playing game. It also shows the object-NoSQL
loaded into the application, one at a time. This is supported    mapping of the current release. The object mapper anno-
by several object-NoSQL mappers, such as Objectify.              tation @Entity declares that player objects may be stored.
   Lazily evolving an entity by adding an attribute is gener-    Annotation @Id marks the unique ID among all players.
ally a safe operation: When a legacy entity is loaded into the
application, the attribute is added as the entity is mapped      2.1      Runtime Errors due to Type Changes
to a Java object. When the object is saved again, the entity        Our upcoming example illustrates how mismatched types
is persisted with the new attribute.                             can cause runtime errors. As runtime errors are particu-
   However, when attributes are removed, renamed, or re-         larly undesirable in web applications, the underlying prob-
typed, the object-NoSQL mappings may no longer be back-          lem should be addressed prior to launch. We then describe
wards-compatible with legacy entities. Accidental removal        how ControVol finds and fixes these issues.
or faulty renamings of attributes lead to runtime problems, a
topic we have addressed in an earlier paper on ControVol [2].       Example 1. The legacy entity for player Gollum in Fig-
   In this paper, we focus on attribute retypings, which also    ure 2(a) records Gollum’s health as a String. Let us as-
occur in Figure 1. Retypings can lead to                         sume Gollum’s player has been stored several releases back,
    1. data loss (due to implicit type conversions),             when health was classified as “poor”, “fair”, or “excellent”.
    2. runtime exceptions (due to type incompatibilities), and   Much has changed since then: The object-NoSQL mapping
    3. confusing query results (due to the lack of standard-     in Figure 2(d) expects to load a Double value. Thus, load-
       ization of NoSQL query languages).                        ing Gollum’s entity will cause a runtime exception due to an
   These runtime issues may only be sporadic: The produc-        unsuccessful type cast.                                    2
tion data store may contain only a small number of struc-
tural outliers that nobody in the development team is aware         ControVol monitors code changes from within Eclipse to
of. Yet this makes trouble shooting even more difficult.         detect this issue at development time. To do so, ControVol
Therefore, systematic tool support is of the essence.            accesses the code repository and compares different versions
   Contributions: In this demonstration, we present new          of object-NoSQL mappings. For instance, if ControVol de-
ControVol features for (1) finding type mismatches in the        tects the earlier declaration shown in Figure 3, then it warns
evolution of object-NoSQL mappings, (2) addressing the           about the type mismatch with the declaration in Figure 2(d).
subtleties that retyping can have on query evaluation in         Note the warning symbol in line 10 of the screenshot, in-
some NoSQL data stores, and (3) suggesting quick fixes so        jected by ControVol.
that developers may pro-actively address these problems.         1
                                                                   We use the term primitive type casually, to refer to classes
Our earlier demos of ControVol [1,11] showed how ControVol       of the java.lang package that wrap Java primitive types
finds and fixes issues caused by the removal and renaming        (int, long, float, etc.). Void, Character, and Object are cur-
of attributes in object-NoSQL mappings. This earlier ver-        rently not considered by ControVol, as Objectify does not
                                                                 support storing values of these types.
Figure 3: Legacy object-NoSQL mapping by which
Gollum’s entity from Figure 2(a) was persisted.




                                                                     Figure 5: ControVol-generated code stub.


                                                                out a runtime exception, since the Integer value is implicitly
                                                                cast to Double (yet loss of precision is possible). However, as
                                                                long as Bilbo’s entity is not migrated, this type mix in the
                                                                data store can produce seemingly confusing query results.
                                                                For instance, evaluating the query
                                                                  SELECT id FROM Player ORDER BY health ASC LIMIT 10

                                                                on our entities from Figure 2 in Google Cloud Datastore
                                                                returns Bilbo’s ID before Frodo’s. Yet Bilbo’s health is 5,
                                                                while Frodo’s health is 1.2 This seems counter-intuitive given
   Figure 4: Quick fixes proposed by ControVol.                 ascending sort. Worse, MongoDB returns Frodo’s ID before
                                                                Bilbo’s for the same query.                                  2

   ControVol also proposes quick fixes to address this issue,      This puzzling behavior is due to the lack of standardiza-
as shown in Figure 4. Developers can choose to (1) suppress     tion in NoSQL query languages. In Google Cloud Datastore,
this warning, (2) generate a code stub for translating the      all queries are evaluated over indexes. The indexes contain
String to a Double, or (3) to change the type back to String.   hierarchically sorted entries, with primary order on the value
   We discuss the second quick fix in greater detail. If se-    type and secondary on a type-specific ordering [4]. Let us
lected, ControVol rewrites the class as shown in Figure 5:2     consider the index capturing the entities from Figure 2, as
    • The original health attribute of type String has been     well as a fourth player named Peregrin with id 4 and a Dou-
      restored. Annotation @IgnoreSave ensures that the value   ble health value of 9.9.
      is loaded from legacy entities, but not saved anymore.       In displaying the index as shown in Table 1, we employ
    • The new health attribute of type Double has been re-      the visual notation from [10]: Google Cloud Datastore eval-
      named, so as not to conflict with the legacy attribute.   uates the query from Example 2 using an index containing
    • A method stub for migrateHealth has been generated.       the keys of player entities (consisting of the kind “Player”
      Due to the Objectify annotation @OnLoad, this method      and the player id), and the values of their health properties.
      is invoked whenever a player entity is loaded.            These are sorted in ascending order. The index entries are
   The developers may now translate the value of legacy at-     sorted, with Integer values before Strings, and further before
tribute health to a Double value within migrateHealth.3         floating point values.

2.2   Mixed Value Types in Query Evaluation                        Key         health ↑               Key         health ↑
  We next give an example of the effect that attributes with       Player/2    5                      Player/3    1.2
mixed value types can cause during query evaluation.               Player/1    "poor"                 Player/2    5
                                                                   Player/3    1.2                    Player/4    9.9
  Example 2. The entity for legacy player Bilbo in Fig-            Player/4    9.9                    Player/1    "poor"
ure 2(b) can be loaded by the mapping in Figure 2(d) with-
2                                                                 Table 1: Datastore.                Table 2: MongoDB.
  In Figure 5, we use the Objectify syntax @OnLoad. As the
annotations are not standardized for object-NoSQL map-
pers, we require different annotations when using a different      Now, retrieving all player IDs in ascending order of their
object-NoSQL mapper. ControVol is currently being ex-           health is conducted by a single scan over this index. Scan-
tended to support Morphia as well.                              ning the index from top to bottom retrieves Bilbo’s ID,
3
  Lazy migration using object-NoSQL mappers can be very         then Gollum’s, Frodo’s, and finally Peregrin’s. To NoSQL
convenient for small, incremental schema changes. Yet when      novices, this may seem surprising, and understandably so:
object-NoSQL mappings need to be compatible with entities       When the entities are loaded as Java objects into the ap-
from several releases back, their declarations become clut-
tered with migration code. We refer to [13] for a proposal      plication, Frodo’s health value has type Double (due to the
on how multi-step lazy migration may be realized outside of     implicit type conversion on loading). This makes it particu-
the application layer.                                          larly confusing why Bilbo with a health value of 5.0 should
be sorted before Frodo with a health of 1.2. From the view-      Acknowledgments
point of the developer, this is a puzzle that can only be        ControVol was partly funded by CNPq grant 441944/2014-0.
solved by inspecting the raw contents of the data store.         We thank Tegawendé F. Bissyandé from University of Lux-
   Worse yet, this effect is product-specific: Table 2 cap-      embourg for suggesting to us to crawl GitHub projects. We
tures the order in which MongoDB returns the query re-           further thank Uta Störl from Darmstadt University of Ap-
sults. Here, the sort operator returns all numeric values        plied Sciences for sharing her insights on queries over mixed
before strings, and hence, a different result.                   value types in MongoDB. Last but not least, we thank the
   Thus, even when developers aim at platform independence       anonymous reviewers for the careful reading of our paper
by using object-NoSQL mappers, the implementation de-            and their helpful suggestions.
tails of NoSQL data stores shine through. Since such effects
can be easy to miss, we have set up ControVol to warn if a
change to an object-NoSQL mapping might introduce mixed          4.   REFERENCES
value types. This gives developers a chance to react, e.g.,       [1] T. Cerqueus, E. Cunha de Almeida, and
to identify these legacy entities and to eagerly migrate them         S. Scherzinger. ControVol: Let yesterday’s data catch
to Double values. This can be done by writing custom code             up with today’s application code. In Proc. WWW’15,
or using declarative, special-purpose schema evolution lan-           poster, 2015.
guages, c.f. [12, 14]. Having made sure that incompatible         [2] T. Cerqueus, E. Cunha de Almeida, and
legacy entities no longer exist, and having allowed the data          S. Scherzinger. Safely Managing Data Variety in Big
store indexes sufficient time to be rebuilt (which happens            Data Software Development. In Proc. BIGDSE’15,
asynchronously in Google Cloud Datastore), the warning is-            2015.
sued by ControVol may be suppressed.                              [3] ExtraLeague, Sept. 2015. Source code available at
                                                                      https://github.com/squix78/extraleague, latest
2.3   Demo Outline                                                    release at http://ncaleague-test.appspot.com.
The general outline for our interactive demo is this:             [4] Google Cloud Platform. Datastore Indexes, Jan. 2016.
  1. We introduce the typical setup for NoSQL web devel-              https://cloud.google.com/appengine/docs/java/
     opment: Developers write code in the Eclipse IDE,                datastore/indexes/#Java_Properties_with_mixed_
     manage its versions in Git, and regularly deploy the             value_types.
     application to a PaaS framework (Google App En-              [5] Hibernate OGM, Jan. 2016.
     gine). The application is backed by a NoSQL data                 http://hibernate.org/ogm/.
     store (Google Cloud Datastore).                              [6] Hibernate OGM Roadmap for Hibernate OGM 5.0,
  2. We show common pitfalls in evolving the application              Jan. 2016. http://hibernate.org/ogm/roadmap/.
     code: Foremost, we focus on issues introduced by type
                                                                  [7] Z. H. Liu and D. Gawlick. Management of Flexible
     changes in object-NoSQL mappings. We also demon-
                                                                      Schema Data in RDBMSs - Opportunities and
     strate the earlier features of ControVol, such as finding
                                                                      Limitations for NoSQL. In CIDR’15, 2015.
     problems caused by renaming or removing attributes.
  3. We provoke various consequences of mismatched types:         [8] Morphia, Jan. 2016.
     data loss, data corruption, runtime errors, and counter-         https://github.com/mongodb/morphia/.
     intuitive query results, as discussed in Section 2. Con-     [9] Objectify, Jan. 2016.
     troVol then finds these issues and proposes quick fixes,         https://github.com/objectify/objectify.
     which it carries out semi-automatically.                    [10] D. Sanderson. Programming Google App Engine with
  4. We further run ControVol on open source projects pub-            Java. O’Reilly Media, Inc., 2015.
     licly hosted on GitHub, and thus, on real-life code.        [11] S. Scherzinger, E. Cunha de Almeida, and
  We discuss the impact of conflicting type declarations with         T. Cerqueus. ControVol: A Framework for Controlled
our audience, as well as the tradeoffs between quick fixes.           Schema Evolution in NoSQL Application
                                                                      Development. In Proc. ICDE’15, demo paper, 2015.
3.    SUMMARY                                                    [12] S. Scherzinger, M. Klettke, and U. Störl. Managing
                                                                      Schema Evolution in NoSQL Data Stores. In Proc.
   When it comes to assessing the potential impact of Con-
                                                                      DBPL’13, 2013.
troVol, we point out two current trends: First, schema-
flexible NoSQL data stores are gaining popularity [7], es-       [13] S. Scherzinger, U. Störl, and M. Klettke. A
pecially in agile web development: When projects undergo              Datalog-based Protocol for Lazy Data Migration in
frequent releases and cannot afford downtime due to a re-             Agile NoSQL Application Development. In Proc.
lease, NoSQL data stores can be suitable backends. Second,            DBPL’15, 2015.
object-NoSQL mappers with support for lazy migration are         [14] J. Schildgen and S. Deßloch. NotaQL is not a Query
gaining popularity. Among the currently popular libraries,            Language! It’s for Data Transformation on
there are Objectify for Google Cloud Datastore, and Mor-              Wide-Column Stores. In Proc. BICOD’15, 2015.
phia for MongoDB. The future roadmap for Hibernate OGM           [15] U. Störl, T. Hauf, M. Klettke, and S. Scherzinger.
explicitly mentions similar plans for lazy data migration [6].        Schemaless NoSQL Data Stores – Object-NoSQL
Thus, there seems to be a trend for vendors of object-NoSQL           Mappers to the Rescue? In Proc. BTW’15, 2015.
mappers to provide annotation-based support for data mi-
gration. Looking at both trends, we see a growing market for
tools like ControVol, especially considering the gaping void
when it comes to a tooling eco-system for NoSQL-backed
application development.