=Paper=
{{Paper
|id=Vol-1775/MODELS2016-SRC_paper_4
|storemode=property
|title=Efficient Persistence and Query Techniques for Very Large Models
|pdfUrl=https://ceur-ws.org/Vol-1775/MODELS2016-SRC_paper_4.pdf
|volume=Vol-1775
|authors=Gwendal Daniel
|dblpUrl=https://dblp.org/rec/conf/models/Daniel16
}}
==Efficient Persistence and Query Techniques for Very Large Models==
<pdf width="1500px">https://ceur-ws.org/Vol-1775/MODELS2016-SRC_paper_4.pdf</pdf>
<pre>
                                   Efficient Persistence and Query
                                  Techniques for Very Large Models
                                                               Gwendal Daniel
                                                    AtlanMod Team / SOM Research Group
                                                       Inria, Mines Nantes, Lina & UOC
                                                             gwendal.daniel@inria.fr


Abstract                                                                   verbose, favoring human-readability at the expanse of the compact-
While Model Driven Engineering is gaining more industrial inter-           ness and (ii) XMI files have to be entirely parsed to obtain a naviga-
est, scalability issues when managing large models have become a           ble model of their contents. The first one decreases efficiency of I/O
major problem in current modeling frameworks. In particular, there         accesses, while the second greatly increases the memory needed to
is a need to store, query, and transform very large models in an ef-       load and navigate a model. Moreover, XMI serializations typically
ficient way. Several persistence solutions based on relational and         lack support for advanced features such as transactions or collab-
NoSQL databases have been proposed to tackle these issues. How-            orative edition, and large monolithic model files are challenging
ever, existing solutions often rely on a single data store, which suits    to integrate in existing versioning systems (Barmpis and Kolovos
for a specific modeling activity, but may not be optimized for other       2013).
scenarios. Furthermore, existing solutions often rely on low-level             To overcome these limitations, several research groups have
model handling API, limiting NoSQL query performance bene-                 proposed their own solutions (detailed in Section 5) based on re-
fits. In this article, we first introduce N EO EMF, a multi-database       lational/NoSQL databases (Eclipse Foundation 2016a; Pagán and
model persistence framework able to store very large models in an          Molina 2014; Scheidgen et al. 2012). They often rely on a lazy-
efficient way according to specific modeling activities. Then, we          loading mechanism that reduce memory consumption by bringing
present the M OGWA Ï query framework, able to compute complex             objects in memory from the datastore only when they are accessed.
OCL queries over very large models in an efficient way with a small            While this evolution of model persistence backends has im-
memory footprint. All the presented work is fully open source and          proved the support for managing large models, they are just a partial
available online.                                                          solution to the scalability problem in current modeling frameworks:
                                                                           they often provide a single generic way to represent models, regard-
Keywords Model Persistence, Model Query, Scalability, NoSQL,               less the way they will be used. In particular, most of them are fo-
OCL                                                                        cused on saving and loading models in an optimized time and mem-
                                                                           ory consumption, without providing adequate solutions for specific
                                                                           modeling scenarios, such as interactive editing, query computation,
1.   Introduction                                                          or model transformations.
The growing use of Model Driven Engineering (MDE) techniques                   Furthermore, all persistence frameworks are based on the use
in industry (Hutchinson et al. 2011; Mohagheghi et al. 2009) has           of low-level model handling APIs (accessing individual model el-
emphasized scalability of existing technical solutions to store,           ement, attribute, or reference) which are then used by most other
query, and transform large models as a major issue (Kolovos et al.         MDE tools in the framework ecosystem. This approach is clearly
2013; Warmer and Kleppe 2006). Large models containing up to               inefficient when used on top of lazy-loading persistence frame-
several millions of elements typically appear in various engineering       works because (i) the API granularity is too fine-grained to ben-
fields, such as civil engineering (Azhar 2011), automotive indus-          efit from the advanced query capabilities of the backend and (ii)
try (Bergmann et al. 2010), product lines (Pohjonen and Tolvanen           an important time and memory overhead is necessary to construct
2002), and can be generated in model-driven reverse engineering            navigable intermediate objects that can be used to interact with the
processes (Bruneliere et al. 2014), such as software modernization.        API.
    Since the publication of the XMI standard (OMG 2016), XML-                 To overcomes these limitations we introduce N EO EMF, a
based serialization has been the preferred format for storing and          multi-database persistence framework able to store models in sev-
sharing models and metamodels. The Eclipse Modeling Frame-                 eral NoSQL databases, depending on the expected usage of the
work (EMF), the de-facto standard for building MDE tools, has              model. N EO EMF is strictly compatible with the EMF API, and
even adopted it as their standard serialization mechanism. However,        relies on a modular architecture which allows to change underlying
XMI-based serialization has two major drawbacks: (i) XMI files are         backend transparently. We also present M OGWA Ï, a query frame-
                                                                           work that bypasses modeling API to compute OCL queries over
                                                                           large models in an efficient an scalable way.
                                                                               The rest of the paper is structured as follows: Section 2 intro-
                                                                           duces N EO EMF and gives an overview of its feature and supported
                                                                           datastores, Section 3 presents the M OGWA Ï, our solution to com-
                                                                           pute model queries efficiently. Section 4 provides some insights on
                                                                           the implementation of the presented tools, and Section 5 reviews
                                                                           existing works in the fields of model persistence and model query.
Finally, Section 6 summarizes the key points of the paper, draws
conclusions and presents our future work.                                                            Model-Based Tools
                                                                                                                             Model Access API
                                                                           “Standard”                        EMF
2.    NeoEMF: a Multi-Datastore Persistence                               Modeling User
                                                                                                                               Persistence API
      Framework for EMF
                                                                                                 NeoEMF Core                 Caching
Our previous works and experiments on model persistence (Gómez                             /Graph    /Map    /Column
et al. 2015; Benelallam et al. 2014; Gómez et al. 2015) have shown                                                              Backend API
                                                                         “Advanced” User
that providing a well-suited data store for a specific modeling sce-       & Developer
nario can dramatically improve performance of client applications.
For example, a graph database can be the optimal solution to com-
pute complex model queries, while it would be quite inefficient                            Blueprints MapDB    HBase/
for repeated atomic accesses. Based on this observation, we devel-                                            ZooKeeper
oped N EO EMF (Daniel et al. 2016a), a scalable model persistence                  Figure 1. NeoEMF Integration in EMF Ecosystem
framework based on a modular architecture enabling model stor-
age into multiple data stores. It is composed of a transparent persis-
tence layer integrated into EMF, and a set of database connectors
which are in charge of the serialization of the model into specific      amount of efforts and benefit immediately from its scalability im-
databases. Currently, NeoEMF provides three implementations-             provements.
map, graph, and column–each one optimized for a specific usage              In particular, N EO EMF supports the following EMF features:
scenario.
    In what follows we first introduce the N EO EMF framework               Code generation: N EO EMF embeds a dedicated code gener-
and its integration into the EMF ecosystem, then we present the               ator that transparently extends the EMF one, and allows client
key features of the software, and we briefly introduce the available          applications to manipulate models using generated java classes.
backends and the typical modeling scenario they address.                    Reflexive/Dynamic API: in addition to generated code, reflex-
                                                                              ive and dynamic EMF methods can be used on N EO EMF ob-
2.1   Framework Overview                                                      jects, and behave as their standard implementations.
Figure 1 presents an overview of the N EO EMF framework and its             Resource API: N EO EMF also implements the resource spe-
integration within the EMF environment. Modelers typically access             cific API, such as getContents, getAllContents, save, and
a model using Model-based Tools, which provide high-level mod-                load.
eling features such as a graphical interface, interactive console, or
query editor. Model-based Tools internally rely on EMF’s Model               As other model persistence solutions (Eclipse Foundation
Access API to navigate models, create and delete elements, verify        2016a; Pagán and Molina 2014), NeoEMF achieves scalability
constraints, etc. In its core, EMF delegates the operations to a per-    using a lazy-loading mechanism, which loads into memory ob-
sistence manager using its Persistence API, which is in charge of        jects only when they are accessed, overcoming XMI’s limitations.
the serialization/deserialization of the model. The N EO EMF core        Lazy-loading is defined at the core component: N EO EMF imple-
component is defined at this level, and can be registered as a per-      mentation of EObject consists of a simple wrapper delegating all
sistence manager for EMF, same as, for example, the default XMI          its method calls to the corresponding database driver. Using this
persistence manager. This design makes N EO EMF both transpar-           technique, N EO EMF benefits from data store caches, and only
ent to the client-application and EMF itself, that simply delegates      maintains a small amount of elements in memory (the ones that
calls without taking care of the actual storage.                         have not been saved), reducing drastically the memory consump-
    Once the core component has received the modeling operation          tion of modeling applications.
to perform, it forwards it to the appropriate database connector             NeoEMF also contains a set of caching strategies that can be
(Map, Graph , or Column), which is in charge of the low-level map-       plugged atop of the data store according to specific needs. Note
ping of the model. These connectors translate modeling operations        that these caches are available for all connectors, unless otherwise
into Backend API calls, store the results, and reify database records    stated.
into EMF EObjects when needed. In addition, N EO EMF embeds
                                                                            EStructuralFeaturesCaching: a cache storing loaded objects
a set of default caching strategies that can be configured transpar-
ently at the EMF API level. These caching strategies can be used to           by their accessed feature.
improve performance of client applications, and enabled/disabled            IsSetCaching: a cache keeping the result of isSet calls to
according to specific requirements.                                           avoid multiple accesses to the database.
    In addition to this transparent integration into existing EMF ap-       SizeCaching: a cache storing the size of multi-valued features
plications, N EO EMF provides its own API, which targets advanced             to avoid multiple accesses to the database.
users / high-performance applications. This API provides utility
methods which overcome EMF limitations, allow fine-grained tun-             RecordCaches: a set of database-specific caches maintaining
ing of the databases, and access to internal caches.                          a list of records to improve execution time.
                                                                            Finally, in our last work (Daniel et al. 2016c) we have extended
2.2   Software Features                                                  the cache support in N EO EMF with an integrated prefetching/-
An important characteristic of N EO EMF is its compliance with           caching framework that allows to customize data access in order to
the EMF API. All classes/interfaces extending existing EMF               speed-up query computation. The PrefetchML framework is com-
ones strictly define all their methods, and ensure that a call to a      posed of a DSL that allows designers to specify prefetching and
N EO EMF method produces the same behavior (including possi-             caching rules with a high-level of abstraction, and an execution en-
ble side effects) as standard EMF API calls. As a result, existing       gine that is in charge of triggering the rules and fetching the ele-
applications can move from EMF to N EO EMF with a very small             ments from the database.
2.3     Data Stores                                                          In this Section we first show an overview of the Gremlin lan-
For now, N EO EMF provide three connectors that are able to repre-        guage, then we present our transformation approach, and we intro-
sent model into specific data stores. In this section we present these    duce some experimental results.
connectors and the modeling scenario they are optimized for.              3.1     The Gremlin Language
2.3.1    N EO EMF/M AP                                                    Gremlin is a Groovy based query language which is part of the Tin-
N EO EMF/M AP (Gómez et al. 2015) has been designed to provide           kerpop initiative, a set of tools that aims to uniform graph database
fast access to atomic operations, such as accessing a single elemen-      under a common API.It is built on top of Pipes, a data-flow frame-
t/attribute, and navigating a single reference. This implementation       work based on process graphs. A process graph is composed of ver-
is optimized for EMF API-based accesses, which typically generate         tices representing computational units and communication edges
atomic and fragmented calls on the model. N EO EMF/M AP embeds            which can be combined to create a complex processing. In the
a key-value store, which maintains a set of in-memory/on disk maps        Gremlin terminology, these complex processing are called traver-
to speed up model element accesses. The benchmarks performed              sals, and are composed of a chain of simple computational units
in previous work (Gómez et al. 2015) show that N EO EMF/M AP is          named steps.
the most suitable solution to improve performance and scalability             Existing work have shown that Gremlin is an interesting alterna-
of EMF API-based tools that need to access very large models on a         tive to Cypher, the pattern matching language used to query Neo4j
single machine.                                                           graph database (Holzschuher and Peinl 2013) that can even outper-
                                                                          form the native query language for specific query scenarios. Grem-
2.3.2    N EO EMF/G RAPH                                                  lin defines four types of steps:
N EO EMF/G RAPH (Benelallam et al. 2014) relies on the rich                 Transform steps: functions mapping inputs of a given type to
traversal features that graph databases usually provide to compute              outputs of another type. They constitute the core of Gremlin:
efficiently complex queries over models. This specific modeling                 they provide access to adjacent vertices, incoming and outgoing
scenario is further explained in the next Section, where we present             edges, and properties. In addition to built-in navigation steps,
a framework able to compute OCL queries efficiently by translat-                Gremlin defines a generic transformation step that applies a
ing them into graph traversals. N EO EMF/G RAPH maps models to                  function to its input and returns the computed results.
property graphs, where model elements are translated into vertices,         Filter steps: functions to select or reject input elements w.r.t.
attributes into vertex properties, and references as edges. Note that
                                                                                a given condition. They are used to check property existence,
to enable complex query computation, metamodel elements are
                                                                                compare values, remove duplicated results, or retain particular
also persisted as vertices, and are linked to their instances through
                                                                                objects in a traversal.
a dedicated INSTANCE_OF relationship.
                                                                            Branch steps: functions to split the computation into several
2.3.3    N EO EMF/C OLUMN                                                       parallelized sub-traversals and merge their results.
N EO EMF/C OLUMN (Gómez et al. 2015) relies on a distributed               Side-effect steps: functions returning their input values and ap-
column-based data store to enable the development of distributed                plying side-effect operations (edge or vertex creation, property
MDE-based applications. In contrast with Map and Graph imple-                   update, variable definition or assignation).
mentations, N EO EMF/C OLUMN offers concurrent read/write ca-
pabilities and guarantees ACID properties at model element level. It      In addition, the step interface provides a set of built-in methods
exploits the wide availability of distributed clusters in order to dis-   to access meta information: number of objects in a step, output
tribute intensive read/write workloads across datanodes. The dis-         existence, or first element in a step. These methods can be called
tributed nature of this persistence solution is used in the ATL-          inside a traversal to control its execution or check conditions on
MR (Benelallam et al. 2015) tool, a distributed engine for model          particular elements in a step.
transformations in the ATL language on top of MapReduce.                      We chose Gremlin as our target language because its expressiv-
                                                                          ity allows to map the entire OCL, and because it is to our knowl-
                                                                          edge the only one that is supported by several NoSQL databases.
3.      Mogwaı̈: a Framework to Perform OCL
        Queries on Large Models                                           3.2     Framework Overview
In the previous Section we introduced the N EO EMF framework,             The M OGWA Ï framework is composed of two components: (i) the
that provides a transparent way to store models into NoSQL                OCL2Gremlin model-to-model transformation, which maps OCL
databases. While this architecture allows to store very large models      expressions on to Gremlin traversals, and (ii) the NeoEMF/Mogwa     
in a scalable way, the presented solution is tailored to the low-level    persistence layer, an extension of N EO EMF/G RAPH that provides
modeling API, which generates fragmented queries on the data              an advanced query API for graph databases. We choose OCL as
store, reducing the benefits of advanced database query capabil-          our input language because it is a well-known OMG standard used
ities. Furthermore, the EMF API imposes to reify each traversed           to complement graphical (meta) modeling languages with textual
element into a navigable EMF object, even if it is not part of the        descriptions of invariants, operation contracts, derivation rules, and
final result of the query, increasing the memory needed to compute        query expressions. Gremlin is a NoSQL query language designed
a query.                                                                  to query databases implementing the Blueprints API, an abstraction
    To address these issues we propose the M OGWA Ï (Daniel et al.       layer on top of graph stores which has been implemented by several
2016b) query framework that is able to handle complex queries             databases. Therefore, we choose Gremlin as our target language,
on large models. The M OGWA Ï framework takes benefits of the            because it is the most mature and generic solution to query a wider
advanced query language available on N EO EMF/G RAPH’s internal           variety of NoSQL databases.
data store. The M OGWA Ï framework translates queries expressed              Figure 2 shows the overall query process of (a) the M OGWA Ï
in OCL (Object Constraint Language) into Gremlin (Tinkerpop               query framework and compares it with (b) standard EMF API
2016), a graph traversal query language. Generated queries are then       based approaches. An initial textual OCL expression is parsed and
sent to the database that is in charge of their computation, bypassing    transformed into an OCL query model. This model constitutes the
EMF API limitations.                                                      input of the OCL2Gremlin M OGWA Ï component, which consists
of a model-to-model transformation generating the corresponding
Gremlin traversal model.
    This transformation is composed of a mapping from OCL on
to Gremlin and a translation algorithm that implements this map-
ping and merge the created steps into a single query. The Grem-
lin model is then converted to a textual expression and sent to
the NeoEMF/Mogwa    component, that computes it on the database
side. Query results are then reified as standard EMF objects by
NeoEMF/Mogwa, making them usable in any EMF-based scenario.
    Compared to existing query frameworks, M OGWA Ï does not
rely on the EMF API to perform a query. In general, API based
query frameworks translate OCL queries into a sequence of low-
level API calls, which are then performed one after another on the
persistence layer (in this example NeoEMF/Graph). While this ap-
proach has the benefit to be compatible with every EMF-based ap-
plication, it does not take full advantage of the database structure                       Figure 3. Mogwaı̈ Interactive Console
and query optimizations. Furthermore, each object fetched from the
database has to be reified to be navigable, even if it is not going to
be part of the end result. Therefore, execution time of the EMF-                 embeds Blueprints 2.5.0 and provides a convenience wrapper
based solutions strongly depends on the number of intermediate                   for Neo4j 1.9.6.
objects fetched from the database while for the M OGWA Ï frame-
                                                                             N EO EMF/M AP is built on top of MapDB 1.0.9, a key-value
work, execution time does not depend on the number of intermedi-
ate objects, making it more scalable over large models.                          store providing Maps, Sets, Lists, and other collections backed
                                                                                 by off-heap or on-disk storage. MapDB provides advanced fea-
3.3     Experimental Results                                                     tures such as database snapshots, ACID transactions support,
                                                                                 and incremental backups.
Experimental results presented in (Daniel et al. 2016b) show
that using the M OGWA Ï framework to perform complex queries                N EO EMF/C OLUMN persists models in Apache HBase 0.98.12-
over large models can dramatically improve performances both in                  hadoop2, a wide column database providing distributed data
terms of memory consumption and execution time. In particular,                   storage on top of HDFS. HBase is designed to handle very large
allInstances based queries computed with the M OGWA Ï are up                    tables atop clusters of commodity hardware. The distribution of
to 20 times faster and up to 75 times better in terms of memory                  the model on the cluster is hidden from client applications,
consumption than the Eclipse OCL interpreter and the EMF-Query                   which accesses them transparently through the EMF API.
framework, two state of the art tools in EMF-based model queries.
                                                                               A prototype of the M OGWA Ï framework has been developed as
    Instead, if the query traverses a small part of the model, or if
                                                                           part of N EO EMF (https://github.com/atlanmod/Mogwai).
an important part of the intermediate results are needed anyway the
                                                                           It extends the standard EMF API provided by N EO EMF by defin-
benefits of using the M OGWA Ï framework are reduced. In partic-
                                                                           ing additional query methods at the Resource level. The query
ular, the overhead implied by the transformation engine may not
                                                                           API accepts a textual OCL expression or an URI to an OCL file
be worthwhile when dealing with relatively small models or simple
                                                                           containing the expressions to compute. In addition, it is possible to
queries.
                                                                           provide input values that represents self and parameter variables.
    The main disadvantage of the M OGWA Ï framework concerns its
                                                                           The framework also provides an OCL console (see Figure 3) inte-
integration to an EMF environment. To benefit from the M OGWA Ï,
                                                                           grated into Eclipse that allows to query N EO EMF/G RAPH models
other Eclipse plug-ins need to be explicitly instructed to use it. Inte-
                                                                           interactively.
gration with the M OGWA Ï framework is straighforward but must be
                                                                               OCL queries are parsed using Eclipse MDT OCL, and the core
explicitly done. Instead, other solutions based on the standard EMF
                                                                           transformation creating the Gremlin model from the OCL one is
API provide benefits in a transparent manner to all tools using that
                                                                           composed of a set of 70 ATL (Jouault et al. 2008) rules and helpers.
API.
                                                                           The created Gremlin model is then expressed using its textual
                                                                           syntax and sent to an embedded Gremlin engin, which executes
4.     Tool Support                                                        the query and returns the results. The reification of these results
N EO EMF is composed of a set of open source Eclipse plugins               into model elements is delegated to N EO EMF/G RAPH, that is in
distributed under the EPL license. Available components are ac-            charge of the mapping between graph and model elements.
tively developed and maintained, and the source code repository
is fully available on GitHub (https://github.com/atlanmod/                 5.     Related Work
NeoEMF). The N EO EMF website1 presents an overview of the                 In this Section we present existing solutions that aims to tackle
supported datastores, the key features, and current ongoing work.
                                                                           scalability issues to store and query large models and we compare
N EO EMF has been released as part of the MONDO platform (Kolovos
                                                                           them with N EO EMF on the persistence side, and the M OGWA Ï
et al. 2015).
                                                                           framework on the query one.
  N EO EMF/G RAPH relies on Blueprints, a high-level inter-
                                                                           5.1     Scalable Model Persistence
      face designed to unify graph databases under a common API.
      Blueprints has been implemented by several datastores such           The CDO model repository (Eclipse Foundation 2016a) is a scal-
      as Neo4j, OrientDB, and Titan. Using this abstraction layer,         able model persistence framework based on a client-server archi-
      client applications can choose the graph store of their choice to    tecture to handle large model in a collaborative environment. It
      persist models through N EO EMF/G RAPH. For now, N EO EMF            provides some advanced features such as transaction support or ba-
                                                                           sic prefetching, and provides a lazy-loading mechanism to reduce
1 www.neoemf.com                                                           memory consumption. CDO can be plugged with several database
                          OCL Query Model      OCLtoGremlin        Gremlin Traversal   Gremlin Traversal    NeoEMF/Mogwaï       Database
                                               Transformation           Model


                                                                                                                Query
                                                                                                               API Call


                                                                (a) The Mogwaı̈ Query Framework

                                        OCL Query Model          OCL Interpreter         NeoEMF/Graph                Database


                                                                                            EMF API Call1

                                                                                                 …

                                                                                            EMF API Calln


                                                                (b) EMF-based Query Frameworks

                                                          Figure 2. Comparison of OCL execution

connectors to store a model, but in practice only relational ones are                  benefit from all performance advantages of NoSQL databases due
used. In addition, different experiences have shown that CDO faces                     to this API dependency.
scalability issues when dealing with very large models (Pagán and                         EMF-IncQuery (Bergmann et al. 2009) is an incremental pattern
Molina 2014; Scheidgen et al. 2012).                                                   matcher framework to query EMF models. It bypasses API lim-
    Morsa (Pagán et al. 2011; Pagán and Molina 2014) is one of                       itations using a persistence-independent index mechanism to im-
the first approaches that use NoSQL databases to handle very large                     prove model access performance. It is based on an adaptation of a
EMF models. It relies on a client-server architecture based on Mon-                    RETE algorithm, and query results are cached and incrementally
goDB2 and aims to manage scalability issues using document-                            updated using the EMF notification mechanism to improve perfor-
oriented database facilities and a lazy-loading mechanism. Morsa                       mance. While EMF-IncQuery shows great execution time perfor-
model persistence is available through the standard EMF mecha-                         mances (Bergmann et al. 2011) when repeating a query multiple
nisms, making its integration transparent in existing EMF based                        times on a model, the results presented in this article show miti-
applications. N EO EMF is similar to Morsa in several aspects, but                     gated performances for single evaluation of queries. This is not the
aims to provide multiple backends that can be chosen according to                      case for our framework. Caches and indexes must be built for each
a specific modeling scenario.                                                          query, implying a non-negligible memory overhead compared to
    EMF fragments (Scheidgen et al. 2012) is another NoSQL-                            the Mogwaı̈ framework. In addition, the initialization of the index
based persistence layer for EMF aimed at achieving fast storage                        needs a complete resource traversal, based on EMF API, which can
of new data and fast navigation of persisted models. Supported                         be costly for lazy-loading persistence frameworks.
backends are MongoDB, Apache Hbase and regular files on the
file system. EMF fragments is based on the proxy mechanism used
by EMF for inter-document relationships: models are automatically
partitioned in several chunks (fragments) using metamodel annota-
                                                                                       6.       Conclusion
tions, and linked together using the standard EMF proxy mecha-                         In this article we have presented N EO EMF, our solution to store
nism. Unlike our approach, CDO, and Morsa, all data from a sin-                        and access very large models using a multi-datastore model per-
gle fragment is loaded at a time. Only links to another fragments                      sistence framework. N EO EMF relies on a lazy-loading capabil-
are loaded on demand. Another characteristic of this approach is                       ity allowing very large model navigation in a reduced amount of
that metamodels have to be modified to indicate where the parti-                       memory, by loading elements from the datastore only when they
tions should be made to get the partitioning capabilities, whereas                     are accessed. N EO EMF has been designed to be fully compatible
N EO EMF can be plugged directly into existing EMF-based appli-                        with existing EMF-based applications by providing a complete im-
cations.                                                                               plementation of the EMF API. Datastores’ behavior and internal
                                                                                       caches can be tuned by providing options to the standard save and
5.2   Model Query                                                                      load EMF methods. Currently, N EO EMF provides three imple-
There are several frameworks to query models, specially targeting                      mentations (graph, map, and column) that can be plugged trans-
the EMF framework (including one or more of the EMF backends                           parently to provide an optimized solution to different modeling use
mentioned before). The main ones are Eclipse MDT OCL (Eclipse                          cases: frequent and repeated atomic accesses, complex query com-
Foundation 2016b), EMF-Query (The Eclipse Foundation 2016)                             putation, and cloud-based model transformation.
and IncQuery (Bergmann et al. 2009).                                                       In addition to the persistence layer itself, we have introduce the
    Eclipse MDT OCL provides an execution environment to eval-                         M OGWA Ï framework that generates Gremlin traversals from OCL
uate OCL invariants and queries over models. It relies on the EMF                      queries in order to maximize the benefits of using a graph back-
API to navigate the model, and stores allInstances results in a                        end to store large models. M OGWA Ï is integrated in the N EO EMF
cache to speed up their computation.                                                   infrastructure, extending N EO EMF/G RAPH with custom query ca-
    EMF-Query is a framework that provides an abstraction layer on                     pabilities. OCL queries are translated using model-to-model trans-
top of the EMF API to query a model. It includes a set of tools to                     formation into Gremlin traversals that are then computed on the
ease the definition of queries and manipulate results. Compared to                     database side, reducing the overhead implied by modeling API and
the Mogwaı̈ framework, these two solutions are strongly dependent                      the reification of intermediate. Experiments detailed in previous
on the EMF API, providing on the one hand an easy integration in                       work (Daniel et al. 2016b) have shown that using this approach
existing EMF applications, but on the other hand they are unable to                    brings a significant improvement both in terms of execution time
                                                                                       and memory consumption. N EO EMF and M OGWA Ï are developed
2 http://www.mongodb.org                                                               as open-source Eclipse plugins and available online.
    Model transformations intensively use model queries to navi-                   appear]. IEEE, 2016b. Available Online at http://tinyurl.com/
gate the model to transform, match source elements, or set target                  zx6cfam.
values. Integrating the M OGWA Ï framework in model transforma-                G. Daniel, G. Sunyé, and J. Cabot. Prefetchml: a framework for prefetching
tion engines (such as ATL (Jouault et al. 2008)) to compute these                  and caching models. In Proc. of the 19th MoDELS Conference [To
queries could drastically reduce the execution time and memory                     appear]. ACM/IEEE, 2016c. Available Online at http://tinyurl.
consumption implied by the transformation of large models. An-                     com/huc55hl.
other possible approach would be to extend the M OGWA Ï to trans-              Eclipse Foundation. The CDO Model Repository (CDO), 2016a. URL
late the transformation itself into database queries and compute it                http://www.eclipse.org/cdo/. URL: http://www.eclipse.
entirely on the database side.                                                     org/cdo/.
    As future work we plan to study the interest of other datas-                Eclipse Foundation.       MDT OCL, 2016b.              URL www.eclipse.
tores that could be beneficial for specific use cases. For example,                org/modeling/mdt/?project=ocl.               URL:    www.eclipse.org/
we want to study if a document-based representation could provide                  modeling/mdt/?project=ocl.
some performance gains. We also want to study how datastores can                A. Gómez, A. Benelallam, and M. Tisi. Decentralized Model Persistence
be combined to optimize a set of modeling activities (for exam-                     for Distributed Computing. In Proc. of the 3rd BigMDE Workshop, pages
ple a map/graph backend that would speed-up both query com-                         42–51. CEUR-WS.org, 2015.
putation and atomic accesses). Finally, we plan to integrate into               A. Gómez, G. Sunyé, M. Tisi, and J. Cabot. Map-based transparent per-
N EO EMF advanced features which are typically needed by mod-                       sistence for very large models. In Proc. of the 18th FASE Conference.
eling processes, such as model versioning, or collaborative edition.                Springer, 2015.
The later could for example benefit of the distributed architecture             F. Holzschuher and R. Peinl. Performance of graph query languages:
provided by N EO EMF/C OLUMN.                                                       Comparison of cypher, gremlin and native access in neo4j. In Proc. of
    Another ongoing work is to study the integration of the M OGWA Ï               the Joint EDBT/ICDT 2013 Workshops, pages 195–204, New York, NY,
framework into model persistence solutions that do not rely on a                    USA, 2013. ISBN 978-1-4503-1599-9. doi: 10.1145/2457317.2457351.
Gremlin compatible database. For example, we plan to adapt ex-                      URL http://doi.acm.org/10.1145/2457317.2457351.
isting work on EOL to SQL translation (Carlos et al. 2014) to                   J. Hutchinson, M. Rouncefield, and J. Whittle. Model-driven engineering
test our model-to-model transformation-based approach over SQL                      practices in industry. In Software Engineering (ICSE), 2011 33rd Inter-
databases. Generating SQL queries would also enable to use the                      national Conference on, pages 633–642. IEEE, 2011.
Spark-SQL connector for HBase in order to improve query execu-                  F. Jouault, F. Allilaire, J. Bézivin, and I. Kurtev. ATL: A model transforma-
tion time and memory consumption over N EO EMF/C OLUMN.                             tion tool. SCP, pages 31 – 39, 2008.
                                                                                D. S. Kolovos, L. M. Rose, N. Matragkas, R. F. Paige, E. Guerra, J. S.
                                                                                    Cuadrado, J. De Lara, I. Ráth, D. Varró, M. Tisi, et al. A research
References                                                                          roadmap towards achieving scalability in model driven engineering. In
S. Azhar. Building information modeling (BIM): Trends, benefits, risks,             Proc. of BigMDE’13, pages 1–10. ACM, 2013.
   and challenges for the AEC industry. Leadership and Management in            D. S. Kolovos, L. M. Rose, R. F. Paige, E. Guerra, J. S. Cuadrado, J. de Lara,
   Engineering, pages 241–252, 2011.                                                I. Ráth, D. Varró, G. Sunyé, and M. Tisi. MONDO: Scalable Modelling
K. Barmpis and D. Kolovos. Hawk: Towards a scalable model indexing                  and Model Management on the Cloud. In Proc. of the Projects Show-
   architecture. In Proc. of BigMDE’13, pages 6–9. ACM, 2013.                       case, (STAF 2015), pages 44–53, 2015.
A. Benelallam, A. Gómez, G. Sunyé, M. Tisi, and D. Launay. Neo4EMF,           P. Mohagheghi, M. A. Fernandez, J. A. Martell, M. Fritzsche, and W. Gilani.
   a Scalable Persistence Layer for EMF Models. In Proc. of the 10th                MDE adoption in industry: challenges and success criteria. In Proc. of
   ECMFA, pages 230–241. Springer, 2014.                                            Workshops at MoDELS 2008, pages 54–59. Springer, 2009.
A. Benelallam, A. Gómez, M. Tisi, and J. Cabot. Distributed Model-to-          OMG. OMG MOF 2 XMI Mapping Specification version 2.5.1, 2016. URL
   Model Transformation with ATL on MapReduce. In Proc. of the 8th                  http://www.omg.org/spec/XMI/2.5.1/.
   SLE Conference, pages 37–48. ACM, 2015.                                      J. E. Pagán and J. G. Molina. Querying large models efficiently. IST, 2014.
G. Bergmann, Á. Horváth, I. Ráth, and D. Varró. Efficient model transfor-       ISSN 0950-5849. doi: http://dx.doi.org/10.1016/j.infsof.2014.01.005.
   mations by combining pattern matching strategies. In Proc. of the 2nd            URL http://dx.doi.org/10.1016/j.infsof.2014.01.005.
   ICMT, pages 20–34, Zurich, Switzerland, 2009. URL http://dx.doi.             J. E. Pagán, J. S. Cuadrado, and J. G. Molina. Morsa: A scalable approach
   org/10.1007/978-3-642-02408-5_3.                                                 for persisting and accessing large models. In Proc. of the 14th MoDELS
G. Bergmann, Á. Horváth, I. Ráth, D. Varró, A. Balogh, Z. Balogh, and           Conference, pages 77–92. Springer, 2011.
   A. Ökrös. Incremental evaluation of model queries over EMF models.         R. Pohjonen and J.-P. Tolvanen. Automated production of family members:
   In Proc. of the 13th MoDELS Conference, pages 76–90. Springer, 2010.             Lessons learned. In Proc. of PLEES’02, pages 49–57. IESE, 2002.
G. Bergmann, A. Horváth, I. Ráth, and D. Varró. Incremental evaluation of    M. Scheidgen, A. Zubow, J. Fischer, and T. Kolbe. Automated and Trans-
   model queries over EMF models: A tutorial on EMF-IncQuery. In Proc.              parent Model Fragmentation for Persisting Large Models. In Proc. of
   of the 7th ECMFA, pages 389–390, Berlin, Heidelberg, 2011. ISBN                  the 15th MoDELS Conference, pages 102–118. Springer, 2012.
   978-3-642-21469-1. URL http://dl.acm.org/citation.cfm?id=                    The Eclipse Foundation. EMF Query, 2016. URL https://projects.
   2023522.2023565.                                                                 eclipse.org/projects/modeling.emf.query.
H. Bruneliere, J. Cabot, G. Dupé, and F. Madiot. MoDisco: A model driven       Tinkerpop.        The Gremlin Language, 2016.            URL www.gremlin.
   reverse engineering framework. IST, pages 1012 – 1032, 2014.                     tinkerpop.com. URL: gremlin.tinkerpop.com.
X. D. Carlos, G. Sagardui, and S. Trujillo. Mqt, an approach for run-time       J. Warmer and A. Kleppe. Building a flexible software factory using partial
   query translation: From EOL to SQL. In Proc. of OCL 2014 co-located              domain specific models. In Proc. of the 6th DSM Workshop, pages 15–
   with MoDELS 2014, pages 13–22, Valencia, Spain, 2014.                            22. University of Jyvaskyla, 2006.
G. Daniel, G. Sunyé, A. Benelallam, M. Tisi, Y. Vernageau, A. Gómez, and
   J. Cabot. Neoemf: a multi-database model persistence framework for
   very large models. In Proc. of the MoDELS 2016 Tool Demonstration
   Session [To appear]. CEUR-WS, 2016a. Available Online at http:
   //tinyurl.com/jhkqoyx.
G. Daniel, G. Sunyé, and J. Cabot. Mogwaı̈: a framework to handle complex
   queries on large models. In Proc. of the 10th RCIS Conference [To

</pre>