=Paper=
{{Paper
|id=Vol-1206/paper4
|storemode=property
|title=Improving Memory Efficiency for Processing Large-Scale Models
|pdfUrl=https://ceur-ws.org/Vol-1206/paper_6.pdf
|volume=Vol-1206
|dblpUrl=https://dblp.org/rec/conf/staf/DanielSBT14
}}
==Improving Memory Efficiency for Processing Large-Scale Models==
Improving Memory Efficiency for Processing Large-Scale
Models
Gwendal Daniel Gerson Sunyé Amine Benelallam
AtlanMod team (Inria, Mines AtlanMod team (Inria, Mines AtlanMod team (Inria, Mines
Nantes, LINA) Nantes, LINA) Nantes, LINA)
gwendal.daniel@etu.univ- gerson.sunye@inria.fr amine.benelallam@inria.fr
nantes.fr
Massimo Tisi
AtlanMod team (Inria, Mines
Nantes, LINA)
massimo.tisi@inria.fr
ABSTRACT These tools handle complex and large-scale models when
Scalability is a main obstacle for applying Model-Driven manipulating important applications, for example, during
Engineering to reverse engineering, or to any other activ- reverse-engineering or software modernization through model
ity manipulating large models. Existing solutions to persist transformation. EMF was first designed to support model-
and query large models are currently inefficient and strongly ing tools and has shown limitations in handling large models.
linked to memory availability. In this paper, we propose a A more efficient persistence solution is needed to allow for
memory unload strategy for Neo4EMF, a persistence layer partial model loading and unloading, which are key points
built on top of the Eclipse Modeling Framework and based when dealing with large models.
on a Neo4j database backend. Our solution allows us to
partially unload a model during the execution of a query by While several solutions to persist EMF models exist, most of
using a periodical dirty saving mechanism and transparent them do not allow partial model unloading and cannot han-
reloading. Our experiments show that this approach enables dle models that exceed the available memory. Furthermore,
to query large models in a restricted amount of memory with these solutions do not take advantage of the graph nature
an acceptable performance. of the models: most of them rely on relational databases,
which are not fully adapted to store and query graphs.
Categories and Subject Descriptors Neo4EMF [3] is a persistence layer for EMF that relies on
D.2.2 [Software Engineering]: Design Tools and Tech-
a graph database and implements an unloading mechanism.
niques
In this paper, we present a strategy to optimize the mem-
ory footprint of Neo4EMF. To evaluate this strategy, we
General Terms perform a set of queries on Neo4EMF and compare them
Performance, Algorithms against two other persistence mechanisms, XMI and CDO.
We measure performances in terms of memory consumption
and execution time.
Keywords
Scalability, Large models, Memory footprint The paper is organized as follows: Section 2 presents the
background and the motivations for our unloading strategy.
1. INTRODUCTION Section 3 describes our strategy and its main concepts: dirty
The Eclipse Modeling Framework (EMF) is the de facto saving, unloading, and extended on-demand loading. Sec-
standard for the Model Driven Engineering (MDE) com- tion 4 evaluates the performance of our persistence layer.
munity. This framework provides a common base for mul- Section 5 compares our approach with existing solutions and
tiple purposes and associated tools: code generation [4, 12], finally, Section 6 concludes and draws the future perspec-
model transformation [9, 13], and reverse engineering [17, 6, tives of the tool.
5].
2. BACKGROUND
2.1 EMF Persistence
As many other modeling tools, EMF has adopted XMI as
its default serialization format. This XML-based represen-
tation has the advantage to be human readable, but has
BigMDE ’14 July 24, 2014. York, UK. two drawbacks: (i) XMI sacrifices compactness for an un-
Copyright c 2014 for the individual papers by the papers’ authors. Copy- derstandable output and (ii) XMI files have to be entirely
ing permitted for private and academic purposes. This volume is published
and copyrighted by its editors. parsed to get a readable and navigational model. The former
drawback reduces efficiency of I/O access, while the latter
increases the memory needed to load a model and limits In this paper we focus on Neo4EMF memory footprint. We
on-demand loading and proxy uses between files. XMI does introduce a strategy to unload some parts of a processed
not provide advanced features such as model versioning or model and save memory during a query execution. In the
concurrent modifications. previous implementation, the on-demand loading mechanism
allows us to load only the parts of the model that are needed,
The CDO [8] model repository was built to solve those prob- but there is no solution to remove unneeded objects from
lems. It was designed as a framework to manage large mod- memory, especially when they were changed but not saved
els in a collaborative environment with a small memory foot- yet.
print. CDO relies on a client-server architecture supporting
transactional accesses and notifications. CDO servers are A reliable unload strategy needs to address two main issues:
built on top of several persistence solutions, but in practice
only relational databases are used to store CDO objects.
• Accessibility: Contents of unloaded objects (attributes
and referenced objects) have to remain accessible through
2.2 Graph Databases standard EMF accessors.
Graph databases are one of the NoSQL data models that
• Transparency: The management of the object life
have emerged to overcome the limitations of relational databases
cycle has to be independent from users, but customiz-
with respect to scale and distribution. NoSQL databases do
able to fit specific needs, e. g., size of the Java virtual
not ensure ACID properties, but in return, they are able to
machine, requirements on execution time, etc.
handle efficiently large-scale data in a distributed environ-
ment.
Our strategy faces these issues by providing a dirty-saving
Graph databases are based on nodes, edges, and proper- mechanism, which provides temporary and transparent model
ties. This particular data representation fits exactly to EMF persistence. The object life cycle has also been modified to
models, which are intrinsically graphs (each object can be include unloading of persisted elements.
seen as a node and references as edges). Thus, graph databases
can store EMF models without a complex serialization pro- In this next sections, we provide an overview of the changelog
cess. used to record the modifications of the processed model.
Then, we present dirty saving, based on the basic Neo4EMF
save mechanism, and we describe the Neo4EMF object life
3. NEO4EMF cycle. Finally, we describe the modifications done on the
Neo4EMF is a persistence layer built on top of the EMF
on-demand loading feature to handle this new strategy.
framework that aims at handling large-models in a scal-
able way. It provides a compatible EMF API and a graph-
database persistence backend based on Neo4j [16]. 3.1 Neo4EMF Changelog
Neo4EMF is open source and distributed under the terms Neo4EMF needs a mechanism to ensure synchronization be-
of the (A)GPLv3 [1]. tween the in-memory model and its backend representation,
avoiding systematic unnecessary calls to the database.
In previous work [3], we introduced the basic concepts of
Neo4EMF : model change tracking and on-demand loading. Despite the existence in EMF of a modification tracking
Model change tracking is based on a global changelog that mechanism, the ChangeRecorder class, we decided to de-
stores the modifications done on a model during an execu- velop an alternative solution that minimizes memory con-
tion (from creation to save). Tracking the modifications is sumption.
done using EMF notification facilities: the changelog acts
as a listener for all the objects and creates its entries from Neo4EMF tracks model modifications in a changelog, a se-
the received notifications. Neo4EMF uses an on-demand quence of entries of five types:
loading mechanism to load object fields only when they are
accessed. Technically, each Neo4EMF object is instantiated
Object creation: A new object has been created and at-
as an empty container. When one of its fields (EReferences
tached to a Neo4EMF resource.
and EAttributes) is accessed, the associated content is
loaded. This mechanism presents two advantages: (i) the Object deletion: An object has been deleted or removed
entire model does not have to be loaded at once and (ii) from a Neo4EMF resource.
unused elements are not loaded.
Attribute modifications: Attribute setting and unsetting.
Neo4EMF does not use the EStore mechanism. Indeed,
Reference addition: Assignment of a new single-valued
EStore allows the EObject data storage to be changed by
reference or addition of a new referenced object in a
providing a stateless object that translates model modifi-
multi-valued one.
cations and accesses into backend calls. Every generated
accessor and modifier delegates to the reflexive API. As Reference deletion: Unsetting a single-valued reference
a consequence, EObjects have to fetch through the store or removing a referenced object in a multi-valued one.
each time a field is requested, engendering several database
queries. On the contrary, Neo4EMF is based on regular
EObjects (with in-memory fields) which are synchronized We distinguish unidirectional and bidirectional reference mod-
with a database backend. ifications for performance reasons (they are not serialized the
same way during the saving process).
Figure 1 summarizes our changelog model. All changelog Figure 2: Excerpt of MoDisco Java Metamodel
entries are subclasses of Entry, which defines some shared
properties: the object concerned by the modification (for Package owned_elements *
Clas s Declaration body_declarations *
BodyDeclaration
instance the object containing a modified attribute or ref- name : S tring name : S tring name : S tring
erence, or the new object in case of a CreateObject entry) comments
*
and a basic serialization method. Comment
Content : S tring
Attribute and reference modification entries (SetAttribute,
AddLink, RemoveLink and their subclasses) have three
additional fields to track fine-grained modifications: the up-
dated feature (attribute or reference identifier) which cor- Figure 3: Sample instance of Java Metamodel
responds to the modified field of the concerned object, the
new and old values of the feature (if available).
p1 : Package owned_elements cl1 : Clas s Declaration
name : "package1" name : "clas s 1"
This decomposition provides a direct access to the informa- body_declarations com1 : Comment
tion required during the serialization process, without ac- b1 : BodyDeclaration
comments
content : "comment1"
cessing the concerned objects. The fine-grained entry man- name : "body1" comments
com2 : Comment
agement also decreases memory consumption. For instance name : "comment2"
modifications on bidirectional references correspond to a sin-
gle changelog entry, while they needed two basic entries be-
fore. Serialization of those entries is also more efficient since
it reduces the number of database accesses. sures that, when the resource is deleted, all the related en-
tries are also deleted. In the previous version, entries could
In the previous version of Neo4EMF, we used the EMF noti- not be deleted from the global changelog, and were kept in
fication framework to create changelog entries. This imple- memory during the execution.
mentation had a major drawback: notifications were han-
dled in a dedicated thread, and we could not ensure that
all the notifications were sent to the changelog before its
3.2 Dirty Saving
serialization. This behavior could create an inconsistency Neo4EMF relies on a mapping between EMF entities and
between the in-memory model and the saved one. This is Neo4j concepts to save its modifications. Figure 2 shows
another reason we do not use the EMF ChangeRecorder an excerpt of the Java metamodel, used in the MoDisco [17]
facilities, which relies on notifications. project. This metamodel describes Java applications in terms
of Packages, ClassDeclarations, BodyDeclarations, and
In this new version, changelog entries are directly created Comments. A Package is a named container that gathers
into the body of the generated methods. This solution re- a set of ClassDeclarations through its owned elements
moves synchronization issues and is also more efficient, be- composition. A ClassDeclaration is composed of a name,
cause entries are created directly, and all the information a set of Comments and a set of BodyDeclarations.
needed to construct them is available in the method body Figure 3 shows a simple instance of this metamodel: a Pack-
(current object, feature identifier, new and old values). We age (package1), containing one ClassDeclaration, (class1).
also do not have to deal with the generic notification API, This ClassDeclaration contains two Comments (comment1
which was resulting in a lot of casts and complex processing and comment2) and one single BodyDeclaration (body1).
to retrieve this information. Synchronizing the changelog Figures 2, 3, and 4 show that:
brings another important benefit: the causality between
model modifications and entries order is ensured and there Model elements are represented as nodes. Nodes with
is no need to reorder the entry stack before its serialization. identifier p1, cl1, b1, and com1 are examples corre-
sponding to p1, cl1, b1, and com1 in Figure 3. The
Finally, we modify the changelog life cycle. In the previous root node represents the entry point of the model (the
version, the changelog was a global singleton object, con- resource directly or indirectly containing all the other
taining the record of a full execution, mixing modifications elements) and is not associated to a model object.
of multiple resources. This solution is not optimal because
saving is done per resource in EMF, and to save a single re- Elements attributes are represented as node properties.
source the entire modification stack needed to be processed Node properties are hname, valuei pairs, where name
to retrieve the corresponding entries. We choose to create a is the feature identifier and value the value of the fea-
dedicated changelog into each Neo4EMF resource that han- ture. Node properties can be observed for p1, cl1, and
dles modifications only for the objects contained in the as- b1.
sociated resource. This modification reduces the complexity
Metamodel elements are also represented as nodes and
of the save processing: the resource changelog is simply it-
are indexed to facilitate their access. Metamodel nodes
erated and its entries are then serialized into database calls.
have two properties: the metaclass name and the meta-
The synchronized aspect of the changelog allows us to pro-
model unique identifier. P, Cl, B and Com are ex-
cess the entries in the order they are added, which was not
amples of metamodel element nodes, they correspond
possible in the previous version.
to PackageDeclaration, ClassDeclaration, Body-
Furthermore, associating a changelog with a resource en-
Declaration, and Comment, respectively in Figure 2
Figure 1: Changelog Metamodel
ChangeLog
Entry
proces s ()
AddLink S etAttribute RemoveLink
EObject NewObject DeleteObject
updatedFeature : EReference updatedFeature : EAttribute updatedFeature : EReference
BidirectionalAddLink UnidirectionalAddLink BidirectionalRemoveLink UnidirectionalRemoveLink
Figure 4: Sample instance database representation
id=b1
INS TANCE_OF id=B
CLAS S __DECLARATION_BODY_DECLARATIONS name : 'body1'
name = 'BodyDeclaration'
ns URI = 'http://java'
IS _ROOT id = p1 PACKAGE__OWNED_ELEMENTS id = cl1 CLAS S __DECLARATION_COMMENTS id=com1
ROOT
name : 'package1' name : 'clas s 1' CLAS S __DECLARATION_COMMENTS content : 'comment1' INS TANCE_OF
id=P
INS TANCE_OF INS TANCE_OF INS TANCE_OF name = 'Comment'
id=com2 ns URI = 'http://java'
id=P id=Cl content : 'comment2'
name = 'Package' name = 'Clas s Declaration'
ns URI = 'http://java' ns URI = 'http://java'
InstanceOf relationships are outgoing relationships be- be entirely persisted, and there is no reason to record their
tween the elements nodes and the nodes representing modifications before their first serialization (the final state
metaclasses. They represent the conformance of an of the object is the one that needs to be persisted). This full
object instance to its class definition serialization behavior has the advantage of generating only
one single entry for a new object, independently from the
References between objects are represented as relation- number of its modified fields.
ships. To avoid naming conflicts relationships are named
using the following convention: This approach works well for small models, but has issues
class name reference name. when a large modification set needs to be persisted: the
changelog grows indefinitely until the user decides to save
it. This is typically the case in reverse engineering, where
When a save is requested, changelog entries are processed to the extracted objects are first all created in memory and
update the database backend. Each entry is serialized into a only afterwards they are saved.
database operation. The CreateObject entry corresponds
to the creation of a new node and its meta-information To address this problem we introduce dirty-saving, a peri-
(instanceof to its meta-class, isRoot if the object is di- odical save action not requested by the user. The period
rectly contained in the resource). All the fields of the object is determined by the changelog size, configurable through
are also serialized and directly saved in the database. A Se- the Neo4EMF resource. Since these save operations are not
tAttribute entry corresponds to an update of the related requested by the user they have to ensure two properties:
node’s property with the corresponding name. AddLink,
RemoveLink, and their subclasses respectively record the
creation and removal of a relationship, storing the contain- • Reversibility: if the modifications are canceled or if
ing class and feature name. the user does not want to save a session the database
should rollback to an acceptable version. This version
We decide to serialize at the same time a created object is either (i) the previous regularly saved database if an
and all its references and attributes. New objects need to older version exists or (ii) an empty database.
• Persistability: if a regular save is requested by the then calling a dirty save, the database will be updated as in
user, the temporary objects in the database have to Figure 6. Note that a Delete relationship has been created
be definitely persisted. They can then constitute a because the removed Comment is not contained in the re-
new acceptable version of the database if a rollback is source anymore. Red relationships and nodes are indexed
needed. respectively in tmp_relationships and tmp_nodes indexes.
This example shows that our mapping is built on top of the
We introduce a new mapping for changelog entries with the existing one: there is no modification done on the previ-
purpose of temporary dirty saving. This mapping is based ous version, represented with black nodes. This simplifies
on the same entries as the regular mapping but the associ- the rollback process, which consists of a deletion of all the
ated Neo4j concepts allow the system to easily extract dirty temporary Neo4j objects.
objects and regular ones. In addition we create two indexes:
tmp_relationships and tmp_nodes which respectively con-
tain the dirty relationships and nodes (i. e., created in a dirty
3.3 Object Life Cycle
saving session). Figure 5 summarizes the mapping between We modify the Neo4EMF object life cycle to enable unload-
changelog entries and neo4j concepts: ing. When a dirty saving is invoked, all the modifications
contained in the changelog are committed to the database.
Because of this persistence, persisted objects can be safely
• CreateObject: creation of a new node (as in the reg- released from memory and reloaded using on-demand load-
ular saving process) and addition to the tmp_nodes ing, if needed.
index.
Figure 7 shows the different life cycle states of a Neo4EMF
• SetAttribute: creation of a dedicated node contain- object. When a Neo4EMF object is created it is New: it
ing the dirty attributes. The idea is to keep a stable has not been persisted into the database and cannot be re-
version (i. e., the previous regularly saved version) to leased. When a save is requested or a dirty save is invoked,
easily reverse it. A SetAttribute relationship is cre- the new object is persisted into the database and it is tagged
ated to link the base object and its attribute node as Clear: all the known modifications related to the object
have been saved and it is fetchable from the database with-
• AddLink: creation of a generic AddLink relation- out information loss. In this state the object can be removed
ship, containing the reference identifier as a property. from memory without consistency issues. When a modifica-
This special relationship format is needed to easily pro- tion is done on the object (setting an attribute or updating
cess dirty relationships and retrieve their correspond- a reference) then it is tagged as Modified.
ing image if a regular save operation is requested
Modified objects cannot be released, because their database-
• RemoveLink: creation of a generic RemoveLink re-
mapped nodes do not contain the modified information. When
lationship, containing the reference identifier as a prop-
a save is processed, the Modified objects revert to Clear
erty. AddLink and RemoveLink relationships with
state and can be released again. Loading objects also have
the same reference identifier and target object are mu-
a particular state that avoids garbage collection of an object
tually exclusive to limit the number of temporary ob-
when it is loading.
jects into the database
• DeleteObject: creation of a special Delete relation-
Figure 7: Neo4EMF EObject life cycle
ship looping on the related node. The base version of
the node is kept alive if a rollback is needed.
The objective of this mapping is to preserve all the infor-
mation contained after a regular save, to easily handle a
rollback. That is why object deletion is done using a re-
lationship: if the modifications are aborted it is simpler to
remove the relationship than creating a new instance of the
node with backup information. We do not use a property to
tag deleted objects for performance reasons (access to node
properties is slower than edge navigation).
To persist definitely dirty objects in the database into regu-
larly saved ones a serialization process is invoked. As changelog
entries, each Neo4j element contains all the information needed
to create their regular equivalents: new objects are simply
removed from the tmp_nodes index, AddLink relationships
are turned into their regular version using their properties
and RemoveLink entries correspond to the deletion of their
existing regular version. To allow garbage collection of Neo4EMF objects, we use
Java Soft and Weak references to store object’s fields. Weak
For example if we update the model given in Figure 3 by re- and Soft referenced objects are eligible for garbage collection
moving com1 and creating a new BodyDeclaration body2 as soon as there is no strong reference chain on them. The
Figure 5: Changelog to Neo4j entity mapping
Neo4j::Relations hipType
AddLink + name : S tring = "AddLink"
+ relName : S tring
Neo4j::Relations hipType
DeleteObject
+ name : S tring = "Delete"
Neo4j::Relations hipType
ChangeLog Entry S etAttribute
+ name : S tring = "S etAttribute"
Neo4j::Relations hipType
RemoveLink
1..* + name : S tring = "RemoveLink" Neo4j::Node
+ relName : S tring
EObject
NewObject
Figure 6: Database state after modifications
id=b1
CLAS S __DECLARATION_BODY_DECLARATIONS
name : 'body1'
Delete
CLAS S __DECLARATION_COMMENTS
id=com1
id = p1 id = cl1 RemoveLink content : 'comment1'
IS _ROOT PACKAGE__OWNED_ELEMENTS rel='CLAS S __DECLARATION_COMMENTS '
ROOT
name : 'package1' name : 'clas s 1'
CLAS S __DECLARATION_COMMENTS id=com2
content : 'comment2'
AddLink
rel='CLAS S __DECLARATION_BODY_DECLARATIONS '
id=b2
name : 'body2'
difference between the two kinds of references is the time database manages its objects life cycle through a policy de-
they can remain in memory. Weak references are collected fined at the resource creation (memory or performance pref-
as soon as possible by the garbage collector, whereas Soft erences).
references can be retained in memory as long as the garbage
collector does not need to free them (i.e., as long as there 3.4 Extended On-Demand Loading
is enough available memory). This particular behavior is To handle the new architecture of our layer, we have to ex-
interesting for cache implementation and to optimize execu- tend the on-demand loading feature to support temporary
tion speed in a large available memory context. Reference persisted objects. On-demand loading uses two parameters:
type (Weak or Soft) can be set through Neo4EMF resource (i) the object that handles the feature to load and (ii) the
parameters. identifier of the feature to load. This behavior implies that
a Neo4EMF object is always loaded from another Neo4EMF
In Section 3.1, we describe that changelog entries contain all object.
the information related to the serialization of the concerned
object. This information constitutes the strong reference
chain on the related object fields. When a save is done, en- Figure 6 shows our Java metamodel instance state after a
tries are processed and deleted, breaking the strong reference dirty save. The database content is a mix between regularly
chain and making objects eligible for garbage collection. saved objects (in black) and dirty-saved ones (in red). Load-
ing referenced Comments instances from ClassDeclara-
Neo4j’s objects are not impacted by this new life-cycle. The tion cl1 is done in three steps to ensure the last dirty-saved
operations have been considered. Persistence Layer XMI CDO Neo4EMF
First, class declaration comments relationships are pro- #Created Elements 22 939 780 4 378 990 >40 000 0001
cessed and their end nodes are saved. Second, the AddLink
relationships containing the corresponding rel property are Table 1: Number of Created Elements Before
processed and their end nodes are added to the previous Memory Overhead
ones. This operation retrieves all the associated nodes for
the given feature, regular ones and dirty ones. Third, Re-
moveLink relationships are processed the same way and
their end nodes are removed from the loaded node set.
Attribute fetching behavior is a bit different: if a node repre-
senting an object has relationships to a dedicated attribute
node, then the data contained in this node is returned in-
stead of the base node property.
To improve the performances of our layer, we create a cache
that maps Neo4j identifiers to their associated object. When
on-demand loading is performed, the cache is checked first,
avoiding the cost of a database access. This cache is also
used to retrieve released objects.
4. EVALUATION
In this section, we evaluate how the memory footprint and Figure 8: Memory Consumption: Model Traversal
the access time of Neo4EMF scale in different large model and Save (20 MB)
scenarios, and we compare it against CDO and XMI. These
experiments are performed over two EMF model extracted
with the MoDisco Java Discoverer [17]. Both models are ex- Note that the number given for Neo4EMF is an approxi-
tracted from Eclipse plug-ins: the first one is an internal tool mation: we stop the execution before any OutOfMemory
and the second one is the Eclipse JDT plugin. The result- error. The average memory used to create elements was
ing XMI files are 20 MB and 420 MB, containing respectively around 500 MB and does not seem to grow. This perfor-
around 80 000 and 1 700 000 elements. mance is due to the dirty-saving mechanism: created ob-
jects generate entries in the changelog. When the changelog
4.1 Execution Environment is full, changes are saved temporarily in the database, freeing
Experiments are executed on a computer running Windows the changelog for next object creations.
7 professional edition 64 bits. Interesting hardware ele-
ments are: an Intel Core I5 processor 3350P (3.5 GHz), 8 GB Experiment 2: Model traversal. In this experiment, we
of DDR3 SDRAM (1600 MHz) and a Seagate barracuda load a model and execute a traversal query that starts from
7200.14 hard disk (6 GB/s). Experiments are executed on the root of the model, traverses all the containment tree and
Eclipse 4.3 running Java SE Runtime Environment 1.8. modifies the name attribute of all NamedElements. All
the modifications are saved at the end of the execution. Dur-
To compare the three persistence solutions, we generate ing the traversal, we measure the execution time for covering
three different EMF models from the MoDisco Java Meta- the entire model and the average memory used to perform
model: (i) the standard EMF model, (ii) the CDO one and the query. In addition, we measure the memory needed to
(iii) the Neo4EMF one. We import both models from XMI save the modifications at the end of the execution. Fig-
to CDO and Neo4EMF and we verify they contain the same ures 8 and 9 summarize memory results. As expected, the
data after the import. Neo4EMF traversal footprint is higher than the XMI one be-
cause we include the Neo4j embedded database and runtime
Neo4EMF uses an embedded Neo4j database to store its in our measures. Unloading brings a real interest when com-
objects. To provide a meaningful comparison in term of paring the results with CDO: when removing unused (i. e.,
memory consumption we choose to use an embedded CDO unreferenced) objects we save space and process the request
server. in a reduced amount of memory. For this experiment we
use a 4 GB Java virtual machine, with the ConcMarkSweepGC
Experiment 1: Object creation. In this first exper- garbage collector, recommended when using Neo4j.
iment, we execute an infinite loop of object creation and
simply count how many objects have been created before a Experiment 3: Time performance. This experiment is
OutOfMemoryException is thrown. We choose a sim- similar to the previous one, but we focus on time perfor-
ple tree structure of three classes to instantiate from the mances. We measure the time needed to perform traversal
MoDisco Java metamodel: a parent ClassFile containing and save. Figures 10 and 11 summarize the results. To
1000 BlockComment and ImportDeclaration. The re- provide a fair comparison between full and on-demand load-
sulting model is a set of independent element trees. For this ing strategies we also include model loading time with the
experiments we choose a 1 GB Java virtual machine and an traversal queries.
arbitrarily fixed changelog size of 100 000 entries. Table 1
1
summarizes the results. The execution was stopped before any memory exception.
Figure 9: Memory Consumption: Model Traversal Figure 11: 420 MB traversal and save performances
and Save (420 MB)
We also run our benchmarks on different operating sys-
tems (Ubuntu 12.04 and 13.10) and we find that CDO and
Neo4EMF time performances seem to be linked to the file
partition format (especially in I/O accesses): Neo4j has bet-
ter performances on these operating system (with a factor
of 1.5) and CDO has slower times (with approximately the
same factor). More investigation is needed to optimize our
tool in different contexts.
Our experiments show that Neo4EMF is an interesting al-
ternative to CDO to handle large models in memory con-
strained environment. On-demand loading and transpar-
ent unloading offer a small memory footprint (smaller than
CDO in our experiments), but our solution does not provide
advanced features like collaborative edition and versioning
provided by CDO.
Figure 10: 20 MB model traversal and save perfor- The unload strategy is transparent for the user, but may be
mances intrusive in some cases, for instance if the hard-drive mem-
ory space is limited or the time performances are critical.
This is why we introduce configuration for dirty saving and
changelog size through the Neo4EMF resource.
Neo4EMF save performances can be explained with dirty- 5. RELATED WORK
saving: during the traversal, entries are generated to track Models obtained by reverse engineering with EMF-based
the name modifications. These entries are then saved in the tools such as MoDisco [17, 5, 11] can be composed of mil-
database when the changelog is full, reducing the final save lions of elements. Existing solutions to handle this kind of
cost. This behavior also explains a part of the traversal time models have shown clear limitations in terms of memory
overhead, when compared to CDO: Neo4EMF traversal im- consumption and processing.
plies database write access for dirty saving where CDO does
not, related I/O accesses considerably impact performance. CDO is the de facto standard to handle large models using
a server and a relational database. However, some exper-
iments have shown that CDO does not scale well to very
4.2 Discussion large models [2]. Pagán et al. [14, 15] propose to use NoSQL
The results of these experiments show that dirty-saving cou- databases to store models, especially because those kind of
pled with on-demand loading decrease significantly the mem- databases should fit better to the interconnected nature of
ory needed to execute a query. As expected, this memory EMF models.
footprint improvement worsens the time performances of our
tool, in particular because of dirty-saving, which generates Mongo EMF [7] is a NoSQL approach that stores EMF mod-
several database calls. That is why we provide dirty sav- els in MongoDB, a document-oriented database. However,
ing configuration through the Neo4EMF resource. The ex- Mongo EMF storage is different from the standard EMF
periments also show that Neo4EMF is able to handle large persistence backend, and cannot be used as is to replace an
queries and modifications in a limited amount of memory, other persistence solution in an existing system. Modifica-
compared to existing solutions. tions on the client software are needed to integrate it.
Morsa [14] is an other persistence solution based on Mon- [4] L. Bettini. Implementing Domain-Specific Languages
goDB database. Similarly to Neo4EMF, Morsa uses a stan- with Xtext and Xtend. 2013.
dard EMF mechanism to ensure persistence, but it uses a [5] H. Bruneliere, J. Cabot, G. Dupé, and F. Madiot.
client-server architecture, like CDO. Morsa has some sim- Modisco: A model driven reverse engineering
ilarities with Neo4EMF, notably in its on-demand loading framework. Information and Software Technology,
mechanism, but does not use a graph database. 56(8):1012 – 1032, 2014.
[6] H. Bruneliere, J. Cabot, F. Jouault, and F. Madiot.
EMF Fragments [10] is another EMF persistence layer based Modisco: A generic and extensible framework for
on a NoSQL database. The EMF Fragments approach is dif- model driven reverse engineering. In Proceedings of the
ferent from other NoSQL persistence solutions: it relies on IEEE/ACM International Conference on Automated
the proxy mechanism provided by EMF. Models are auto- Software Engineering, ASE ’10, pages 173–174, New
matically partitioned and loading is performed by partition. York, NY, USA, 2010. ACM.
Loading on demand is only performed for cross-partition [7] Bryan Hunt. MongoEMF, 2014. url:
references. Another difference with Neo4EMF is that EMF https://github.com/BryanHunt/mongo-emf/wiki/.
Fragments needs to annotate the metamodels to provide the [8] Eclipse Foundation. The CDO Model Repository
partition set, whereas our approach does not require model (CDO), 2014. url: http://www.eclipse.org/cdo/.
adaptation or tool modification.
[9] INRIA and LINA. ATLAS transformation language,
2014.
6. CONCLUSION AND FUTURE WORK [10] Markus Scheidgen. EMF fragments, 2014. url: https:
In this paper, we presented a strategy to optimize the mem- //github.com/markus1978/emf-fragments/wiki/.
ory footprint of Neo4EMF, a persistence layer designed to [11] Modeliosoft Solutions, 2014. url:
handle large models through on-demand loading and trans- http://www.modeliosoft.com/.
parent unloading. Our experiments show that Neo4EMF is [12] J. Musset, É. Juliot, S. Lacrampe, W. Piers, C. Brun,
an interesting alternative to CDO for accessing and query- L. Goubet, Y. Lussaud, and F. Allilaire. Acceleo user
ing large models, especially in small available memory con- guide, 2006.
text, with a tolerable performance loss. Neo4EMF does not [13] OMG. MOF 2.0 QVT final adopted specification
have collaborative model editing or model versioning fea- (ptc/05-11-01), April 2008.
tures, which biases our results: providing those features may [14] J. E. Pagán, J. S. Cuadrado, and J. G. Molina. Morsa:
imply a more important memory consumption. A scalable approach for persisting and accessing large
models. In Proceedings of the 14th International
In future work, we plan to improve our layer by providing Conference on Model Driven Engineering Languages
partial collection loading, allowing the loading of large col- and Systems, MODELS’11, pages 77–92, Berlin,
lections subparts from the database. In our experiments, we Heidelberg, 2011. Springer-Verlag.
detected some memory consumption overhead in this par-
[15] J. E. Pagán and J. G. Molina. Querying large models
ticular case: when an object contains a huge number of ref-
efficiently. Information and Software Technology, 2014.
erenced objects (through the same reference) and they are
In press, accepted manuscript. url:
all loaded at once.
http://dx.doi.org/10.1016/j.infsof.2014.01.005.
We then plan to study the inclusion of attribute and refer- [16] J. Partner, A. Vukotic, and N. Watt. Neo4j in Action.
ence meta-information directly in the database to avoid un- O’Reilly Media, 2013.
necessary object loading: some EMF mechanisms, like is- [17] The Eclipse Foundation. MoDisco Eclipse Project,
Set may induce load on demand of the associated attribute, 2014. url: http://www.eclipse.org/MoDisco/.
just in order to make a comparison. It could be interest-
ing to provide this information from the database without a
complete and costly object loading.
Finally, we want to introduce loading strategies such as
prefetching or model partitioning (using optional metamodel
annotations or a definition of the model usage) to allow users
to customize the object life cycle.
7. REFERENCES
[1] AtlanMod. Neo4EMF, 2014. url:
http://www.neo4emf.com/.
[2] K. Barmpis and D. S. Kolovos. Comparative analysis
of data persistence technologies for large-scale models.
In Proceedings of the 2012 Extreme Modeling
Workshop, XM ’12, pages 33–38, New York, NY, USA,
2012. ACM.
[3] A. Benelallam, A. Gómez, G. Sunyé, M. Tisi, and
D. Launay. Neo4emf, a scalable persistence layer for
emf models. July 2014.