=Paper=
{{Paper
|id=Vol-1406/paper2
|storemode=property
|title=Approach to Define Highly Scalable Metamodels Based on JSON
|pdfUrl=https://ceur-ws.org/Vol-1406/paper2.pdf
|volume=Vol-1406
|dblpUrl=https://dblp.org/rec/conf/staf/GerhartBHB15
}}
==Approach to Define Highly Scalable Metamodels Based on JSON==
Approach to Define Highly Scalable Metamodels Based on JSON Markus Gerhart, Julian Bayer, Jan Moritz Höfner, and Marko Boger University of Applied Sciences Konstanz, ProGraMof, Brauneggerstraße 55, 78462 Konstanz, Germany {mgerhart,jubayer,jahoefne,marko.boger}@htwg-konstanz.de http://www.htwg-konstanz.de Abstract. Domain-specific modelling is increasingly adopted in the soft- ware development industry. While open source metamodels like Ecore have a wide impact, they still have some problems. The independent storage of nodes (classes) and edges (references) is currently only possi- ble with complex, specific solutions. Furthermore the developed models are stored in the extensible markup language (XML) data format, which leads to problems with large models in terms of scaling. In this paper we describe an approach that solves the problem of inde- pendent classes and references in metamodels and we store the models in the JavaScript Object Notation (JSON) data format to support high scalability. First results of our tests show that the developed approach works and classes and references can be defined independently. In ad- dition, our approach reduces the amount of characters per model by a factor of approximately two compared to Ecore. The entire project is made available as open source under the name MoDiGen. This paper fo- cuses on the description of the metamodel definition in terms of scaling. Keywords: Metamodel definition, JavaScript Object Notation, JSON, Model scalability, Metamodel scalability, Model storage 1 Introduction Tools for creating Domain-Specific Modeling Languages (DSML) are becoming more accepted in the software development industry to develop specific solutions for specific problems. These solutions are developed with tools such as Xtext[5], Meta Programming System (MPS)[6], MetaEdit+ Modeler[7], Kybele[8], Mag- icDraw[9] or Eugenia[10]. The underlying metamodel of the tools is of crucial importance, because it serves as basis for all subsequent steps like code gener- ation or programatic manipulation of the model data. Therefore, access to the metamodel has to be very simple and the memory consumption should be as low as possible. Only if the metamodel is maintained at a high abstraction level, the subse- quent programmatic processing can be implemented simple, clean, and clear like it is suggested by the KISS-principle (”Keep it simple, stupid”). However, exist- ing open source tools cover only the needs of specific subject areas such as the software development industry. Requirements of very complex metamodels for instance the statics of buildings can currently only be fulfilled through complex detours. Not considering commercial solutions, because they do not give insight into the structure of their metamodel and storage solution, another common problem is the storage consumption of very large models. Furthermore, a great weakness of existing open source solutions such as Ecore[1] is, that they do not scale well to very large models. Our suggested approach allows the definition of metamodels in a simple and clear way. We offer the possibility, that nodes (classes) and edges (references) can exist independently and with equal rights. This leads to a variety of possibilities in the creation of metamodels. In addition, the data of metamodels and models is held in the JavaScript Object Notation (JSON)[17] file format. This allows the smooth scaling of the model data using existing solutions such as CouchDB[14], MongoDB[15], and RavenDB[16]. We demonstrate, that the storage of metamodels and models is possible with- out problems, even when involving well over 10,000 elements (classes and refer- ences). The paper first reviews related work in the field in section 2, which is mostly other tools and techniques for the definition of meta models. Our general ap- proach for the definition of meta-models based on JSON for high scalability is described in section 3. The core contribution of this publication is the developed metamodel with independent classes and connections and the data structure us- ing JSON to store models for high scalability. Section 4 illustrates the results of our approach from different angles. Finally, we summarise the limitations of our research and draw conclusions in section 5. 2 Related Work The Ecore metamodel of the Eclipse Modeling Framework (EMF) stores edges as parts of nodes. An EReference actually models one end of an edge [1]. This is also true for the Generic Modeling Environment (GME) [2] and WebGME [3]. The disadvantages of this approach were already discussed in [11]. Storing edges as parts of nodes does not scale well for large numbers of edges. Accessing, loading and saving individual edges requires linear time and may pose a prob- lem in terms of heap memory. As an alternative, [11] proposes storing edges as relations like a relational database, separating the edge from the node. That is the same basic principle we follow. Instead of an SQL-like structure as proposed in [11], we introduce the edge as a first-level-object that is stored in a NoSQL database. The Ecore metamodel uses an Extensible Markup Language (XML) based format for model serialization [1]. The need for a model serialization format that is not based on XML was formulated in [12]. While our approach does not satisfy all the criteria for model storage set forth in that paper, we believe that JSON as the basic format for model storage will simplify implementing these criteria. A diagram is stored as a collection of JSON objects which can be addressed through an identifier. Instead of loading large models all at once, objects could be loaded as needed using their identifier. The partial loading of model data based on JSON is covered by databases like CouchDB or MongoDB. An XML- based format would always have to be loaded in full apart from approaches such as partial parsing which significantly increases the runtime. Approaches such as GEMSjax [4] or EMF-REST [19] provide Representational State Transfer (REST) access to Ecore models and translate them into either JSON or XML. Compared to a method that stores JSON data in a document oriented database, these approaches require additional overhead, as the Ecore model has to be serialised into the target format on each call to the REST interface. 3 Approach In this section we will give a detailed explanation of our approach. First we discuss and reason about the architecture of our metamodel and its components and then present an example. 3.1 Architecture An overview of the MoDiGen metamodel architecture is given in Figure 1. The design is focused on universality and simplicity. We utilise standard conform JSON to store both model and instance data. This helps us to achieve program- ming language agnosticism and scalability. In the following paragraphs we will discuss the architecture depicted in Figure 1 and the corresponding metamodel components in detail. <> < > M_OBJ M_BOUNDS + name: String + upperBound: Int + lowerBound: Int M_ENUM M_CLASS M_REFERENCE M_ATTRIBUTE M_LINK_DEF + values: List[Skalar] + abstract: Bool + sourceDeletionDeletesTarget: Bool + globalUnique: Bool + type: M_CLASS | M_REFERENCE + superTypes: List[M_CLASS] + targetDeletionDeletesSource: Bool + localUnique: Bool + deleteIfLower: Bool + inputs: List[M_LINK_DEF] + source: List[M_LINK_DEF] + default: Type + outputs: List[M_LINK_DEF] + target: List[M_LINK_DEF] + constant: Bool + singleAssignment: Bool + expression: String attributes + type: Scalar | M_ENUM attributes + ordered: Bool + transient: Bool Fig. 1. MoDiGen Metamodel M Obj is the abstract base class of most of the metamodel’s components. Its use is to provide a name attribute to all components that need to be identifiable. The name attribute is guaranteed to be unique on a model scope. M Bounds is abstract and defines upperBound and lowerBound attributes. Bound values can either be zero, a positive Integer, or -1 for infinity. By defining an upper as well as a lower bound, one can model maxima, minima, as well as ranges. By default lowerBounds are set to 0 and upperBounds to -1. The modelling of attributes in nodes and edges is done using M Attribute. It extends M Bounds so it can be defined to either be single-valued or an array with an optional maximum and/or minimum length. To further define its be- haviour a number of mandatory attributes exist which will be discussed now. If uniqueLocal is set to true the M Attribute behaves much like a Set data structure, for it is a collection which can not contain any duplicate values. The uniqueGlobal flag in contrast, guarantees that the attribute’s values are unique on a model scope. This may be useful for modelling attributes like Social Security Num- bers. The default attribute is a value of type and defines the initial value of the M Attribute. Using expression one can define a simple arithmetic formula to derive the value of the attribute. This renders the attribute read only and can be useful in cases where one attribute depends on other attributes. Whether the M Attribute is a String, Integer, Double, or Enum is defined by setting the type attribute. The ordered flag defines whether the attribute’s values are ordered in some fashion. The attribute transient determines whether the attribute is tran- sient. If set to true the attribute’s value won’t be stored when the model is being saved to a database. This might be useful for attributes which are the result of an expression. Single Assignment behaviour can be modelled by setting the sin- gleAssignment flag, this causes the value to be settable exactly once. Constant on the other hand means that the value is default and may never be changed. Nodes are modelled using M Classes, which in addition to a number of mandatory attributes may contain an arbitrary number of user defined at- tributes. The mandatory attributes are defined in the following manner. Ab- stract denotes whether the M Class is declared as abstract, meaning it may not be instantiated but only be used as a base for other M Classes. Inheritance be- tween M Classes is modelled using the superTypes attribute. It contains all direct predecessors. By defining superTypes as a list we explicitly allow multiple inher- itance. The inputs attribute is a list of all incoming M Link Defs and outputs is a list of all outgoing M Link Defs. On the other side M Reference also has a target and source attribute which behaves like inputs and outputs. M Link Def is used to define one endpoint of a connection and has a type attribute which is either M Class or M Reference as well as an upper- and lower- bound. The flag deleteIfLower defines whether the M Class and M Reference which contains the M Link Def should be deleted in case the number of values of the M Link Def drops to its lower bound. This means one can model a minimum count of Output/Input or Source/Target values a certain Class or Reference must have of any type. An M Reference denotes an edge between two M Classes. By defining ref- erences as first-level classes our metamodel gains a couple of powerful modelling possibilities. For example edges may have an arbitrary number of custom at- tributes and allow n:m relationships. A set of standard attributes also exists which will be defined now. With sourceDeletionDeletesTarget set, in case the source of the M Reference is deleted, the target, and the reference itself is also deleted. Accordingly targetDeletionDeletesSource deletes the source and the ref- erence in case the target is deleted. This can be useful to model containments and similar constructs where one end of a reference can not exist without the other end. The source attribute is a list of sources and the target attribute is a list of targets of the M Reference. Both have the Type M Link Def, therefore M References can be defined to be valid for a number of different source- and target-classes with dedicated bounds for each class in both directions. For ex- ample one can define an edge to be valid from A to B or C and further specify separate bounds for the number of edges from A to B or A to C as well as separate bounds for the number of Bs, Cs, and As involved in those edges. Finally M Enum is a simple Enum of a scalar type and might be used as a type for attributes. 3.2 Example To demonstrate the capabilities of our metamodel, we will consider the following (oversimplified) family tree model. Three M Classes, Person, Male, and Female exist, where Male and Female inherit from Person and Person is abstract. The following relationships exist between these classes: The Relationship isHusband has a Male source and a Female target, while the reverse relationship isWife has a Female source and a Male target. The Male class also has a relationship isFather, directed at Person and the Female class has the corresponding isMother relationship, also directed at Person. Figure 2 illustrates the setup. Person * + FirstName: String * + SocialSecurityNumber: String + Birthday: String isMother isFather 1 1 Male 1 0…1 Female isHusband isWife 0…1 1 Fig. 2. Family Tree Model Listing 1.1 is taken from the compressed familytree model and represents the M Class Male. The mtype property is necessary because JSON has no type sys- tem and so this property is needed to declare that this is an M Class. The inputs and outputs properties are M Link Defs linking to the respective M Reference and indicating that at most one such relationship is allowed for one instance of this M Class. Because Male inherits all its attributes from Person mAttributes is empty. 1 " Male " : { 2 " mType " : " mClass " , 3 " name " : " Male " , 4 " superTypes " : [ " Person " ] , 5 " mAttributes " : [] , 6 " inputs " : [ 7 { " type " : " isWife " , 8 " upperBound " : 1 , 9 " lowerBound " : 0 10 } 11 ], 12 " outputs " : [ 13 { " type " : " isHusband " , 14 " upperBound " : 1 , 15 " lowerBound " : 0 16 }, 17 { " type " : " isFather " , 18 " upperBound " : -1 , 19 " lowerBound " : 0 20 } 21 ] 22 }, Listing 1.1. The Male M Class taken from the Family Tree example model An M Reference is given in listing 1.2. This is the isHusband M Reference linking the Male source to the Female target. The isHusband reference links exactly one Male object to one Female object. 1 " isHusband " : { 2 " mType " : " mRef " , 3 " name " : " isHusband " , 4 " mAttributes " : [] , 5 " source " : [ 6 { " type " : " Male " , 7 " upperBound " : 1 , 8 " lowerBound " : 1 9 } 10 ], 11 " target " : [ 12 { " type " : " Female " , 13 " upperBound " : 1 , 14 " lowerBound " : 1 , 15 } 16 ] 17 } Listing 1.2. The isHusband M Reference taken from the Family Tree example model We instantiate this model with three persons. A Male instance and a Fe- male instance, who are married to each other (using the isHusband and isWife references) and another Male, who is the child of the other two. Listing 1.3 shows part of the JSON of an instance of the family tree model. Specifically it shows one instance of the Male class and one of the isHusband Reference. The complete JSON source for both the model and the instance can be found at MoDiGen[13]. 1 " 846 bc8a2 -00 fc -401 f - b626 -0 b0252516aee " : { 2 " mClass " : " Male " , 3 " outputs " : { 4 " isFather " : [ " 8 e9b1093 - a589 -4 ae4 -8 e1e -1 b3d63a3f842 " ] , 5 " isHusband " : [ " ee204744 -6322 -49 d4 -928 e -1442 e8bc70c4 " ] 6 }, 7 " inputs " : { 8 " isWife " : [ " 666 d4de7 - e0f2 -4620 -8 c19 - d5469b40be1f " ] 9 }, 10 " mAttributes " : { 11 " First_Name " : [ " Hans " ] , 12 " SocialSec u r it y Nu m be r " : [ " 12 " ] , 13 " Birthday " : [ " 12 -02 -2015 " ] 14 } 15 }, 16 17 " ee204744 -6322 -49 d4 -928 e -1442 e8bc70c4 " : { 18 " mRef " : " isHusband " , 19 " source " : { 20 " Male " : [ " 846 bc8a2 -00 fc -401 f - b626 -0 b0252516aee " ] 21 }, 22 " target " : { 23 " Female " : [ " a264a43b -6 f97 -4257 -9243 - baddbf745490 " ] 24 } 25 } Listing 1.3. Family tree instance 4 Evaluation The presented approach makes it possible to create nodes and edges with equal rights. This results from the revised metamodel definition which is crucial for pro- grammatic processing of the model data. The storage of metamodel and model information is done using JSON. This allows for easy integration and processing by conventional programming languages and web technologies. Furthermore, the number of characters and therefore the storage consumption for metamodel def- initions and model instances was reduced, compared to the XML data structure of Ecore, by separating edges and nodes, and by using JSON. The change of the number of characters based on the metamodel definition of the familytree example is shown in Figure 3. It turns out that the number 8000 Ecore MoDiGen 7000 6000 5000 -39% Characters 4000 3000 -58% -2% 2000 -36% 1000 0 Ecore/MoDiGen Ecore/MoDiGen 1 Ecore/MoDiGen 2 Ecore/MoDiGen 1,2 1 w/o whitespace 2 w/o default values Fig. 3. Number of characters for the Familytree Metamodel of characters were consistently reduced when compared to Ecore. The biggest difference can be revealed by removing white spaces. This comes at the expense of human readability but is irrelevant for machine processing. By removing default values a further reduction was achieved. Ecore applies these measures by default. The development of the number of characters based on the model instance, depending on the number of nodes is shown in Figure 4. This is based on the smallest possible model instances (without whitespace and default values) for a model where all nodes are interconnected. It can be seen, that for smaller models the Ecore approach is more appropriate, but for larger models the presented approach has advantages. This is mainly caused by the changed handling of connections between objects. If only few connections are present in a model the advantage of our approach relativizes. We generated model instances with 10,000 interconnected nodes for Ecore as well as MoDiGen and found that the storage consumption of Ecore was 5,58 Gigabyte and the storage consumption of MoDiGen was 3,6 Megabytes. Our approach has advantages regarding the scalability of big models. This is mainly due to the used data structure implemented in JSON. Data formats like JSON can easily be horizontally scaled using existing database solutions like CouchDB, MongoDB or RavenDB. This is additionally favoured by the lower memory consumption of the developed metamodel. In contrast to XML-based 25000 MoDiGen ECore 20000 15000 Characters 10000 5000 0 0 5 10 15 20 Nodes Fig. 4. Development of the number of characters for the familytree model instance according to the created nodes data structures the JSON based data structure offers the possibility to access just parts of the stored model. 5 Conclusion and future work We have introduced the MoDiGen metamodel and shown how its approach differs from other metamodels for DSMLs, such as Ecore or GME. Treating edges as first level objects instead of features of nodes allows for easy programmatic access to the edges. The use of JSON yields more compact models than XML, allows for seamless integration into web applications using JavaScript, and opens the door for improvements regarding scalability. In comparison to Ecore, the MoDiGen metamodel lacks the possibility to de- fine operations. While in the Ecore itself, EOperation is more of a placeholder, it can be given an implementation in the context of the Eclipse Modeling Frame- work. Edges as first level objects give easier access to references and permit the existence of stand-alone edges. However, this also means that the modeller has to explicitly state whether an edge must be automatically deleted upon the deletion of one of the connected nodes. This is not a problem in Ecore where references are deleted when the containing class is deleted. In its current form, the JSON representation of MoDiGen models still contains code that could be removed. For example, attributes that have their default value or empty properties are still in the JSON. The JSON representation could be significantly compressed by removing that code. In the future, we plan to implement a complete modeling framework on the basis of this metamodel and work on improving the JSON representation in terms of size. We will also use the MoDiGen metamodel for code generation projects. Furthermore, we plan on extending the metamodel to allow specifica- tion of constraints using the Object Constraint Language (OCL)[18]. References 1. Steinberg, D., Budinsky, F., Paternostro M., Merks E.: EMF Eclipse Modeling Framework, Second Edition. Addison-Wesley Professional (2008) 2. Ledeczi, A., Maroti, M., Bakay, A., Karsai, G., Garrett, J., Thomason, C., Nord- strom, G., Sprinkle, J., Volgyesi, P.: The Generic Modeling Environment. In: Pro- ceedings of WISP’2001, IEEE, Budapest (2001) 3. Maróti, M. Kereskényi, R, Kecskés, T., Völgyesi, P., Lédecyi, Á.: Online Collabora- tive Environment for Designing Complex Computational Systems. Procedia Com- puter Science, Volume 29, pp. 2432-2441 (2014) 4. Farwick, M., Agreiter, B., White, J., Forster, S., Lanzanasto, N., Breu R.: A Web- Based Collaborative Metamodeling Environment with Secure Remote Model Access. ICWE 2010, LNCS 6189, pp. 278-291. Springer-Verlag, Heidelberg (2010) 5. Itemis AG, http://www.eclipse.org/Xtext 6. MPS Meta Programming System, https://www.jetbrains.com/mps/, Accessed 2015-01-22 7. MetaEdit+ Modeler, http://www.metacase.com/mep/, Accessed 2015-01-22 8. Kybele GMF Generator a tool for developing GMF editors in a few steps, http://www.kybele.etsii.urjc.es/kyb kybelegmfgen/, Accessed 2015-01-22. 9. No Magic Inc.: MagicDraw, https://www.magicdraw.com/, Accessed 2015-01-22. 10. Eugenia a tool to automatically generate a GMF editor, http://eclipse.org/epsilon/doc/eugenia/, Accessed 2015-01-22 11. Scheidgen, M.: Reference Representation Techniques for Large Models. In: Big- MDE’13, ACM, Budapest (2013) 12. Kolovos, D., Rose, L., Matragkas, N., Paige, R., Guerra E., Cuadrado, J., De Lara, J., Ráth, I., Varró, D., Tisi, M., Cabot, J.: A Research Roadmap Towards Achieving Scalability in Model Driven Engineering. In: BigMDE’13, ACM, Budapest (2013) 13. MoDiGen, http://www.modigen.de/publications/ 14. CouchDB, http://couchdb.apache.org/ 15. MongoDB, https://www.mongodb.org/ 16. RavenDB, http://ravendb.net/ 17. Java Script Object Notation, https://tools.ietf.org/html/rfc7159 18. OMG Object Management Group: Object Constraint Language Version 2.4, http://www.omg.org/spec/OCL/2.4/ (2014) 19. EMF-REST, http://www.emf-rest.com