MODELS'11 Workshop - EESSMod 2011 Empirical evaluation of the conjunct use of MOF and OCL Juan Cadavid1 , Benoit Baudry1 , Benoı̂t Combemale2 1 INRIA, Centre Rennes – Bretagne Atlantique Campus de Beaulieu, 35042 Rennes Cedex, France {juan.cadavid,benoit.baudry}@inria.fr 2 IRISA, Université de Rennes 1 Triskell Team, Rennes, France {benoit.combemale}@irisa.fr Abstract. MOF and OCL are commonly used for metamodeling: MOF to model the domain structure, and OCL for the well-formedness rules. Thus, metamodelers have to master both formalisms and understand how to articulate them in order to build metamodels that accurately capture domain knowledge. A systematic empirical analysis of the conjunct use of MOF and OCL in existing metamodels could help metamodelers un- derstand how to use these formalisms. However, existing metamodels usually present anomalies that prevent automatic analysis without prior fixing. In particular, it often happens that both parts of the metamodel (MOF and OCL) are inconsistent. In this paper, we propose a process for analyzing metamodels and we report on the pre-processing phase we went through on 52 metamodels in order to get them ready for automatic empirical analysis. 1 Introduction The Meta-Object Facility (MOF) [2] and the Object Constraint Language (OCL) [4] are commonly used for metamodeling: MOF to define a domain model and OCL to define well-formedness rules in this domain. At first glance, the roles of both standards seem well-delimited, yet many conceptual decisions can be imple- mented in either one, as we will demonstrate. For this reason, their combined us- age has revealed several styles as observed in the panorama of developed mature DSMLs (Domain Specific Modeling Languages). Systematic empirical analysis of how these standards are combined in publicly available metamodels would help understand their usage and propose new methodologies and techniques to assist domain experts when building a new metamodel. Empiric analysis can be es- tablished through the systematic collection of metrics over existing metamodels. However, there currently exists no metrics that relate MOF and OCL and there exists no tool to automatically compute metrics on metamodels. Another issue is related to the lack of homogeneous formats to support the automatic analysis of MOF/OCL based metamodels. For example, OCL well-formedness rules are provided in various formats (txt, annotations in Ecore, OCL model). Also, be- cause MOF and OCL parts are not always stored in the same format, both parts - 48 - MODELS'11 Workshop - EESSMod 2011 2 Juan Cadavid, Benoit Baudry, Benoı̂t Combemale of the metamodel tend to be inconsistent. When gathering data for empirical analysis, it is thus necessary to fix it prior to metrics computation. This paper aims at illustrating the challenges of the conjunct usage of MOF and OCL for metamodeling. We propose initial metric definitions and analysis methodology to empirically understand how both formalisms are related and conjunctly used in existing metamodels. We have collected 52 metamodels for which we have learned a few initial lessons by manually analyzing and fixing them in order to get them ready for automatic measurement. In particular we have found and fixed a number of inconsistencies in OCL invariants defined in OMG (Object Management Group) standard metamodels. Section 2 presents the motivation for this study. Section 3 introduces the problem statement, introduces definitions as well as the presentation of the two standards. Section 4 presents our research process. Section 5 and 6 present the first data findings and conclusions relevant to the first phase of our research process. 2 Motivation This section illustrates the issues for the definition of a correct and precise meta- model through an example. The model in figure 1, expressed in the basic version of MOF called EMOF (Essential MOF), specifies the concepts and relationships of the Petri net domain. A PetriNet is composed of several Arcs and several Nodes. Arcs have a source and a target Node, while Nodes can have several in- coming and outgoing Arcs. The model distinguishes between two different types of Nodes: Places or Transitions. PetriNet name: EString arcs nodes 0..* 0..* outgoings Arc Node 1 source 0..* name: EString weight: EInt 1 target 0..* ingoings Transition Place marking: EInt Fig. 1. MOF-based domain structure for Petri nets This model captures every necessary concept to build Petri nets. However, there can also exist valid instances of this model that are not valid Petri nets. For example, the model does not prevent the construction of a Petri net in which an arc’s source and target are only places, instead of linking a place and a transition. Thus, the sole model is not sufficient to precisely model the specific domain of Petri nets, since it still allows the construction of models that are not valid in this domain. The model needs to be enhanced with additional properties to capture the domain more precisely. The following well-formedness rules, expressed in OCL, show some mandatory properties of Petri nets. - 49 - MODELS'11 Workshop - EESSMod 2011 Empirical evaluation of the conjunct use of MOF and OCL 3 1. noEqualN amesInv: Two nodes cannot have the same name. context P e t r i N e t inv : s e l f . nodes−>f o r A l l ( n1 , n2 | n1 <> n2 implies n1 . name <> n2 . name ) 2. noSameEndT ypesInv: No arc may connect two places or two transitions. context Arc inv : s e l f . s o u r c e . oclType ( ) <> s e l f . t a r g e t . oclType ( ) 3. placeM arkingP ositiveInv: A place’s marking must be positive. context P l a c e inv : s e l f . marking >= 0 4. arcW eightP ositiveInv: An arc’s weight must be strictly positive. context Arc inv : s e l f . w e i g h t > 0 In our study we consider that the metamodel for Petri nets is the composi- tion of the model and the associated well-formedness rules. We learn from this example that the construction of a precise metamodel, that consistently captures a domain, requires: (i) mastering two formalisms1 : EMOF for concepts and re- lationships; OCL for properties; (ii) building two complimentary views on the domain model; (iii) finding a balance between what is expressed in one or the other formalism, (iv) keeping the views, expressed in different formalisms, consis- tent. This last point is particularly challenging in case of evolution of one or the other view. One notable case from the OMG and the evolution of the UML stan- dard is that the AssociationEnd class disappeared after version 1.4 in 2003, but as late as version 2.2, released in 2009, there were still OCL expressions referring to this metaclass [11]. In the same manner, the OCL 2.2 specification depends on MOF 2.0, however a particular section of the specification defining the binding between MOF and OCL [4, p.169] makes use of the class ModelElement which only existed until MOF 1.4. 3 Research problem 3.1 Definitions This section defines the terms we use to designate the focus of our analysis. The relationship between a model and a metamodel can be described as shown in figure 2 [3]. Here the conformsTo relation is a predicate function that returns true if all objects in the model are instances of the concepts defined in the metamodel, all relations between objects are valid with respect to relationships defined in the metamodel and if all properties are satisfied. 1 One can notice that some properties could have been modeled with EMOF by choos- ing another structure for concepts and relationships. However, the number of con- cepts and relationships would have increased, hampering the understandability of the metamodel and increasing the distance between the metamodel and a straight- forward representation of domain concepts. - 50 - MODELS'11 Workshop - EESSMod 2011 4 Juan Cadavid, Benoit Baudry, Benoı̂t Combemale Definition 1. Metamodel. A metamodel is defined as the composition of: – Concepts. The core concepts and attributes that define the domain. – Relationships. Relationships that specify how the concepts can be bound together in a model. – Well-formedness rules. Additional constraints that restrict the way con- cepts can be assembled to form a valid model. In this study, we consider metamodels defined with OMG standards. We distin- guish two parts as defined below. Definition 2. Metamodel under study. For this work, a metamodel is de- fined as the composition of: – Domain structure. An EMOF-compliant model portraying the domain concepts as metaclasses and relationships between them. – Invariants. Well-formedness rules that impose invariants over the domain structure and that are expressed in OCL. conformsTo ▶ 0..* Model (M) MetaModel (MM) conformsTo(m:M) : Bool Fig. 2. Model & MetaModel Definition with Class Diagram Notation 3.2 Summary of EMOF and OCL Figure 3 displays the structure of EMOF [2]. EMOF allows to specify the concepts of a metamodel in a Package. This Package contains Classes and Propertys to model the concepts and relationships. The Propertys of a Class can be either: attributes of type Boolean, String or Natural; or references to other Classes, in this case the Property is of type another Class. Figure 4 displays the structure of OCL expressions [4] that can be used to constrain the structure defined with EMOF. The most noticeable constructs for OCL expres- sions are: the ability to declare Variables, whose type is a concept modeled with EMOF; the ability to use control structures such as IfExp and LoopExp; the ability to have composite OCL expressions, through CallExps. Figure 5 dis- plays the connection between EMOF and OCL [4, p.169]. This figure specifies that it is possible to define Constraints on Elements (everything in EMOF is an Element, cf. figure 3). They can be defined as Expressions, and one partic- ular type of expression is ExpressionInOCL, an expression defined with OCL. The most important concept is the notion of ExpressionInOCL that binds an Element coming from an EMOF model on one hand to an OclExpression on the other hand. The existence of this binding between formalisms is essential for metamodeling: this is how two different formalisms can be smoothly inte- grated in the construction of a metamodel. This binding is also what allows us to automatically analyze metamodels built with EMOF and OCL. - 51 - MODELS'11 Workshop - EESSMod 2011 Empirical evaluation of the conjunct use of MOF and OCL 5 Element NamedElement name: String Package type Type 1 TypedElement 0..* 0..* ownedType nestedPackage Classifier DataType Property lower: Natural⊤ = 1 Class upper : Natural⊤ = 1 owner isAbstract: Boolean = false isOrdered : Boolean = false {ordered} 0..* isComposite: Boolean = false Boolean String Natural⊤ ownedAttribute default: String = "" 0..* superClass 0..1 opposite Fig. 3. The EMOF Core with Class Diagram Notation Core::TypedElement OclExpression source 0..1 0..1 body CallExp LiteralExp IfExp VariableExp TypeExp referredVariable referredType 0..1 0..1 FeatureCallExp LoopExp Variable Core::Classifier 0..1 iterator 0..1 result representedParameter 0..1 PropertyCallExp OperationCallExp IteratorExp IterateExp Core::Parameter referredProperty referredOperation 0..1 0..1 Core::Property Core::Operation Fig. 4. OCL Expression metamodel ModelElement 0..* 0..* Constraint (from Core) +constrainedElement +constraint (from Core) 0..1 1 +body Classifier Expression (from Core) (from DataTypes) OclExpression +bodyExpression ExpressionInOcl 1 Fig. 5. OCL and MOF binding - 52 - MODELS'11 Workshop - EESSMod 2011 6 Juan Cadavid, Benoit Baudry, Benoı̂t Combemale 3.3 Metrics In order to understand the conjunct usage of these two standards, we aim at defining the following metrics. – Size of the metamodel: The number of constructs that a metamodel provides can change dramatically from one language to the other. Such measure has to be compared when evaluating several metamodels from diverse domains. – Size of the invariants set: Metamodels can portray different levels of com- plexity; highly complex domains require a large number of OCL formulae to express their logic, whereas lesser complex domains will express their knowledge with a lower number of constraints. With this metric we aim at understanding the different levels of such complexity. – Invariant complexity with respect to the underlying domain structure: Some metamodels contain lengthy and complex well-formedness rules, while others seem to define them using simple expressions. We measure how many EMOF elements are used in each OCL invariant and thus the quantity of model elements involved in a constraint. – Invariant complexity with respect to the OCL syntax metamodel: In order to extract the effective subset of the OCL language that is used in DSML de- velopment, we intend to query the invariants for the specific OCL expression types they use. 4 Analysis of MOF and OCL in metamodels The data sets and metrics specified in the previous section are used to build a tool to perform the computations which will provide the data to perform analysis on a metamodel. 4.1 The Global Process for Analysis Automation Figure 6 shows the overall process to analyze a metamodel. The process is com- posed of three activities with their own tools: 1. If the OCL invariants are not defined in the OCL/XMI format (extension .oclxmi in figure 6), the first activity consists in preprocessing (activity OCL Parsing). It is performed depending on the input format of the OCL invari- ants (extension .ocl in figure 6). We have used OclInEcore2 , a textual editor for Ecore files. 2. The next step consists in using an in-house built tool to automatically com- pute the metrics over the metamodel (activity Metrics Computation). Such tool would take as an input the metamodel composed of the domain struc- ture expressed in Ecore, and the invariants expressed in OCL and produce a CSV file containing all the metric values for the input metamodel. 2 OclInEcore, cf. http://wiki.eclipse.org/MDT/OCLinEcore/. - 53 - MODELS'11 Workshop - EESSMod 2011 Empirical evaluation of the conjunct use of MOF and OCL 7 3. The metric values are finally analyzed with R3 (activity Statistical Analysis). R is an open-source language for statistical applications, which provides sev- eral functionalities to run analysis and create plots, both one-variable and multi-variable. We provide a set of generic scripts that could be used for any CSV file produced with our in-house built tool. These scripts automate the production of graphics to assist analysis. no .oclxmi available? OCL pre-process .ocl yes Metrics Compute metrics Computation Tool .oclxmi .ecore R Run statistical analysis .csv Graphical analysis Fig. 6. SPEM Process for Metamodel Automatic Analysis Currently our research has accomplished the encircled parts of the diagram, performing the preparation of the data of the metamodels presented in the next section. 5 Data setup and preprocessing Understanding the use of EMOF and OCL requires a sample containing data from repositories in diverse backgrounds. The sample subjects must come from standard bodies, academia and industry altogether. 5.1 Experimental data setup Table 1 details a list of standard specifications coming from different sources, defining domain-specific languages. The first two columns contain the name and source; the first group comes from the OMG. The following group presents meta- models taken from academic research; first a metamodel for the B language cre- ated at IMAG; SAD3 is a software architecture component model created at 3 R, cf. http://www.r-project.org/ - 54 - MODELS'11 Workshop - EESSMod 2011 8 Juan Cadavid, Benoit Baudry, Benoı̂t Combemale ENSTA Bretagne. In the last group, metamodels MTEP and XMS are meta- models created by Thomson Video Networks for encoding standards for video hardware. SAM is a metamodel from the Topcased open source software project. The third column counts the number of metamodels. In the OMG group, specifi- cations define large modeling languages, normally structured in packages, there- fore we treat each one of these as a separate metamodel. In the remaining cases, each specification contains only one metamodel. The fourth column mentions the formalism used to express well-formedness rules. As expected, we chose specifi- cations using OCL. The fifth column shows the different standards that exist to specify the domain structure. Of our main interest, Ecore is a lightweight imple- mentation of EMOF [5], providing equally an XMI-based persistence mechanism. The sixth column presents the format for expressing invariants in OCL. These are found either as separate .ocl text files or embedded in .ecore as annota- tions. Availability of the Ecore format and some of the mentioned forms of OCL invariants are necessary to enable the automation of the metrics computation. Table 1. Specifications containing sample metamodels. Name Source Meta- Expression of Domain Struc- OCL invariants models Constraints ture format format UML OMG 13 Text and OCL Ecore Annotations in Ecore CCM OMG 4 Text and OCL Ecore Text in docu- mentation OCL OMG 4 Text and OCL Ecore Text in docu- mentation MOF OMG 2 Text and OCL XMI .ocl text file CWM OMG 21 Text and OCL Ecore Text in docu- mentation DD OMG 3 Text and OCL XMI Annotations in Ecore B language Academic 1 Language specifi- Ecore .ocl text file Research cation and OCL SAD3 Academic 1 OCL Ecore .ocl text file Research MTEP Industry 1 OCL Ecore .ocl text file XMS Industry 1 OCL Ecore .ocl text file SAM Industry 1 OCL Ecore .ocl text file 5.2 Preprocessing data for analysis The preprocessing step is based on an automated Java program that takes an Ecore/XMI metamodel with associated OCL invariants stored in their available format for each metamodel and its OCL invariants, according to table 1 and produces as output an Ecore/XMI metamodel with OCL/XMI invariants, where - 55 - MODELS'11 Workshop - EESSMod 2011 Empirical evaluation of the conjunct use of MOF and OCL 9 all the OCL invariants that remain are syntactically correct (parse without errors using the Eclipse OCL parser [1]). The OCL/XMI format presents the abstract syntax tree of each OCL expression. At the end of the preprocessing step, every metamodel can be automatically analyzed for metrics computation. The metrics computation tool will be able to compute metrics on the MOF structure and the OCL invariants, and consequently analyzable data is emitted as output. Throughout this process we have observed the following issues. Different storage formats Ecore is the de-facto standard based on the XMI format used to express the domain structure of a metamodel, yet there is no ev- idence of such a format to store OCL expressions for a metamodel. Besides OCL text files, invariants are also added as annotations; however these only consist of maps of string-to-string entries, which can themselves present different schemas. Our preprocessing program automatically detects the format and proceeds to parse and produce the previously mentioned output. Different OCL syntaxes Different parsers allow or reject certain OCL con- structs [7]. To enable automation analysis of the OCL expressions, such vari- ations must be streamlined to satisfy the precise syntax required for Eclipse OCL; this was performed by replacing the unrecognized constructs by its ac- cepted equivalents; for example, the use of the minus “-” operator to exclude elements from a collection, instead of the exclude operation. Errors in invariants In many cases, OCL invariants are added to a metamodel with the sole purpose of documentation and might not be checked for correct- ness. The studied sets of invariants from the selected specifications contained incorrect OCL expressions, containing errors from syntax (invalid use of OCL constructs) or semantics (references to non-existent model elements from the domain structure). Table 2 presents trivial errors and thus capable of being cor- rected, as well as those that could not be fixed, since it would require knowledge from the domain expert. 6 Conclusions In this article we have illustrated several issues that arise when metamodeling with the MOF and OCL formalisms. Our purpose is to learn how these for- malisms are used in existing metamodels, in order to assist metamodelers in the future. The rest of the paper discusses a set of metamodels that we have gathered from different sources (OMG, open source project, industry) and the lessons we have learned while manually analyzing and fixing these metamodels to get them ready for automatic analysis. Most of the problems to automate the analysis over the metamodels are that OCL well-formedness rules first are pro- vided in a variety of formats and secondly are often inconsistent with the MOF domain model. The next step for this work is to build a tool that can automat- ically analyze metamodels. This tool should compute a set of metrics about the - 56 - MODELS'11 Workshop - EESSMod 2011 10 Juan Cadavid, Benoit Baudry, Benoı̂t Combemale Table 2. Corrected errors in OCL invariants. Corrected errors Frequency Missing parenthesis 94 Notation for enumeration literals 51 Missing variable in forAll body 30 Missing mandatory typecast (oclAsType()) 22 Typos in pointers to metaclasses and properties 15 Typos in OCL operations invocation 13 Use of ’-¿’ instead of ’.’ for non-collection properties 10 Use of ’.’ as a shortcut for ’collect’ 9 Use of unescaped OCL keywords 6 ’if’ expression without ’else’ and ’endif’ 5 Use of ’notEmpty’ and ’isEmpty’ for non-collection properties instead of 4 oclIsUndefined() Treating of boolean values as literals ’#true’ and ’#false’ 3 Use of ’union’ instead of ’concat’ to concatenate strings 2 Errors remaining incorrect Frequency Pointers to nonexistent properties/operations 133 Invariants with a context metaclass in an outside metamodel 2 Reference to undefined stereotypes 1 conjunct use of MOF and OCL. These metrics will be the basis for our empirical investigation. Such empirical work will lead to complement existent guidelines for metamodelers [8, 9, 6, 10] suggesting an appropriate use of MOF and OCL. References 1. Eclipse ocl, http://www.eclipse.org/modeling/mdt/?project=ocl 2. Omg meta object facility core, v2.0 (2006) 3. A Framework to Formalise the MDE Foundations. In: TOWERS. pp. 14–30 (2007) 4. Omg object constraint language, v2.2 (2010) 5. Budinsky, F., Merks, E., Steinberg, D.: Eclipse Modeling Framework 2.0. Addison- Wesley Professional (2009) 6. Garcia, M.: Efficient integrity checking for essential mof+ ocl in software reposito- ries. Journal of Object Technology 7(6) 7. Gogolla, M., Kuhlmann, M., Büttner, F.: A benchmark for ocl engine accuracy, determinateness, and efficiency. In: MoDELS. LNCS, vol. 5301, pp. 446–459 (2008) 8. Karsai, G., Krahn, H., Pinkernell, C., Rumpe, B., Schindler, M., Völkel, S.: Design guidelines for domain specific languages. In: The 9th OOPSLA workshop on DSM (2009) 9. Kovari, P.: Explore model-driven development (mdd) and related approaches: Ap- plying domain-specific modeling to model-driven architecture. IBM Developer- works (2007) 10. Loecher, S., Ocke, S.: A metamodel-based ocl-compiler for uml and mof. Electron. Notes Theor. Comput. Sci. 102, 43–61 (November 2004), http://dx.doi.org/10. 1016/j.entcs.2003.09.003 11. Selic, B.: Uml 2 specification issue 6462. http://www.omg.org/issues/issue6462.txt (2003), updates dating until 2008. - 57 -