=Paper=
{{Paper
|id=Vol-392/paper-1
|storemode=property
|title=On the Quantitative Assessment of Class Model Compositions: An Exploratory Study
|pdfUrl=https://ceur-ws.org/Vol-392/Paper1.pdf
|volume=Vol-392
}}
==On the Quantitative Assessment of Class Model Compositions: An Exploratory Study==
<pdf width="1500px">https://ceur-ws.org/Vol-392/Paper1.pdf</pdf>
<pre>
MODELS`08 Workshop ESMDE


                        On the Quantitative Assessment of
                Class Model Compositions: An Exploratory Study

                              Kleinner Oliveira1, Alessandro Garcia2, Jon Whittle2
                                          1 Computer Science Department

                                  Pontifical Catholic University of Rio de Janeiro
                                             Rio de Janeiro, RJ - Brazil
                                               kleinner@gmail.com
                                             2 Computing Department

                                         Lancaster University – InfoLab 21
                                                  Lancaster - UK
                                         {alessandro,jon}@comp.lancs.ac.uk


                  Abstract. Model composition can be viewed in model-driven engineering as an
                  operation where a set of activities should be performed to merge two input
                  models into a single output model. The latter aggregates syntactical and
                  semantic properties from the original models. However, given the growing
                  heterogeneity of model composition strategies, it is particularly challenging for
                  designers to objectively assess them given a particular problem at hand. The
                  key problem is that there is a lack of canonical set of indicators to quantify
                  harmful properties associated with the output models, such as composition
                  conflicts and modularity anomalies. This paper presents an inquisitive study in
                  order to capture an initial set of metrics for assessing and comparing model
                  composition strategies in two case studies. We apply a number of metrics to
                  quantify different conflict types and modularity properties arising at composite
                  class models produced with override and merge-based strategies. We have
                  observed that some of the quantitative indicators were effective to pinpoint
                  when a model composition strategy is not properly chosen. In some cases, the
                  output models exhibited non-obvious undesirable conflicts and anti-modularity
                  factors.

                  Keywords: Model Composition, MDE, Metrics, Assessment.


           1 Introduction
           Given the central role that model composition plays in model-driven engineering
           nowadays, researchers are increasingly focusing on defining and improving
           alternative techniques for composing structural or behavioural models. Model
           composition can be defined by a composition operation, a special type of model
           transformation, that takes two models Ma and Mb as input models and combines their
           elements into an output model Mab. Several mechanisms have been proposed in order
           to put model composition into practice (e.g., see [2, 3, 4, 5, 6]), based on related work


                                                         1
MODELS`08 Workshop ESMDE


           in many different domains, such as database integration [7], aspect-oriented modeling,
           model transformation , and merging of state charts.
              However, not much attention has been paid to the quality assessment of such model
           composition techniques. Even worse, according to [5] there is very little experience
           that can be used to determine the worth of current approaches. Given the growing
           heterogeneity of model composition strategies [3], such as override and merge, it is
           intrinsically difficult to systematically quantify undesirable phenomena that arise in
           the output composite models, including abstract syntax conflicts and semantic
           clashes. It is particularly challenging for researchers or designers to objectively assess
           the output model and the composition strategy itself given the problem at hand.
              In this paper we start to tackle such needs through an exploratory study (Section 2)
           on assessing composition strategies for class models. The goal is to inquisitively
           identify an initial set of indicators for the evaluation and comparison of alternative
           composition strategies. We have applied a metrics suite (Section 3) to quantify the
           conflicts rate and modularity properties arising in class model compositions based on
           merge and override. Our long-term goal is to define a comprehensive assessment
           framework intended to guide researchers and designers on the assessment of model
           composition techniques. In our study, we have detected that some of the used
           quantitative indicators were effective to determine when a model composition
           strategy is not properly chosen (Section 4). In certain cases, the output models
           exhibited non-obvious syntactic and semantic conflicts and a number of modularity
           anomalies not existing in the original input models. We also contrast the initial
           findings of our exploratory investigation with related work (Section 5). Finally, we
           present some concluding remarks (Section 6).


           2 Experimental Procedures

           This section describes the experimental procedures used in our exploratory study.
           Two case studies were performed in order to investigate possible problems associated
           with the use of composition strategies for class models. The first study comprises a set
           of real-life models for an Automated Highway Toll System. In this case, different
           members of a distributed software development team were in charge of modeling
           different use cases of the system. They would need to cope with model composition
           problems when bringing the use cases together.
              There are three packages, namely (for simplification) Packages A, B, and C, where
           each of them implements a set of use cases. There are two explicit compositions
           defined for these packages (Figure 1). Package A presents a UML class diagram that
           specifies basically functionalities related to: create user account, add funds, and stop
           toll booth. Package B specifies functionalities related to: synchronizes accounts,
           process credit card, transponder and vehicle. While the Package C specifies
           functionalities related to add transponder and start toll booth. The goal is to produce
           an output Package that gathers all functionalities together. To this end, we need to
           merge the Package A, B and C according to a particular composition strategy
           (override or merge specifically). The choice of a particular composition strategy is


                                                       2
MODELS`08 Workshop ESMDE


           very important to produce sound output models while not introducing modularity
           impairments.


               Figure 1. Example of composition of an automated highway toll system.


               The second study consists of a literature-based [11] example of a calculator that is
           depicted in Figure 2. It has two packages: (1) Package A presents a UML class
           diagram that specifies a Calculator to implement two basic functionalities: sum and
           subtraction; and (2) Package B represents a Calculator that implements three
           functionalities: sum, division, and multiplication. The goal is to produce an output
           Calculator that contains four operations: sum, subtraction, division, multiplication. To
           do this, we need to put these functionalities together in a single Package by merging
           Package A and Package B.
               Our aim is to assess in which ways the composition strategies (override and merge
           specifically) impact on the input models’ properties. The merge strategy usage is
           more appropriate when the input design models contain specifications for different
           requirements of a software system. On the other hand, the override strategy can be
           indicated when elements in an existing model need to be somehow evolved or
           changed. The semantics of the override strategy [3] can be briefly defined as: (i) for
           all elements in the Package A and Package B that are corresponding, the Package A’s
           element should override its corresponding element; and (ii) elements in the Package A
           and B that are not involved in a correspondence match remain unchanged and they are
           inserted into the output model (Package AB).
                The semantics of the merge strategy [3] can also be defined as: (i) for all
           elements in the Package A and Package B that are corresponding elements, they
           should be combined; and (ii) elements in the Package A and B that are not involved in
           a correspondence match remain unchanged and they are inserted into the output
           model (Package AB). However, when we put these elements together in the output
           model (as the result of either overriding corresponding elements or adding elements in


                                                      3
MODELS`08 Workshop ESMDE


           the Package AB directly) may result in some problems such as semantic clashes. We
           will propose a metrics suite to provide ways to assess how useful or harmful such
           composition relationships are following a specific composition strategy. The goal is to
           provide initial support for designers and researchers objectively analyze which
           composition strategy minimizes the conflicts rate while maximizing modularity
           benefits in the output model.


                               Figure 2. Example of composition of calculators


           3   A Metrics Suite for Model Composition

           This section presents the metrics suite defined for assessing the model compositions
           in our exploratory study. This framework guides the researchers for assessing and
           coping with difficulties of UML model composition assessment.

           3.1 Quantifying Composition Conflicts Rate
           Number of Abstract Syntax Conflicts (NAbSC)
           This metric counts the number of abstract syntax conflicts in a class model. Abstract
           syntax conflicts occur when a model does not comply with the UML metamodel’
           metaclasses and their structural relationships. It is a well-known problem, for
           instance, in graph transformations. The goal is to quantify and check inconsistencies
           of the target models against the UML metamodel. Once all the conflicts have been
           addressed (i.e. NAbSC = 0), the output model can be considered as compliant to he
           UML metamodel. Otherwise, the output model is an invalid or non-compliant model.

                        SM
                                   where:
           NAbSC = ∑ ki ,          SM – a set of model elements.
                        i =1       ki – the number of AbSC of the i-th model element.


                                                     4
MODELS`08 Workshop ESMDE


           Number of Semantic Clash Conflicts (NSCC)
           This metric counts the number of semantic clash conflicts in a model. A semantic
           clash conflict occurs when model elements have different names, however, with same
           semantic value. We need to quantify such conflicts in order to identify unexpected
           semantic clash problems in the output models. For instance, models with semantic
           clashes may become ambiguous and inconsistent. In addition, it may affect the model
           understandability or complicate some tasks such as model transformation and code
           generation. If the NSCC has a high value, it may imply that the output model is
           useless. This metric is given by the formula:
                                    where:
                 1 SM
           NSCC = ∑ wi ,            SM – a set of model elements.
                 2 i=1              wi – a boolean value that represents if an i-th model

           Number of Compositions of a Model Element (NCME)
           This metric counts the number of compositions that a model element has participated.
           The number of compositions may be an effective indicator of semantic mix conflict.
           When model elements are composed, their semantics are mixed and it may lead to
           unsound model elements. For example, a design pattern assigns roles to their
           participant classes, which define the functionality of the participants in the pattern
           context. When UML class diagrams are merged such roles may be modified having
           negative impacts on quality attributes of the design pattern. This metric is given by
           the formula:
           NCME = M ,           where:
                                M – the number of compositions that a model element has
                                participated during the composition process.

           Number of Behavioral Feature Conflicts (NBFC)
           This metric counts the number of behavioral feature conflicts in a class. A behavioral
           feature conflict may occur when a class: (1) has two (or more) methods that are used
           with the same purpose, and (2) refers to a method that no longer exists, or exists
           under a different behavior that is not expected. The high NBFC measure may
           represent some undesirable model composition phenomena. This metric is determined
           by the formula:
           NBFC = B , where:
                             B – the number of behavioral feature (method) conflicts in a class


           Number of Unmeaning Model Elements (NUME)
           This metric counts the number of unmeaning model elements in a model. During the
           composition process, the model elements are manipulated and sometimes some
           elements are not referred nor make reference to other elements, that is, they are
           isolated. This metric is given by the formula:

           NUME = U ,           where:
                                U – the number of unmeaning model element in a mode.


                                                     5
MODELS`08 Workshop ESMDE


           3.2 Quantifying Modularity Anomalies in Composite Models

           We have also applied some classical metrics intended to measure some modularity-
           related characteristics of a class, such as coupling degree, number of attributes, and
           operations. These metrics are described in Table II. Due to space constraints, these
           metrics are briefly presented. In fact, most of these metrics (e.g. NATC and CBC)
           were originally defined by other authors and their definitions can be found in their
           respective publications [14, 17]. The goal of using these metrics is to assess how the
           composition process affects the output models regarding some design principles, such
           as low coupling, when we specify different composition strategies. In addition, in
           many cases, composition strategies can artificially lead to the introduction of design
           anomalies (“bad smells”), such as “Temporary Field”; this bad smell can be identified
           comparing the NATC of a class in the output model against the respective classes in
           the input models used for the composition.
                                 Table I. The Class-level Modularity Metrics
                        Metric                                      Description
            Number of Attributes in a         Counts the number of attributes in a class.
            Class (NATC)
            Number of Operations in a Class   Counts the number of operation in a class.
            (NOPC)
            Number of Associations between    Counts the number of associations per class; the new
            Classes (NASC)                    language produced from a model composition may not
                                              be consistent with the domain defined previously.
            Coupling between Classes (CBC)    Counts the number of all dependencies of a class to
                                              other classes in the system.
            Number of Subclasses of a Class   Counts the number of children of a class.
            (NSUBC)
            Number of Superclasses of a       Counts the parents of a class.
            Class (NSUPC)


           4. Results and Discussion
           Quantitative assessment is an effective way to supply measures and evidence that may
           improve our understanding about model-driven engineering techniques, in our case,
           model composition. Although quantitative studies have some disadvantages, they are
           very useful because they boil a complex situation down to simple numbers that are
           easier to grasp and discuss. This section provides a general analysis and discussion of
           the data that have been collected from applying the set of defined metrics to model
           compositions derived in the two case studies (Section 2).
                Graphics are used to represent the data gathered in the measurement process. The
           Y-axis presents the absolute values gathered by the metrics. Each pair of bars is
           attached to an integer value, which represents the measure. The X-axis specifies the
           metric itself. These graphics help analyzing how the composition of the input models
           affects (or not) the output model regarding a particular metric. These graphics support
           an analysis of how the change of composition strategy affect (or not) the output
           model. The results shown in the graphics were gathered according to the model point


                                                       6
MODELS`08 Workshop ESMDE


           of view; that is, they represent the total of metric values associated with all the model
           elements for each model (output model) that is being considered.
                Figure 3 depicts the overall composition results between Package A, B and C of
           the Automated Highway Toll System following the override and merge strategy. We
           compare the output model produced by the override and merge strategy and it is
           possible to observe that no measure was detected to the metrics such as NSUNC,
           NSUPC, NAbSC, and NSCC. On the other hand, the NOPC metrics have a higher
           measurement following merge strategy than override strategy. This observation can
           indicate a negative point considering reusability. Although not showing differences
           between each other with regard to the NASC, NSUBC, and NSUPC metrics, the
           output models presents significant differences, for example, the Package BAC
           produced by the merge strategy presents all functionalities defined in the Package A,
           B and C, while the Package BAC produced by the override strategy contains only
           functionalities defined in the Package B.
                According to the measures concerning the number of associations between
           classes (NASC), the number of abstract syntax conflicts (NAbSC), and the number of
           subclasses (NSUBC) and superclasses of a class (NSUPC), no significant difference
           was detected in favor of a specific composition strategy when applied to the two case
           studies. The measures of NSUBC and NSUPC can be easily explained because both
           case studies do not exhibit a hierarchy-depth in their inheritance relationships. On the
           other hand, the measure of NASC supplies evidence of that number of associations of
           a Class contains is independent of type of composition strategy.
              Package BAC produced by the override strategy provided higher results in two
           measurements, NCME and NUME. When EntranceLightInterface is inserted in the
           Package BAC this class becomes unmeaning, because of the class that it makes
           relationship, PackageA.BoothController, no longer exists (PackageA.BoothController
           is overridden by the PackageB.BoothController). The NASC and CBC measurement
           have same values. So the coupling in the Package BAC is independent of the kind of
           composition strategy in this case.


                  Figure 3. Comparison between output models produced following
                                 override and merge strategy.


                                                      7
MODELS`08 Workshop ESMDE


              The measurement regarding the output Calculators has provided some results that
           are depicted in Figure 4. We compare the output Calculators produced following
           override and merge strategy. Although not showing differences between each other
           regarding the NASC, NSUBC, and NSUPC metrics, the output models present
           significant differences. The Package AB produced by the merge strategy has higher
           values for some metric measures such as NATC, NOPC, CBC, NSCC, NCME, and
           NBFC. On the other hand, Package AB produced by the override strategy provided
           higher results in one only measure, NUME, because two enumerations,
           CalculatorType and ExpressionType, are unmeaning in the Package.


             Figure 4. Comparison between calculators produced following override and
                                        merge strategy.
                According to the data gathered, the most useful metrics in this exploratory study
           were as follows. First, number of semantic clash conflicts (NSCC) as it indicated the
           presence of a significative number of negative semantic clashes. This measure served
           as warnings of not helpful output models using a particular composition strategy
           through the identification of ambiguity and inconsistency arising in semantic clashes.
           The observation of Figure 3 provides evidence of the effectiveness of this metrics.
           Second, number of unmeaning model elements (NUME) supplied evidence that
           override strategy is potentially harmful when used beyond the purposed of evolving
           or changing an existing model. The output models, based on override-driven
           compositions, had elements that are not referred nor make reference to other
           elements, that is, they are isolated (unmeaning in the package). Thus, regarding this
           metric the better strategy to be applied in the case studies was the merge strategy.
                Finally, after observing all the conflict rate and modularity results, the metrics
           indicated that the merge strategy is the best strategy to be used in our two case
           studies. This finding is also mainly based on the measures of NUME and NSCC
           (discussed above). Moreover, we should highlight that, as expected, it is particularly
           challenging for researches to objectively assess the output models and identify
           conflicts associated with some metrics such as NUME, NAbSC and NSCC.
           Therefore, the issue of improving automated support for measuring conflict rates
           should be a topic of future work.


           5. Related Work
           There is little related work focusing on either the quantitative assessment of models in
           general or on the quantitative assessment of model compositions. Up to now, most


                                                      8
MODELS`08 Workshop ESMDE


           approaches involving model composition rest on subjective assessment criteria. Even
           worse, they lead to dependence on experts who have built up an arsenal of mentally-
           held indicators to evaluate the growing complexity of design models in general [5].
           As a consequence, the truth is that modelers ultimately rely on feedback from experts
           to determine “how good” the input models and their compositions are. According to
           [5], the state of the practice in assessing model quality provides evidence that
           modeling is still in the craftsmanship era and when we assess model composition this
           problem is accentuated.
              To the best of our knowledge, the need for assessing models during a model
           composition process neither have been pointed out nor even proposed by current
           model composition techniques [2, 3, 4, 8, 9]. For example, the UML built-in
           composition mechanism, namely package merge, does not define metrics or criteria to
           assess the merged UML models. Moreover, it has been found to be incomplete,
           ambiguous and inconsistent [6].
              The lack of quantitative indicators for model compositions hinder our process of
           understanding better side effects peculiar to certain model composition strategies.
           Many different types of metrics have been developed during the past few decades for
           different UML models. These metrics have certainly helped designers analyze their
           UML models to some extent. However, as researchers’ focus has shifted to the
           activities related to model management (such as model composition, evolution and
           transformation), hence the shortcomings and limitation of UML model metrics have
           become more apparent. Some authors [1, 12, 13-18] have proposed a set of metrics
           that consider UML model’s properties. These works have shown that their measures
           satisfy some properties expected for good measures of design models. However, these
           metrics can not be employed to assess problems that may arise in a model
           composition process such as semantic clashes.


           6. Concluding Remarks and Future Work
           If models are seen as primary development and transformation artifacts in model-
           driven engineering, then software designers naturally become concerned with how
           their quality is evaluated. In order to be considered for use in mainstream software
           development, model composition techniques should be supplemented with quality
           criteria and indicators. These elements are fundamental for developing and analyzing
           composition processes and output models. We presented an exploratory study and an
           initial metrics suite for assessing class model compositions generated by a selected set
           of model composition strategies. Such metrics are applied to output models and some
           analysis are performed according to the data gathered.
              Our initial evaluation has demonstrated the feasibility of our candidate set of
           metrics for quantifying modularity properties and conflict rates in composition
           processes. Obviously, more investigations on its applicability to large UML model
           compositions are required. Further empirical evaluations are indeed fundamental to
           validate our quantitative indicators in real-world design settings involving UML
           model compositions. Thus, future work will concentrate on designing and carrying
           out a family of empirical studies to assess, for example, compositions of the most
           popular OMG’s UML profiles in realistic scenarios. Finally, we should point out that


                                                      9
MODELS`08 Workshop ESMDE


           model composition assessment is in initial stage and there is very little experience that
           can be used to determine the feasibility of current approaches. Moreover, its
           empirical-driven improvement, supported by a comprehensive set of well-validated
           metrics suite, is absolutely necessary to the evolution of model-driven engineering
           field. This work represents one of the first stepping stones towards this end.


           References
           1. J. Aranda, N. Ernst, J. Horkoff, and S. Easterbrook. A Framework for Empirical Evaluation
              of Model Comprehensibility. In International Workshop on Modeling in Software
              Engineering (MiSE), pp. 20-26, May, 2007.
           2. G. Brunet, M. Chechik, S. Easterbrook, S. Nejati, N. Niu, and M. Sabetzadeh. A Manifesto
              for Model Merging. In International Workshop on Global Integrated Model Management
              (GaMMa’06), pages 5–12, Shanghai, China, May 2006. ACM Press.
           3 S. Clarke and R. Walker. Composition Patterns: an Approach to Designing Reusable
              Aspects. 23rd Intl. Conf. on Software Engineering, pp. 5–14, Toronto, Canada, 2001.
           4 T. Cotternier, A. van den Berg, and T. Elrad. Modeling Aspect-Oriented Composition. In
              7th International Workshop n Aspect-Oriented Modeling co-located with (MODELS’ 05),
              Montego Bay, Jamaica, October 2005.
           5 R. France and B. Rumpe. Model-Driven Development of Complex Software: A Research
              Roadmap. In Future of Software Engineering (FOSE’07) co-located with ICSE’07, pages
              37–54, Minnesota, EUA, May 2007.
           6 OMG, Unified Modeling Language: Infrastructure version 2.1, Object Management Group,
              February 2007.
           7 E. Rahm and P. Bernstein. A Survey of Approaches to Automatic Schema Matching. VLDB
              Journal: Very Large Data Bases, 10(4):334–350, 2001.
           8 R. Reddy, R. France, S. Ghosh, F. Fleurey, and B. Baudry. Model Composition – a
              Signature-Based Approach. In Aspect Oriented Modeling (AOM) Workshop, Montego Bay,
              Jamaica, October 2005.
           9 Y. Reddy, R. France, G. Straw, N. M. J. Bieman, E. Song, and G. Georg. Directives for
              Composing Aspect-Oriented Design Class Models. Transactions of Aspect-Oriented
              Software Development, 1(1):75–105, 2006.
           10 B. Selic. The Pragmatics of Model-Driven Development. IEEE Software, 20(5):19–25,
              2003.
           11. Gamma, E. et al. Design Patterns: Elements of Reusable Object-Oriented Software.
              Addison-Wesley, Reading, 1995.
           12. Shyam R. Chidamber and Chris F. Kemerer. A metrics suite for object oriented design.
              IEEE Transactions on Software Engineering, 20(6):476–493, June 1994
           13. A. Baroni, Quantitative assessment of UML dynamic models, SIGSOFT Software Eng.
              Notes, vol. 30, no. 5, pp. 366-369, 2005.
           14. Baroni, A.L., Abreu, F.B. and Guerreiro, P. The State-of-the Art of UML Design Metrics.
              Technical Report, Universidade Nova de Lisboa, Monte da Caparica, 2005.
           15. Shyam R. Chidamber and Chris F. Kemerer. A metrics suite for object oriented design.
              IEEE Transactions on Software Engineering, 20(6):476–493, June 1994
           16. A. Baroni, Quantitative assessment of UML dynamic models, SIGSOFT Software Eng.
              Notes, vol. 30, no. 5, pp. 366-369, 2005.
           17. A. Baroni, F. Abreu, and P. Guerreiro. The State-of-the Art of UML Design Metrics.
              Technical Report, Universidade Nova de Lisboa, Monte da Caparica, 2005.
           18. M. Genero, M. Piattini-Velthuis, J. Lemus, and L. Reynoso Metrics for UML Models,
              UPGRADE, vol. 5, number 2, April, 2002.


                                                       10

</pre>