Evaluation of Model Comparison for Delta-Compression in Model Persistence Markus Scheidgen1 Humboldt Universität zu Berlin, Germany scheidge@informatik.hu-berlin.de Abstract. Model-based software engineering is applied to more and more complex software systems. As a result, larger and larger mod- els with longer and longer histories have to be maintained and per- sisted. Already, a lot of research efforts went into model versioning, comparison, and repositories. Existing strategies either record and per- sist changes (change-based repositories, e.g. EMF-Store) or relay on ex- isting text-based version control systems to persist whole model revi- sions (state-based repositories). Both approaches have advantages and disadvantages. We suggest a hybrid approach that infers changes via comparison to persist delta-compressed model states. Our hypothesis is that delta-compression requires a trade-off between comparison quality and execution time. Existing model comparison frameworks are tailored for comparison quality and not necessarily execution time performance. Therefore, we evaluate and compare traditional line-based comparison, an existing model comparison framework (EMF-Compare), and our own framework (EMF-Compress). We reverse engineered the Eclipse code- base and it’s history with MoDisco to create a large corpus of evolving example models for our experiments. 1 Introduction In BigMDE, we assume that model-based software engineering (MBSE) is ap- plied to increasingly complex software systems and that the size of models that we need to process is getting increasingly larger. But not only the size of models is increasing, they also evolve over longer periods of time, and have longer revision histories. To cope with such evolving models, the MBSE community began to adapt the concepts of existing text-file-based version control systems for models. Respective model repositories allow clients to navigate model history, compare revisions, and merge different branches of development [1]. We can distinguish two basic strategies for model repositories: state-based and change-based [8]. State-based systems store individual revisions of a model and whenever a user needs to compare two revisions the respective models are compared. In a change- based system, only the differences (or the operations causing these differences) are persisted and when a user requires a particular revision the respective model is re-created from prior revisions and respective difference data. Both strategies have advantages and disadvantages. Change-based reposito- ries, e.g. EMF-Store [8], require to record editing operations to infer and persist changes. This requires a tight integration of editors and model repository. While this works well for certain editors like MVC-based graphical editors (e.g. GMF), it does not for background-parsing-based textual editors (e.g. Xtext), mixed-, or closed editing environments. For state-based repositories, we can use existing version control systems that were developed for managing text files [1]. Here, models are persisted in their serialized form, e.g. as XMI/XML files. While this is independent from the editing environment and works for all models that can be serialized without information loss, it imposes an unnatural organization scheme onto modeling. Clients have to organize large models in terms of directories and files. To be efficient, the respective files have to be reasonably sized. We propose a third strategy that uses a change-based strategy internally, and expose the characteristics of a state-based system to the editing environ- ment and the user. Git, for example, allows us to access file revisions as a whole, but stores many revisions of the same file in a delta-compressed pack-file. We will persist model histories in terms of changes, but provide and take models as a whole. This requires us to create model differences through model comparison, since we can not rely on difference information provided by the editing environ- ment. Similar to version control systems, this strategy does not force us to use the same comparison algorithms for storing models tightly (also know as delta- compression) and for presenting changes and similarities to users. We can use algorithms with good comparison quality and good performance characteristics internally for frequent delta-compression, and different algorithms with excel- lent comparison quality (i.e. minimal editing distance) and less performance to occasionally present changes to users. In this paper, we want to evaluate comparison algorithms for delta-compres- sion. We developed a framework called EMF-Compress [12]. We implemented signature-based matching strategies, far simpler than the similarity-based match- ing of existing frameworks like EMF-Compare. In contrast to existing comparison frameworks, we use a meta-model that allows us to persist differences (i.e. delta- models) between models without referencing the compared models directly. We conducted experiments with reverse engineered MoDisco [3] Java models that were taken from the Eclipse source-code history [13]. In these experiments, we compare compression quality and execution times for different matching algo- rithms and provide execution times for comparison (compression) and patching (de-compression) to prove the feasibility of our approach. The paper is organized as follows. The following Section 2 provides a brief introduction into model comparison. Section 3 introduces our delta-compression tailored comparison framework. The evaluation Section 4 describes and discusses our experiments in compressing the reverse-engineered Eclipse code-base. We present related work, future research questions, conclusions in Sections 5 and 6. 2 About Matches, Differences, Merges In this paper, we only consider two-way comparison. We always compare a left (original) model with a right (revised) model. Algorithms for comparing two text files (i.e. two list of comparable items) exist for a very long time. These algorithms are searching for the minimal editing distance, that is the smallest possible set of changes (adding lines, removing lines, changing lines) necessary to modify one file and get the other. One particular algorithm is Meyer’s [11] algorithm that compares two list in O(N ∗ D) where N is the sum of both sizes and D the number of changed lines. Models are not list of items; they are graphs of items. But, each model element contains lists of items, i.e. their attribute and reference values. Assuming that the order of values does not change arbitrary or volatile between revisions, we could apply existing algorithms, like Meyer’s, to each value-set in each model element. But in order to use this approach, we need to know which values (primitive values as well as other model elements) are supposed to be equal and which are not. Therefore, existing model comparison frameworks use two steps. First, model elements are matched. Frameworks use different matching algorithms to establish pairs of matching model elements from both compared models. In a second step, these matches are used to find differences, comparing element by element, feature by feature. In such comparison processes, the matching strategy influences the comparison quality. Do we yield a minimal set of differences or a large set of differences only depends on how good the matches are. We can distinguish different types of matching-strategies. Among others: signature-based and similarity-based matching. Signature-based algorithms only consider local model element properties like its meta-class, name, parameters, and its position in the containment hierarchy (i.e. its parent). Similarity-based algorithms consider the larger neighborhood of an element to assess its shape in order to establish matching pairs. Finding and comparing signatures is a straightforward, linear process. Similarity-base algorithms on the other hand can be considerably different based on the used neighborhood, considered char- acteristics, used heuristics, etc. Existing frameworks, like EMF-Compare, are tailored for presenting differ- ences to users and for merging models. They use similarity-based matching for a maximum comparison quality and represent matches and differences in a com- parison model. These comparison models reference both compared models. This is fine when the goal of a merge is to create a new merged model from two existing compared models. But in delta-compression, we want to re-create one model from the other and a respective difference model. This is also known as patching and the difference model is than called a patch. 3 Model Comparison for Compression We developed the comparison framework EMF-Compress [12] with the goal to evaluate signature-based comparison for delta-compression. This means two things. First, we can perform matching and differentiating in one single step. In a signature-based matching algorithm, we consider the position of an element (i.e. its parent) to be part of its signature. As a consequence, a moved model ObjectDelta SettingDelta ValuesDelta settingDeltas valueDeltas originalIndex : EInt featureID : EInt originalStart : EInt 0..* 0..* originalEnd : EInt matchedObjects 0..* 0..1 0..1 originalObject originalObject DataValuesDelta ReferencedObjectsDelta ContainedObjectsDelta revisedValues : EJavaObject originalClass 0..1 revisedObjectContainments EClass 0..* (from ecore) revisedObjectReferences 0..* ObjectContainment ObjectReference OriginalObjectReference RevisedObjectReference RevisedObjectContainment OriginalObjectContainment revisedObject revisedObject 0..1 0..1 EObject (from ecore) originalProxy 0..1 Fig. 1. Combined meta-model for matches and differences. element (i.e. a model element that changes its parent) is not matched with its original. This is obviously bad for comparison quality, but we also only need to establish matches locally within individual value-sets. Therefore, we can match elements on the fly during differentiation. We start by assuming that the root elements match. We then apply Meyer’s algorithm to all features. While Meyer’s algorithm is applied, we establish matches among the values of each individual value-set. We recursively proceed for all matches in all containment references throughout the containment hierarchy. Secondly, EMF-Compress uses a comparison meta-model that represent dif- ferences under the assumption that the right model is not available and that it needs to be re-created from the left model. This means the comparison model (depicted in Fig. 1) represents all differences as changes (i.e. deltas). An Object- Delta represents all changes between two matched elements without referenc- ing these elements directly. A SettingDelta is a container for differences (and matches) in a feature (via featureID). A SettingDelta can contain match- ing elements (matchedObjects) identified via index (originalIndex) and Val- ueDeltas that described changed ranges of values with the feature. Ranges are also identified via indices. There are different classes used to represent differences depending on the value type (primitive, contained object, cross-referenced ob- ject) and whether the value is part of the left (original) model or is a new value that is added to the right (revised) model. All object values that are added have to be contained in the comparison model, since it is supposed to be indepen- dent of the right (revised) model. All object values that are removed or moved within the model are referenced based the ObjectDelta instance that represent the object in the comparison model. Therefore, the comparison model does not reference any of the compared model directly. The information in a comparison model is sufficient to recreate (i.e. patch) a right (revised) model from an left (original) model. 4 Evaluation 4.1 Setup The subject for our experiments is a corpus that comprises a subset of the Eclipse Foundation’s source code repositories. The Eclipse Foundation maintains code (and other artifacts, like documentation, web-pages, etc.) in over 600 Git repos- itories. We took the largest 200 of those repositories that actually contained Java code. These 200 repositories contain about 600 thousand revisions with a total of over 3 million compilation unit (CU) revisions that contain about 400 million SLOC (lines of code without empty lines and comments). The Git repositories take 6.6 GB of disk space. The generated MoDisco model represen- tation of these repositories comprises over 4 billion objects and is persisted in EMF-Fragments [14] and a binary serialization format with about 230 GB of diskspace. We run all experiments on a server computer with four 6-core Intel Xeon X74600 2.6 GHz processors and 128 GB of main memory. However, EMF- Compress operates mostly in a single thread with the exception of JVM and Eclipse maintenance tasks. EMF-Compress runs on Eclipse Mars’ versions of EMF in a Java 8 virtual machine. All operations were run with a maximum of 12 GB of heap memory. The system runs on GNU/Linux with a 3.16 kernel. EMF-Compress operations are long running computations over thousands of re- visions and CUs and respective algorithms are invoked over and over again. As usual for such macro-benchmark measures, we are ignoring JIT warm-ups and other micro-benchmark related issues [16]. The use-case behind these experiments is a repository wide application of reverse-engineering (with MoDisco) for the purpose of analysis [13]. The goal of this model-based mining of software repositories (MSR) [4] approach, is an AST- level model of a large-scale software repository (e.g. Eclipse) that allows clients to derive metrics, dependencies, design structure matrices, and other data for statistical analysis over the history of many software projects. Not only would we benefit from lesser space requirements of the resulting models, delta-compression and comparison models would also allows us to omit unnecessary processing of unchanged model parts during model analysis. 4.2 Signature- v Similarity- v Line-based Comparison As a first experiment, we want to measure and compare signature-based, simila- rity-based, and line-based comparison both by comparison quality and execution Number of matches Comparison model size Avg. execution times (log) 100 100 avg. time per compilation unit (ms) 500 80 80 60 60 50 (%) (%) 40 40 5 20 20 0 0 1 signature similarity lines signature similarity lines signature similarity parse Fig. 2. Signature-based comparison with EMF-Compress, similarity-based comparison with EMF-Compare, and line-based comparison on the original source-code files. time. Please note, line-based comparison always refers to comparing the original source files and not any serialized representation of the models. The resulting line-based numbers are not really comparable, but provide a frame of reference. Due to the low runtime performance of similarity-based comparison, we had to limit our measurements to the first 1000 revisions of the 100 biggest (by Git-size) Eclipse projects. For signature-based comparison, we used our own framework EMF-Compress. The used signatures comprise parent, meta-class, and where applicable names. For similarity-based comparison, we used EMF-compare in its default configuration. For line-based comparison, we used the data provided by Git. Fig 2 show the results. Different colors suggest data that is not directly comparable and that has to be read very carefully. The left chart depicts the comparison quality by means of matched elements. For models this is the num- ber of actual matched elements. For lines this is the number of matched lines, i.e. lines that were not added, removed, nor changed. The chart shows the fraction of matched elements relative to the overall number of elements without the uncom- pressed initial revisions. The data is accumulated for each analyzed repository. As expected, similarity-based comparison produces good comparison quality, since it uses far more data to establish matches than signature-based matching. The signature-based matching quality is worse, but still of a similar magnitude. We cannot draw any sensible conclusions from comparing these model-based num- bers with the number of matched lines, because lines span different numbers of model elements and shorter lines match more likely than longer lines. The middle chart compares the size of the resulting comparison models in relation to the size of the original models. This data also does not include the uncompressed initial revisions. For models, we used the approximated size of a bi- nary serialized representation of all models. For lines, we used the average model size per line for added lines and two bytes for removed lines to approximate sizes for a corresponding comparison model. Measuring the size of comparison mod- els is not very informative, since the comparison models of EMF-Compare and EMF-Compress comprise different data. EMF-Compare expresses changes by referencing to the differentiating model elements, while EMF-Compress stores changed features, indices, and includes all added values. Our line comparison model approximation does not include any overhead for organizing the differ- ences. Similarity-based Signature-based 30 30 execution time (s) 20 20 8000 400 20 10 10 1 0 0 0 5000 10000 15000 20000 0 5000 10000 15000 20000 2.0 2.0 1.5 1.5 execution time (s) 1.0 1.0 0.5 0.5 0.0 0.0 0 5000 10000 15000 20000 0 5000 10000 15000 20000 size (#objects) size (#objects) Fig. 3. Comparison execution time depending on model size for similarity- and signature-based matching. The plots show data measured for 300k compilation units taken from different Eclipse projects. Top row depicts all measurements, the lower row magnifies a smaller range. The right chart of Fig. 4 shows the average execution time spend on compari- son in all repositories compared to the time that was necessary to create models for the original Java code. Similarity-based comparison is magnitudes slower than signature-based comparison. Fig. 3 provides a closer look on the execu- tion times. This chart plots execution times spend on individual CUs in relation to their sizes in number of model elements. While signature-based comparison scales linearly for most CUs, similarity-based comparison is above linear. Addi- tionally, the comparison times seem to depend on CU structure and not only CU size, since the measured times are particularly large for some CUs. 4.3 Top-Level Comparison v Deep Comparision For a second experiment, we want to compare different levels of comparison. The hypothesis is that if we only try to match the first levels in a containment hierarchy and only use equality to compare the rest of the models, we can save execution time and still yield sufficient comparison quality. We devised two differ- ent matching algorithms. The first only uses top-level named elements and their signature for matching (e.g. Java packages, classes, methods). All other elements (i.e. everything contained in method bodies) only matches if they are equal to each other. The second algorithm tries to match all model elements based on their signature. We also provide line-based matching data for reference. This time, we measure comparison for all 200 Eclipse projects, all revisions, all CUs. Fig. 4 shows the results. The top row of charts compares artifact sizes for some example Eclipse projects. The lower left shows the average size of uncompressed initial revisions and respective comparison model sizes relative to the overall size of the uncompressed models. The execution times are given as an average per revision for both compression algorithms and the times necessary to decompress (i.e. reconstruct a revision from its predecessor and the comparison model). The comparison quality with only matching the named elements is signifi- cantly worse than with matching all elements. But more surprisingly, the exe- cution time performance is also much worse. Even though, we do not have to match all model elements, we still have to compare them for equality. 5 Related Work EMF-Store is an example for a change-based model version control system (Koegel and Helming [8]). An example for a model repository based on existing source-code version control systems is presented by Altmanniger et al. [1]. Different frameworks for model comparison exist: for example EMF-Compare by Brun and Pierantonio [2] or SI-Diff by Kehrer et al. [5]). Stephan and Cordy present a feature-based comparison of existing approaches and frameworks [15]. Kolovos et al. did the same for matching-algorithms [10]. In this paper, we only assumed meta-model agnostic matching-algorithms. But it is assumed that comparison quality can be increased, when matching is implemented depending on the meta-model of compared models. Kolovos et al. [9] and Kehrer et al. [5] present approaches for customization of matching algorithms. In [6,7] Kehrer et al. research the posibilities to infer high-level editing op- erations from low-level comparison models. This can be valuable for presenting differences to users in a better way and to represent model differences/changes more efficiently in smaller delta-models. Named element matching Meta-class matching Line matching 14 14 delta models detla models 100 delta lines 12 first revisions 12 first revisions first revisions 10 uncompressed 10 uncompressed 80 uncompressed 8 8 MLines 60 GB GB 6 6 40 4 4 2 2 20 0 0 0 cdt webtools gmf−tooling rap jdt.core cdt webtools gmf−tooling rap jdt.core cdt webtools gmf−tooling rap jdt.core Compressed relative to full size Avg. execution times Avg. execution times (logarithmic) avg. time per revision (ms) avg. time per revision (ms) 80 1000 1500 60 1000 200 (%) 40 500 50 20 10 0 0 first Δ-models Δ-models delta parse compress compress decompr. decompr. parse compress compress decompr. decompr. revisions for top-level for full lines top-level full top-level full top-level full top-level full only model only model only model only model only model Fig. 4. Compression quality by percentage of full size and percentage of matched ele- ments compared to line-based compression. 6 Conclusions We evaluated the potential use of model comparison for delta-compressing state- based model repositories. We could show that signature-based matching algo- rithms reach sufficient comparison quality with significantly shorter execution times than the existing similarity-based model comparison in frameworks like EMF-Compare. Furthermore, we presented a meta-model for comparison mod- els that does not reference the compared models, can be persisted independently, and allows to re-create one of the compared models from the other. Therefore, we could prove the principle feasibility of time efficient delta-compression for models. However, there are many unanswered questions. We only evaluated a single algorithm for similarity-based matching. Different algorithms, e.g. meta-model specific algorithms could yield better results. Further, we only considered model representations of source-code following a single meta-model. Different kinds of models could yield different results. What factors influence the performance of model comparison in what way? Delta-compression allows to decompress se- quences of model revisions efficiently, but does not allow a random access of particular revisions. In a practical application, we need to consider to persist some intermediate revision as full model to counter that. What factors influ- ence the right amount these intermediate models? The models that we used for evaluation have a convenient size. How can we combine delta-compression with existing approaches in fragmenting and persisting very large models [14]? References 1. Altmanninger, K., Seidl, M., Wimmer, M.: A survey on model versioning ap- proaches. International Journal of Web Information Systems 5(3), 271–304 (2009) 2. Brun, C., Pierantonio, A.: Model differences in the eclipse modeling framework. UPGRADE, The European Journal for the Informatics Professional 9(2), 29–34 (2008) 3. Bruneliere, H., Cabot, J., Jouault, F., Madiot, F.: Modisco: A generic and ex- tensible framework for model driven reverse engineering. In: Proceedings of the IEEE/ACM International Conference on Automated Software Engineering. pp. 173–174. ASE ’10, ACM (2010) 4. Kagdi, H., Collard, M.L., Maletic, J.I.: A survey and taxonomy of approaches for mining software repositories in the context of software evolution. Journal of Software Maintenance and Evolution: Research and Practice 19(2), 77–131 (2007) 5. Kehrer, T., Kelter, U., Pietsch, P., Schmidt, M.: Adaptability of model comparison tools. Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering - ASE 2012 p. 306 (2012) 6. Kehrer, T., Kelter, U., Taentzer, G.: A rule-based approach to the semantic lift- ing of model differences in the context of model versioning. 2011 26th IEEE/ACM International Conference on Automated Software Engineering, ASE 2011, Proceed- ings pp. 163–172 (2011) 7. Kehrer, T., Kelter, U., Taentzer, G.: Consistency-preserving edit scripts in model versioning. 2013 28th IEEE/ACM International Conference on Automated Soft- ware Engineering, ASE 2013 - Proceedings pp. 191–201 (2013) 8. Koegel, M., Helming, J.: EMFStore: a model repository for EMF models. 2010 ACM/IEEE 32nd International Conference on Software Engineering 2, 307–308 (2010) 9. Kolovos, D.S.: Establishing correspondences between models with the epsilon com- parison language. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 5562 LNCS, 146–157 (2009) 10. Kolovos, D.S., Di Ruscio, D., Pierantonio, A., Paige, R.F.: Different models for model matching: An analysis of approaches to support model differencing. Pro- ceedings of the 2009 ICSE Workshop on Comparison and Versioning of Software Models pp. 1–6 (2009) 11. Myers, E.W.: AnO (ND) difference algorithm and its variations. Algorithmica 1(1- 4), 251–266 (1986) 12. Scheidgen, M.: EMF-Compress. https://github.com/markus1978/emf-compress (2016) 13. Scheidgen, M., Fischer, J.: Model-based mining of source code repositories. In: Amyot, D., Fonseca i Casas, P., Mussbacher, G. (eds.) System Analysis and Mod- eling: Models and Reusability, Lecture Notes in Computer Science, vol. 8769, pp. 239–254. Springer International Publishing (2014) 14. Scheidgen, M., Zubow, A., Fischer, J., Kolbe, T.H.: Automated and transpar- ent model fragmentation for persisting large models. In: Proceedings of the 15th International Conference on Model Driven Engineering Languages and Systems (MODELS). LNCS, vol. 7590, pp. 102–118. Springer, Innsbruck, Austria (2012) 15. Stephan, M., Cordy, J.R.: A Survey of Methods and Applications of Model Com- parison (June) (2012) 16. Wilson, S., Kesselman, J.: Java Platform Performance: Strategies and Tactics. Addison-Wesley, Boston, MA (2000)