Train Benchmark Case: an EMF-I NC Q UERY Solution⇤ Gábor Szárnyas Márton Búr István Ráth Budapest University of Technology and Economics Department of Measurement and Information Systems H-1117 Magyar tudósok krt. 2, Budapest, Hungary szarnyas@mit.bme.hu, marton.bur@inf.mit.bme.hu, rath@mit.bme.hu This paper presents a solution for the Train Benchmark Case of the 2015 Transformation Tool Con- test, using EMF-I NC Q UERY. 1 Introduction This paper describes a solution for the TTC 2015 Train Benchmark Case [6]. The source code of the solution is available as an open-source project.1 There is also a SHARE image available.2 2 EMF-I NC Q UERY Automated model transformations are frequently integrated with modeling environments, requiring both high performance and a concise programming interface to support software engineers. The objective of the EMF-I NC Q UERY [2] framework is to provide a declarative way to define queries over EMF models. EMF-I NC Q UERY extended the pattern language of V IATRA 2 with new features (including transitive closure, role navigation, match count) and tailored it to EMF models [4]. EMF-I NC Q UERY is developed with a focus on incremental query evaluation, however, the most recent version is also capable of evaluating queries with a local search-based algorithm. 2.1 Incremental Pattern Matching EMF-I NC Q UERY uses the Rete algorithm [1] to perform incremental pattern matching. The Rete algo- rithm uses tuples to represent the model objects, attributes, references and partial matches in the model. The algorithm defines an asynchronous network of communicating nodes. The network consists of three types of nodes. Input nodes are responsible for indexing the model by type, i.e. they store the appropriate tuples for the objects and references. They are also responsible for producing the update messages and propagating them to the worker nodes. Worker nodes perform a transformation on the output of their parent node(s) and propagate the results. Partial query results are represented in tuples and stored in the memory of the worker node, thus allowing for incremental query reevaluation. Production nodes are ter- minators that provide an interface for fetching the results of the query and the changes introduced by the latest transformation. Moreover, parallelization possibilities of the algorithm were already investigated in [3]. ⇤ This work was partially supported by the MONDO (EU ICT-611125) project and Red Hat Inc. 1 https://github.com/FTSRG/trainbenchmark-ttc 2 http://is.ieis.tue.nl/staff/pvgorp/share/?page=ConfigureNewSession&vdi=ArchLinux64_ TrainBenchmark-EIQ.vdi c G. Szárnyas et al. Submitted to: This work is licensed under the TTC 2015 Creative Commons Attribution License. 2 Train Benchmark Case: an EMF-I NC Q UERY Solution The incremental pattern matcher provides quick reevaluation for complex queries. However, it does so at the expense of high memory consumption as the partial results are stored in the Rete network. 2.2 Local Search-Based Pattern Matching Local search-based pattern matching (LS) is commonly used in graph transformation tools. Along with the incremental query engine, EMF-I NC Q UERY also provides a local search-based pattern matcher. The matching process consists of four steps. (1) At first, in a preprocessing step the patterns are normalized: the constraint set is minimized, variables that are always equal are unified and positive pattern calls are flattened. These normalized patterns are evaluated by (2) the query planner, using a specified cost estimation function to provide search plans: totally ordered lists of search operations used to ensure that the constraints from the pattern definition hold. From a single pattern specification multiple search plans can be derived, thus pattern matching includes (3) plan selection based on the input parameter binding and model-specific metrics. Finally, (4) the search plan is executed by a plan interpreter evaluating the different operations of the plans. If an operation fails, the interpreter backtracks; if all operations are executed successfully, a match is found. Compared to the incremental query engine, the search-based algorithm requires less memory [7] and is therefore capable of performing queries on larger models if there is not enough memory available for the incremental engine. 2.3 Defining the Pattern Matching Strategy Currently, the pattern matching strategy has to be determined by the developer by specifying the query backend of EMF-I NC Q UERY. Developing a hybrid pattern matching engine is subject to future work. This will allow the user to use annotations to define the evaluation strategy for each pattern. There are also plans to develop an adaptive query engine (Section 5). 2.4 Pattern Match Representation For each query, EMF-I NC Q UERY generates a set of utility classes. These classes store the model objects in the match and provide a convenient interface for reading and transforming the matches. These classes are used for implementing the transformation operations (Section A.1). 3 Solution The case defines a well-formedness validation scenario set in the domain of railway systems [6]. The case provides a synthetic instance model generator which is capable of generating models of various sizes. For the solution, we used the metamodel defined in the case description without any modifications or extensions. The solution was developed in the Eclipse IDE. For setting up the development environment, please refer to the readme file. The projects are not tied to the Eclipse environment and can be compiled with the Apache Maven build automation tool. This offers a number of benefits, including portability and the possibility of continuous integration. The solution is written in Java 7. The patterns are defined in I NC Q UERY Pattern Language (IQPL) [4]. G. Szárnyas et al. 3 3.1 Example Query: RouteSensor follows We describe the implementation of the RouteSensor route: Route definedBy query in detail. The other queries and transforma- swP: SwitchPosition NEG switch «new» sensor: Sensor sensor sw: Switch tions are implemented in a similar manner. The imple- mented application uses the Java classes generated by EMF-I NC Q UERY and the hand-coded transformation logic introduced below. First it finds the matches of the queries, then the corresponding transformation step is applied for each match. The code of patterns and the transformation definitions are listed in Section A.1. The RouteSensor query looks for sensors 1 pattern routeSensor(route, sensor, switchPosition, sw) that are connected to a switch, but the sensor 2 { 3 Route.follows(route, switchPosition); and the switch are not connected to the same 4 SwitchPosition.^switch(switchPosition, sw); route. The query in IQPL is listed in Listing 1. 5 TrackElement.sensor(sw, sensor); 6 neg find definedBy(route, sensor); The positive conditions are defined by using 7 } the appropriate classes and references, while the 8 negative application condition (NAC) is defined 9 pattern definedBy(route, sensor) 10 { as a negative find operation (neg find) for a 11 Route.definedBy(route, sensor); separate query. 12 } During the repair operation, for the selected Listing 1: Pattern of the RouteSensor query. matches, the missing definedBy edge is inserted by connecting the route to the sensor. The Java transformation code implementing the transformation is listed in Listing 2. The transformation uses the match object returned by EMF-I NC Q UERY. 1 public void transform(final Collection matches) { 2 for (final Object match : matches) { 3 final RouteSensorMatch rsm = (RouteSensorMatch) match; 4 rsm.getRoute().getDefinedBy().add(rsm.getSensor()); 5 } 6 } Listing 2: Transformation of the RouteSensor query. 3.2 Query Evaluation Strategies for the RouteSensor Pattern We use the RouteSensor query to provide an overview of the various query evaluation strategies used in EMF-I NC Q UERY. 3.2.1 Incremental Evaluation The Rete network derived from the RouteSensor query is shown in Figure 7. For the sake of clarity, we simplified the Rete network by removing some implementation-specific details. The evaluation in the Rete network starts with the input nodes (switch, follows, sensor, definedBy), which are indexing the model by collecting the appropriate tuples. The worker nodes are responsible for performing the relational operations, join and antijoin in this case. The join nodes have a pair of tuple masks (e.g. h2, 3i and h0, 1i) to determine the attributes used in the join operation. The match set of the pattern is stored in the production node. 4 Train Benchmark Case: an EMF-I NC Q UERY Solution 3.2.2 Local Search-Based Evaluation The search plan generated for evaluating the RouteSensor query is shown in Figure 8. This screenshot is taken from the Local Search Debugger view of EMF-I NC Q UERY. The search plan is presented in the upper-left part, while the found matches with the variable substitutions are shown below the search plan in a table viewer. The Zest-based graph viewer in the right visualises a match based on the selection of the table viewer. 4 Evaluation In this section, we present the benchmark environment and evaluate the results. 4.1 Benchmark Environment The benchmarks were performed on a 64-bit Ubuntu Server 14.04 virtual machine deployed on a private cloud. The machine used a quad-core 2.50 GHz Xeon L5420 processor and 16 GB of memory. We used Oracle JDK 8 and set the available heap memory to 15 GB. 4.2 Benchmark Results To present the results, we use the reporting framework of the Train Benchmark. The framework generates plots to visualise the execution time of the phases defined in the benchmark. The plots showing each query are included in Section A.2. On each plot, the x axis shows the problem size, i.e. the size of the instance model, while the y axis shows the aggregated execution time of a certain phases, measured in milliseconds. Both axes use logarithmic scale. 4.2.1 Benchmark Results for the RouteSensor Query For the sake of conciseness, we only discuss the results for the RouteSensor query in detail. The results for the batch validation are shown in Figure 1. The results suggest that—given enough memory—both the incremental and the local search-based (LS) strategies are able to run the query and the transformation for the largest model. The first validation takes consistently longer for the incremental strategy as for the LS strategy. This is caused by the fact that the incremental strategy builds the Rete algorithm during the read phase. However, the difference is small as the first validation time largely consists of deserializing the EMF model. The execution times of the revalidation are shown in Figure 2. The execution time of the incremental strategy linearly correlates with the size of the change set. This implies that for a fixed change set, the incremental strategy is able to perform the transformation in constant time, while execution time for the LS strategy correlates with the model size. For the proportional change set, the revalidation time is a low-degree polynomial of the model size for both strategies, however, it is an order of magnitude faster for the incremental strategy than for the LS. 4.3 Comparison of the Query Evaluation Strategies Section A.2 shows the detailed results for all queries and both evaluation strategies. In the first validation (Figure 3 and Figure 5), the evaluation strategies show similar performance characteristics as both have G. Szárnyas et al. 5 to compute the complete result set of the query. The execution times for both strategies show that the most complex query is SemaphoreNeighbor, while the simplest one is SwitchSensor. As expected, the execution times of the revalidation are different for the two strategies. Figure 4 shows that for the incremental strategy the execution time correlates with the size of the match set (instead of the model size). This can be observed when comparing the execution times for the fixed and the proportional change sets. Figure 6 shows that the execution time for the LS strategy is determined by the model size and is not affected by the size of the change set. These results imply that the optimal evaluation strategy depends on the specific workload profile. If there is enough memory available and the transformations operate on a small amount of model elements, it is recommended to use the incremental strategy. If the transformations often change a large proportion of the model elements, the LS strategy is recommended. 5 Summary and Future Work The paper presented a solution for the Train Benchmark case of the 2015 Transformation Tool Contest. There is ongoing work to develop a hybrid query engine [5] for EMF-I NC Q UERY. This will allow the user to use annotations on the patterns for specifying the desired query evaluation strategy. There are also plans to develop an adaptive query engine which will use query optimisation heuristics to determine the appropriate strategy based on the query, the model and the available resources. Acknowledgements. The authors would like to thank Zoltán Ujhelyi for providing valuable insights into EMF-I NC Q UERY. References [1] Gábor Bergmann (2013): Incremental Model Queries in Model-Driven Design. Ph.D. dissertation, Budapest University of Technology and Economics, Budapest. Available at http://home.mit.bme.hu/~bergmann/ download/phd-thesis-bergmann.pdf. [2] Gábor Bergmann, Ákos Horváth, István Ráth, Dániel Varró, András Balogh, Zoltán Balogh & András Ökrös (2010): Incremental Evaluation of Model Queries over EMF Models. In: MODELS, Springer, Springer, doi:http://dx.doi.org/10.1007/978-3-642-16145-2_6. [3] Gábor Bergmann, István Ráth & Dániel Varró (2009): Parallelization of Graph Transformation Based on Incremental Pattern Matching. Electronic Communications of the EASST, Proceedings of the Eighth In- ternational Workshop on Graph Transformation and Visual Modeling Techniques 18. Available at http: //eceasst.cs.tu-berlin.de/index.php/eceasst/article/view/265/249. [4] Gábor Bergmann, Zoltán Ujhelyi, István Ráth & Dániel Varró (2011): A Graph Query Language for EMF Models. In: Theory and Practice of Model Transformations, Fourth Intl. Conf., LNCS 6707, Springer. [5] Ákos Horváth, Gábor Bergmann, István Ráth & Dániel Varró (2010): Experimental Assessment of Combining Pattern Matching Strategies with VIATRA2. International Journal on Software Tools for Technology Trans- fer 12(3-4), pp. 211–230, doi:10.1007/s10009-010-0149-7. Available at http://dx.doi.org/10.1007/ s10009-010-0149-7. [6] Gábor Szárnyas, Oszkár Semeráth, István Ráth & Dániel Varró (2015): The TTC 2015 Train Benchmark Case for Incremental Model Validation. In: 8th Transformation Tool Contest (TTC 2015). [7] Zoltán Ujhelyi, Gábor Szőke, Ákos Horváth, Norbert István Csiszár, László Vidács, Dániel Varró & Rudolf Ferenc (2015): Performance Comparison of Query-based Techniques for Anti-Pattern Detection. Information and Software Technology, doi:10.1016/j.infsof.2015.01.003. In press. 6 Train Benchmark Case: an EMF-I NC Q UERY Solution A Appendix A.1 Patterns and Transformations A.1.1 PosLength 1 pattern posLength(segment) 2 { 3 Segment.length(segment, length); 4 check(length <= 0); 5 } Listing 3: Pattern of the PosLength query. 1 public void transform(final Collection matches) { 2 for (final Object match : matches) { 3 final RouteSensorMatch rsm = (RouteSensorMatch) match; 4 rsm.getRoute().getDefinedBy().add(rsm.getSensor()); 5 } 6 } Listing 4: Transformation of the PosLength query. A.1.2 SwitchSensor 1 pattern switchSensor(sw) 2 { 3 Switch(sw); 4 neg find hasSensor(sw); 5 } 6 7 pattern hasSensor(sw) 8 { 9 TrackElement.sensor(sw, _); 10 } Listing 5: Pattern of the SwitchSensor query. 1 public void transform(final Collection matches) { 2 for (final Object match : matches) { 3 final SwitchSensorMatch ssm = (SwitchSensorMatch) match; 4 final Sensor sensor = RailwayFactory.eINSTANCE.createSensor(); 5 ssm.getSw().setSensor(sensor); 6 } 7 } Listing 6: Transformation of the SwitchSensor query. A.1.3 SwitchSet 1 pattern switchSet(semaphore, route, switchPosition, sw) 2 { 3 Route.entry(route, semaphore); 4 Route.follows(route, switchPosition); 5 SwitchPosition.^switch(switchPosition, sw); 6 7 Semaphore.signal(semaphore, ::GO); 8 SwitchPosition.position(switchPosition, swPP); 9 Switch.currentPosition(sw, swCP); 10 G. Szárnyas et al. 7 11 swPP != swCP; 12 } Listing 7: Pattern of the SwitchSet query. 1 public void transform(final Collection matches) { 2 for (final Object match : matches) { 3 final SwitchSetMatch ssm = (SwitchSetMatch) match; 4 ssm.getSw().setCurrentPosition(ssm.getSwitchPosition().getPosition()); 5 } 6 } Listing 8: Transformation of the SwitchSet query. A.1.4 RouteSensor The RouteSensor query is discussed in detail in Section 3.1. A.1.5 SemaphoreNeighbor 1 pattern semaphoreNeighbor(semaphore, route1, route2, sensor1, sensor2, te1, te2) 2 { 3 Route.exit(route1, semaphore); 4 Route.definedBy(route1, sensor1); 5 TrackElement.sensor(te1, sensor1); 6 TrackElement.connectsTo(te1, te2); 7 TrackElement.sensor(te2, sensor2); 8 Route.definedBy(route2, sensor2); 9 neg find entrySemaphore(route2, semaphore); 10 11 route1 != route2; 12 } 13 14 pattern entrySemaphore(route, semaphore) 15 { 16 Route.entry(route, semaphore); 17 } Listing 9: Pattern of the SemaphoreNeighbor query. 1 public void transform(final Collection matches) { 2 for (final Object match : matches) { 3 final SemaphoreNeighborMatch snm = (SemaphoreNeighborMatch) match; 4 snm.getRoute2().setEntry(snm.getSemaphore()); 5 } 6 } Listing 10: Transformation of the SemaphoreNeighbor query. 8 Train Benchmark Case: an EMF-I NC Q UERY Solution A.2 Detailed Benchmark Results fixed, RouteSensor, Function: read+check (Y: Log2) (X: Log2) proportional, RouteSensor, Function: read+check (Y: Log2) (X: Log2) 248543.5 ● 245898.56 ● ● ● 88179.4 87821.38 ● ● 31284.69 ● 31364.95 ● Time (ms) Time (ms) 11099.33 ● 11201.83 ● ● ● 3937.87 ● 4000.67 ● 1397.1 ● 1428.82 ● ● ● 495.67 ● ● ● 510.3 ● ● ● ● ● ● ● 175.86 182.25 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 Size Size Tool Tool ● EMFIncQuery−Incremental ● EMFIncQuery−Incremental EMFIncQuery−LocalSearch EMFIncQuery−LocalSearch (a) Fixed change set (b) Proportional change set Figure 1: First validation times for the RouteSensor query. fixed, RouteSensor, Function: repair+recheck (Y: Log2) (X: Log2) proportional, RouteSensor, Function: repair+recheck (Y: Log2) (X: Log2) 10458.14 14897.01 ● 3487.25 2339.85 ● ● 1162.82 367.52 ● Time (ms) Time (ms) ● ● 387.74 57.72 ● ● ● 129.29 9.07 ● ● ● 43.11 ● ● ● 1.42 ● ● ● ● ● ● 14.38 ● ● ● 0.22 ● 4.79 ● 0.04 ● ● 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 Size Size Tool Tool ● EMFIncQuery−Incremental ● EMFIncQuery−Incremental EMFIncQuery−LocalSearch EMFIncQuery−LocalSearch (a) Fixed change set (b) Proportional change set Figure 2: Revalidation times for the RouteSensor query. EMFIncQuery−Incremental, fixed, Function: read+check (Y: Log2) (X: Log2) EMFIncQuery−Incremental, proportional, Function: read+check (Y: Log2) (X: Log2) 248543.5 ● 245898.56 ● 90149.88 ● 89175.22 ● ● ● 32698.5 32339.44 Time (ms) Time (ms) ● ● 11860.16 ● 11727.91 ● ● ● 4301.83 ● 4253.13 ● 1560.33 ● 1542.4 ● ● ● 565.95 ● ● ● 559.35 ● ● ● 205.28 ● ● 202.85 ● ● 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 Size Size Query Query ● PosLength ● PosLength RouteSensor RouteSensor SemaphoreNeighbor SemaphoreNeighbor SwitchSensor SwitchSensor SwitchSet SwitchSet (a) Fixed change set (b) Proportional change set Figure 3: First validation times for the incremental query evaluation strategy. G. Szárnyas et al. 9 EMFIncQuery−Incremental, fixed, Function: repair+recheck (Y: Log2) (X: Log2) EMFIncQuery−Incremental, proportional, Function: repair+recheck (Y: Log2) (X: Log2) 44.7 5427.67 ● 21.22 969.12 ● ● 10.07 173.04 ● Time (ms) Time (ms) ● ● ● ● ● ● ● 4.78 ● ● ● ● ● ● ● ● ● 30.9 ● ● ● ● ● ● 2.27 5.52 ● ● 1.08 0.98 0.51 0.18 0.24 0.03 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 Size Size Query Query ● PosLength ● PosLength RouteSensor RouteSensor SemaphoreNeighbor SemaphoreNeighbor SwitchSensor SwitchSensor SwitchSet SwitchSet (a) Fixed change set (b) Proportional change set Figure 4: Revalidation times for the incremental query evaluation strategy. EMFIncQuery−LocalSearch, fixed, Function: read+check (Y: Log2) (X: Log2) EMFIncQuery−LocalSearch, proportional, Function: read+check (Y: Log2) (X: Log2) 270211.18 ● 274313.06 ● 94728.84 ● 95807.13 ● ● ● 33209.41 33461.79 Time (ms) Time (ms) ● ● 11642.33 ● 11686.93 ● ● ● 4081.49 ● 4081.8 ● 1430.86 ● 1425.62 ● ● ● 501.62 ● 497.91 ● ● 175.86 ● ● ● ● 173.9 ● ● ● 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 Size Size Query Query ● PosLength ● PosLength RouteSensor RouteSensor SemaphoreNeighbor SemaphoreNeighbor SwitchSensor SwitchSensor SwitchSet SwitchSet (a) Fixed change set (b) Proportional change set Figure 5: First validation times for the local search-based query evaluation strategy. EMFIncQuery−LocalSearch, fixed, Function: repair+recheck (Y: Log2) (X: Log2) EMFIncQuery−LocalSearch, proportional, Function: repair+recheck (Y: Log2) (X: Log2) 158821.74 ● 152640.78 ● 43392.68 ● 42155.18 ● ● ● 11855.59 11642.1 Time (ms) Time (ms) ● ● ● ● 3239.14 ● 3215.23 ● ● ● 884.99 ● 887.96 ● ● ● 241.79 ● 245.23 ● ● ● 66.06 ● 67.73 ● ● ● 18.05 ● 18.7 ● 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 Size Size Query Query ● PosLength ● PosLength RouteSensor RouteSensor SemaphoreNeighbor SemaphoreNeighbor SwitchSensor SwitchSensor SwitchSet SwitchSet (a) Fixed change set (b) Proportional change set Figure 6: Revalidation times for the local search-based query evaluation strategy. 10 Train Benchmark Case: an EMF-I NC Q UERY Solution A.3 Rete Network switch follows sensor definedBy swP, sw route, swP trackElement, sensor route, sensor 0 1 Join swP, sw, route 1 0 Join swP, sw, route, sensor 2, 3 0, 1 Antijoin swP, sw, route, sensor Production Figure 7: The Rete network for the RouteSensor query. A.4 Local Search Plan Figure 8: The search plan and the matches for the RouteSensor query.