=Paper=
{{Paper
|id=Vol-3089/ttc21_paper_ocl2sql_GarciaDominguez_solution
|storemode=property
|title=The Epsilon Solution to the OCL2PSQL Case
|pdfUrl=https://ceur-ws.org/Vol-3089/ttc21_paper12_ocl2sql_GarciaDominguez_solution.pdf
|volume=Vol-3089
|authors=Antonio García-Domíguez
|dblpUrl=https://dblp.org/rec/conf/ttc/Garcia-Domiguez21
}}
==The Epsilon Solution to the OCL2PSQL Case==
The Epsilon Solution to the OCL2PSQL Case Antonio Garcia-Dominguez1 1 Aston University, Birmingham, UK Abstract There have been several attempts to map Object Constraint Language queries to SQL: one of these is the OCL2PSQL mapping proposed by Nguyen and Clavel. In this paper, I describe an implementation of OCL2PSQL using two languages from the Eclipse Epsilon project: 744 lines of ETL code for the model-to-model transformation itself, and 97 lines of EGL code for a model-to-text transformation that produces more readable SQL than the reference version. The solution passes all correctness tests set out in the original framework: the transformation has a median time of 0.93s across all scenarios. Keywords OCL, SQL, model transformation, abstract syntax graphs 1. Introduction model-to-text transformation (the Epsilon Generation Language), model validation, pattern matching, model The OMG Object Constraint Language is a well known, migration, unit testing and other tasks. The solution standardized language for specifying constraints and passes all correctness checks, though some minor refine- queries in models: while typically associated with UML, ments of the proposed mappings were required. it has been widely used for MOF-based modelling lan- The rest of the paper is structured as follows: Section 2 guages, and particularly those based on the Ecore imple- explains the overall structure of the solution. Section 3 mentation of EMOF. Given the increasingly large size of describes the key features of the implemented model- the models used by Model-Driven Engineering practition- to-model transformation from OCL to SQL. Section 4 ers, one common solution is to persist them in databases. describes the alternative model-to-text transformation This has motivated attempts to map the OCL queries that has been developed from the SQL models to textual (written in terms of the abstract syntax of the model- SQL queries. Finally, Section 5 presents the preliminary ing language) to SQL queries that run directly on the performance results obtained by the solution author. underlying relational database used to store the models. Some of these attempts have used imperative features such as loops and cursors to deal with iterators, possibly 2. Overall structure reducing their compatibility across vendors (due to the limited standardization of these features). OCL2PSQL The Epsilon solution to the OCL2PSQL case is a Java ap- (“OCL to pure SQL”) is an approach that provides a map- plication, using Apache Maven for its dependency man- ping of nested iterators while staying entirely within the agement. Epsilon is usable as a standalone Java library, broadly standardized and declarative parts of SQL [1]. with stable versions available through Maven Central and The OCL2PSQL TTC case has selected a core subset of snapshot versions available through the OSS Sonatype this mapping (with some erratum added since the origi- repository. The solution should work in Epsilon 2.3.0, but nal release of the mapping), and has invited tool authors uses the latest 2.4.0 interim versions to avoid a warning to demonstrate the usability, conciseness and ease of message when using the Epsilon Generation Language. understanding of their model transformation languages The solution (now merged into the TTC’21 OCL2PSQL through alternative implementations of this subset. solutions repository1 ) reuses the basic scaffolding of the This paper presents an outline of a solution based on reference solution, including the generated code for the the Eclipse Epsilon family of model management lan- OCL and SQL metamodels, and the classes responsible guages. Since the original release in 2006 [2], Eclipse Ep- for interpreting the environment variables, communi- silon has grown to include languages for model-to-model cating with the MySQL database, and performing the transformation (the Epsilon Transformation Language), correctness tests. The solution adds the following Java classes: TTC’21: Transformation Tool Contest, Part of the Software Technologies: Applications and Foundations (STAF) federated • SampleLauncher, which transforms all OCL conferences, Eds. A. Boronat, A. García-Domínguez, and G. Hinkel, queries without using the environment variables 25 June 2021, Bergen, Norway (online). of the benchmark framework. This is mostly for " a.garcia-dominguez@aston.ac.uk (A. Garcia-Dominguez) internal development. 0000-0002-4744-9150 (A. Garcia-Dominguez) © 2021 Copyright for this paper by its authors. Use permitted under Creative 1 Commons License Attribution 4.0 International (CC BY 4.0). https://github.com/TransformationToolContest/ CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) ttc2021-ocl2psql/tree/master/solutions/Epsilon • OCL2SQL, which encapsulates a model-to-model Listing 1: Main ETL script transformation written in the Epsilon Transfor- mation Language into an easy-to-use Java class. import ’s0_literals.etl’; 1 More information on the ETL transformation is import ’s1_equals_and.etl’; 2 available in Section 3. import ’s2_allInstances.etl’; 3 • SQL2Text, which replaces the model-to-text import ’s3_size.etl’; 4 transformation in the reference solution with one import ’s4_collect_variable.etl’; 5 based on the Epsilon Generation Language. More import ’s5_attributes.etl’; 6 details on the EGL transformation are given in import ’s6_associationEnds.etl’; 7 Section 4. import ’s7_exists.etl’; 8 import ’s8_existsWithFree.etl’; 9 Finally, the solution changed the code of the Solution 10 class in the reference solution to use the OCL2SQL and post { 11 SQL2Text classes from above. var firstRootSelect = SQL!PlainSelect.all.selectOne( 12 ps|ps.eContainer.isUndefined()); 13 3. Model-to-model var stmt = new SQL!SelectStatement; 14 transformation with ETL stmt.selectBody = firstRootSelect; 15 } 16 The Epsilon project includes several languages for per- forming model transformations: • The Epsilon Object Language (EOL) is an OCL- subgraph produced from the OCL e subtree, allowing inspired imperative language which is well suited the different SQL subtrees to be linked together. The for small in-place (endogenous) model transfor- OCL2PSQL case did not require any manual rule schedul- mations, though it can be used for purely imper- ing or the use of greedy/lazy rules: the automated rule ative exogenous transformations as well. scheduling done by ETL based on source element types • The Epsilon Transformation Language (ETL) and guards was sufficient. builds on top of EOL by adding the concept of a ETL allows for breaking up the transformation across rule, which transforms a certain type of source several modules. This made it relatively easy to itera- model element (possibly with some guards lim- tively implement the various stages in OCL2PSQL and iting its applicability) to a target model element. test out how it behaved for the various challenges. List- The rule scheduling can be kept entirely declara- ing 1 shows the main script of the ETL transformation: tive, or can be controlled to some extent through it is composed of a number of import statements that the use of “greedy” or “lazy” rules. ETL is well- bring in the rules needed for each stage, and a post rule suited for exogenous transformations, where a which places the one PlainSelect without a container new model is produced from the source model. into a SelectStatement element as mandated by the • Flock can be thought of as ETL with an automated SQL metamodel. “copy-unless-otherwise-stated” default strategy. Some of the OCL2PSQL mappings had to produce sig- It is well suited for model migration tasks, where nificantly large SQL subtrees: to keep the code concise, a model has to undergo small changes from one a library of EOL operations (utilities.eol) was cre- version of a metamodel to the next. ated and reused from the ETL rules. This library largely • The Epsilon Wizard Language (EWL) is a variant contained a set of functions for simple creation of SQL of EOL which allows users to define “wizards” model elements, an implementation of the OCL2PSQL that users can manually trigger on specific model functions for listing the free variables in an OCL expres- elements, performing small in-place transforma- sion 𝑒 (FVars(𝑒)) and for listing the source variables that tions (perhaps with some simple user interaction). the value of a subexpression 𝑒′ of the OCL expression 𝑒 depends upon (SVars𝑒 (𝑒′ )), and several other miscella- Out of these languages, ETL was chosen since the neous functions. As a simple example, Listing 2 shows original formulation of OCL2PSQL as a collection of re- the code needed to transform OCL integer literals to the cursive mappings by source type was a close match to target SQL metamodel. the rules used by ETL. For the most part, each of those The ETL rules are for the most part a direct one-to- mappings was translated into an ETL rule, and calls to one translation from the descriptions at the end of the map𝑒 (𝑒) were translated into e.equivalent() calls in ETL. OCL2PSQL case, except for two changes. The e.equivalent() built-in operation retrieves the SQL The first change was considering one special case listed Listing 2: Excerpt of ETL for stage 0 (integer literals) 1 import ’utilities.eol’; 2 3 /∗ 4 ∗ All these boil down to: 5 ∗ mape(l) = SELECT l as res, 1 as val 6 ∗/ 7 8 rule IntLiteral 9 transform e:OCL!IntegerLiteralExp 10 to ps:SQL!PlainSelect { 11 ps.selItems.add(longSelectItem(’res’, e.integerValue.asLong())); 12 ps.selItems.add(longSelectItem(’val’, 1l)); 13 } in the original OCL2PSQL paper [1] but not in the case Listing 3: Fix for “subtree stealing” in ETL paper. The mapping of collect and exists in the original OCL2PSQL paper covered the case when 𝑣 ∈ / 𝐹 𝑉 𝑎𝑟𝑠(𝑏), operation copyIfContained(value) { 1 but this mapping had been omitted from the OCL2PSQL var emfTool = new Native( 2 case description. It turned out that this special case was "org.eclipse.epsilon.emc.emf.tools.EmfTool"); 3 needed for some of the queries, e.g. challenge 0 in stage if (value.eContainer.isDefined()) { 4 4 (Car.allInstances()−>collect(c|5)). return emfTool.ecoreUtil.copy(value); 5 The second change was due to an unexpected inter- } 6 action between the recursive approach used to define return value; 7 OCL2PSQL, the definition of SubSelect.selectBody as a } 8 containment reference in the SQL metamodel, and how the ETL e.equivalent() operation works. ETL will only apply a certain rule once to each matching source model element, and from them on e.equivalent() will always and if so it performed a deep cloning of the SQL subtree. return the same counterpart in the target model (e.g. the A better fix (which unfortunately would have required exact same object). This resulted in some SQL queries a rewriting of the input files for this case) would be to “losing” the body of their SubSelect objects to other sub- change the SQL metamodel so that SubSelect.selectBody is trees of the SQL model, as they also needed the mapping no longer a containment reference, and the same Plain- for that part of the OCL expression. Select can be reused from multiple SubSelect model For instance, consider the final challenge: elements. Even further, this suggests that the OCL2PSQL map- Car.allInstances()−>exists(c| ping really produces a SQL expression graph (where some c.owners−>exists(p| subexpressions are reused) rather than a SQL abstract p.name = ’Peter’)) syntax tree. Instead of running the same subquery from several places, it may be advisable to redefine OCL2PSQL In this query, the SQL mapping of c.owners−>exists(...) so it produces a sequence of SQL queries rather than a and the SQL mapping of p.name both require using the single large SQL query: it would run these reused sub- mapping of c.owners as a subquery. ETL successfully queries first, and then provide their results to the higher- maps c.owners to SQL, but EMF will not allow a single level queries. Otherwise, there may be a risk that the model element to be contained from more than one place. SQL query could grow exponentially if sufficiently large Since the mapping of c.owners−>exists(...) will complete subqueries have to be duplicated across several locations. last, it will effectively “steal” the subquery representing Overall, the transformation required writing 14 rules c.owners from the mapping of p.name. across 531 lines of ETL code, with a support library of The fix for this issue turned out to be simple, as shown EOL operations that was 213 lines long. These line counts in Listing 3. All uses of the e.equivalent() operation were included whitespace and comments: if these are excluded, wrapped into a new EOL operation: the operation tested the line counts are reduced to 305 lines of ETL code and if this “stealing” was about to take place (i.e. if the Plain- Select was already contained in another SubSelect), Listing 4: Excerpt of EGL to generate SQL query text Listing 5: SQL-specific LanguageFormatter used to in- 1 [%= SelectStatement.all.first.generate() %][% dent the SQL query text 2 private static class SQLFormatter 1 3 @template extends LanguageFormatter 2 4 operation SelectStatement generate() { %] { 3 5 [%=self.selectBody.generate()%]; private static final String increasePattern = "\\(\\s∗$"; 4 6 [% } private static final String decreasePattern = "^\\)"; 5 7 6 8 @template public SQLFormatter() { 7 9 operation PlainSelect generate() { %] super(Pattern.compile(increasePattern, 8 10 SELECT Pattern.MULTILINE), 9 11 [% for (si in self.selItems) { %] Pattern.compile(decreasePattern, 10 12 [%=si.generate() + (hasMore ? "," : "")%] Pattern.MULTILINE)); 11 13 [% } } 12 14 if (self.fromItem.isDefined()) { %] } 13 15 FROM [%=self.fromItem.generate() %] 16 [% } 17 for (join in self.joins) {%] 18 [%=join.generate()%] Listing 6: SQL query for challenge 0 in stage 1, as gener- 19 [% } ated by EGL 20 if (self.whereExp.isDefined()) { %] SELECT 1 21 WHERE [%=self.whereExp.generate() %] TEMP_left.res = TEMP_right.res AS res, 2 22 [% } 1 AS val 3 23 if (self.groupBy.isDefined()) { %] FROM ( 4 24 [%= self.groupBy.generate() %] SELECT 5 25 [% 2 AS res, 6 26 } 1 AS val 7 ) AS TEMP_left 8 JOIN ( 9 SELECT 10 154 lines of EOL code2 . 3 AS res, 11 1 AS val 12 ) AS TEMP_right; 13 4. Model-to-text transformation to SQL with EGL The reference solution included a model-to-text transfor- operations, which allow EGL templates in their body and mation that produced the SQL query to be run in MySQL return strings which can be used within expressions (e.g. from the SQL model. During the development of this so- for concatenating separators, as in line 12). lution, it was found that the generated SQL was difficult The script also uses some of the built-in Epsilon vari- to read in the presence of multiple levels of subqueries, ables: hasMore is a built-in Epsilon variable available in as it was entirely on one line. loops which is true if and only if there are more values In order to improve the readability of the SQL queries after the current one. and help with the debugging, an alternative implementa- One useful feature in EGL is its ability to integrate for- tion was written in 96 lines of EGL. The EGL template tra- matters that postprocess the generated text. In particular, verses the SQL model recursively from the root Select- the EGL LanguageFormatter was customised for SQL Statement, breaking up SELECT statements, CASE ex- (as shown in Listing 5) to automatically indent the lines pressions, joins, and subqueries across multiple lines. of the SQL script to improve readability, while keeping Listing 4 shows an excerpt of the EGL script: the first the EGL script as simple as possible. This class only re- line is the entry point of the entire script, kicking off quires the regular expressions that should increase and the recursive descent of the SQL model from the Select- decrease the indentation level after a match. Using this Statement. The EGL script makes heavy use of template script, queries are generated in the more readable form 2 shown in Listing 6. These counts were obtained using the count-etl-lines.sh script included in the solution folder. 1. Transform (JVM warmup) 2. Transform (avg over 10 runs) 3. Query test 1.2 1.1 1.0 0.9 Seconds 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0 1 2 0 1 2 0 0 1 0 1 2 0 1 0 1 0 1 2 3 0 C C C C C C C C C C C C C C C C C C C C C S0 S0 S0 S1 S1 S1 S2 S3 S3 S4 S4 S4 S5 S5 S6 S6 S7 S7 S7 S7 S8 Figure 1: Execution times in seconds per stage and challenge, for the first warmup execution of the transformation, the average of 10 additional runs of the transformation, and the test of the generated SQL statement. 5. Results the transformation was then run 10 more times within the same JVM, and the average execution time was recorded. After implementing the ETL model-to-model transfor- This was followed by a single execution of the generated mation and the EGL model-to-text transformation, Java SQL query, to test if the expected results were produced. code to encapsulate these transformations and integrate Whereas the execution times for the “warmup” runs them with the TTC benchmark framework was added. are between 0.9s and 1.2s, the average execution times are The transformations passed all correctness cases for all much smaller and comparable to other solutions of the scenarios across all stages and challenges. contest, ranging between 0.1s and 0.25s. Test execution In terms of execution time, the transformations were times ranged between 0.3s and 0.4s. run on a Lenovo X1 laptop with an i7-6600U CPU run- ning at 2.60GHz with 16GiB of physical RAM, running Ubuntu Linux 20.04.2 LTS with Linux 5.4.0-74-generic References and the Oracle JDK 11.0.8. The default Java memory allocation settings were used (no -Xmx or other JVM op- [1] H. Nguyen Phuoc Bao, M. Clavel, OCL2PSQL: An tions were given). The Docker image provided by the OCL-to-SQL code-generator for model-driven engi- OCL2PSQL case authors was used to run MySQL, using neering, in: T. K. Dang, J. Küng, M. Takizawa, S. H. Docker Engine 20.10.7. Bui (Eds.), Future Data and Security Engineering, The transformation and test times are shown in Fig- Springer International Publishing, Cham, 2019, pp. ure 1: the transformation times include both the ETL 185–203. doi:10.1007/978-3-030-35653-8_13. model-to-model transformation and the EGL model-to- [2] D. S. Kolovos, R. F. Paige, F. Polack, The Epsilon text transformation, in order to mimic the two trans- Object Language (EOL), in: Model Driven Architec- formations done by the reference implementation (the ture - Foundations and Applications, Second Euro- proper OCL2PSQL transformation, and a model-to-model pean Conference, ECMDA-FA 2006, Bilbao, Spain, transformation between a 3rd-party JSqlParser meta- July 10-13, 2006, Proceedings, 2006, pp. 128–142. model and the EMF-based metamodel). It was noted doi:10.1007/11787044_11. during the open peer review stage of the contest that the first execution of the transformation was considerably slower than later executions, due to Java class-loading and just-in-time recompiler warmup times. In order to obtain more representative results from a typical user (who would have a long-running Java process running the transformation repeatedly for different OCL queries),