=Paper= {{Paper |id=Vol-2513/paper1 |storemode=property |title=Mapping OCL into SQL: Challenges and Opportunities Ahead |pdfUrl=https://ceur-ws.org/Vol-2513/paper1.pdf |volume=Vol-2513 |authors=Manuel Clavel,Hoàng Nguyễn Phước Bảo |dblpUrl=https://dblp.org/rec/conf/models/ClavelB19 }} ==Mapping OCL into SQL: Challenges and Opportunities Ahead== https://ceur-ws.org/Vol-2513/paper1.pdf
        Mapping OCL into SQL: Challenges and
               Opportunities Ahead ?

                Manuel Clavel1( ) and Hoàng Nguyễn Phước Bảo2
               1
                 Vietnamese-German University, Bình Dương, Vietnam
                            manuel.clavel@vgu.edu.vn
               2
                 Vietnamese-German University, Bình Dương, Vietnam
                            ngpbhoang1406@gmail.com



        Abstract. In this paper, we discuss some of the challenges and oppor-
        tunities that arise when mapping OCL into SQL. For the challenges,
        we review the past proposals, evaluating their key design decisions and
        limitations. For the opportunities, we highlight the key role that OCL-
        to-SQL code-generators can play in model-driven software development.
        By no means we pretend to exhaust the subject: other challenges and
        opportunities can be identified and brought up. Our present goal is to
        stir up again the discussion on this subject among the OCL community.

        Keywords: OCL · SQL · Code generation


1     Introduction

Model-driven engineering (MDE) aims to develop software systems using models
as the driving force. Models are artifacts that specify the different aspects of the
intended software system. Appropriate code-generators should then bridge the
gap between models and executable code.
    The Unified Modeling Language [12] is the facto standard modeling language
for MDE. Originally, UML was conceived as a graphical language: models were
defined using diagrammatic notation. However, it soon became clear that UML
diagrams were not expressive enough to specify certain aspects of software sys-
tems. To address this limitation, the Object Constraint Language (OCL) [11]
was added to the UML standard. OCL is a textual language, with a semi-formal
semantics. It can be used to specify in a precise, unambiguous way complex con-
straints and queries over models. For example, to define integrity constraints and
authorization constraints in the context of secure database-centric applications
model-driven development [4].
    Several mappings from OCL to SQL have been proposed in the past [5, 7,
8, 10, 13]. The limitations of these mappings, when used as OCL-to-SQL code-
generators, reveal some of the non-trivial challenges ahead. We can organize
these challenges into two groups. The first group contains the challenges related
?
    Copyright ©2019 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).




                                          3
to language coverage, i.e., how much of the OCL language a mapping can cover.
The second group contains the challenges related to execution time efficiency,
i.e., how long it takes to execute a query generated by a mapping.
     Interestingly, the limitations of the past mappings from OCL to SQL also
show that the opportunities ahead are not insignificant. In a nutshell, correctly
implementing in SQL complex queries is not an easy task, and, arguably, a more
difficult task than specifying them in OCL. Furthermore, implementing complex
queries in SQL, which execute efficiently on a large database, is a task often
reserved for SQL connoisseurs. Hence, the opportunities ahead for an OCL-to-
SQL code generator are plentiful.

Organization The rest of the paper is organized as follows. In Section 2 we
introduce the main example that we use throughout the paper. Next, in Section 3
we review the current mappings from OCL to SQL, evaluating their key design
decisions and limitations. Then, in Section 4 we discuss some of the challenges
that arise when mapping OCL into SQL, especially from the point of view of for
the execution time efficiency of the queries generated by the mapping. Finally,
in Section 5 we conclude with some remarks and propose future work.


2   Our Example
To evaluate the limitations of the current mappings from OCL to SQL, and to
illustrate the challenges and opportunities ahead, we use throughout this paper
the following example. Consider the diagram CarOwnership shown in Figure 1. It
models a simple domain, where there are only cars and persons. The persons can
own cars (they are their owners), and, logically, the cars can be owned by persons
(they are their ownedCars). No restriction is imposed regarding ownership: a
person can own many different cars (or none), and a car can be owned by many
different persons (or by none). Finally, each car can have a color, and each person
can have a name.
    The CarOwnership model can be implemented in SQL as a database CarDB
containing:
 – A table Car to store the cars, with a column color to store the color of each
   car, and a column Car_id to store the (primary) key of each car.
 – A table Person to store the persons, with a column name to store the name
   of each person, and a column Person_id to store the (primary) key of each
   person.
 – A table Ownership to store the many-to-many relationship of ownership
   between cars and persons, with columns ownedCars and owners storing the
   (foreign) keys of the corresponding cars and persons.
In the sections that follow, we will use this implementation, shown in Fig-
ure 2. Notice that we have not added other indexes to the tables Car, Person,
and Ownership excepts those created by declaring the columns Car_id and
Person_id as primary keys, and the columns ownedCars and owners as foreign




                                        4
keys. When executing queries on the database CarDB we will consider different
scenarios. In particular, CarDB(n) will denote an instance of the database CarDB
containing 10n cars and 10(n−1) persons, where each car is owned by one per-
son and each person owns 10 different cars, and each car has a color different
from ’no-color’, and each person has a name different from ’no-name’. Fi-
nally, when executing queries on the database CarDB, we use a server machine,
with Intel(R) Xeon(R) CPU E5-2620 v3 at 2.40GHz with 16 GB RAM, using
MySQL 5.7.25.3 The execution times reported in the following sections corre-
spond to the arithmetic mean of 50 executions. The figures reported are given
in seconds, unless otherwise stated.




                         Fig. 1. The CarOwnership model




3     Mappings from OCL to SQL

In this section we review the current mappings from OCL to SQL, evaluating
their key design decisions and their limitations.4 Unfortunately, in the majority
of cases, we have not been able to access the tools implementing the mappings,
and we must base our analysis in the published reports.


3.1   OCL2SQL

To the best of our knowledge, OCL2SQL [5, 6, 10] was the first attempt of map-
ping OCL into SQL. Based on a set of transformation templates, OCL2SQL
3
  Although we have used MySQL for running our examples, we believe that our overall
  results should apply likewise to the other SQL engines.
4
  Logically, the first key design decision is how to map the OCL contextual models
  to SQL schemata (the so-called OR mapping). To the best of our knowledge, the
  mappings reviewed here use essentially the same underlying OR-mapping (classes
  are mapped to tables, attributes to columns, and many-to-many associations to
  tables with appropriate foreign-keys). An interesting discussion, already introduced
  in [6], is how to make the OCL-to-SQL mappings independent of the underlying
  OR-mappings. This discussion is, however, outside the scope of this paper.




                                         5
CREATE TABLE Car (
  Car_id int(11) NOT NULL AUTO_INCREMENT,
  color varchar(255) DEFAULT NULL,
  PRIMARY KEY (Car_id)
) ENGINE=InnoDB

CREATE TABLE Person (
  Person_id int(11) NOT NULL AUTO_INCREMENT,
  name varchar(255) DEFAULT NULL,
  PRIMARY KEY (Person_id)
) ENGINE=InnoDB

CREATE TABLE Ownership (
  ownedCars int(11) DEFAULT NULL,
  owners int(11) DEFAULT NULL,
  KEY fk_ownership_ownedCars (ownedCars),
  KEY fk_ownership_owners (owners),
  CONSTRAINT ownership_ibfk_1
     FOREIGN KEY (ownedCars) REFERENCES Car (Car_id),
  CONSTRAINT ownership_ibfk_2
     FOREIGN KEY (owners) REFERENCES Person (Person_id)
) ENGINE=InnoDB


         Fig. 2. CarDB: An SQL implementation of the CarOwnership model


automatically generates SQL queries from OCL expressions. An interesting ap-
plication of OCL2SQL is described in [14]. OCL2SQL only covers (a subset of)
OCL boolean expressions. Moreover, the high execution time for the queries gen-
erated by OCL2SQL makes it impractical, as an OCL-to-SQL code-generator, for
large scenarios. For example, [3] reported that the query generated by OCL2SQL
for the expression:
Writer.allInstances−>forAll(a|a.books−>forAll(b|b.page >300))
takes more than 45 minutes to execute on a scenario consisting of 102 writers
and 105 books, each writer being the author of 103 books and each book having
exactly 150 pages.5

3.2   MySQL4OCL
MySQL4OCL [8] is defined recursively over the structure of OCL expressions.
For each OCL expression, MySQL4OCL generates a stored procedure 6 that, when
5
  The experiment was carried out on a machine with Intel Pentium M 2.00GHz
  600MHz, and 1GB of RAM.
6
  Stored procedures are routines (like a subprogram in a regular computing language)
  that are stored in the database. Stored procedures provide a special syntax for local
  variables, error handling, loop control, if-conditions, and cursors, which allow the
  definition of iterative structures.




                                          6
called, creates a temporary table containing the values corresponding to the eval-
uation of the given expression. More concretely, for the case of iterator expres-
sions, the stored procedure generated by MySQL4OCL repeats, using a loop,
the following process: i) it fetches from the iterator’s source collection a new
element, using a cursor ; ii) it calls the stored procedure corresponding to the it-
erator’s body with the newly fetched element as a parameter ; iii) it processes the
resulting temporary table according to the semantics of the iterator’s operator.
    Although cursors and loops (inside stored procedures) allow MySQL4OCL to
cover a large subclass of the OCL language (including nested iterators), they also
bring about a fundamental limitation to the use of MySQL4OCL as an OCL-to-
SQL code-generator: they often impede the highly-optimized execution strategies
implemented by SQL engines. Still, the interested reader can find in [8] a prelim-
inary discussion about the efficiency of the code produced by MySQL4OCL, as
well as a comparison with previous known results on evaluating OCL expressions
on medium-large scenarios. Unfortunately, the writer-book example mentioned
before for the case of OCL2SQL is not included in this comparison.


3.3   Incremental OCL constraints checking

An interesting method for evaluating OCL constraints consists of checking if the
SQL query characterizing the tuples that violate the given constraint returns
the empty set. This method was first introduced in [6] and then exploited in [13]
when incrementally checking OCL constraints. As in the case of OCL2SQL, this
method is limited to OCL boolean expressions. With regards to execution time
efficiency, the figures provided in [13] are not easily comparable with normal ex-
ecution times, since the generated SQL queries are computed in an incremental
way. More specifically, “whenever a change in the data occurs, only the con-
straints that may be violated because of such change are checked and only the
relevant values given by the change are taken into account.”


3.4   SQL-PL4OCL

SQL-PL4OCL [7] closely follows the design of MySQL4OCL and, consequently,
bears the same fundamental limitation regarding execution time efficiency, as we
illustrate with an example below. Still, concerning its predecessor, SQL-PL4OCL
simplifies the definition of the mapping, improves the execution time of the
generated queries (by reducing the number of temporary tables), and implements
some of the features that were left in [8] as future work: namely, handling the
null value and supporting (not parametrized) sequences.

Example 1. To illustrate the costly consequences, in terms of execution time
efficiency, of using cursors and loops to implement OCL iterator expressions,
consider the following OCL query:

Car.allInstances()−>select(c|c.owners−>exists(p|p.name =’no−name’))−>size().




                                        7
   The stored procedure generated by SQL-PL4OCL for this query is given in [7]
(Example 11). Now, if we call this stored procedure on the scenarios CarDB(3),
CarDB(4), CarDB(5), CarDB(6), and CarDB(7), we obtain the following execution
times:7

                      CarDB(3) CarDB(4) CarDB(5)    CarDB(6) CarDB(7)
      SQL-PL4OCL          0.76     6.17 1min 3.02 10min 24.00 > 90min

   We will go back to this example at the end of Section 4. Here we simply note
that the above query, when implemented in SQL in the expected way (without
cursors and loops), takes less than 1 second to execute on the scenario CarDB(7),
while the stored procedure generated by SQL-PL4OCL (using cursors and loops)
did not finish its execution after 90 minutes.


3.5     OCL2PSQL

As part of our on-going research, we are implementing a new mapping from
OCL to SQL called OCL2PSQL. The interested reader can experiment with our
current prototype at:

http://researcher-paper.ap-southeast-1.elasticbeanstalk.com/

    As in the case MySQL4OCL/SQL-PL4OCL, our mapping is defined recur-
sively over the structure of OCL expressions. However, OCL2PSQL diverts com-
pletely from MySQL4OCL/SQL-PL4OCL in that it does not rely on the use of
cursors and loops for implementing iterator expressions, neither does it creates
temporary tables for storing intermediate results. Instead, i) for intermediate
results, it uses standard subqueries and ii) for iterator expressions, it adds to the
subquery corresponding to the iterators’ body an extra column corresponding to
the iterator’s variable. Intuitively, this column stores the element in the iterator’s
source that is “responsible” for the result that is stored in the corresponding row.
    Although OCL2PSQL seems to avoid the limitation of MySQL4OCL/SQL-
PL4OCL in terms of execution time efficiency, its implementation is still un-
dergoing and it cannot yet successfully address some of the other challenges
discussed below.


4     Challenges

After reviewing the key design decisions and limitations of the existing mappings
from OCL to SQL, we highlight now some of the challenges ahead. We organize
these challenges into two groups. The first group contains the challenges related
to language coverage, i.e., how much part of the OCL language a mapping covers.
The second group contains the challenges related to execution time efficiency,
i.e., how much time it takes to execute a query generated by a mapping.
7
    The definition of the scenarios CarDB(n) is given in Section 2.




                                            8
4.1    Language coverage

OCL [11] is a language for specifying constraints and queries using a textual
notation. Every OCL expression is written in the context of a model, called
the contextual model. OCL is strongly typed. Expressions either have a primi-
tive type, a class type, a tuple type, a collection type, or a special type. OCL
provides a dot-operator to access the values of attributes and association-ends.
OCL also provides standard operators on primitive data, tuples, and collections,
and special operators to iterate over collections, such as forAll, exists, select,
and collect. Collections can be sets, bags, ordered sets and sequences, and can
be parametrized by any type, including other collection types. To represent un-
definedness, OCL provides two constants, namely, null and invalid, of a special
type. Intuitively, null represents an unknown or undefined value, whereas invalid
represents an error or exception.
    The challenges regarding language coverage are many, including mapping
complex iterator expressions such as the transitive closure expressions. Here
we only highlight two challenges, which, in our opinion, are among the most
“pressing” ones, in the sense that they deal with features of the OCL language
that are commonly used.
    Since SQL does not natively support (parametrized) structured collections,
mappings from OCL to SQL would need to explicitly encode this “structure” in
the generated queries, as proposed in [8]. This is the approach followed in [7] to
support (not parametrized) sequences. Currently, none of the existing mappings
can support parametrized collections.
    Although SQL supports the null -value, it is not obvious how (or if) it can be
used to implement the null constant in OCL, and it is even less obvious how the
constant invalid can be implemented in SQL. Currently, among the published
mappings, only [7] covers OCL expressions dealing with OCL null -undefinedness.
None of the existing mappings can support OCL invalid.


4.2    Execution time efficiency

SQL engines have highly optimized strategies for executing queries over large
databases. We discuss below some of the optimization “tips” that OCL-to-SQL
code-generators should be aware of, to generate queries that can execute effi-
ciently on large databases.8
    To illustrate our discussion, we include below examples of SQL queries ex-
ecuted in CarDB(6) and CarDB(7). As explained before, MySQL4OCL/SQL-
PL4OCL cannot efficiently handle (nested) iterators over large collections. Thus,
in our comparisons, we can only include the execution times corresponding to
queries generated by OCL2PSQL.
8
    Nevertheless, we should be aware that “development is ongoing, so no optimiza-
    tion tip is reliable for the long term.” (MySQL 8.0 Reference Manual, (13.2.11.11
    Optimizing Subqueries).




                                          9
Indexes Indexes are used by SQL engines to improving the speed of operations
in a table. Indexes can be created using one or more columns. The following
example illustrates well the difference between using indexes or not, in terms of
execution time efficiency.

Example Suppose that we want to know the number of cars whose color is ‘no-
color’. In SQL we can use the following query:
SELECT COUNT(*)
   FROM (SELECT * FROM Car WHERE color = ’no-color’) AS TEMP;
     In OCL we can specify the same query using the following expression:
Car.allInstances()−>select(c|c.color =’no−color’)−>size()

     Currently, OCL2PSQL translates this expression as follows:
SELECT COUNT(TEMP_select_body.ref_c) AS res, TRUE AS val
FROM
  (SELECT TEMP_LEFT.res = TEMP_RIGHT.res AS res,
    TEMP_LEFT.ref_c AS ref_c, TEMP_LEFT.val_c AS val_c
   FROM
     (SELECT color AS res, Car_id AS ref_c,
      TRUE AS val_c FROM Car) AS TEMP_LEFT
   JOIN
     (SELECT ’no-color’ AS res, TRUE AS val) AS TEMP_RIGHT
  ) AS TEMP_select_body
WHERE TEMP_select_body.res = TRUE;

    If we execute the above queries on CarDB(6) and CarDB(7), with and without
indexing the column color (columns CarDB(n)[color] and CarDB(n), respectively)
in the table Car, we obtain the following results.9

              CarDB(6) CarDB(6)[color] CarDB(7) CarDB(7)[color]
     SQL          0.22           0.00      2.48           0.00
     OCL2PSQL     0.24           0.24      3.00           2.60

     As expected, the execution performance for the SQL-query dramatically im-
proves when indexing the column color. However, the query generated by OCL-
2PSQL does not prompt the SQL engine to use the index color, and therefore its
performance does not improve. Advanced/smart OCL-to-SQL code-generators
should automatically i) add indexes (for one or more columns) to the tables, and
ii) generate queries that benefit from the added indexes. The challenge here is to
add to the tables only the indexes that will increase the execution performance
for the given SQL-queries, since the insert and update-statements will naturally
take more time on tables having indexes.
9
    As reported in [7] (Query Q8, Figure 6: “Evaluation times”), the query generated
    by SQL-PL4OCL for a similar OCL expression takes 50.02 seconds to execute on a
    scenario like CarDB(6), on an Intel Core m7, 1.3 GHz, 8 GB RAM, using MySQL
    5.7.




                                         10
EXISTS function When using the EXISTS function, SQL engines are optimized
to stop processing when a row is returned. The following example illustrates well
this optimization.

Example Suppose that we want to know if there is at least one car whose color
is different from ‘no-color’. In SQL we can use the following query, which uses
the COUNT function:
SELECT COUNT(*) > 0
FROM (SELECT * FROM Car WHERE color <> ’no-color’) AS TEMP;

Alternatively, we can use the following query, which uses the EXISTS function:
SELECT EXISTS (SELECT * FROM Car WHERE color <> ’no-color’);

    On the other hand, we can specify in OCL the original query using the
following expression:

Car.allInstances()−>exists(c|c.color <>’no−color’)

      Currently, OCL2PSQL translates this expression as follows:
SELECT COUNT(*) > 0 AS res, TRUE AS val
FROM
  (SELECT TEMP_LEFT.res <> TEMP_RIGHT.res AS res,
   TEMP_LEFT.ref_c AS ref_c, TEMP_LEFT.val_c AS val_c
   FROM
     (SELECT color AS res, Car_id AS ref_c,
      TRUE as val_c FROM Car) AS TEMP_LEFT
     JOIN
     (SELECT ’no-color’ AS res, TRUE as val) AS TEMP_RIGHT
  ) AS TEMP_exists_body
WHERE TEMP_exists_body.res = TRUE;

   If we execute the above queries on CarDB(6) and CarDB(7), without indexing
the column color in the table Car, we obtain the following results.10
10
     Interestingly, [7] (Query Q10, Figure 6: “Evaluation times”) reports that the query
     generated by SQL-PL4OCL for the same OCL expression only takes 0.05 seconds
     to execute on a scenario like CarDB(6), on an Intel Core m7, 1.3 GHz, 8 GB RAM,
     using MySQL 5.7. However, this low execution time can be misleading. In a sense,
     it corresponds to the “best-case scenario”. MySQL4OCL/SQL-PL4OCL treats the
     forAll and exists iterators differently from other iterators. First of all, (the stored
     procedure generated by) MySQL4OCL/SQL-PL4OCL does not create a temporary
     table of the size of the source collection to store the intermediate results, but a
     temporary table with a single row. Secondly, (the stored procedure generated by)
     MySQL4OCL/SQL-PL4OCL stops the execution of the internal loop as soon as it
     finds an element in the source collection that makes the execution of the iterator’s
     body returns FALSE or TRUE, depending on whether the iterator is a forAll or an
     exists. Since all the cars in CarDB(6) have a color different from ’no-color’, (the stored
     procedure generated by) MySQL4OCL/SQL-PL4OCL stops the internal loop after
     just one iteration.




                                              11
                        CarDB(6) CarDB(7)
   SQL(using COUNT)         0.22     2.72
   SQL(using EXISTS)        0.00     0.00
   OCL2PSQL                 0.24     3.10

    As expected, the execution performance for the SQL-query dramatically im-
proves when using the EXISTS function instead of the COUNT function. OCL-
2PSQL currently uses the COUNT function to implement the exists iterator.
Thus, its performance does not benefit from the short-circuiting provided by
the EXISTS function. Advanced/smart OCL-to-SQL code generators should au-
tomatically translate OCL expressions using operators capable of short-circuiting
the execution, whenever possible. The challenges here are (i) to find both the
short-circuiting functions in SQL and the operators in OCL that are semanti-
cally “compatible” with each other, and (ii) to properly handle the case when
the former effectively short-circuits the execution of the query.


Joins versus correlated subqueries Mappings from OCL to SQL may try
to use correlated subqueries to implement OCL iterators. A correlated subquery
is a subquery that contains a reference to a table that also appears in the outer
query. However, only for certain cases, SQL engines are optimized for executing
correlated subqueries. The following example illustrates well this problem.

Example Suppose that we want to know the number of cars that have at least
one owner whose name is ’no-name’. In SQL we can use the following query,
which uses a correlated subquery:

SELECT COUNT(*) FROM Car AS TEMP
WHERE EXISTS
  (SELECT 1 FROM Ownership
   JOIN Person
   ON Person.Person_id = owners
   WHERE Person.name = ’no-name’
      AND TEMP.Car_id = ownedCars);

    Alternatively, we can also use the following query, which uses joins instead
of correlated subqueries:

SELECT COUNT(*)
FROM (SELECT COUNT(*) > 0 FROM Car
      JOIN Ownership
      on Car_id = ownedCars
      JOIN Person
      ON Person.Person_id = owners
      WHERE Person.name = ’no-name’
      GROUP BY Car_id) AS TEMP;




                                      12
    On the other hand, we can specify in OCL the original query using the
following expression:


Car.allInstances()−>select(c|c.owners−>exists(p|p.name =’no−name’))−>size()


      Currently, OCL2PSQL translates this expression as follows:


SELECT COUNT(TEMP_select_body.ref_c) AS res, TRUE as val
FROM
   (SELECT COUNT(*) > 0 AS res,
    TEMP_exists_body_bool_exp.ref_c AS ref_c
    FROM
       (SELECT TEMP_LEFT.res = TEMP_RIGHT.res AS res,
        TEMP_LEFT.ref_c AS ref_c, TEMP_LEFT.val_c AS val_c,
        TEMP_LEFT.ref_p AS ref_p, TEMP_LEFT.val_p AS val_p
        FROM
           (SELECT name AS res,
            TEMP_p.ref_c AS ref_c, TEMP_p.val_c AS val_c,
            TEMP_p.ref_p AS ref_p, Person_id IS NOT NULL as val_p
            FROM
               (SELECT owners AS res,
                TEMP_c.ref_c AS ref_c, TEMP_c.val_c AS val_c,
                ownedCars AS ref_p, ownedCars IS NOT NULL AS val_p
                FROM
                 (SELECT Car_id AS res, Car_id AS ref_c,
                  TRUE AS val_c FROM Car) AS TEMP_c
                LEFT JOIN Ownership
                ON ownedCars = TEMP_c.ref_c
                AND TEMP_c.val_c = TRUE) AS TEMP_p
             LEFT JOIN Person
             ON Person.Person_id = TEMP_p.ref_p
             AND TEMP_p.val_p = TRUE) AS TEMP_LEFT
          JOIN (SELECT ’no-name’ AS res,
          TRUE AS val) AS TEMP_RIGHT)
     AS TEMP_exists_body_bool_exp
     WHERE TEMP_exists_body_bool_exp.res = TRUE
     AND TEMP_exists_body_bool_exp.val_p = TRUE
     GROUP BY ref_c) AS TEMP_select_body
WHERE TEMP_select_body.res = TRUE;


   If we execute the above queries on CarDB(6) and CarDB(7), without indexing
the column name in the table Person, we obtain the following results.11

11
     As reported in Section 3, for the scenario CarDB(6), it takes over 10 minutes to
     execute the stored procedure generated by SQL-PL4OCL for this query; for the
     scenario CarDB(7), the execution did not finish after 90 minutes.




                                          13
                                 CarDB(6) CarDB(7)
      SQL (using correlation)       14.38 2min 59.52
      SQL (using joins)              0.04       0.28
      OCL2PSQL                       0.04       0.30

    As expected, the execution performance for the SQL-query improves dramat-
ically when using joins instead of correlated subqueries. OCL2PSQL uses joins
to implement iterators. The performance of the query generated by OCL2PSQL
is on par with the performance of the SQL-query implemented using joins. The
challenge here is to properly handle the case when the elements to be joined do
not match with each other.


5      Concluding Remarks and Future Work
The Object Constraint Language (OCL) plays a key role in adding precision to
UML models, and therefore it is called to be an important actor in model-driven
engineering (MDE). However, to fulfill this role, smart/advanced code-generators
must bridge the gap between UML/OCL models and executable code. This is
certainly the case for database-centric applications.
    In this paper we have reviewed the existing mappings from OCL to SQL,
evaluating their key design decisions and their limitations.12 We recognize that,
up to now, the main efforts have been placed in covering as much as possible
of the OCL language. Without underestimating that challenge, we want to em-
phasize the need to look at the other key challenge, namely, to generate, from
OCL expressions, SQL queries that can perform on par with SQL queries imple-
mented by professionals. In this regard, we have provided examples that show
that none of the existing OCL-to-SQL mapping is really up to this task, although
OCL2PSQL shows significant progress in that direction. We have also identified
some of the optimization “tips” supported by SQL engines that smart/advanced
OCL-to-SQL code-generators should be aware of, in order to generate queries
that can execute efficiently on large databases. A related challenge, which we
have not discussed here, is the readability of the SQL queries generated by the
OCL-to-SQL mappings. Ideally, the generated queries should be easy to un-
derstand and to modify, if needed. Arguably, this has not been the case up to
now, and therefore we consider it as an additional challenge for smart/advanced
OCL-to-SQL code-generators.
    We also recognize that implementing in SQL complex queries is not an easy
task; if fact, we can argue that it is a more difficult task than specifying them
in OCL. Suppose, for example, that we are interested in querying our database
12
     There have been also different proposals [1–3,9] in the past for what we may call OCL
     evaluators. These are tools that load first the scenario on which an OCL expression
     is to be evaluated and then evaluate this expression using an OCL interpreter. As
     reported in [3], the main problem with OCL evaluators is the time required for
     loading large scenarios.




                                            14
CarDB (without assuming that every car has at least one owner) about: i) if it
exists a car whose owners all have the name ’no-name’, and ii) how many cars
have at least one owner with no name declared yet. We can specify i) in OCL as
follows:
Car.allInstances()−>exists(c|c.owners−>forAll(p|p.name=’no−name’))

Similarly, we can specify ii) in OCL as follows:
Car.allInstances()−>select(c|c.owners−>exists(p|p.name.oclIsUndefined()))−>size()

We invite the reader to implement i) and ii) in SQL, and draw his/her own
conclusions.
   In our opinion, this state of affairs offers exciting opportunities for smart/ad-
vanced OCL-to-SQL code-generators. To prove our point, we want to propose
the following case study for the OCL (and database) community:

 – Take a group of students who have just taken a Database course, and ask
   them to write in SQL some queries (given in English). The students will
   be provided with the underlying data model (a class diagram or an ER
   diagram).
 – Take another group of students who have taken the same (or similar) Data-
   base course, plus a short-course on OCL. Ask them to write in OCL the same
   queries as before. The students will be provided with the same underlying
   data model as the other group.
 – Then, evaluate and compare the results, taking into consideration:
     • The correctness (from the semantic point of view) of the SQL queries
       (how many students wrote them correctly in SQL) versus the correctness
       of the OCL queries (how many students wrote them correctly in OCL).
     • The efficiency of the SQL queries (how long it takes to execute them
       on large scenarios) versus the efficiency of the SQL queries generated by
       the OCL-to-SQL code-generator of choice (how long it takes to execute
       them on the same large scenarios).

    Finally, we leave as another task for the OCL community to carry out a more
systematic, in-depth comparison of the different OCL-to-SQL code-generators,
including not only language-coverage and execution time efficiency, but also un-
derlying OR mappings, SQL constructs used (queries, views, stored procedures,
SQL dialects), and other implementations issues (RDBMS, software architecture,
and so on).


References
 1. T. Baar and S. Markovic. The RoclET tool. http://www.roclet.org/index.php,
    2007.
 2. D. Chiorean, M. Bortes, D. Corutiu, C. Botiza, and A. Carcu. An OCL environment
    (OCLE) 2.0.4. http://lci.cs.ubbcluj.ro/ocle/, 2005. Laboratorul de Cercetare
    in Informatica, University of BABES-BOLYAI.




                                        15
 3. M. Clavel, M. Egea, and M. A. G. de Dios. Building an efficient component for
    OCL evaluation. ECEASST, 15, 2008.
 4. M. A. G. de Dios, C. Dania, D. A. Basin, and M. Clavel. Model-driven devel-
    opment of a secure ehealth application. In M. Heisel, W. Joosen, J. Lopez, and
    F. Martinelli, editors, Engineering Secure Future Internet Services and Systems -
    Current Research, volume 8431 of LNCS, pages 97–118. Springer, 2014.
 5. B. Demuth and H. Hußmann. Using UML/OCL constraints for relational database
    design. In R. B. France and B. Rumpe, editors, UML, volume 1723 of LNCS, pages
    598–613. Springer, 1999.
 6. B. Demuth, H. Hußmann, and S. Loecher. OCL as a specification language for
    business rules in database applications. In M. Gogolla and C. Kobryn, editors,
    UML, volume 2185 of LNCS, pages 104–117. Springer, 2001.
 7. M. Egea and C. Dania. SQL-PL4OCL: an automatic code generator from OCL to
    SQL procedural language. Software & Systems Modeling, 2017.
 8. M. Egea, C. Dania, and M. Clavel. MySQL4OCL: A stored procedure-based
    MySQL code generator for OCL. ECEASST, 36, 2010.
 9. M. Gogolla, F. Büttner, and M. Richters. USE: A UML-based specification environ-
    ment for validating UML and OCL. Science of Computer Programming, 69:27–34,
    2007.
10. F. Heidenreich, C. Wende, and B. Demuth. A framework for generating query
    language code from OCL invariants. ECEASST, 9, 2008.
11. Object Management Group. Object constraint language specification version
    2.4. Technical report, OMG, February 2014. https://www.omg.org/spec/OCL/
    About-OCL/.
12. Object Management Group. Unified Modeling Language. Technical report, OMG,
    December 2017. https://www.omg.org/spec/UML/About-UML/.
13. X. Oriol and E. Teniente. Incremental checking of OCL constraints through
    SQL queries. In A. D. Brucker, C. Dania, G. Georg, and M. Gogolla, edi-
    tors, OCL@MoDELS, volume 1285 of CEUR Workshop Proceedings, pages 23–32.
    CEUR-WS.org, 2014.
14. H. Sneed, B. Demuth, and B. Freitag. A process for assessing data quality. In
    Proceedings - IEEE 6th International Conference on Software Testing, Verification
    and Validation Workshops, ICSTW 2013, 03 2013.




                                        16