Solving the Movie Database Case with QVTo

                Christopher Gerking                                Christian Heinzemann
               Software Engineering Group,                              Fraunhofer IPT,
                 Heinz Nixdorf Institute,                  Project Group Mechatronic Systems Design,
             University of Paderborn, Germany               Software Engineering, Paderborn, Germany
        christopher.gerking@uni-paderborn.de              christian.heinzemann@ipt.fraunhofer.de


       This paper proposes a solution to the IMDb movie database case of the Transformation Tool Contest
       2014. Our solution is based on the Eclipse implementation of the OMG standard QVTo. We imple-
       mented all of the tasks including all of the extension tasks. Our benchmark results show that QVTo
       is able to handle models with a few thousand objects.


1     Introduction

This paper proposes a solution to the movie database case [3] of the Transformation Tool Contest 2014.
The objective of the movie database case is to derive a set of performance results that indicate the ability
of model transformation languages to process large models with millions of objects. The case study is
based on the IMDb movie database that stores information about movies, actors, actresses, and ratings.
    We use QVT Operational Mappings (QVTo, [4]) for solving the different tasks of the movie database
case. QVTo is a textual, imperative model transformation language based on OCL [5] that is standardized
by the OMG. It natively supports metamodels specified in EMF [6] such as the provided IMDb meta-
model. In this paper, we use the QVTo implementation of the Eclipse Model to Model Transformation
(MMT) project1 .
    The Eclipse implementation of the QVTo standard is open source and already widely used in other
open-source and academical projects. It is used, for example, within the Graphical Modeling Framework
(GMF2 ) and in the Papyrus project3 . Recently, it has been used for translating software design models to
verification models [1] and for generating operational behavior specifications out of declarative ones [2].
    In our implementation, we created seven transformations for solving the different tasks of the movie
database case. We implemented the three main tasks and all of the extension tasks. Our implementation
demonstrates that QVTo enables a concise specification of the solutions. Four out of seven tasks require
less than 30 lines of code. Our benchmark results show that the Eclipse implementation of QVTo is
currently able to handle input models with a few thousand objects in a reasonable amount of time.
    The paper is structured as follows. We first briefly review the movie database case in Section 2 and
QVTo in Section 3. Thereafter, Section 4 describes our solution that we implemented in QVTo. We
provide benchmark results concerning runtime of our transformations in Section 5 before concluding the
paper in Section 6.

    1 http://projects.eclipse.org/projects/modeling.mmt.qvt-oml
    2 http://eclipse.org/gmf-tooling/
    3 https://www.eclipse.org/papyrus/


                                                                           c C. Gerking & C. Heinzemann
Submitted to:
                                                                           This work is licensed under the
TTC 2014
                                                                           Creative Commons Attribution License.
2                                                          Solving the Movie Database Case with QVTo


2     The Movie Database Case
The movie database case is based on a simple metamodel for storing movies and the actors who played in
these movies [3]. Therefore, it provides the classes Movie, Actor, and Actress. In addition, the metamodel
comprises classes to represent a Couple or Clique. A clique consists of n persons that played together in
at least 3 movies while n ≥ 2. A couple is a clique with n = 2, i.e., two persons who played together in
at least 3 movies.


3     QVT Operational Mappings
QVT Operational Mappings (QVTo, [4]) is a textual, imperative language for defining unidirectional
model-to-model transformations. The current Eclipse implementation of QVTo natively supports the
specification of model transformations based on EMF metamodels. Since QVTo is an imperative ex-
tension of OCL [5], the Eclipse implementation also provides access to numerous OCL operations that
enable to build collections (e.g., sets) of objects.
    A QVTo transformation refers to one or more input metamodels and one or more output metamod-
els. Then, a transformation run transforms instances of the input metamodels to instances of the output
metamodels. QVTo also enables inplace transformations where one and the same model instance acts
as both input and output, enabling model modifications as required for the movie database case. Each
transformation has a name and a unique entry point denoted by main(). Using so called configuration
properties, QVTo supports the parametrization of transformations by means of primitive data types.
    In our implementation, we use mappings, helpers, and constructors. A mapping translates an object
of an input model to an object of an output model. A helper may be used to perform auxiliary com-
putations but also for creating additional objects in the output model. Finally, constructors enable the
parametrized instantiation of classes which are part of the output metamodel.


4     Solution
In the following, we present our solutions to the tasks that were given as part of the IMDb movie database
case. For each of the tasks, we provide a QVTo model transformation operating on concrete instances of
the IMDb metamodel. Our overall design goal is to keep the solutions concise with respect to the trans-
formation size. Thus, we prefer using high-level native language features provided by QVTo, avoiding
more complex manual implementations by means of low-level constructs whenever possible.


4.1   Generating Test Data
For the generation of test data, we developed a QVTo transformation with a single IMDb output model.
In addition, we declare a transformation parameter N using a QVTo configuration property such that the
resulting amount of test data may be configured along with the invocation of the transformation.
    The implementation of our transformation reflects the structure of the given Henshin specification [3]
in terms of imperative operation calls. Thus, the given Henshin units correspond to dedicated helper
operations parametrized by means of integer values. The actual instantiation takes places inside dedicated
constructor operations for the types Movie, Actor, and Actress.
C. Gerking & C. Heinzemann                                                                                                       3


4.2     Finding Couples
The solution for this task is based on an inplace transformation that accepts an existing IMDb input
model and will output the same model after manipulating it. The required manipulation for this task is
the addition of Couple elements as defined in Section 2. In order to detect the set of couples, our approach
is to traverse all pairs of persons. We achieve this by iterating over the set of persons using an imperative
forEach loop with two iterator variables. During the iteration, we create a Couple element for every
unique pair that is a valid couple:
p e r s o n s −>forEach ( p1 , p2 ) {
        S e t {p1 , p2}−>map c r e a t e C o u p l e ( ) ;
};

    In order to achieve uniqueness, we must not create a new Couple if an existing one already refers to
the same two persons (regardless of their order). Therefore, we use an operational mapping operation to
generate a Couple for every valid and unique input pair. The mapping is declared as follows:
mapping S e t ( P e r s o n ) : : c r e a t e C o u p l e ( ) : C o u p l e when { s e l f −>i s V a l i d C o u p l e ( ) }

    An operational mapping is an imperative operation that behaves according to a partial mathematical
function, i.e., maps each input to at most one output. Thus, the first invocation of a mapping with
a certain input will potentially create the appropriate result. However, reinvoking the mapping with an
equal input will not produce another result, but return the cached result of the prior invocation. To exploit
this mapping behavior for the creation of unique couples, we represent input pairs using the OCL Set
type. The equality behavior of this built-in collection type ensures that two pairs compare equal if they
refer to the same persons regardless of their ordering.
    In order to not generate any invalid couples, we check the validity inside a when clause of the
createCouple mapping. This causes a mapping invocation to be skipped whenever the input Set is
not a pair or has less than three common movies.

4.3     Computing Average Rankings
Task 3 and Extension Task 3 of the IMDb movie database case require to compute the average rankings
for couples or cliques. Our solution to this challenge is based on a QVTo inplace transformation. We
traverse the set of groups inside the given IMDb model by means of an imperative forEach loop. During
each iteration, we compute the average rating for one of the detected groups. To obtain the sum of
ratings for the common movies, we use the sum operation defined for OCL collections. We compute the
arithmetic mean by simply dividing the sum of ratings by the number of movies:
c o u p l e . a v g R a t i n g : = c o u p l e . commonMovies . r a t i n g −>sum ( ) / c o u p l e . commonMovies−>s i z e ( ) ;


4.4     Computing the Top-15 Groups
Extension Task 1 and 4 require to query information from the given IMDb model. In particular, the
challenge is to query the top-15 couples/cliques according to their average ratings and their number of
common movies. Hence, our QVTo-based solutions declare only input and no output models.
    In order to obtain the top-15, we sort all existing groups as required. Whereas this approach consti-
tutes a computational overhead since only the top-15 is of interest, it allows for a concise solution. By
using the predefined sortedBy operation for OCL collections, we avoid more complex manual imple-
mentations. The listing below illustrates the sorting of couples by average rating. Based on the sorted
4                                                                            Solving the Movie Database Case with QVTo


sequence of couples or cliques, we iterate over the first 15 elements and print out the desired information
about each group using QVTo’s log operation.
var s o r t e d = c o u p l e s −>s o r t e d B y (− a v g R a t i n g ) ;


4.5     Finding Cliques
Our solution to Extension Task 2 comprises a QVTo configuration property that represents the desired
size n of the cliques to be obtained. The major challenge in comparison to Task 2 is to retrieve all
candidate sets consisting of n persons in order to check each of these sets for being a valid clique. Since
n is not fixed to a certain value (such as n=2 for Task 2), it is not possible to solve this problem using
a fixed number of iterator variables as described in Section 4.2. Instead, we construct the candidate sets
explicitly inside a helper operation. The signature below illustrates that the operation accepts a set of
persons and returns all candidate subsets for cliques:
helper Set ( Person ) : : c a n d i d a t e s ( ) : Set ( Set ( Person ) )

    The implementation of the candidates operation is based on an incremental approach. Starting
with an empty set of persons, we iterate over every given person and create new sets by adding the
current person to each of the sets already created before.
    In order to save runtime, we evaluate the validity of any constructed set on the fly. This means that we
discard a constructed set if the number of common movies goes below three, because no valid extension
to a clique with three or more common movies exists. In addition, our solution does not construct sets
with more than n persons, which would be an evitable overhead. After constructing all clique candidates,
we map each valid candidate set to an appropriate Clique instance similar to our solution for Task 2.


5     Evaluation and Benchmarks
In our evaluation, we particularly focus on the relationship between code conciseness and runtime per-
formance for QVTo. Table 1 summarizes the measured runtime for the transformations from invocation
to termination, as well as the transformation size in terms of the underlying source lines of code (SLOC).
The performance testing was carried out on a quad-core 2,2 GHz machine with 8 GB of main memory
running the 3.4.0 release version of Eclipse QVTo. Our measurements are based on the parameter values
N ∈ {50, 100, 150, 200, 250, 300, 350, 400} for the size of the synthetic test data. For each N, Table 1 also
shows the resulting number of model elements generated in Task 1. Our measurements are based on the
size n = 3 for the cliques to be detected in Extension Task 2.

                                   Table 1: Evaluation of Conciseness and Performance
                                                                     Runtime
              N                50        100         150         200      250     300       350      400     SLOC
         # Elements          1000       2000        3000        4000     5000     6000     7000     8000
           Task 1             2.2s       0.1s        0.2s       0.2s     0.2s     0.3s     0.3s      0.4s     59
           Task 2             4.3s      15.2s       36.6s       63.3s   93.9s    134.1s   179.2s   234.7s     24
           Task 3             0.3s       0.1s        0.2s       0.3s     0.4s     0.6s     0.7s      0.9s     8
         Ext. Task 1          0.3s       0.1s        0.2s       0.3s     0.4s     0.6s     0.7s      1.0s     28
         Ext. Task 2         14.1s      53.6s      122.6s      225.6s 345.5s     487.3s   654.3s   1371.9s    47
         Ext. Task 3          0.2s       0.1s        0.3s       0.5s     0.8s     0.9s     1.2s      3.5s     8
         Ext. Task 4          0.2s       0.2s        0.3s       0.5s     0.8s     1.0s     1.2s      3.4s     37
C. Gerking & C. Heinzemann                                                                                 5


    Table 1 indicates a superlinear increase for the runtime of complex challenges such as Task 2 or
Extension Task 2. Consequently, QVTo is not able to provide an acceptable transformation runtime for
realistic models based on the IMDb movie database. Thus, the conciseness enabled by QVTo (reflected
by the small number of source code lines) is obviously out of proportion to the measured runtime.
    The detected performance limitations are traceable to QVTo’s missing native support for the con-
struction of powersets (which is required to generate all candidate sets for couples or cliques). In con-
trast to QVTo as an imperative language, declarative approaches might achieve considerable runtime
improvements by obtaining all possible subsets using nondeterministic matching techniques. Further-
more, QVTo as a dedicated model transformation language does not provide a broader scope of actions
when it comes to performance tweaks. In contrast, using general-purpose languages (such as Java) gives
rise to specific implementational variations that could drastically improve the performance.
    Nevertheless, focusing on the small number of source code lines illustrated in Table 1, our evaluation
shows that QVTo enables a concise specification of the transformations for all tasks. Thus, despite the
detected performance drawbacks, we regard our major design goal as reached.


6    Conclusions
This paper presents a solution to the movie database case of the Transformation Tool Contest 2014 based
on the Eclipse implementation of QVTo. Our results show that QVTo enables for a concise specification
of transformations. However, QVTo is only able to handle synthetic test models with a few thousand
objects in a reasonable amount of time but not realistic models based on the IMDb movie database.
     Our benchmark results indicate two promising directions for future works. First, realizing parts of a
QVTo transformation inside a Java blackbox [4] is an option to integrate more efficient implementations.
We excluded blackboxes in order to keep the focus on plain model-to-model transformations. Second,
the QVT/OCL specifications [4, 5] could be extended by missing operations such as computing a power
set for Extension Task 2. A promising approach in that direction is the implementation of an extensible
standard library for OCL [7]. However, such library extensions are only useful if the additional operations
are equipped with an efficient implementation or further improve the code conciseness.


References
[1] Christopher Gerking (2013): Transparent UPPAAL-based Verifcation of MechatronicUML Models. Master’s
    thesis, University of Paderborn.
[2] Christian Heinzemann & Steffen Becker (2013): Executing reconfigurations in hierarchical component archi-
    tectures. In Philippe Kruchten, Dimitra Giannakopoulou & Massimo Tivoli, editors: CBSE’13, Proceedings
    of the 16th ACM SIGSOFT Symposium on Component Based Software Engineering, ACM, pp. 3–12.
[3] Tassilo Horn, Christian Krause & Matthias Tichy (2014): The TTC 2014 Movie Database Case.
[4] Object Management Group (2011): Meta Object Facility (MOF) 2.0 Query/View/Transformation. Available
    at http://www.omg.org/spec/QVT/1.1/. Document formal/2011-01-01.
[5] Object Management Group (2012): Object Constraint Language (OCL) 2.3.1. Available at http://www.
    omg.org/spec/OCL/2.3.1/. Document formal/2012-01-01.
[6] David Steinberg, Frank Budinsky, Marcelo Paternostro & Ed Merks (2008): EMF: Eclipse Modeling Frame-
    work, 2nd edition. The Eclipse Series, Addison-Wesley.
[7] Edward D. Willink (2011): Modeling the OCL Standard Library. Electronic Communications of the EASST
    44. Available at http://journal.ub.tu-berlin.de/eceasst/article/view/663.