<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Solving the TTC 2014 Movie Database Case with UML-RSDS</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>K. Lano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>S. Yassipour-Tehrani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept of Informatics, King's College London</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes a solution to the Movie Database case using UML-RSDS. The solution specification is declarative and logically clear, whilst the implementation (in Java) is of practical efficiency. UML-RSDS [1] is a hybrid MT language which uses UML notations to specify transformations: source and target metamodels of a transformation are defined as UML class diagrams, transformations are expressed as use cases, whose effect is specified by a sequence of postconditions written in OCL. This provides an expressiveness similar to other hybrid languages such as GrGen or ETL. The UML-RSDS tools automatically synthesise executable implementations of transformations from the UML specifications. For the case study specification, we define separate use cases for each task of the case study. Each use case defines a sub-transformation of the problem. Task 1: Create synthetic datasets We implement this task by a use case task1 which has parameter n : Integer and a single postcondition Integer.subrange(0,n-1)-&gt;forAll( x | Movie.createPositive(x) &amp; Movie.createNegative(x) ) where createPositive is a static operation of Movie which creates the 5 movies, 3 actors and 2 actresses of each positive case, and createNegative is a static operation of Movie which creates the 5 movies, 2 actors and 3 actresses of each negative case. createPositive is: createPositive(n : Integer) pre: n &gt;= 0 post: Movie-&gt;exists( m1 | m1.rating = 10*n &amp; Movie-&gt;exists( m2 | m2.rating = 10*n + 1 &amp; Movie-&gt;exists( m3 | m3.rating = 10*n + 2 &amp; Movie-&gt;exists( m4 | m4.rating = 10*n + 3 &amp; Movie-&gt;exists( m5 | m5.rating = 10*n + 4 &amp; Movie.createPositiveActors(n,m1,m2,m3,m4,m5) &amp; Movie.createPositiveActresses(n,m1,m2,m3,m4,m5) ) ) ) ) ) where createPositiveActors creates the actors a, b and c and links them to the movies as required, and likewise for createPositiveActresses. The definition of createNegative is similar. Task 2: Find couples We implement this task by a use case task2 which has a single postcondition: p : Person &amp; q : p.movies.persons &amp; p.name &lt; q.name &amp; comm = p.movies /\ q.movies &amp; comm.size &gt; 2 =&gt; Couple-&gt;exists( c | p : c.p1 &amp; q : c.p2 &amp; c.commonMovies = comm ) To appear in EPTCS.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>This constraint is implicitly 8-quantified over persons p and q. It creates a couple c for each distinct pair
p and q of persons whose set of common movies comm has size at least 3. =n denotes intersection, also
written as \. Only one couple is created for each pair because of the restriction that p1 always holds the
person with the lexicographically smallest name.</p>
      <p>
        The quantifier q : Person can be restricted to q : p:movies:persons because the conditions comm =
p:movies \ q:movies &amp; comm:size &gt; 2 imply that q 2 p:movies:persons (a case of the Restricting Input
Ranges transformation design pattern [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]).
      </p>
      <p>The implementation is a linear iteration through Person and its execution time should therefore be of
order Person:size C where C is the maximum size of movies:persons. However, efficient computation
of set intersections is needed for situations where the sets of common movies become large.
Task 3: Calculate average scores for couples This is implemented by a use case task3 with a single
postcondition operating on context Couple:</p>
      <p>avgRating = ( commonMovies-&gt;collect(rating)-&gt;sum() ) / commonMovies.size
This iterates over objects self of Couple, and sets the average rating of each couple equal to the average
of the rating of each of their common movies (if two or more movies have the same rating, these ratings
are all counted separately in the sum).</p>
      <p>Extension task 1: List best 15 couples The set of existing couples can be sorted in different orders
using the sortedBy operator. For example:</p>
      <p>Couple!sortedBy( avgRating)
is the sequence of couples in order of decreasing avgRating.</p>
      <p>However this would be very inefficient in this situation, where only the best 15 elements with respect
to a given measure are needed, out of possibly millions of elements.</p>
      <p>In UML-RSDS it is possible to extend the system library with new functions, which are provided
with an implementation by the developer. Here we need a version of sortedBy which takes a bound on
the number of elements to return: SortLib:sortByN(s; s!collect(e); n) returns the best n elements of s
according to e, sorted in ascending e-value order. Semantically it is the same as s!sortedBy(e):subrange(1; n).</p>
      <p>We define an external module SortLib with sortByN as a static operation, and provide (hand-written)
Java code for this operation, making use of the existing UML-RSDS merge sort algorithm. The use case
then has the postcondition:
bestcouples = SortLib.sortByN(Couple.allInstances,</p>
      <p>Couple-&gt;collect(-avgRating), 15) =&gt;</p>
      <p>bestcouples-&gt;forAll( c | c-&gt;display() )
A toString() : String operation is added to Couple which returns a display string consisting of the average
score, number of movies and persons of each couple. This string is printed to the console by c!display().</p>
      <p>An example of the output is:
Couple avgRating 9992.5, 4 movies (a9993; a9994)
Couple avgRating 9992.0, 3 movies (a9990; a9992)
Couple avgRating 9992.0, 3 movies (a9990; a9993)
Couple avgRating 9992.0, 3 movies (a9990; a9994)
Couple avgRating 9992.0, 3 movies (a9991; a9992)
Couple avgRating 9992.0, 3 movies (a9991; a9993)
Couple avgRating 9992.0, 3 movies (a9991; a9994)
Couple avgRating 9992.0, 3 movies (a9992; a9993)
Couple avgRating 9992.0, 3 movies (a9992; a9994)
Couple avgRating 9991.5, 4 movies (a9990; a9991)
Couple avgRating 9982.5, 4 movies (a9983; a9984)
Couple avgRating 9982.0, 3 movies (a9980; a9982)
Couple avgRating 9982.0, 3 movies (a9980; a9983)
Couple avgRating 9982.0, 3 movies (a9980; a9984)
Couple avgRating 9982.0, 3 movies (a9981; a9982)
for the test case with N = 1000.</p>
      <p>Similarly, couples can be displayed in decreasing order of the number of common movies:
bestcouples2 = SortLib.sortByN(Couple.allInstances,</p>
      <p>Couple-&gt;collect(-commonMovies.size), 15) =&gt;</p>
      <p>bestcouples2-&gt;forAll( c | c-&gt;display() )
Extension task 2: Generate cliques This use case assumes that task 2 has been completed. A use case
couples2cliques creates a 2-clique for each couple:</p>
      <p>Clique-&gt;exists( c | c.persons = p1 \/ p2 &amp; c.commonMovies = commonMovies )
This constraint has context Couple and is applied to each instance self of Couple.</p>
      <p>A use case nextcliques generates cliques of size n + 1 from those of size n:
persons@pre.size = n &amp; p : commonMovies@pre.persons &amp;
p.name &gt; persons@pre.name-&gt;max() &amp;
comm = p.movies /\ commonMovies@pre &amp; comm.size &gt; 2 =&gt;</p>
      <p>Clique-&gt;exists( c | c.persons = cl.persons@pre-&gt;including( p ) &amp;</p>
      <p>c.commonMovies = comm )
This iterates over Clique@pre, so that only pre-existing cliques are considered as input to the rule, not
cliques generated by the rule. The nextcliques implementation is therefore a linear iteration over Clique,
rather than a fixed-point iteration.</p>
      <p>Extension task 3: Calculate average score for cliques This task is implemented by a use case
exttask3 with a single postcondition operating on context Clique:</p>
      <p>avgRating = ( commonMovies-&gt;collect(rating)-&gt;sum() ) / commonMovies.size
Extension Task 4: List best 15 cliques As with extension task 1, this task can be achieved using a
specialised sorting operator that returns the best 15 cliques according to a valuation expression. Only
cliques of a given size n are of interest:
ncliques = Clique-&gt;select( persons.size = n ) &amp;
bestcliques = SortLib.sortByN(ncliques, ncliques-&gt;collect(-avgRating), 15) =&gt;
bestcliques-&gt;forAll( c | c-&gt;display() )</p>
      <p>Similarly for the display of cliques by the number of common movies:
ncliques2 = Clique-&gt;select( persons.size = n ) &amp;
bestcliques2 = SortLib.sortByN(ncliques2, ncliques2-&gt;collect(-commonMovies.size), 15) =&gt;
bestcliques2-&gt;forAll( c | c-&gt;display() )</p>
      <p>Results</p>
    </sec>
    <sec id="sec-2">
      <title>To run the use cases for couples from the command line, type</title>
      <p>java Controller couples N
where N is the synthetic data set required (1000, 2000, etc).</p>
      <p>Table 1 shows the execution times of the tasks on SHARE for the synthesised data sets, using an
unoptimised Java 4 implementation (in which sets are represented as Vectors).
where N is the synthetic data set required (1000, 2000, etc).</p>
      <p>Table 2 shows the execution time for extension task 2 for clique sizes from 3 to 5.
and likewise for in2.txt, in3.txt. Table 3 shows the results.</p>
      <p>Data set task2
in1.txt 1864ms
in2.txt 5816ms
in3.txt More than 120s</p>
      <p>To run the use cases for cliques for the IMDb files from the command line, type
java Controller mcliques in1.txt
This runs task2, couples2cliques, nextcliques (for parameter 2 to generate the cliques of size 3), extension
task 3 and extension task 4 (for cliques of size 3). Table 4 shows the results for clique generation.</p>
    </sec>
    <sec id="sec-3">
      <title>The implemented transformation may be obtained at:</title>
      <p>It has also been uploaded to the umlrsds TTC14 workspace on SHARE, in the Public=rsync
directory (remoteUbuntu12LTS TTC14 umlrsds new). The execution times in the SHARE environment are
slightly lower than those given above. A version using a pre-filter can also be executed, using FController
instead of Controller in the above commands. The filter can substantially reduce the size of the input
models by discarding people and movies which cannot contribute to the sets of couples or cliques. This
makes the computation of couples for the dataset in3.txt feasible (execution time 4296ms) although the
filter takes 45 seconds to execute. Similarly for clique calculation for in3.txt.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Lano</surname>
          </string-name>
          ,
          <article-title>The UML-RSDS manual, www</article-title>
          .dcs.kcl.ac.uk/staff/kcl/umlrsds.pdf,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Lano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kolahdouz-Rahimi</surname>
          </string-name>
          ,
          <article-title>Model transformation design patterns</article-title>
          ,
          <source>IEEE Transactions in Software Engineering</source>
          , vol.
          <volume>40</volume>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Horn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Krause</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tichy</surname>
          </string-name>
          ,
          <source>The TTC 2014 Movie Database Case</source>
          ,
          <string-name>
            <surname>TTC</surname>
          </string-name>
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>