Solving the TTC Movie Database Case with FunnyQT Tassilo Horn Institute for Software Technology, University Koblenz-Landau, Germany horn@uni-koblenz.de FunnyQT is a model querying and model transformation library for the functional Lisp-dialect Clo- jure providing a rich and efficient querying and transformation API. This paper describes the Fun- nyQT solution to the TTC 2014 Movie Database transformation case. All core tasks and all extension tasks have been solved. 1 Introduction This paper describes a solution of the TTC 2014 Movie Database Case [3]. All core and extension tasks have been solved. The solution project is available on Github1 , and it is set up for easy reproduction on the SHARE2 image. The solution is implemented using FunnyQT [2] which is a model querying and transformation li- brary for the functional Lisp dialect Clojure3 . Queries and transformations are plain Clojure programs using the features provided by the FunnyQT API. This API is structured into several task-specific sub- APIs/namespaces, e.g., there is a namespace funnyqt.in-place containing constructs for writing in-place transformations, a namespace funnyqt.model2model containing constructs for model-to-model transfor- mations, a namespace funnyqt.bidi containing constructs for bidirectional transformations, and so forth. As a Lisp, Clojure provides strong metaprogramming capabilities that are exploited by FunnyQT in order to define several embedded domain-specific languages (DSL, [1]) for different tasks. For example, the pattern matching constructs used in this solution is provided in terms of a task-oriented DSL. 2 Solution Description In this section, the transformation and query specification for all core and extension tasks are going to be explained. In the listings given in the following, all function calls are shown in a namespace-qualified form to make it explicit in which Clojure or FunnyQT namespace those functions are defined. Clojure allows to define short aliases for used namespaces in order to allow qualification while still being concise, e.g., (emf/eget o :prop) where emf is an alias for the namespace funnyqt.emf and eget is the function name. All functions with namespace aliases emf, ip, poly, and u are FunnyQT functions, all others are either core Clojure or Clojure standard library functions, or functions defined in the transformation namespace itself. 1 https://github.com/tsdh/ttc14-movie-couples 2 http://is.ieis.tue.nl/staff/pvgorp/share/?page=ConfigureNewSession&vdi=Ubuntu12LTS_TTC14_ 64bit_FunnyQT4.vdi 3 http://clojure.org Submitted to: TTC 2014 2 Solving the TTC Movie Database Case with FunnyQT 2.1 Task 1: Generating Test Data The first task is generating test data. The case description [3] illustrates the task with Henshin rules. Since the rules actually don’t match anything but simply create new elements in the model, we have implemented them as plain functions receiving the model and an integer parameter i from which the movie ratings and actor names are derived. The function create-positive! creates 5 movies, 3 actors, and 2 actresses, sets their attributes, and links them as requested. 1 (defn ^:private create-positive! [model i] 2 (let [m1 (emf/ecreate! model ’Movie {:rating (+ 0.0 (* 10 i))}) 3 m2 (emf/ecreate! model ’Movie {:rating (+ 1.0 (* 10 i))}) 4 m3 (emf/ecreate! model ’Movie {:rating (+ 2.0 (* 10 i))}) 5 m4 (emf/ecreate! model ’Movie {:rating (+ 3.0 (* 10 i))}) 6 m5 (emf/ecreate! model ’Movie {:rating (+ 4.0 (* 10 i))})] 7 (emf/ecreate! model ’Actor {:name (str "a" (* 10 i)) :movies [m1 m2 m3 m4]}) 8 (emf/ecreate! model ’Actor {:name (str "a" (+ 1 (* 10 i))) :movies [m1 m2 m3 m4]}) 9 (emf/ecreate! model ’Actor {:name (str "a" (+ 2 (* 10 i))) :movies [m2 m3 m4]}) 10 (emf/ecreate! model ’Actress {:name (str "a" (+ 3 (* 10 i))) :movies [m2 m3 m4 m5]}) 11 (emf/ecreate! model ’Actress {:name (str "a" (+ 4 (* 10 i))) :movies [m2 m3 m4 m5]}))) The create-negative! function is defined similarly, so it is skipped here for brevity. 2.2 Task 2/3 and Extension Task 2/3: Finding Couples/Cliques and Compute Average Rankings For finding cliques of arbitrary sizes n ≥ 3, a higher-order transformation should be defined that generates a transformation rule for that n. The FunnyQT solution also allows for n = 2 and deals with the fact that in this case, Couple elements should be created rather than Clique elements. Also, the computation of the average rankings of a couple’s or clique’s common movies is done while creating the Couple or Clique element instead of doing it separately in a further step. Before discussing this higher-order transformation, a few helper functions are going to be intro- duced which will be used as constraints in the patterns of the generated rules. First, there’s a function movie-count that gets some person and returns the number of movies that this person has acted in. Sec- ondly, the person-count function returns the number of persons that acted in a given movie. Thirdly, movie-set gets a person element and returns the movies that person acted in as a set. And lastly, a function avg-rating gets a collection of movies and returns their average rating. The function n-common-movies? printed in the next listing gets an integer n, a person element p, and a sequence of more person elements 4 . loop and recur implement a local tail-recursion. Initially, the set of common movies common is bound to the set of p’s movies, and the remaining persons are bound to more shadowing the function’s parameter of the same name. If there are more persons, the recur in line 5 jumps back to the loop in line 2 where common is rebound to the intersection of common and the movies of the first person in more. Likewise, mode is rebound to the remainder of more. Thus, if all given persons act together in at least n movies, the set of common movies is returned. Otherwise, nil is returned. Since in Clojure the values nil and false are falsy while every other value is truthy, this function can act as a predicate and still return more information, i.e., the common movies, in the positive case. 1 (defn n-common-movies? [n p & more] 2 (loop [common (movie-set p), more more] 3 (when (>= (count common) n) 4 (if (seq more) 5 (recur (set/intersection common (movie-set (first more))) (rest more)) 6 common)))) 4 The Clojure varargs syntax & els is similar to Java’s Type... els syntax. T. Horn 3 The higher-order transformation generating a FunnyQT in-place transformation rule for a given n ≥ 2 is a Clojure macro. A macro is a function which is executed at compile-time by the Clojure compiler. It receives code passed to it as arguments, processes it, and returns new code that takes the place of it’s call. This new code is called the macro’s expansion. Because like all Lisps, Clojure is homoiconic, i.e., Clojure code is represented using Clojure datastructures (literals, symbols, lists, vectors), a macro is essentially a transformation on the abstract syntax tree of the Clojure code that’s passed to the macro. Listing 1 in the appendix on page 6 shows the define-groups-rule macro which is the higher-order transformation solving the tasks. It receives an parameter n and, as its name suggests, expands into a rule for finding couples if n equals 2 or cliques of size n for larger values of n. We’re not going to discuss the macro in details, however the central idea of the Clojure (or Lisp) macrosystem is that one defines the basic structure of the macro’s expansion using a quasi-quoted (back- ticked) form as a kind of template. In this quasi-quoted form, values computed at compile-time can be inserted using the unquote (~) and unquote-splicing (~@) operators to fill in the template’s variable parts. The last part of the implementation of the tasks 2 and 3 and the extension tasks 2 and 3 is to actually invoke the macro to create the transformation rules for couples and cliques of 3, 4, and 5 persons. 1 (define-group-rule 2) ;; make-groups-of-2!: The Couples rule 2 (define-group-rule 3) ;; make-groups-of-3!: The Cliques of Three rule 3 (define-group-rule 4) ;; make-groups-of-4!: The Cliques of Four rule 4 (define-group-rule 5) ;; make-groups-of-5!: The Cliques of Five rule Instead of discussing the rule generation macro in details, it makes more sense to have an in-depth look at one of its expansion like the one for n being 3 shown below. A FunnyQT in-place transformation rule is defined whose name is make-groups-of-3!, and it gets as arguments the model on which it should be applied, and an integer c which determines how many common movies a clique of three persons needs to have. The case description fixes c to 3, but with this parameter, we allow for a bit more generality. 1 (ip/defrule make-groups-of-3! 2 {:forall true} 3 [model c] 4 [m :when (>= (person-count m) 3) 5 m --> p0 :when (>= (movie-count p0) c) 6 m --> p1 :when (>= (movie-count p1) c) 7 :when (neg? (compare (emf/eget-raw p0 :name) (emf/eget-raw p1 :name))) 8 :when (n-common-movies? c p0 p1) 9 m --> p2 :when (>= (movie-count p2) c) 10 :when (neg? (compare (emf/eget-raw p1 :name) (emf/eget-raw p2 :name))) 11 :when-let [cms (n-common-movies? c p0 p1 p2)] 12 :as [cms p0 p1 p2] :distinct] 13 (emf/ecreate! model ’Clique {:avgRating (avg-rating cms), :persons [p0 p1 p2], :commonMovies cms})) Lines 4 to 12 define the rule’s pattern. The structural part defines that it matches a Movie element m which references three Person elements p0, p1, and p2 using its persons reference. Additionally, the pattern defines several constraints using the :when keyword. The movie m needs to have at least three acting persons (line 4), and all persons need to act in at least c movies (lines 5, 6, and 9). To avoid duplicate matches where only the order of the three person elements differs, the constraints in line 7 and 10 enforce a lexicographical order of the names of the persons p0, p1, and p2. Line 8 ensures that p0 and p1 have at least c common movies. The same for the complete clique of three persons is also asserted in line 11, where the common movies are also bound to the variable cms. The first constraint is there only for performance reasons. Clearly, if p0 and p1 already have less than c common movies, then p0, p1, and p2 cannot have more. This test ensures that the pattern matching process stops for the combination of p0 and p1 as soon as possible. The last line of the pattern, line 12, defines that each match should be represented as a vector con- taining the set of common movies cms and the three persons. The keyword :distinct specifies that only distinct matches should be found. The reason is that if some clique of three acts in x common movies, 4 Solving the TTC Movie Database Case with FunnyQT there are exactly x matches that differ only in the movie m. By omitting the movie from the match repre- sentation and specifying that we are only interested in distinct matches, those duplicates are suppressed. The last two lines define the action that should be applied on matches. A new Clique element is created that gets assigned the found persons with their common movies and average rating. What has been skipped from explanation until now is the rule’s :forall option. It specifies that calling the rule finds all matches at once and then applys the action to each of them. FunnyQT performs the pattern matching process in parallel for such :forall-rules. 2.3 Extension Task 1/4: Compute Top-15 Couples/Cliques The case description demands for the Extension Tasks 1 and 4 the computation of the top-15 groups according to the criteria (a) average rating of common movies, and (b) number of common movies. If there’s a tie between two groups for the current criterium, the respective other criterium is used to cut it. If that doesn’t suffice, i.e., both groups have the same average rating and number of common movies, the names of the group’s members are compared as a fallback. Since the person names are unique in the models, there is no chance that no distinction can be made. The implementation is simple in that the sequence of all couples (or cliques of a given size) are sorted using a comparator. Like in Java, a Clojure comparator is a function that receives two objects and returns a negative integer if the first object should be sorted before the second, a positive integer if the first object should be sorted after the second item, and zero if both objects are equal with respect to order. The comparators for the average rating, number of common movies, and the group’s member names are shown in the next listing. 1 (defn rating-comparator [a b] 2 (compare (emf/eget b :avgRating) (emf/eget a :avgRating))) 3 (defn common-movies-comparator [a b] 4 (compare (.size ^java.util.Collection (emf/eget-raw b :commonMovies)) 5 (.size ^java.util.Collection (emf/eget-raw a :commonMovies)))) 6 (defn names-comparator [a b] 7 (compare (str/join ";" (map #(emf/eget % :name) (actors a))) 8 (str/join ";" (map #(emf/eget % :name) (actors b))))) They get two objects a and b (two couples or cliques) and compare them using Clojure’s standard compare function which works for objects of any class implementing the java.lang.Comparable interface. Until now, there are only three individual comparators, but sorting is always done with one single comparator. So the following listing defines a higher-order comparator, e.g., a function that receives arbitrary many comparators and returns a new comparator which compares using the given ones. 1 (defn comparator-combinator [& comparators] 2 (fn [a b] 3 (loop [cs comparators] 4 (if (seq cs) 5 (let [r ((first cs) a b)] 6 (if (zero? r) (recur (rest cs)) r)) 7 (u/errorf "%s and %s are incomparable!" a b))))) The function comparator-combinator returns an anonymous function with two arguments a and b. This function recurses5 over the given comparators applying one after the other until one returns a non-zero result. So finally, here are the two top-15 groups functions. 1 (defn groups-by-avg-rating [groups] 2 (sort (comparator-combinator rating-comparator common-movies-comparator names-comparator) groups)) 3 (defn groups-by-common-movies [groups] 4 (sort (comparator-combinator common-movies-comparator rating-comparator names-comparator) groups)) 5 Clojure’s (loop [] ... (recur )) is a local tail-recursion. loop establishes bindings just like let, and recur jumps back to the loop providing new values for the variables. T. Horn 5 groups-by-avg-rating gets a collection of groups groups and then sorts them by the combined com- parator first taking the average rating into account, then the number of common movies, and eventually the names of the groups’ actors if neither of the two former comparators could decide on the two groups order. group-by-common-movies is defined analoguously with the common-movies-comparator taking prece- dence over the rating-comparator. 3 Evaluation and Conclusion With respect to correctness, the FunnyQT solution computes the same numbers of couples and cliques of various sizes as printed in the case description. Also, the top-15 lists are identical for all models. The table below shows the execution times for the IMDb models. They include the pattern matching time, the time needed for creating the Couple and Clique elements, and the time needed for setting their attributes and references including the computation of the average ratings. The benchmarks were run on a GNU/Linux machine with eight 2.8GHz cores with 30GB of RAM dedicated to the JVM process. Model Couples (secs) 3-Cliques (secs) imdb-0005000-49930.movies.bin 1.677278992 6.957570992 imdb-0010000-98168.movies.bin 1.702668160 11.333623024 imdb-0045000-299504.movies.bin 3.947932624 15.714349728 imdb-0085000-499995.movies.bin 9.012942560 26.388751712 imdb-0200000-1004463.movies.bin 35.409998560 117.833811360 imdb-0495000-2000900.movies.bin 159.973018224 757.280006768 imdb-all-3257145.movies.bin 619.160156640 4295.030516512 The main bottleneck of the FunnyQT transformation is the required memory. The generated rules for finding groups of a given size first compute all matches and then generate one new couple or clique element for each match. This means that the original model, all matches, and also all new elements reside in memory at the same time. Another strong point of the solution is its conciseness. All in all, it consists of 152 lines of code (including boilerplace code like namespace definitions) in three source files, one for the generation of the synthetic test models (30 LOC), one for the couple and cliques rules (52 LOC), and one for the queries (70 LOC, most of which are concerned with pretty-printing the results into files). References [1] Martin Fowler (2010): Domain-Specific Languages. Addison-Wesley Professional. [2] Tassilo Horn (2013): Model Querying with FunnyQT - (Extended Abstract). In Keith Duddy & Gerti Kappel, editors: ICMT, Lecture Notes in Computer Science 7909, Springer, pp. 56–57. [3] Christian Krause, Tassilo Horn & Matthias Tichy (2014): The TTC 2014 Movie Database Case. In: Transfor- mation Tool Contest 2014. 6 Solving the TTC Movie Database Case with FunnyQT 1 (defmacro define-group-rule [n] 2 (let [psyms (map #(symbol (str "p" %)) (range n))] 3 ‘(ip/defrule ~(symbol (str "make-groups-of-" n "!")) 4 {:forall true} 5 [~’model ~’c] 6 [~’m :when (>= (person-count ~’m) ~n) 7 ~@(mapcat (fn [i] 8 (let [ps (nth psyms i)] 9 ‘[~’m --> ~ps 10 :when (>= (movie-count ~ps) ~’c) 11 ~@(when-not (zero? i) 12 ‘[:when (neg? (compare (emf/eget-raw ~(nth psyms (dec i)) :name) 13 (emf/eget-raw ~ps :name)))]) 14 ~@(when-not (or (zero? i) (= i (dec n))) 15 ‘[:when (n-common-movies? ~’c ~@(take (inc i) psyms))])])) 16 (range n)) 17 :when-let [~’cms (n-common-movies? ~’c ~@psyms)] 18 :as [~’cms ~@psyms] 19 :distinct] 20 (emf/ecreate! ~’model ~(if (= n 2) ‘’Couple ‘’Clique) 21 ~(if (= n 2) 22 ‘{:commonMovies ~’cms :avgRating (avg-rating ~’cms) 23 :p1 ~(first psyms) :p2 ~(second psyms)} 24 ‘{:commonMovies ~’cms :avgRating (avg-rating ~’cms) 25 :persons [~@psyms]}))))) Listing 1: The higher-order transformation generating couple and cliques rules