=Paper= {{Paper |id=None |storemode=property |title=PrefWork - A Framework for the User Preference Learning Methods Testing |pdfUrl=https://ceur-ws.org/Vol-584/paper2.pdf |volume=Vol-584 |dblpUrl=https://dblp.org/rec/conf/itat/Eckhardt09 }} ==PrefWork - A Framework for the User Preference Learning Methods Testing== https://ceur-ws.org/Vol-584/paper2.pdf
                  PrefWork - a framework for the user preference
                            learning methods testing?

                                                     Alan Eckhardt1,2
                               1
                                    Department of Software Engineering, Charles University,
                              2
                                   Institute of Computer Science, Czech Academy of Science,
                                                   Prague, Czech Republic
                                                 eckhardt@ksi.mff.cuni.cz

Abstract. PrefWork is a framework for testing of meth-          - it can be used for any given task, but it has to be
ods of induction of user preferences. PrefWork is thor-         customised, the developer has to choose from a very
oughly described in this paper. A reader willing to use Pref-   wide range of possibilities. For our case, Weka is too
Work finds here all necessary information - sample code,        strong.
configuration files and results of the testing are presented
                                                                    RapidMiner [2] has a nice user interface and is in
in the paper. Related approaches for data mining testing
                                                                a way similar to Weka. It is also written in Java and
are compared to our approach. There is no software avail-
able specially for testing of methods for preference learning   has source codes available. However the ease of use
to our best knowledge.                                          is not better than that of Weka. The user interface
                                                                is nicer than in Weka but the layout of Weka is more
                                                                intuitive (allowing to connect various components that
1    Introduction                                               are represented on a plane).
                                                                    R [3] is a statistical software that is based on its
User preference learning is a task that allows many dif-        own programming language. This is the biggest incon-
ferent approaches. There are some specific issues that          venience - a user willing to use R has to learn yet
differentiate this task from a usual task of data min-          another programming language.
ing. User preferences are different from measurements               There are also commercial tools as SAS miner [4],
of a physical phenomenon or a demographic informa-              SPSS Clementine [5], etc. We do not consider these,
tion about a country; they are much more focused on             because of the need to buy a (very expensive) licence.
the objects of interest and involve psychology or econ-             We must also mention the work of T. Horváth -
omy.                                                            Winston [6], which was developed recently. Winston
    When we want to choose the right method for user            may suit our needs, because it is light-weighted, has
preference learning, e.g. for an e-shop, the best way           also a nice user interface, but in the current stage there
is to evaluate all possible methods and to choose the           are few methods and no support for the method test-
best one. The problems with the testing of methods              ing. It is more a tool for the data mining lecturing
for preference learning are:                                    than the real world method testing.
 – how to evaluate these methods automatically,                     We are working with ratings the user has associ-
 – how to cope with different sources of data, with             ated to some items. This use-case is well-known and
   different types of attributes,                               used across the internet. An inspiration for extend-
 – how to measure the suitability of a method,                  ing our framework is many other approaches to user
 – to personalise the recommendation for every user             preference elicitation. An alternative to ratings has
   individually.                                                been proposed in [7, 8] - instead of ratings, the sys-
                                                                tem requires direct feedback from the user about the
                                                                attribute values. The user has to specify in which val-
2    Related work                                               ues the given recommendation can be improved. This
The most popular tool related to PrefWork is the open           approach is called critique based recommendations.
source projec t Weka [1]. Weka is in development for                Among other approaches, we should mention also
many years and has achieved to become the most wide-            work of Kiessling [9], which uses the user behaviour as
ly used tool for data mining. It offers many classifica-        the source for the preference learning.
tors, regression methods, clustering, data preprocess-       We also need some implementations of algorithms
ing, etc. However this variability is also its weakness of the user preference learning that are publicly avail-
 ?
   The work on this paper was supported by Czech able for being able to compare various methods among
   projects MSM 0021620838, 1ET 100300517 and GACR themselves. This is a strength of PrefWork - any
   201/09/H057.                                          existing method, which works with ratings, can be
8       Alan Eckhardt

integrated into PrefWork using a special adaptor                Aggregation function may have different forms; one
for each tool (see Section 4.3). There is a little bit      of the most common is a weighted average, as in the
old implementation of collaborative filtering Cofi [10]     following formula:
and a brand new one (released 7.4.2009) Mahout [11],
developed by Apache Lucene project. Cofi uses Taste         @(o) =(2 ∗ fP rice (o) + 1 ∗ fDisplay (o) + 3 ∗ fHDD (o)+
framework [12], which became a part of Mahout. The                1 ∗ fRAM (o))/7 ,
expectations are that Taste in Mahout would perform
better than Cofi, so we will try to migrate our             where fA is the fuzzy set for the normalisation of at-
PrefWork adaptor for Cofi to Mahout. Finally there is       tribute A.
IGAP [13] - a tool for learning of fuzzy logic programs         Another totally different approach was proposed
in form of rules, which correspond to user preferences.     in [15]. It uses the training dataset as partitioning of
Unfortunately, IGAP is not yet available publicly for       normalised space X 0 . For example, if we have an object
download.                                                   with normalised values [0.4, 0.2, 0.5] with rating 3, any
    We did not find any other mining algorithm spe-         object with better attribute values (e.g. [0.5, 0.4, 0.7])
cialised on user preferences available for free down-       is supposed to have the rating at least 3. In this way,
load, but we often use already mentioned Weka. It           we can find the highest lower bound on any object with
is a powerful tool that can be more or less easily in-      unknown rating. In [15] was also proposed a method
tegrated into our framework and provide a reasonable        for interpolation of ratings between the objects with
comparison of a non-specialised data mining algorithm       known ratings and even using the ideal (non-existent)
to other methods that are specialised for preference        virtual object with normalised values [1, ..., 1] with rat-
learning.                                                   ing 6.


3    User model                                             4    PrefWork
For making this article self-contained, we describe in     Our tool PrefWork was initially developed as a master
brief our user model, as in [14]. In this section, we de-  thesis of Tomáš Dvořák [16], who has implemented it
scribe our user model. This model is based on a scoring    in Python. In this initial implementation, only Id3 de-
function that assigns the score to every object. User      cision trees and collaborative filtering was implemen-
rating of an object is a fuzzy subset of X(set             ted. For better ease of use and also for the possibility
of all objects), i.e. a function R(o) : X → [0, 1], where  of integrating other methods, PrefWork was later re-
0 means the least preferred and 1 means the most pre-      written to Java by the author. Many more possibilities
ferred object. Our scoring function is divided into two    were added until the today state. In the following sec-
steps.                                                     tions, components of PrefWork are described.
                                                               Most of the components can be configured by XML
Local preferences In the first step, which we call lo-     configurations.  Samples of these configurations and
cal preferences, all attribute values of object o are nor- Java  interfaces will be provided for each component.
malised using fuzzy sets fi : DAi → [0, 1]. These fuzzy    We   omit  methods   for configuration from Java inter-
sets are also called objectives or preferences over at-    faces such  as configTest(configuration,section)
tributes. With this transformation, the original space which is configured using a configuration from a sec-
                             YN                            tion in an XML file. Also data types of function argu-
of objects’ attributes X =       DAi is transformed into ments are omitted for brevity.
                            i=1
X 0 = [0, 1]N . Moreover, we know that the object o ∈ X 0
with transformed attribute values equal to [1, . . . , 1] is 4.1 The workflow
the most preferred object. It probably does not
                                                              In this section a sample of workflow with PrefWork is
exist in the real world, though. On the other side,
                                                              described.
the object with values [0, . . . , 0] is the least preferred,
                                                                  The structure of PrefWork is in Figure 1. There
which is more probable to be found in reality.
                                                              are four different configuration files - one for database
                                                              access configuration (confDbs), one for datasources
Global preferences In the second step, called global (confDatasources), one for methods (confMethods)
preferences, the normalised attribute values are aggre- and finally one for PrefWork runs (confRuns). A run
gated into the overall score of the object using an ag- consists of three components - a set of methods, a set
gregation function @ : [0, 1]N → [0, 1]. Aggregation of datasets and a set of ways to test the method. Every
function is also often called utility function.               method is tested on every dataset using every way to
                                                                       PrefWork - a framework for the user . . .    9




                                          How to divide
                                              data to
                     Test              training and testing         Datasource          Data
                                               sets
           Results of method testing                             Train data/Test data
                                            Predicted rating                                   Database/
                                                                                                 CSV
                   Results                                           Inductive
                 Interpreter                                          Method
                    Results                confDbs
                                       confDatasources
                                         confMethods
                  CSV File                confRuns


                                             Fig. 1. PrefWork structure.



test. For each case, results of the testing are written        tains a random number associated to each rating. Its
into a csv file.                                               purpose will described later.
    A typical situation a researcher working with                  Every datasource has to implement the following
PrefWork finds himself in is: “I have a new idea X. I am       methods:
really interested, how it performs on that dataset Y.”
    The first thing is to create corresponding Java            interface BasicDataSource{
class X that implements interface InductiveMethod                boolean hasNextRecord();
(see 4.3) and add a section X to confMethods.xml.                void setFixedUserId(value);
Then copy an existing entry defining a run (e.g. IFSA,           List getRecord();
see 4.5) and add method X to section methods. Run                Attribute[] getAttributes();
ConfigurationParser and correct all errors in the new            Integer getUserId();
class (and there will be some, for sure). After the run          void setLimit(from, to,
has finished correctly, process the csv file with results              recordsFromRange);
to see how X performed in comparison with other                  void restart();
methods.                                                         void restartUserId();
    A similar case is when introducing a new dataset           }
into PrefWork - confDatasets.xml and confDBs.xml
have to be edited if the data are in SQL database              There are two main attributes of datasource - a list
or in a csv file. Otherwise a new Java class (see 4.2)         of all users and a list of ratings of the current user.
able to handle the new type of data has to be created.         getUserId returns the id of the current user. The
For example, we still have not implemented the class           most important function is getRecord, which returns
for handling of arff files - these files have the defini-      a vector containing the rating of the object and its
tion of attributes in themselves, so the configuration         attributes. Following calls of getRecords return all
in confDatasets.xml would be much more simple (see             objects rated by the current user. A typical sequence
Section 4.2 for an example of a configuration of a data-       is:
source with its attributes).                               int userId = data.getUserId();
                                                           data.setFixedUserId(userId);
4.2 Datasource                                             data.restart();
                                                           while(data.hasNextRecord()){
Datasource is, as the name hints, the source of data         List record =
for inductive methods. Currently, we are working only                        data.getRecord();
with ratings of objects. Data are vectors, where the
first three attributes typically are: the user id, the ob-   // Work with the record
ject id and the rating of the object. The attributes of      ...
the object follow. There is a special column that con- }
10      Alan Eckhardt

Another important function is setLimit, which limits            Let us also note the select for obtaining the user
the data using given boundaries from and to. The            ids (section usersSelect) and the name of the column
random number associated to each vector returned            that contains the random number used in setLimit
by getRecord has to fit into this interval. If              (randomColumn).
recordsFromRange is false, then the random number
should be outside of the given interval on the contrary.
                                                            Other types of user preferences. PrefWork as it is
This method is used when dividing the data into train-
                                                            now supports only ratings of objects. There are many
ing and testing sets. For example, let us divide the data
                                                            more types of data containing user preferences - user
to 80% training set and 20% testing set. First, we call
                                                            clickstream, user profile, filtering of the result set etc.
setLimit(0.0,0.8,true) and let the method train
                                                                PrefWork does not work with any information
on these data. Then, setLimit( 0.0,0.8,false) is
                                                            about the user, either demographic like age, sex, place
executed and vectors returned by the datasource are
                                                            of birth, occupation etc. or his behaviour. These types
used for the testing of the method.
                                                            of information may bring a large improvement in the
    Let us show a sample configuration of a datasource
                                                            prediction accuracy, but they are typically not present
that returns data about notebooks:
                                                            - users do not want to share any personal information
                                             for the sole purpose of a better recommendation.
                                                Another issue is the complexity of user information;
   userid                           a semantic processing would have to be used.
     numerical
                                                4.3   Inductive method
   notebookid
     numerical                                 InductiveMethod is the most important interface - it is
                                                what we want to evaluate. Inductive method has two
   rating                           main methods:
     numerical
                                                            interface InductiveMethod {
    
                                                              int buildModel(trainingDataset,
   price
                                                                    userId);
     numerical
                                                              Double classifyRecord(record,
   
                                                                    targetAttribute);
   producer
                                                            }
     nominal
                                                buildModel uses the training dataset and the userId
   ram                              for the construction of a user preference model. After
     numerical                                 having it constructed, the method is tested - it is be-
                                                ing given records via method classifyRecord and is
   hdd                              supposed to evaluate them.
     numerical                                     Various inductive methods were implemented.
                                                Among the most interesting are our method Statisti-
                                               cal ([18, 15] ) and Instances ([15]), WekaBridge that
                                              allows to use any method from Weka (such as Sup-
    note_ifsa                                               port vector machine) and ILPBridge that transforms
                                             data to a prolog program and then uses Progol [19] to
                                              create the user model. CofiBridge allows to use Cofi
    randomize                                               as a PrefWork InductiveMethod.
                                                 A sample configuration of method Statistical is:
  userid
                                          
  select distinct userid from note_ifsa                  Statistical
                                           
                                          WeightAverage
                                                          VARIANCE
First, a set of attributes is defined. Every attribute   
has a name and a type - numerical, nominal or list.      
An example of list attribute is actors in a film. This    AvgRepresentant
attribute can be found in the IMDb dataset [17].         
                                                                    PrefWork - a framework for the user . . .   11

                                        
       Linear                                                  
                                          
                                               Statistical
       RepresentantNormalizer                                     
                                                Standard2CPNormalizer
                                                  
       ListNormalizer                                           
                                               Statistical
                                                  
                                                                Mean
Every method requires a different configuration, only
                                                                SVM
the name of the class is obligatory. Note that the
                                                               
methods based on our two-step user model (Statis-
                                                               
tical and Instances for now) can be easily configured
                                                                
to test different heuristics for the processing of differ-
                                                                 MySQL
ent types of attributes. Configuration contains three
                                                                 NotebooksIFSA
sections: numericalNormalizer, nominalNormalizer
                                                                 
and listNormalizer for the specification of the
                                                                
method for the particular type of attribute. Also see
                                                               
Section 4.5 for an example of this configuration.
                                                               
                                                                
4.4   Ways of the testing of the method                          TestTrain
                                                                 0.05
Several possible ways for the testing of methods can
                                                                 resultsIFSA
be defined, the division to training and testing sets
                                                                 
is the most typically used. The method is trained on
                                                                   DataMiningStatistics
the training set (using buildModel) and then tested on
                                                                   
the testing set (using classifyRecord). Another typical
                                                                 
method is k-fold cross validation that divides data into
                                                                
k sets. In each of k runs, one set is used as the testing
                                                                
set and the rest as the training set.
                                                                 TestTrain
interface Test {                                                 0.1
  void test(method, trainDataSource,                             resultsIFSA
         testDataource);                                         
}                                                                  DataMiningStatistics
                                                                   
    When the method is tested, the results in the form           
userid, objectid, predictedRating, realUserRating               
have to be processed. The interpretation is done by            
a TestResultsInterpreter. The most common is                 
DataMiningStatistics, which computes such measures
as correlation, RMSE, weighted RMSE, MAE, Kendall
                                                              First, we have specified which methods are to be
rank tau coefficient, etc. Others are still waiting to be
                                                          tested - in our case it is two variants of Statistical,
implemented - ROC curves or precision-recall statis-
                                                          then Mean and SVM. Note that some attributes of
tics.
                                                          Statistical, which was defined in confMethods, can be
abstract class TestInterpreter {                          “overridden” here. The basic configuration of Statisti-
   abstract void writeTestResults(                        cal is in Section 4.3. Then the datasource for testing of
                              testResults);               the methods is specified – we are using MySql database
}                                                         with datasource NotebooksIFSA. Several datasources
                                                          or databases can be specified here. Finally, the ways
                                                          of the testing and interpretation are given in section
4.5 Configuration parser
                                                          tests. TestTrain requires ratio of the training and the
The main class is called ConfigurationParser. The de- testing sets, the path where the results are to be writ-
finition of one test follows:                             ten, and the interpretation of the test results.
12      Alan Eckhardt

  date;Ratio;dataset;method;userId;mae;rmse;weightedRmse;monotonicity;tau;weightedTau;correlation;buildTime;
                               testTime;countTrain;countTest;countUnableToPredict
                                                     28.4.2009
 12:18;0,05;NotebooksIFSA;Statistical,StandardNorm2CP;1;0,855;0,081;1,323;1,442;0,443;0,358;0,535;94;47;10;188;0;
                                                     28.4.2009
 12:18;0,05;NotebooksIFSA;Statistical,StandardNorm2CP;1;0,868;0,078;1,216;1,456;0,323;0,138;0,501;32;0;13;185;0;
                                                     28.4.2009
 12:18;0,05;NotebooksIFSA;Statistical,StandardNorm2CP;1;0,934;0,083;1,058;1,873;0,067;0,404;0,128;31;16;12;186;0;
   28.4.2009 12:31;0,025;NotebooksIFSA;Statistical,Peak;1;0,946;0,081;1,161;1,750;0,124;0,016;0,074;15;16;4;194;0
    28.4.2009 12:31;0,025;NotebooksIFSA;Statistical,Peak;1;0,844;0,076;1,218;1,591;0,224;0,215;0,433;0;16;6;192;0
    28.4.2009 12:31;0,025;NotebooksIFSA;Statistical,Peak;1;1,426;0,123;1,407;1,886;0,024;0,208;-0,063;16;0;4;194;0


                                      Fig. 2. A sample of results in a csv file.


    The definitions of runs are in confRuns.xml in sec-
tion runs. The specification of the run to be executed                                         Average of Tau coefficient
is in section run of the same file.                                                  0,7

                                                                                                            weka,SVM
                                                                                     0,6                    Mean
4.6   Results of testing                                                                                    Statistical, Linear regression
                                                                                                            Statistical,2CP-regression
                                                                                     0,5                    weka,MultilayerPerceptron
In Figure 2 is a sample of the resulting csv file. In
                                                                   Tau coefficient


our example, there are three runs with method Sta-
                                                                                     0,4
tistical with normaliser StandardNorm2CP and three
runs with normaliser Peak. Runs were performed on                                    0,3
different settings of the training and the testing sets,
so the results are different even for the same method.                               0,2
    The results contain all necessary information re-
quired for generation of a graph or a table with the                                 0,1
results. Csv format was chosen for its simplicity and
wide acceptance, so any other possible software can                                   0
handle it. We are currently using Microsoft Excel and                                          2        5            10            15         20    40    75
                                                                                                                   Training set size
its Pivot table that allows aggregation of results by
different criteria. Among other possibilities is also the
already mentioned R [3].                                                                               Fig. 3. Tau coefficient.

   Example figures of the output of PrefWork are in
Figures 3 and 4. The lines represent different meth-
ods, X axis represents the size of the training set and                                        Average of Weighted RMSE
the Y axis the value of the error function. In Fig-
                                                                              1,55
ure 3 the error function is Kendall rank tau coefficient
(the higher it is the better) and in Figure 4 is RMSE
weighted by the original rating (the lower the better).                       1,35

The error function can be chosen, as is described in
                                                              Weighted RMSE




Section 4.4.
                                                                              1,15
   It is impossible to compare PrefWork to another
framework generally. A simple comparison to other
such systems is in Section 2. This can be done only                           0,95

qualitatively; there is no attribute of frameworks that
can be quantified. The user itself has to choose among                                                  weka,SVM
                                                                              0,75                      Mean
them the one that suits his needs the most.                                                             Statistical, Linear regression
                                                                                                        Statistical,2CP-regression
                                                                                                        weka,MultilayerPerceptron

                                                                              0,55
4.7   External dependencies                                                                2       5            10            15             20    40    75
                                                                                                                     Training set size
PrefWork is dependent on some external libraries. Two
of them are sources for inductive methods - Weka [1]                                               Fig. 4. Weighted RMSE.
and Cofi [10]. Cofi also requires taste.jar.
                                                                      PrefWork - a framework for the user . . .       13

    PrefWork requires following jars to function cor-            Eliassi-Rad, T., eds.: KDD’06: Proceedings of the 12th
rectly:                                                          ACM SIGKDD international conference on Knowledge
                                                                 discovery and data mining, New York, NY, USA, ACM
      Weka          weka.jar                                     (August 2006), 935–940.
      Cofi          cofi.jar                                  3. R-project. http://www.r-project.org/.
      Cofi          taste.jar                                 4. SAS enterprise miner. http://www.sas.com/.
      Logging       log4j.jar                                 5. SPSS Clementine. http://www.spss.com/software/
      CSV parsing opencsv-1.8.jar                                modeling/modeler/.
      Configuration commons-configuration-1.5.jar             6. Š. Pero, T. Horváth: Winston: A data mining assis-
      Configuration commons-lang-2.4.jar                         tant. In: To appear in proceedings of RDM 2009, 2009.
      MySql         mysql-connector-java-5.1.5-               7. P. Viappiani, B. Faltings: Implementing example-based
                    bin.jar                                      tools for preference-based search. In: ICWE’06: Pro-
      Oracle        ojdbc1410.2.0.3.jar                          ceedings of the 6th international conference on Web
                                                                 engineering, New York, NY, USA, ACM, 2006, 89–90.
        Tab. 1. Libraries required by PrefWork.               8. P. Viappiani, P. Pu, B. Faltings: Preference-based
                                                                 search with adaptive recommendations. AI Commun.
                                                                 21, 2-3, 2008, 155–175.
                                                              9. S. Holland, M. Ester, W. Kiessling: Preference min-
5     Conclusion                                                 ing: A novel approach on mining user preferences for
                                                                 personalized applications. In: Knowledge Discovery in
PrefWork has been presented in this paper with a thor-           Databases: PKDD 2003, Springer Berlin / Heidelberg,
ough explanation and description of every component.             2003, 204–216.
Interested reader should be now able to install Pref-        10. Cofi: A Java-Based Collaborative Filtering Library.
Work, run it, and implement a new inductive method               http://www.nongnu.org/cofi/.
or a new datasource.                                         11. Apache Mahout project. http://lucene.apache.
    The software can be downloaded at http://www.                org/mahout/.
ksi.mff.cuni.cz/∼eckhardt/PrefWork.zip                       12. Taste project. http://taste.sourceforge.net/old.
as an Eclipse project containing all java sources and all        html.
                                                             13. T. Horváth, P. Vojtáš: Induction of fuzzy and anno-
required libraries or can be downloaded as SVN check-
                                                                 tated logic programs. In Muggleton, S., Tamaddoni-
out at [20]. The SVN archive contains Java sources and
                                                                 Nezhad, A., Otero, R., eds.: ILP06 - Revised Selected
sample configuration files.                                      papers on Inductive Logic Programming. Number 4455
                                                                 in Lecture Notes In Computer Science, Springer Ver-
5.1   Future work                                                lag, 2007, 260–274.
                                                             14. A. Eckhardt: Various aspects of user preference
We plan to introduce time dimension to PrefWork.                 learning and recommender systems. In Richta, K.,
Netflix [21] datasets uses a timestamp for each rat-             Pokorný, J., Snášel, V., eds.: DATESO 2009. CEUR
ing. This will enable to study the evolution of the              Workshop Proceedings, Česká technika - nakladatel-
                                                                 stvı́ ČVUT, 2009, 56–67.
preferences in time, which is a challenging problem.
                                                             15. A. Eckhardt, P. Vojtáš: Considering data-mining tech-
However, the integration of the time dimension into
                                                                 niques in user preference learning. In: 2008 Interna-
PrefWork can be done in several ways and the right               tional Workshop on Web Information Retrieval Sup-
one is yet to be chosen.                                         port Systems, 2008, 33–36.
    Allowing other sources of data apart from the rat-       16. T. Dvořák: Induction of user preferences in seman-
ings is a major issue. The clickthrough data can be              tic web, in Czech. Master Thesis, Charles University,
collected without any effort of the user and can be sub-         Czech Republic, 2008.
stantially larger than the number of ratings. But its in-    17. The Internet Movie Database. http://www.imdb.
tegration                                           into         com/.
PrefWork would require a large reorganisation of ex-         18. A. Eckhardt: Inductive models of user preferences for
isting methods.                                                  semantic web. In Pokorný, J., Snášel, V., Richta, K.,
                                                                 eds.: DATESO 2007. Volume 235 of CEUR Workshop
                                                                 Proceedings., Matfyz Press, Praha, 2007, 108–119.
References                                                   19. S. Muggleton: Learning from positive data. 1997, 358–
                                                                 376
 1. I.H. Witten, E. Frank: Data Mining: Practical Ma-        20. PrefWork - a framework for testing methods for
    chine Learning Tools and Techniques, 2nd Edition.            user preference learning. http://code.google.com/p/
    Morgan Kaufmann, San Francisco (2005).                       prefwork/.
 2. I. Mierswa, M. Wurst, R. Klinkenberg, M. Scholz,         21. Netflix dataset, http://www.netflixprize.com.
    T. Euler: Yale: Rapid prototyping for complex data
    mining tasks. In Ungar, L., Craven, M., Gunopulos, D.,