=Paper=
{{Paper
|id=None
|storemode=property
|title=PrefWork - A Framework for the User Preference Learning Methods Testing
|pdfUrl=https://ceur-ws.org/Vol-584/paper2.pdf
|volume=Vol-584
|dblpUrl=https://dblp.org/rec/conf/itat/Eckhardt09
}}
==PrefWork - A Framework for the User Preference Learning Methods Testing==
PrefWork - a framework for the user preference
learning methods testing?
Alan Eckhardt1,2
1
Department of Software Engineering, Charles University,
2
Institute of Computer Science, Czech Academy of Science,
Prague, Czech Republic
eckhardt@ksi.mff.cuni.cz
Abstract. PrefWork is a framework for testing of meth- - it can be used for any given task, but it has to be
ods of induction of user preferences. PrefWork is thor- customised, the developer has to choose from a very
oughly described in this paper. A reader willing to use Pref- wide range of possibilities. For our case, Weka is too
Work finds here all necessary information - sample code, strong.
configuration files and results of the testing are presented
RapidMiner [2] has a nice user interface and is in
in the paper. Related approaches for data mining testing
a way similar to Weka. It is also written in Java and
are compared to our approach. There is no software avail-
able specially for testing of methods for preference learning has source codes available. However the ease of use
to our best knowledge. is not better than that of Weka. The user interface
is nicer than in Weka but the layout of Weka is more
intuitive (allowing to connect various components that
1 Introduction are represented on a plane).
R [3] is a statistical software that is based on its
User preference learning is a task that allows many dif- own programming language. This is the biggest incon-
ferent approaches. There are some specific issues that venience - a user willing to use R has to learn yet
differentiate this task from a usual task of data min- another programming language.
ing. User preferences are different from measurements There are also commercial tools as SAS miner [4],
of a physical phenomenon or a demographic informa- SPSS Clementine [5], etc. We do not consider these,
tion about a country; they are much more focused on because of the need to buy a (very expensive) licence.
the objects of interest and involve psychology or econ- We must also mention the work of T. Horváth -
omy. Winston [6], which was developed recently. Winston
When we want to choose the right method for user may suit our needs, because it is light-weighted, has
preference learning, e.g. for an e-shop, the best way also a nice user interface, but in the current stage there
is to evaluate all possible methods and to choose the are few methods and no support for the method test-
best one. The problems with the testing of methods ing. It is more a tool for the data mining lecturing
for preference learning are: than the real world method testing.
– how to evaluate these methods automatically, We are working with ratings the user has associ-
– how to cope with different sources of data, with ated to some items. This use-case is well-known and
different types of attributes, used across the internet. An inspiration for extend-
– how to measure the suitability of a method, ing our framework is many other approaches to user
– to personalise the recommendation for every user preference elicitation. An alternative to ratings has
individually. been proposed in [7, 8] - instead of ratings, the sys-
tem requires direct feedback from the user about the
attribute values. The user has to specify in which val-
2 Related work ues the given recommendation can be improved. This
The most popular tool related to PrefWork is the open approach is called critique based recommendations.
source projec t Weka [1]. Weka is in development for Among other approaches, we should mention also
many years and has achieved to become the most wide- work of Kiessling [9], which uses the user behaviour as
ly used tool for data mining. It offers many classifica- the source for the preference learning.
tors, regression methods, clustering, data preprocess- We also need some implementations of algorithms
ing, etc. However this variability is also its weakness of the user preference learning that are publicly avail-
?
The work on this paper was supported by Czech able for being able to compare various methods among
projects MSM 0021620838, 1ET 100300517 and GACR themselves. This is a strength of PrefWork - any
201/09/H057. existing method, which works with ratings, can be
8 Alan Eckhardt
integrated into PrefWork using a special adaptor Aggregation function may have different forms; one
for each tool (see Section 4.3). There is a little bit of the most common is a weighted average, as in the
old implementation of collaborative filtering Cofi [10] following formula:
and a brand new one (released 7.4.2009) Mahout [11],
developed by Apache Lucene project. Cofi uses Taste @(o) =(2 ∗ fP rice (o) + 1 ∗ fDisplay (o) + 3 ∗ fHDD (o)+
framework [12], which became a part of Mahout. The 1 ∗ fRAM (o))/7 ,
expectations are that Taste in Mahout would perform
better than Cofi, so we will try to migrate our where fA is the fuzzy set for the normalisation of at-
PrefWork adaptor for Cofi to Mahout. Finally there is tribute A.
IGAP [13] - a tool for learning of fuzzy logic programs Another totally different approach was proposed
in form of rules, which correspond to user preferences. in [15]. It uses the training dataset as partitioning of
Unfortunately, IGAP is not yet available publicly for normalised space X 0 . For example, if we have an object
download. with normalised values [0.4, 0.2, 0.5] with rating 3, any
We did not find any other mining algorithm spe- object with better attribute values (e.g. [0.5, 0.4, 0.7])
cialised on user preferences available for free down- is supposed to have the rating at least 3. In this way,
load, but we often use already mentioned Weka. It we can find the highest lower bound on any object with
is a powerful tool that can be more or less easily in- unknown rating. In [15] was also proposed a method
tegrated into our framework and provide a reasonable for interpolation of ratings between the objects with
comparison of a non-specialised data mining algorithm known ratings and even using the ideal (non-existent)
to other methods that are specialised for preference virtual object with normalised values [1, ..., 1] with rat-
learning. ing 6.
3 User model 4 PrefWork
For making this article self-contained, we describe in Our tool PrefWork was initially developed as a master
brief our user model, as in [14]. In this section, we de- thesis of Tomáš Dvořák [16], who has implemented it
scribe our user model. This model is based on a scoring in Python. In this initial implementation, only Id3 de-
function that assigns the score to every object. User cision trees and collaborative filtering was implemen-
rating of an object is a fuzzy subset of X(set ted. For better ease of use and also for the possibility
of all objects), i.e. a function R(o) : X → [0, 1], where of integrating other methods, PrefWork was later re-
0 means the least preferred and 1 means the most pre- written to Java by the author. Many more possibilities
ferred object. Our scoring function is divided into two were added until the today state. In the following sec-
steps. tions, components of PrefWork are described.
Most of the components can be configured by XML
Local preferences In the first step, which we call lo- configurations. Samples of these configurations and
cal preferences, all attribute values of object o are nor- Java interfaces will be provided for each component.
malised using fuzzy sets fi : DAi → [0, 1]. These fuzzy We omit methods for configuration from Java inter-
sets are also called objectives or preferences over at- faces such as configTest(configuration,section)
tributes. With this transformation, the original space which is configured using a configuration from a sec-
YN tion in an XML file. Also data types of function argu-
of objects’ attributes X = DAi is transformed into ments are omitted for brevity.
i=1
X 0 = [0, 1]N . Moreover, we know that the object o ∈ X 0
with transformed attribute values equal to [1, . . . , 1] is 4.1 The workflow
the most preferred object. It probably does not
In this section a sample of workflow with PrefWork is
exist in the real world, though. On the other side,
described.
the object with values [0, . . . , 0] is the least preferred,
The structure of PrefWork is in Figure 1. There
which is more probable to be found in reality.
are four different configuration files - one for database
access configuration (confDbs), one for datasources
Global preferences In the second step, called global (confDatasources), one for methods (confMethods)
preferences, the normalised attribute values are aggre- and finally one for PrefWork runs (confRuns). A run
gated into the overall score of the object using an ag- consists of three components - a set of methods, a set
gregation function @ : [0, 1]N → [0, 1]. Aggregation of datasets and a set of ways to test the method. Every
function is also often called utility function. method is tested on every dataset using every way to
PrefWork - a framework for the user . . . 9
How to divide
data to
Test training and testing Datasource Data
sets
Results of method testing Train data/Test data
Predicted rating Database/
CSV
Results Inductive
Interpreter Method
Results confDbs
confDatasources
confMethods
CSV File confRuns
Fig. 1. PrefWork structure.
test. For each case, results of the testing are written tains a random number associated to each rating. Its
into a csv file. purpose will described later.
A typical situation a researcher working with Every datasource has to implement the following
PrefWork finds himself in is: “I have a new idea X. I am methods:
really interested, how it performs on that dataset Y.”
The first thing is to create corresponding Java interface BasicDataSource{
class X that implements interface InductiveMethod boolean hasNextRecord();
(see 4.3) and add a section X to confMethods.xml. void setFixedUserId(value);
Then copy an existing entry defining a run (e.g. IFSA, List