=Paper=
{{Paper
|id=Vol-2203/146
|storemode=property
|title=Preference Learning by Matrix Factorization on Island Models
|pdfUrl=https://ceur-ws.org/Vol-2203/146.pdf
|volume=Vol-2203
|authors=Stepan Balcar
|dblpUrl=https://dblp.org/rec/conf/itat/Balcar18
}}
==Preference Learning by Matrix Factorization on Island Models==
S. Krajči (ed.): ITAT 2018 Proceedings, pp. 146–151
CEUR Workshop Proceedings Vol. 2203, ISSN 1613-0073, c 2018 Štepán Balcar
Preference learning by matrix factorization on island models
Štěpán Balcar
Charles University, Faculty of Mathematics and Physics, Czech Republic, Prague
Stepan.Balcar@mff.cuni.cz
Abstract: This paper presents island models, methods, im- on size of population and probability of mutation (other
plementation and experiments connecting stochastic opti- parameters are fixed).
mization methods and recommendation task (collaborative In this paper, parameters of our methods were chosen
one). Our models and methods are based on matrix fac- after experiments on sample data. We will try to improve
torization. Parallel run of methods optimizes the RMSE RMSE using parallelization. We will provide results on
metric from an island model point-of-view. both ML100k and ML1M data.
This paper comments on architecture and some imple-
mentation decisions. Main contribution of this paper is:
We dealt with two research hypotheses. First, whether
island models bring always improvement. We will show • Multiple island model brings statistically significant
that almost always yes. Second, whether evolutionary al- improvement of recommendation in comparison to
gorithm does or does not always find the best solution. single instance optimization.
This will be confirmed only on smaller data. Experiments • Our implementation is able to handle individuals
were provided on Movie Lens 100k and 1M data. (matrix factors). Size of our individuals is several or-
ders bigger than usual in evolutionary computation.
1 Introduction This paper is organized as follows: Chapter 2 "Related
work" concerns recommender systems and matrix factor-
This paper studies recommender systems, especially learn- ization; evolutionary algorithms and island model. Chap-
ing user/customer preferences. Main idea of this paper is ter 3 "Methods, models and parameters" sketches our view
to connect this study to evolutionary island models. Here, of stochastic optimization methods; parameters and set-
island models are a computational tool for matrix factor- tings of island model used in tests. Next section "Data"
ization. describes realization of experiments and design of tests
Instances of one stochastic optimization method run in and computing resources. Finally section 4 "Results" gives
parallel on island models, searching the same state space. both numeric and graphic presentation of our results and
Such a method traverses the state space of solutions. The discusses confirmation of our hypotheses. After conclu-
actual state they visit is influenced by (mutually indepen- sions paper ends with Future work.
dent) stochasticity. Hence, different instances can visit dif-
ferent parts of the state space. Island model parallelization
is based on cooperation of methods during the computa- 2 Related work
tion.
This resembles ensemble and/or hybrid learning, where This section gives basic notation for recommender sys-
different optimization methods are: tems, evolutionary computation used and overviews some
relevant literature.
• run in parallel
2.1 Recommender systems and matrix factorization
• allowed to run in different state spaces
It was the Netflix price (announced in October 2006)
The solution (usually one of them) is chosen at the end by which excited public and changed the landscape of rec-
an additional aggregation/decision method. ommender systems area research.
Using evolutionary computing for recommendation is The BellKor squad included Chris Volinsky and his
surveyed in the paper of Horvath [1]. This survey shows AT&T colleagues Robert Bell and Yehuda Koren, along
that usage of evolutionary methods is quite frequent in rec- with four other engineers from the United States, Austria,
ommendation systems (see also [2]). One of approaches is Canada and Israel (BellKor’s Pragmatic Chaos) on June
model based. Our approach uses matrix factorization and 26, 2009, finally crossed the 10% finish line. Main in-
hence the model is constituted by factors. In [1] they men- gredient of winning solution has been matrix factorization
tion only one paper [3] where evolution individuals are (MF), see e.g. [4].
matrix factors. Authors of the article [3] provide results We focus on latent model based recommendation -
on ML100k data on minimizing squared error depending which is a part of collaborative filtering technique (Co),
Preference learning by matrix factorization on island models 147
asked for in Netflix Price competition. Co was introduced of evolutionary algorithms in combination with the com-
by [4]. putation of latent factors.
Let us denote U set of user ids and I the set of item ids. Motivated by this, in our paper individuals are repre-
Input data can be viewed as partial function r : U ×I −→ sented by concatenation of U| and I| (like a worm d-thick
P, here P is the preference scale (e.g. one to five stars and |U | + |I | long)
or real numbers). Alternative representation is in the form
of algebraic matrix R ∈ P |U |×|I | with dimensions repre-
senting users and items respectively. Rating of user i ∈ U u11 ... u1|U | i11 ... i1|I |
of an item j ∈ I is content of corresponding cell of matrix .. .. .. .. .. ..
. . . . . .
ri j ∈ P. Usually this matrix is very sparse, so in practice
xd1 ... ud|U | id1 ... xd|I |
input data are represented in a form of a database table R
with schema R(uid = user id, iid = item id, rating). Al-
though implementation uses database table formal model Figure 1: Individual represented by latent vector of users
of [4] description is easier in the language of matrices. The and latent vector of items
matrix R is factorized to lower dimensional representation
In our experiments we use the RMSE as the fitness func-
R ≈ U × I| = R̂ (1) tion
where U ∈ P |U |×d , I ∈ P |I |×d are user and item la- v
u |U | |I |
tent factors matrices and × is a matrix product. d is much u ∑ ∑ e2
t i=1 j=1 i j
smaller than |U | and |I |. Approximation of R by R̂ is (3)
measured by |U | ∗ |I |
!
d
e2i j = (ri j − rˆi j )2 = ri j − ∑ uik i jk (2) 2.3 Island model
k=1
here rˆi j will serve as approximation of value ri j . Island models are one of approaches how parallelize opti-
Our main optimization method is SGD - stochastic gra- mization algorithms. Their characterization is that there is
dient descent method. For other approaches and dimen- no central master and populations are distributed. Island
sions of research in recommender systems we reffer to [5]. model consists of parallel algorithms enriched by send-
ing and accepting migrants. Migrants are individuals from
other islands population. The hope is that this propagation
2.2 Evolutionary algorithms of genetic material can speed up convergence and escape
Evolutionary algorithms became popular after 1970, from local optima trap.
thanks to John Holland [6] as a means for solving opti- Parameters of island model are frequency of sending-
mization problems. The solution is represented as an indi- receiving migrants and the way how the migrant is chosen
vidual, the set of individuals forms the population. Evo- from local population (remember that parameters of meth-
lutionary algorithm is formed by phases: initialization, ods are fixed and are chosen after experiments on a sample
selection (parent selection for next generation offspring), data). Optimal choice of these parameters depends on the
crossover, mutation and evaluation of the whole popula- type of optimization method and is subject of study. These
tion. The stochastic computing of individuals seeks con- aspects of island models are studied in [7].
vergence to the global optimum. Specific type of island models, heterogeneous, were
The contribution of evolutionary algorithms in the area used in the application of multi-criteria optimization by
of recommender systems was reviewed by Horvath de Car- Martin Pilát and Roman Neruda [8]. They designed a
valho in [1]. This review analyzed more than 65 research new algorithm of MOGASOLS (Multiobjective Genetic
papers using EC techniques for user recommendations. It Algorithm with Single Objective Local Search) combining
analyzed different approaches according to five aspects: (i) multi-criteria genetic algorithm (MOGA) with a single-
recommendation technique used, (ii) datasets employed in criteria genetic algorithm (SOGA). It is proved that par-
the experiments and (iii) evaluation methods used in the allelization of time-consuming method MOGA achieves
experiments, (iv) baselines used to compare with the pro- worse results than parallel running MOGA and SOGA
posed methods and (v) reproducibility of the research. methods. They were tested in NSGA-II [9] and IBEA [10]
Most of nature-inspired algorithms reviewed in [1] find algorithms. This was further developed in [11].
application in solving partial problems such as feature The first who come up with heterogeneous island mod-
weighting extraction, model based approach, clustering els (HeIM), the way we understand them, was Martin Pilat
and function learning. ([11]). In these models, the islands may carry diverse com-
Computing latent factor models by using of evolution- putational methods, differing not only in parameters but
ary algorithms has emerged as a possible approach [3]. also in the structure of the whole stochastic algorithm [11].
Available publications do not mention the parallelization For the choice of optimal methods for the given problem is
148 Štepán Balcar
responsible the central planner, which replaces unsuccess- Individual equals solution. Solution is represented by
ful method by more successful methods during the whole pair of latent factors of users and items (Figure 1). Mul-
computation. In publication [11] the original Balcar’s tool tiplying of these latent factor we get a matrix which rep-
[12] was used. resents our estimation of ratings. Length of first vector is
In this paper this tool is used with homogeneous islands the number of users and the length of second vector equals
to find optimal user and item latent factor. number of items. Size of the individual depends on size
Then different optimization techniques are used to con- of input problem. Width of latent vector is an optional pa-
verge with factor to optimal one. For illustration we give rameter.
formulas for stochastic gradient descent. Recall e2i j , then
next generation values of user and item factors equal to
3.2 Parameters and settings of island model used in
tests
∂ 2 ∂ 2
u´ik = uik + α e ik´ j = ik j + α e (4) Now we describe island model - the environment in which
∂ uik i j ∂ ik j i j
the above methods will be parallelized. It is the envi-
Note, that in our implementation we do not consider ronment which ensures dissemination of genetic material.
normalization factor, this is left for future. Nevertheless, The main tool for this is the migration. This migration
some normalization effect is obtained by migration and does not mean only exchange of best solutions. We would
synergy of islands. α is not fixed, just bounded from like to spread across the system (between islands) genetic
above. More on our system is in [12]. material which has the potential to contribute to further
improvement of population quality and to speed-up the
convergence. Migration is inspired by processes in nature
3 Methods, models and parameters ([15], [16]). Most of island models exchange migrants in a
synchronized way. In our system exchange of migrants is
Island models where originally developed for paralleliza- not synchronized - so in fact our islands are more general.
tion of evolutionary algorithms. In this paper we will use Key parameter of island models , as nature shows ([17]),
also other stochastic optimization methods. is the frequency of migration and size of local populations.
The bigger the frequency of migration the bigger is the
3.1 Stochastic optimization methods chance that islands will help each other. On other side each
hardware and software is limited by data through put.
Evolutionary algorithms try to balance trade off between Matrix factorization needs migration of much bigger in-
exploration and exploitation. This property should help dividuals than e.g. continuous optimization, where indi-
evolutionary algorithm to escape from local extremes and vidual is a point in an n-dimensional space. In this paper
simultaneously converge to optimal solution. For this abil- we had to change architecture of model used in [11]. In
ity they pay a higher time complexity because of parallel [11] input problems were TSP (traveling salesman prob-
development of bigger population. Because of this fact, lem of size cca. 1000 cities), BP (bin packing, one dimen-
on some types of problems, the winners are other stochas- sional with 1000 items), CO (continuous optimization, 10-
tic optimization methods. An example is TSP solution by dimensional function) and vertex cover (1000 vertexes).
hill climbing with 2-opt ([13]). So, this is the main rea- Solution of preference learning by matrix factorization on
son we consider several additional stochastic optimization island models is represented of much bigger individuals,
methods (see Table 1). rough estimation of our state space is > #TSP20 . Evolu-
All these methods use the stochastic gradient descent. tionary algorithms use incoming individuals in groups and
A general link to description of stochastic optimization after a while. Hence, it is advantageous to store them in a
methods is [14]. front. As far as individuals are big and memory is limited
Our implementation of stochastic optimization methods we had to limit the size of fronts (see Table 2).
was changed in two ways. First, we had to create operators
to enable them to work with latent factor based individu-
als. Second, methods were modified for parallel run in is- 3.3 Data
land models with migration. They were enriched with abil- We used data from the movie recommendation domain in
ity to cooperate. They receive individuals and enrich their the experiments. The effectiveness of parallelization has
local population. Please notice, that only evolution and been verified on datasets ml-100k1 and ml-1m2 . Datasets
differential evolution have bigger population. Remaining are formed by movie evaluation (1-5 stars), for trials we
methods have only one individual population. In this case use the trinity (user, movie, rating).
enrichment means replacement. They provide their solu- The training set consists of four-fifths from the dataset,
tion from local population to neighboring islands. Meth- the rest is the test set. Counting derivatives for SGD lever-
ods manipulate incoming individuals in concordance with
basic algorithm idea. E.g. the tabu search method inserts 1 http://grouplens.org/datasets/movielens/100k/
a newcomer to the tabu set. 2 https://grouplens.org/datasets/movielens/1m/
Preference learning by matrix factorization on island models 149
Algorithms Tool Parameters
HillClimbing SGD random rating from each line numberOfNeighbors=10
RandomSearch generation of latent vectors
Evolution uniform crossover + SGD popSize=10, mutationRate=0.9,
crossRate=0.1,
parentSelector=CompareTwoRandomSelectors
BruteForce SGD according to the I-th rating
TabuSearch SGD random rating from each line tabuModelSize=50
Sim.Annealing SGD random rating from each line temperature=10000,
coolingRate=0.002
Diff.Evolution differential crossover popSize=50, F=0.25
Table 1: Methods and parameters
Parameter value Island 1 Island 2
Number of iterations 50, period = 60 seconds Method
(agent)
Method
(agent)
Number of islands 4 (AMD Opteron 6376) I. II.
Manager
Neighbors of method 3 (distributed to everyone) (planner)
Migration frequency 5 seconds Monitor
(statistic)
Table 2: Parameters of the island model
Island 3
Method
(agent)
III.
ages the training set. The test set is used to evaluate indi- Figure 2: Architecture of system
viduals.
3.4 Realization of the experiment The use of this system is the implementation of methods
as an agent, who can develop methods as adaptive com-
Our main idea is to increase stochasticity of searching the
puting resources. The infrastructure of the system is made
state space (which is enormous for movie data). First
up of two central points, by Agent-manager that manages
level of stochasticity is enabled by stochastic optimization
computation and by Agent-monitor, that monitors genetic
method. Second level is enabled by several independent is-
material in the system.
lands. The third level is attained by migration. Our system
enables all of these. Moreover, all of these run in parallel Software allows multiple ways of monitoring. The mon-
with mutually assisting methods (not competitive). itor statistically processes the quality of the individuals.
In order to be able to obtain the best solution from such Another way of observing what happens in the system is
a computation resource, the computation must be continu- to produce the pedigrees of an individual. The basic idea
ously monitored (Figure 2). Solutions represent individu- is to enrich every individual of the pedigree that contains
als that are unpredictably moving across the island model. information on the involvement of concrete islands in its
We can never know when and where the best solution will creation.
appear (sometimes the best solution does not appear at the For our experiments, were used statistics that are com-
end, see Figure 3). puted from each monitored individuals in one iteration of
Our implementation uses agent middle-ware Jade3 for planner. We will present the results as a measure of the
achievement of higher adaptivity of methods (in our three system, which is monitoring follow-planner-iterations.
levels of stochasticity). Our evolutionary methods (Table 1) use uniform
Here we were facing main decision. Whether to prefer crossover. In phase of mutation they apply stochastic gra-
higher adaptivity (based e.g. on Jade) or better use of ef- dient descent on a randomly chosen rating.
fectiveness of specific hardware (based e.g. on C). From Differential evolution combines 3 latent vectors (indi-
pure experimental curiosity we went along the agent based viduals which are models/solutions) LV1 which is a con-
middle-ware framework direction. catenation of U|1 and I|1 , similarly LV2 and LV3 . Result
The central brain of the multi-agent based system is a of the differential operator is latent vector
manager of migration which directs the computation and
measures genetic material during evolution.
LVnew = LV1 + F ∗ (LV2 − LV3 )
3 http://jade.tilab.com/ here F is 0.25.
150 Štepán Balcar
3.5 Test design and computing resources
Comparison of single methods - MFML1m
1.7
Two pairs of tests were created, comparing single meth- Single-BruteForce
Single-Evolution
ods and island models, differing in the size of the input 1.6 Single-HillClimbing
Single-SimulatedAnnealing
Movieland dataset. The width of the latent vector was set 1.5
Single-TabuSearch
to 10 (best after initial experiments on smaller samples).
The tests run on all initial parameter settings (Table 3) 9 1.4
y: fitness RMSE
times. As far as our computations are nondeterministic 1.3
this makes difference (see Figures 4 and 5). These are
computationally demanding calculations. 1.2
1.1
Parameter value
Number of runs 9 1
Computing nodes AMD Opteron 6376, 64GB memory 0.9
Latent factor width 10 0 5 10 15 20
x: time in planner iterations (1x iteration = 60x seconds)
25 30 35 40 45 50
Datasets ml-100k, ml-1m
Table 3: Parameters of the test
Figure 3: Methods ML1m investigation of median run
4 Results
Comparison single methods and island - MFML100k
1
0.98
Our experiments validated two hypotheses. First is that
evolution does not necessary give best solution and sec- 0.96
ond that island models improve results. We will present
y: fitness RMSE
results for two datasets separately. We will show the best 0.94
solution of 9 runs and average of all runs (for distribution 0.92
see Figures 4 and 5).
On the smaller data set the winner is always Evolution 0.9
and Islands give always better solution.
On the bigger dataset the single island winner is simu- 0.88
lated annealing (in both minimum and average). In Islands Si
ng
le
Is
la
nd
-B
Si
ng
le
Is
la
nd
-E
Si
ng
le
Is
la
nd
-H
Si
ng
le
Is
la
nd
-T
-B -E -H -T
the winner is hill climbing (in both minimum and aver- ru
te
Fo
rc
ru
te
For
vo
lu
tio
n
vo
lu
tio
n
illC
lim
bi
illC
lim
bi
ab
uS
ea
ab
uS
ear
ce ng rc
age). On bigger data we can not always observe advantage
e n g h ch
brought by parallelization by islands and migration. This
can be observed especially by simulated annealing. One Figure 4: Methods ML100k boxplot of results
possible explanation is that hill climbing does not risk that
much going in wrong direction as simulated annealing is
doing (sometimes). Bigger data bring bigger state space 6 Future work
and hence risk is necessary, but 50 iterations of the plan-
ner is probably not enough. In future research we will run In our future work we plan to extend this to heterogeneous
island models with more iterations. island models. We also plan to develop new planners
which will be specifically designed for recommendation
domain. We would like to consider also islands with statis-
5 Conclusions tical learning methods. Challenge is extension to content
based recommendation.
We proved that island models are a good computing tool
for recommender systems. Our experiments have shown
following. Island models brought clear improvement on References
smaller data set. On bigger data, it is also true except of
simulated annealing. Moreover on bigger data evolution [1] Tomáš Horváth and André C. P. L. F. de Carvalho. Evo-
does not find best solution. lutionary computing in recommender systems: a review of
Our implementation is publicly available on Github4 recent research. Natural Computing, 16(3):441–462, 2017.
and hence enables repeatability of our experiments (see [2] M. Rezaei and R. Boostani. Using genetic algorithm to
[1], requirement (v)). enhance nonnegative matrix factorization initialization. In
2010 6th Iranian Conference on Machine Vision and Image
4 https://github.com/sbalcar/distributedea Processing, pages 1–5, Oct 2010.
Preference learning by matrix factorization on island models 151
Methods Single-min Islands-min Single-average Islands-average
BruteForce 0.98787115 0.94367838 0.99088728 0.94571058
DifferentialEvolution 1.50105714 1.49096267 1.51771324 1.49957429
Evolution 0.88196787 0.87695296 0.88705747 0.87952888
HillClimbing 0.98047412 0.97400051 0.98207866 0.97824895
RandomSearch 1.60448414 1.58333447 1.61703977 1.60802325
SimulatedAnnealing 1.06110779 1.03108626 1.07797515 1.03806128
TabuSearch 0.98048607 0.97681064 0.98217126 0.97963446
Table 4: Comparison of single methods and island models: Dataset - MFML100k, Runs - 9
Methods Single-min Islands-min Single-average Islands-average
BruteForce 1.22268511 1.18410586 1.23279212 1.2033589
DifferentialEvolution 1.55227103 1.55057106 1.55780106 1.55422412
Evolution 1.10254503 1.07069968 1.13526207 1.09140823
HillClimbing 0.96832015 0.96727131 0.97065502 0.96840234
RandomSearch 1.65483811 1.66265835 1.66738744 1.6660426
SimulatedAnnealing 0.96655732 0.97118289 0.97048761 0.9729405
TabuSearch 0.96873215 0.96735123 0.97106628 0.96854622
Table 5: Comparison of single methods and island models: Dataset - MFML1m, Runs - 9 (in red there are violations of
island improvement)
1998.
0.974
Comparison single methods and island - MFML1m
[8] Martin Pilát and Roman Neruda. Combining multiobjective
and single-objective genetic algorithms in heterogeneous
0.973
island model. In CEC 2010, pages 1–8, 2010.
0.972 [9] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast
and elitist multiobjective genetic algorithm: Nsga-ii. Trans.
y: fitness RMSE
0.971
0.97
Evol. Comp, 6(2):182–197, 2002.
[10] Eckart Zitzler, Marco Laumanns, and Lothar Thiele. Spea2:
0.969
Improving the strength pareto evolutionary algorithm.
0.968 Technical report, ETH Zurich, 2001.
0.967 [11] S Balcar and M Pilat. Heterogeneous island model with re-
Si Is Si Is Si Is
planning of methods. In GECCO’18, accepted as poster,
la la la
2018.
ng nd ng nd ng nd
le -H le -S le -T
-Hi -S -T
llC illC im im ab ab
lim lim ul ul uS uS
bi
ng
bi
ng
at
ed
An
at
ed
An
ea
rch
ea
rc
h
[12] Stepan Balcar. Heterogeneous island model with re-
ne ne
al
in
g
al
in
g planning of methods (in czech). Master thesis, Martin Pilát
supervisor, 2017.
[13] G. A. Croes. A method for solving traveling salesman prob-
Figure 5: Methods ML1m boxplot of results lems. Operations Research, page 791–812, 1958.
[14] R.K. Arora. Optimization: Algorithms and Applications.
CRC Press, 2015.
[3] Dariush Zandi Navgaran, Parham Moradi, and Fardin [15] Jinliang Wang and Michael C. Whitlock. Estimating effec-
Akhlaghian. Evolutionary based matrix factorization tive population size and migration rates from genetic sam-
method for collaborative filtering systems. 2013 21st ICEE, ples over space and time. Genetics, 163(1):429–446, 2003.
pages 1–5, 2013. [16] William Astle and David J. Balding. Population struc-
[4] Y Koren, R Bell, and Volinsky C. Matrix factorization tech- ture and cryptic relatedness in genetic association studies.
niques for recommender systems. Computer, 42(8):30–37, Statist. Sci., 24(4):451–471, 11 2009.
2009. [17] John Wakeley. The coalescent in an island model of popu-
[5] L. Peska and P. Vojtas. Using implicit preference relations lation subdivision with variation among demes. Theoretical
to improve recommender systems. Journal on Data Seman- Population Biology, 59(2):133 – 144, 2001.
tics, 6,1:15–30, 2017.
[6] John H. Holland. Adaptation in Natural and Artificial Sys- Acknowledgements. This work was supported by a
tems. MIT Press, 1992. Ph.D grant SVV2018-260451 under supervision of Peter
[7] Darrell Whitley, Soraya Rana, and Robert B. Heckendorn. Vojtas.
The island model genetic algorithm: On separability, pop-
ulation size and convergence. J.Comp.Inf.Tech., 7:33–47,