rrecsys: an R-package for prototyping recommendation
                           algorithms

                            Ludovik Çoba                                             Markus Zanker
              Universiteti i Shkodrës "Luigj Gurakuqi"                      Free University of Bozen-Bolzano
                             Sheshi 2 Prilli                                     piazza Domenicani, 3
                          Shkodër, Albania                                        39100 Bolzano, Italy
                       lcoba@unishk.edu.al                                         mzanker@unibz.it

ABSTRACT
                                                                      Table 1: Benchmark in terms of RMSE between
We introduce rrecsys, an open source extension package in             Lenskit, rrecsys and recommenderlab.
R for rapid prototyping and intuitive assessment of recom-                Algorithm     Lenskit rrecsys recommenderlab
mender system algorithms. As the only currently available R               globalMean     1.1278  1.1257       NA
package for recommender algorithms (recommenderlab) did                  itemAverage     1.0428  1.0246       NA
not include popular algorithm implementations such as ma-                userAverage     1.0509  1.0416       NA
trix factorization or One-class Collaborative Filtering algo-           SVD(10 feat.)    0.9287  0.9277      3.7023
rithms we developed rrecsys as an easily accessible tool that          SVD (50 feat.)    0.9224  0.9207      3.7023
can, for instance, be employed for interactive demonstra-              SVD (100 feat.)   0.9273  0.9191      3.7020
tions when teaching. This package replicates state-of-the-art          SVD (150 feat.)   0.9262  0.9188      3.7009
Collaborative Filtering algorithms for rating and binary data           IB (20 neigh.)   0.9455  0.9851      1.1641
and we compare results with the Java-based LensKit im-                  IB (50 neigh.)   0.9503  0.9477      1.1798
plementation and recommederlab for the purpose of bench-               IB (100 neigh.)   0.9551  0.9416      1.2371
marking the implementation. Therefore this work can also
be seen as a contribution in the context of replication of
algorithm implementations and reproduction of evaluation              improvements of a new algorithm over the selected baseline
results.                                                              technique. Prototyping helps to shape recommender algo-
                                                                      rithms and evaluation methodologies as a strategy to tackle
1.     INTRODUCTION                                                   directly the issue of reproducibility. Furthermore, teach-
   R represents a popular choice in Data Analytics and Ma-            ing recommendation concepts and evaluation methodology
chine Learning. The software has low setup cost and con-              in hands-on sessions is highly relevant to understand ideas
tains a large selection of packages and functionalities to            and algorithms from a didactics perspective and to make the
enhance and prototype algorithms with compact code and                learning experience more student-centered.
good visualization tools. Thus R represents a suitable en-
vironment for exploring the field of recommender systems.             2.   THE PACKAGE
We present and contribute a novel R package, rrecsys1 , that             rrecsys has a modular structure as well as includes ex-
replicates several state-of-the-art recommender algorithms            pansion capabilities. The core of the package includes the
for Likert scaled as well as binary rating values. Up to now          implementation of several popular algorithms such as: Most
there is only one package addressing recommender systems,             Popular, Global Average, Item Average, User Average, Item
recommenderlab 2 , which lacks implementation of popular              Based K-Nearest Neighbors, Simon Funk’s SVD, Weighted
algorithms and we benchmark results in Section 3.                     Alternated Least Squares and Bayesian Personalized Rank-
This work can be seen as a contribution towards the repro-            ing. The package’s evaluation module is based on k-fold
ducibility of algorithms and results. Although this concept           cross-validation method. A stratified random selection pro-
of reproducibility of experimental results is a fundamental           cedure is applied when dividing the rated items of each user
prerequisite for scientific research it is many times not given       into k folds such that each user is uniformly represented
for granted in the recommender systems field. For instance            in each fold. Based on the task (rating prediction or rec-
Said et al. [3] pointed out that the major recommendation             ommendation) the following metrics are computed: mean
frameworks such as MyMediaLite, LensKit and Apache Ma-                absolute error(MAE), root mean square error(RMSE), Pre-
hout show major differences in the implementation of the              cision, Recall, F1, True and False Positives, True and False
same algorithm variants and in their evaluation methodol-             Negatives, normalized discounted cumulative gain (NDCG),
ogy. Differences which are according to Said et al. many              rank score, area under the ROC curve (AUC) and catalog
times much larger than the typically reported performance             coverage. RMSE and MAE metrics are computed according
1                                                                     to their two variants, user-based vs. global.
    https://cran.r-project.org/package=rrecsys
2
    https://cran.r-project.org/package=recommenderlab
                                                                      3.   RRECSYS IN ACTION
 RecSys 2016 Poster Proceedings, September 15-19, 2016, Boston, MA,
USA.                                                                    In this section we introduce an executable script in R for
Copyright held by the author(s).                                      running some of the functionalities of rrecsys in order to
demonstrate its intuitive use.
                                                               Table 2: Single prediction and evaluation times for
# Install and load:                                            the cropped MovieLens dataset.
install.packages("rrecsys")                                                                                Eval.
                                                                           Algorithm              Pred.
library(rrecsys)                                                                                         (5 folds)
# ML Latest is loaded on the package.                                       IBKNN                1.51 ms 3539.4 s
data("mlLatest100k")                                                     Baseline Alg.           0.14 µs   0.4 s
# Define a rating matrix and explore it.                          BPR(20 features, 20 iteration) 0.95 µs   2.2 s
mlLatest <- defineData(mlLatest100k,                              BPR(40 features, 20 iteration)  1.5 µs   3.5 s
  minimum = .5, maximum = 5, halfStar = TRUE)                    wALS(20 features, 20 iteration) 1.1 ms 2583.3 s
sparsity(mlLatest); numRatings(mlLatest)                         wALS(40 features, 20 iteration) 1.7 ms 4098.6 s
rowRatings(mlLatest); colRatings(mlLatest)
smallMlLatest <- mlLatest[rowRatings(mlLatest)
>= 200, colRatings(mlLatest) > 10]                             Table 3: Single prediction and evaluation time on
# Setting up the number of iterations for FunkSVD.             FunkSVD for the cropped MovieLens dataset.
setStoppingCriteria(nrLoops = 50)                                # of features 40   80    100 120 140 180
# Training a model using FunkSVD.                                Pred.(in µs) 2.6 8.7 11.3 14.1 16.6 24.7
svd10 <- rrecsys(smallMlLatest, "FunkSVD", k = 10,                Eval.(in s)  6.2 20.4 26.6 33.0 38.8 57.9
lambda = 0.001, gamma = 0.0015)
# Using the trained model to predict and recommend.                   1.05
p <- predict(svd10)                                                                                item-based CF
r <- recommend(svd10, topN = 10)
                                                                        1                        FunkSVD(80 feat.)
# Instantiate an evaluation model.
model <- evalModel(smallMlLatest, folds = 5)                                                          GlobalAv
                                                                      0.95
                                                               RMSE
# Using the above model to evaluate predictions.                                                       UserAv
evalPred(model, "IBKNN", neigh = 10)                                                                   ItemAv
# Using the same model to evaluate recommendations.                    0.9
evalRec(model, "globalAverage", topN = 10,
goodRating = 3)                                                       0.85

                                                                       0.8
                                                                             10    50         100        150        200
4.   BENCHMARK RESULTS                                                                  Neighboors for IB.
   In Table 1 we report results from benchmarking the rrec-
sys implementation with the popular Lenskit [1] Java library
                                                               Figure 1: Evaluation on cropped MovieLens Latest.
and the recommenderlab R package. The reported results
demonstrate the ability to clearly reproduce the results of
Lenskit being the most well-known Java-based recommenda-       GB RAM.We report execution times for a single prediction
tion library and in contrast to recommenderlab. Evaluation     task and the full evaluation steps in Table 2 and 3.
is made using 5-fold cross validation on the MovieLens100K
dataset. Lenskit and rrecsys were configured identically. In
the case of recommenderlab we selected parameters such         5.     CONCLUSIONS
that its configuration was as close as possible to our and        This poster contributed a recently released package for
Lenskit’s evaluation methodology. Reported error metrics       prototyping and interactively demonstrating recommenda-
were computed as a global average over the whole ratings in    tion algorithms in R. It comes with a nice range of imple-
the test set. The SVD algorithm implementation in recom-       mented standard CF algorithms. Reported results demon-
menderlab is based on an approximation estimated by the        strate that it reproduces results of the Java-based Lenskit
EM algorithm, resulting in bad prediction performance but      toolkit. Thus it remains to hope that this effort will be of
enables the developer to vectorize, providing good compu-      use for the field of recommender systems and the large R
tation performance. In the case of the item based k-nearest    user community.
neighbor algorithm, rrecsys replicates recommenderlab im-
plementation. Yet results of recommenderlab differ quite       6.     REFERENCES
clearly proving that disparity in the implementation of the    [1] M. D. Ekstrand, M. Ludwig, J. A. Konstan, and J. T.
evaluation methodology significantly influences the reported       Riedl. Rethinking the recommender research ecosystem:
results.   We deployed a second set of experiments using           Reproducibility, openness, and lenskit. RecSys ’11,
MovieLens Latest3 [2] dataset cropped to a smaller chunk           pages 133–140, New York, NY, USA, 2011. ACM.
containing 620 users, 851 items, 58801 ratings, where each
                                                               [2] F. M. Harper and J. A. Konstan. The movielens
users has rated at least 20 items and each item was rated at
                                                                   datasets: History and context. ACM Trans. Interact.
least 25 times. In Figure 1 we report results of evaluation
                                                                   Intell. Syst., 5(4):19:1–19:19, Dec. 2015.
on this dataset with 5 folds. We ran these examples on a
2012’s laptop computer with an Intel i5 at 2.60GHz and 8       [3] A. Said and A. Bellogı́n. Comparative Recommender
                                                                   System Evaluation: Benchmarking Recommendation
3                                                                  Frameworks. RecSys, pages 129–136, 2014.
  Authors express their gratitude to GroupLens for allowing
redistribution of the MovieLens Latest data.