=Paper= {{Paper |id=Vol-1688/paper-12 |storemode=property |title=rrecsys: An R-package for Prototyping Recommendation Algorithms |pdfUrl=https://ceur-ws.org/Vol-1688/paper-12.pdf |volume=Vol-1688 |authors=Ludovik Çoba,Markus Zanker |dblpUrl=https://dblp.org/rec/conf/recsys/CobaZ16 }} ==rrecsys: An R-package for Prototyping Recommendation Algorithms== https://ceur-ws.org/Vol-1688/paper-12.pdf

rrecsys: an R-package for prototyping recommendation
algorithms

Ludovik Çoba Markus Zanker
Universiteti i Shkodrës "Luigj Gurakuqi" Free University of Bozen-Bolzano
Sheshi 2 Prilli piazza Domenicani, 3
Shkodër, Albania 39100 Bolzano, Italy
lcoba@unishk.edu.al mzanker@unibz.it

ABSTRACT
Table 1: Benchmark in terms of RMSE between
We introduce rrecsys, an open source extension package in Lenskit, rrecsys and recommenderlab.
R for rapid prototyping and intuitive assessment of recom- Algorithm Lenskit rrecsys recommenderlab
mender system algorithms. As the only currently available R globalMean 1.1278 1.1257 NA
package for recommender algorithms (recommenderlab) did itemAverage 1.0428 1.0246 NA
not include popular algorithm implementations such as ma- userAverage 1.0509 1.0416 NA
trix factorization or One-class Collaborative Filtering algo- SVD(10 feat.) 0.9287 0.9277 3.7023
rithms we developed rrecsys as an easily accessible tool that SVD (50 feat.) 0.9224 0.9207 3.7023
can, for instance, be employed for interactive demonstra- SVD (100 feat.) 0.9273 0.9191 3.7020
tions when teaching. This package replicates state-of-the-art SVD (150 feat.) 0.9262 0.9188 3.7009
Collaborative Filtering algorithms for rating and binary data IB (20 neigh.) 0.9455 0.9851 1.1641
and we compare results with the Java-based LensKit im- IB (50 neigh.) 0.9503 0.9477 1.1798
plementation and recommederlab for the purpose of bench- IB (100 neigh.) 0.9551 0.9416 1.2371
marking the implementation. Therefore this work can also
be seen as a contribution in the context of replication of
algorithm implementations and reproduction of evaluation improvements of a new algorithm over the selected baseline
results. technique. Prototyping helps to shape recommender algo-
rithms and evaluation methodologies as a strategy to tackle
1. INTRODUCTION directly the issue of reproducibility. Furthermore, teach-
R represents a popular choice in Data Analytics and Ma- ing recommendation concepts and evaluation methodology
chine Learning. The software has low setup cost and con- in hands-on sessions is highly relevant to understand ideas
tains a large selection of packages and functionalities to and algorithms from a didactics perspective and to make the
enhance and prototype algorithms with compact code and learning experience more student-centered.
good visualization tools. Thus R represents a suitable en-
vironment for exploring the field of recommender systems. 2. THE PACKAGE
We present and contribute a novel R package, rrecsys1 , that rrecsys has a modular structure as well as includes ex-
replicates several state-of-the-art recommender algorithms pansion capabilities. The core of the package includes the
for Likert scaled as well as binary rating values. Up to now implementation of several popular algorithms such as: Most
there is only one package addressing recommender systems, Popular, Global Average, Item Average, User Average, Item
recommenderlab 2 , which lacks implementation of popular Based K-Nearest Neighbors, Simon Funk’s SVD, Weighted
algorithms and we benchmark results in Section 3. Alternated Least Squares and Bayesian Personalized Rank-
This work can be seen as a contribution towards the repro- ing. The package’s evaluation module is based on k-fold
ducibility of algorithms and results. Although this concept cross-validation method. A stratified random selection pro-
of reproducibility of experimental results is a fundamental cedure is applied when dividing the rated items of each user
prerequisite for scientific research it is many times not given into k folds such that each user is uniformly represented
for granted in the recommender systems field. For instance in each fold. Based on the task (rating prediction or rec-
Said et al. [3] pointed out that the major recommendation ommendation) the following metrics are computed: mean
frameworks such as MyMediaLite, LensKit and Apache Ma- absolute error(MAE), root mean square error(RMSE), Pre-
hout show major differences in the implementation of the cision, Recall, F1, True and False Positives, True and False
same algorithm variants and in their evaluation methodol- Negatives, normalized discounted cumulative gain (NDCG),
ogy. Differences which are according to Said et al. many rank score, area under the ROC curve (AUC) and catalog
times much larger than the typically reported performance coverage. RMSE and MAE metrics are computed according
1 to their two variants, user-based vs. global.
https://cran.r-project.org/package=rrecsys
2
https://cran.r-project.org/package=recommenderlab
3. RRECSYS IN ACTION
RecSys 2016 Poster Proceedings, September 15-19, 2016, Boston, MA,
USA. In this section we introduce an executable script in R for
Copyright held by the author(s). running some of the functionalities of rrecsys in order to
demonstrate its intuitive use.
Table 2: Single prediction and evaluation times for
# Install and load: the cropped MovieLens dataset.
install.packages("rrecsys") Eval.
Algorithm Pred.
library(rrecsys) (5 folds)
# ML Latest is loaded on the package. IBKNN 1.51 ms 3539.4 s
data("mlLatest100k") Baseline Alg. 0.14 µs 0.4 s
# Define a rating matrix and explore it. BPR(20 features, 20 iteration) 0.95 µs 2.2 s
mlLatest <- defineData(mlLatest100k, BPR(40 features, 20 iteration) 1.5 µs 3.5 s
minimum = .5, maximum = 5, halfStar = TRUE) wALS(20 features, 20 iteration) 1.1 ms 2583.3 s
sparsity(mlLatest); numRatings(mlLatest) wALS(40 features, 20 iteration) 1.7 ms 4098.6 s
rowRatings(mlLatest); colRatings(mlLatest)
smallMlLatest <- mlLatest[rowRatings(mlLatest)
>= 200, colRatings(mlLatest) > 10] Table 3: Single prediction and evaluation time on
# Setting up the number of iterations for FunkSVD. FunkSVD for the cropped MovieLens dataset.
setStoppingCriteria(nrLoops = 50) # of features 40 80 100 120 140 180
# Training a model using FunkSVD. Pred.(in µs) 2.6 8.7 11.3 14.1 16.6 24.7
svd10 <- rrecsys(smallMlLatest, "FunkSVD", k = 10, Eval.(in s) 6.2 20.4 26.6 33.0 38.8 57.9
lambda = 0.001, gamma = 0.0015)
# Using the trained model to predict and recommend. 1.05
p <- predict(svd10) item-based CF
r <- recommend(svd10, topN = 10)
1 FunkSVD(80 feat.)
# Instantiate an evaluation model.
model <- evalModel(smallMlLatest, folds = 5) GlobalAv
0.95
RMSE
# Using the above model to evaluate predictions. UserAv
evalPred(model, "IBKNN", neigh = 10) ItemAv
# Using the same model to evaluate recommendations. 0.9
evalRec(model, "globalAverage", topN = 10,
goodRating = 3) 0.85

0.8
10 50 100 150 200
4. BENCHMARK RESULTS Neighboors for IB.
In Table 1 we report results from benchmarking the rrec-
sys implementation with the popular Lenskit [1] Java library
Figure 1: Evaluation on cropped MovieLens Latest.
and the recommenderlab R package. The reported results
demonstrate the ability to clearly reproduce the results of
Lenskit being the most well-known Java-based recommenda- GB RAM.We report execution times for a single prediction
tion library and in contrast to recommenderlab. Evaluation task and the full evaluation steps in Table 2 and 3.
is made using 5-fold cross validation on the MovieLens100K
dataset. Lenskit and rrecsys were configured identically. In
the case of recommenderlab we selected parameters such 5. CONCLUSIONS
that its configuration was as close as possible to our and This poster contributed a recently released package for
Lenskit’s evaluation methodology. Reported error metrics prototyping and interactively demonstrating recommenda-
were computed as a global average over the whole ratings in tion algorithms in R. It comes with a nice range of imple-
the test set. The SVD algorithm implementation in recom- mented standard CF algorithms. Reported results demon-
menderlab is based on an approximation estimated by the strate that it reproduces results of the Java-based Lenskit
EM algorithm, resulting in bad prediction performance but toolkit. Thus it remains to hope that this effort will be of
enables the developer to vectorize, providing good compu- use for the field of recommender systems and the large R
tation performance. In the case of the item based k-nearest user community.
neighbor algorithm, rrecsys replicates recommenderlab im-
plementation. Yet results of recommenderlab differ quite 6. REFERENCES
clearly proving that disparity in the implementation of the [1] M. D. Ekstrand, M. Ludwig, J. A. Konstan, and J. T.
evaluation methodology significantly influences the reported Riedl. Rethinking the recommender research ecosystem:
results. We deployed a second set of experiments using Reproducibility, openness, and lenskit. RecSys ’11,
MovieLens Latest3 [2] dataset cropped to a smaller chunk pages 133–140, New York, NY, USA, 2011. ACM.
containing 620 users, 851 items, 58801 ratings, where each
[2] F. M. Harper and J. A. Konstan. The movielens
users has rated at least 20 items and each item was rated at
datasets: History and context. ACM Trans. Interact.
least 25 times. In Figure 1 we report results of evaluation
Intell. Syst., 5(4):19:1–19:19, Dec. 2015.
on this dataset with 5 folds. We ran these examples on a
2012’s laptop computer with an Intel i5 at 2.60GHz and 8 [3] A. Said and A. Bellogı́n. Comparative Recommender
System Evaluation: Benchmarking Recommendation
3 Frameworks. RecSys, pages 129–136, 2014.
Authors express their gratitude to GroupLens for allowing
redistribution of the MovieLens Latest data.