1. INTRODUCTION

rrecsys: an R-package for prototyping recommendation algorithms

Ludovik Çoba

lcoba@unishk.edu.al 1

Markus Zanker

mzanker@unibz.it 0 0 Free University of Bozen-Bolzano , piazza Domenicani, 3, 39100 Bolzano , Italy 1 Universiteti i Shkodrës "Luigj Gurakuqi" , Sheshi 2 Prilli, Shkodër , Albania

2016

We introduce rrecsys, an open source extension package in R for rapid prototyping and intuitive assessment of recommender system algorithms. As the only currently available R package for recommender algorithms (recommenderlab) did not include popular algorithm implementations such as matrix factorization or One-class Collaborative Filtering algorithms we developed rrecsys as an easily accessible tool that can, for instance, be employed for interactive demonstrations when teaching. This package replicates state-of-the-art Collaborative Filtering algorithms for rating and binary data and we compare results with the Java-based LensKit implementation and recommederlab for the purpose of benchmarking the implementation. Therefore this work can also be seen as a contribution in the context of replication of algorithm implementations and reproduction of evaluation results.

1. INTRODUCTION

R represents a popular choice in Data Analytics and Machine Learning. The software has low setup cost and contains a large selection of packages and functionalities to enhance and prototype algorithms with compact code and good visualization tools. Thus R represents a suitable environment for exploring the eld of recommender systems. We present and contribute a novel R package, rrecsys1, that replicates several state-of-the-art recommender algorithms for Likert scaled as well as binary rating values. Up to now there is only one package addressing recommender systems, recommenderlab 2, which lacks implementation of popular algorithms and we benchmark results in Section 3. This work can be seen as a contribution towards the reproducibility of algorithms and results. Although this concept of reproducibility of experimental results is a fundamental prerequisite for scienti c research it is many times not given for granted in the recommender systems eld. For instance Said et al. [ 3 ] pointed out that the major recommendation frameworks such as MyMediaLite, LensKit and Apache Mahout show major di erences in the implementation of the same algorithm variants and in their evaluation methodology. Di erences which are according to Said et al. many times much larger than the typically reported performance 1https://cran.r-project.org/package=rrecsys 2https://cran.r-project.org/package=recommenderlab globalMean itemAverage userAverage SVD(10 feat.) SVD (50 feat.) SVD (100 feat.) SVD (150 feat.) IB (20 neigh.) IB (50 neigh.) IB (100 neigh.) improvements of a new algorithm over the selected baseline technique. Prototyping helps to shape recommender algorithms and evaluation methodologies as a strategy to tackle directly the issue of reproducibility. Furthermore, teaching recommendation concepts and evaluation methodology in hands-on sessions is highly relevant to understand ideas and algorithms from a didactics perspective and to make the learning experience more student-centered. 2.

THE PACKAGE

rrecsys has a modular structure as well as includes expansion capabilities. The core of the package includes the implementation of several popular algorithms such as: Most Popular, Global Average, Item Average, User Average, Item Based K-Nearest Neighbors, Simon Funk's SVD, Weighted Alternated Least Squares and Bayesian Personalized Ranking. The package's evaluation module is based on k-fold cross-validation method. A strati ed random selection procedure is applied when dividing the rated items of each user into k folds such that each user is uniformly represented in each fold. Based on the task (rating prediction or recommendation) the following metrics are computed: mean absolute error(MAE), root mean square error(RMSE), Precision, Recall, F1, True and False Positives, True and False Negatives, normalized discounted cumulative gain (NDCG), rank score, area under the ROC curve (AUC) and catalog coverage. RMSE and MAE metrics are computed according to their two variants, user-based vs. global. 3.

RRECSYS IN ACTION

In this section we introduce an executable script in R for running some of the functionalities of rrecsys in order to demonstrate its intuitive use. # Install and load: install.packages("rrecsys") library(rrecsys) # ML Latest is loaded on the package. data("mlLatest100k") # Define a rating matrix and explore it. mlLatest <- defineData(mlLatest100k,

minimum = .5, maximum = 5, halfStar = TRUE) sparsity(mlLatest); numRatings(mlLatest) rowRatings(mlLatest); colRatings(mlLatest) smallMlLatest <- mlLatest[rowRatings(mlLatest) >= 200, colRatings(mlLatest) > 10] # Setting up the number of iterations for FunkSVD. setStoppingCriteria(nrLoops = 50) # Training a model using FunkSVD. svd10 <- rrecsys(smallMlLatest, "FunkSVD", k = 10, lambda = 0.001, gamma = 0.0015) # Using the trained model to predict and recommend. p <- predict(svd10) r <- recommend(svd10, topN = 10) # Instantiate an evaluation model. model <- evalModel(smallMlLatest, folds = 5) # Using the above model to evaluate predictions. evalPred(model, "IBKNN", neigh = 10) # Using the same model to evaluate recommendations. evalRec(model, "globalAverage", topN = 10, goodRating = 3)

4. BENCHMARK RESULTS

In Table 1 we report results from benchmarking the rrecsys implementation with the popular Lenskit [ 1 ] Java library and the recommenderlab R package. The reported results demonstrate the ability to clearly reproduce the results of Lenskit being the most well-known Java-based recommendation library and in contrast to recommenderlab. Evaluation is made using 5-fold cross validation on the MovieLens100K dataset. Lenskit and rrecsys were con gured identically. In the case of recommenderlab we selected parameters such that its con guration was as close as possible to our and Lenskit's evaluation methodology. Reported error metrics were computed as a global average over the whole ratings in the test set. The SVD algorithm implementation in recommenderlab is based on an approximation estimated by the EM algorithm, resulting in bad prediction performance but enables the developer to vectorize, providing good computation performance. In the case of the item based k-nearest neighbor algorithm, rrecsys replicates recommenderlab implementation. Yet results of recommenderlab di er quite clearly proving that disparity in the implementation of the evaluation methodology signi cantly in uences the reported results. We deployed a second set of experiments using MovieLens Latest3[ 2 ] dataset cropped to a smaller chunk containing 620 users, 851 items, 58801 ratings, where each users has rated at least 20 items and each item was rated at least 25 times. In Figure 1 we report results of evaluation on this dataset with 5 folds. We ran these examples on a 2012's laptop computer with an Intel i5 at 2.60GHz and 8 3Authors express their gratitude to GroupLens for allowing redistribution of the MovieLens Latest data.

IBKNN

Baseline Alg.

BPR(20 features, 20 iteration) BPR(40 features, 20 iteration) wALS(20 features, 20 iteration) wALS(40 features, 20 iteration) 0:85 0:8 10 50 100 150 200

Neighboors for IB.

GB RAM.We report execution times for a single prediction task and the full evaluation steps in Table 2 and 3.

5. CONCLUSIONS

This poster contributed a recently released package for prototyping and interactively demonstrating recommendation algorithms in R. It comes with a nice range of implemented standard CF algorithms. Reported results demonstrate that it reproduces results of the Java-based Lenskit toolkit. Thus it remains to hope that this e ort will be of use for the eld of recommender systems and the large R user community.

6. REFERENCES

[1]

M. D.

Ekstrand ,

Ludwig ,

J. A.

Konstan , and

J. T.

Riedl . Rethinking the recommender research ecosystem: Reproducibility, openness, and lenskit . RecSys '11 , pages 133 { 140 , New York, NY, USA, 2011 . ACM.

[2]

F. M.

Harper and

J. A.

Konstan . The movielens datasets: History and context . ACM Trans. Interact. Intell. Syst. , 5 ( 4 ): 19 :1{ 19 : 19 , Dec . 2015 .

[3]

Said and A. Bellog n. Comparative Recommender System Evaluation: Benchmarking Recommendation Frameworks . RecSys , pages 129 { 136 , 2014 .