rrecsys: an R-package for prototyping recommendation algorithms Ludovik Çoba Markus Zanker Universiteti i Shkodrës "Luigj Gurakuqi" Free University of Bozen-Bolzano Sheshi 2 Prilli piazza Domenicani, 3 Shkodër, Albania 39100 Bolzano, Italy lcoba@unishk.edu.al mzanker@unibz.it ABSTRACT Table 1: Benchmark in terms of RMSE between We introduce rrecsys, an open source extension package in Lenskit, rrecsys and recommenderlab. R for rapid prototyping and intuitive assessment of recom- Algorithm Lenskit rrecsys recommenderlab mender system algorithms. As the only currently available R globalMean 1.1278 1.1257 NA package for recommender algorithms (recommenderlab) did itemAverage 1.0428 1.0246 NA not include popular algorithm implementations such as ma- userAverage 1.0509 1.0416 NA trix factorization or One-class Collaborative Filtering algo- SVD(10 feat.) 0.9287 0.9277 3.7023 rithms we developed rrecsys as an easily accessible tool that SVD (50 feat.) 0.9224 0.9207 3.7023 can, for instance, be employed for interactive demonstra- SVD (100 feat.) 0.9273 0.9191 3.7020 tions when teaching. This package replicates state-of-the-art SVD (150 feat.) 0.9262 0.9188 3.7009 Collaborative Filtering algorithms for rating and binary data IB (20 neigh.) 0.9455 0.9851 1.1641 and we compare results with the Java-based LensKit im- IB (50 neigh.) 0.9503 0.9477 1.1798 plementation and recommederlab for the purpose of bench- IB (100 neigh.) 0.9551 0.9416 1.2371 marking the implementation. Therefore this work can also be seen as a contribution in the context of replication of algorithm implementations and reproduction of evaluation improvements of a new algorithm over the selected baseline results. technique. Prototyping helps to shape recommender algo- rithms and evaluation methodologies as a strategy to tackle 1. INTRODUCTION directly the issue of reproducibility. Furthermore, teach- R represents a popular choice in Data Analytics and Ma- ing recommendation concepts and evaluation methodology chine Learning. The software has low setup cost and con- in hands-on sessions is highly relevant to understand ideas tains a large selection of packages and functionalities to and algorithms from a didactics perspective and to make the enhance and prototype algorithms with compact code and learning experience more student-centered. good visualization tools. Thus R represents a suitable en- vironment for exploring the field of recommender systems. 2. THE PACKAGE We present and contribute a novel R package, rrecsys1 , that rrecsys has a modular structure as well as includes ex- replicates several state-of-the-art recommender algorithms pansion capabilities. The core of the package includes the for Likert scaled as well as binary rating values. Up to now implementation of several popular algorithms such as: Most there is only one package addressing recommender systems, Popular, Global Average, Item Average, User Average, Item recommenderlab 2 , which lacks implementation of popular Based K-Nearest Neighbors, Simon Funk’s SVD, Weighted algorithms and we benchmark results in Section 3. Alternated Least Squares and Bayesian Personalized Rank- This work can be seen as a contribution towards the repro- ing. The package’s evaluation module is based on k-fold ducibility of algorithms and results. Although this concept cross-validation method. A stratified random selection pro- of reproducibility of experimental results is a fundamental cedure is applied when dividing the rated items of each user prerequisite for scientific research it is many times not given into k folds such that each user is uniformly represented for granted in the recommender systems field. For instance in each fold. Based on the task (rating prediction or rec- Said et al. [3] pointed out that the major recommendation ommendation) the following metrics are computed: mean frameworks such as MyMediaLite, LensKit and Apache Ma- absolute error(MAE), root mean square error(RMSE), Pre- hout show major differences in the implementation of the cision, Recall, F1, True and False Positives, True and False same algorithm variants and in their evaluation methodol- Negatives, normalized discounted cumulative gain (NDCG), ogy. Differences which are according to Said et al. many rank score, area under the ROC curve (AUC) and catalog times much larger than the typically reported performance coverage. RMSE and MAE metrics are computed according 1 to their two variants, user-based vs. global. https://cran.r-project.org/package=rrecsys 2 https://cran.r-project.org/package=recommenderlab 3. RRECSYS IN ACTION RecSys 2016 Poster Proceedings, September 15-19, 2016, Boston, MA, USA. In this section we introduce an executable script in R for Copyright held by the author(s). running some of the functionalities of rrecsys in order to demonstrate its intuitive use. Table 2: Single prediction and evaluation times for # Install and load: the cropped MovieLens dataset. install.packages("rrecsys") Eval. Algorithm Pred. library(rrecsys) (5 folds) # ML Latest is loaded on the package. IBKNN 1.51 ms 3539.4 s data("mlLatest100k") Baseline Alg. 0.14 µs 0.4 s # Define a rating matrix and explore it. BPR(20 features, 20 iteration) 0.95 µs 2.2 s mlLatest <- defineData(mlLatest100k, BPR(40 features, 20 iteration) 1.5 µs 3.5 s minimum = .5, maximum = 5, halfStar = TRUE) wALS(20 features, 20 iteration) 1.1 ms 2583.3 s sparsity(mlLatest); numRatings(mlLatest) wALS(40 features, 20 iteration) 1.7 ms 4098.6 s rowRatings(mlLatest); colRatings(mlLatest) smallMlLatest <- mlLatest[rowRatings(mlLatest) >= 200, colRatings(mlLatest) > 10] Table 3: Single prediction and evaluation time on # Setting up the number of iterations for FunkSVD. FunkSVD for the cropped MovieLens dataset. setStoppingCriteria(nrLoops = 50) # of features 40 80 100 120 140 180 # Training a model using FunkSVD. Pred.(in µs) 2.6 8.7 11.3 14.1 16.6 24.7 svd10 <- rrecsys(smallMlLatest, "FunkSVD", k = 10, Eval.(in s) 6.2 20.4 26.6 33.0 38.8 57.9 lambda = 0.001, gamma = 0.0015) # Using the trained model to predict and recommend. 1.05 p <- predict(svd10) item-based CF r <- recommend(svd10, topN = 10) 1 FunkSVD(80 feat.) # Instantiate an evaluation model. model <- evalModel(smallMlLatest, folds = 5) GlobalAv 0.95 RMSE # Using the above model to evaluate predictions. UserAv evalPred(model, "IBKNN", neigh = 10) ItemAv # Using the same model to evaluate recommendations. 0.9 evalRec(model, "globalAverage", topN = 10, goodRating = 3) 0.85 0.8 10 50 100 150 200 4. BENCHMARK RESULTS Neighboors for IB. In Table 1 we report results from benchmarking the rrec- sys implementation with the popular Lenskit [1] Java library Figure 1: Evaluation on cropped MovieLens Latest. and the recommenderlab R package. The reported results demonstrate the ability to clearly reproduce the results of Lenskit being the most well-known Java-based recommenda- GB RAM.We report execution times for a single prediction tion library and in contrast to recommenderlab. Evaluation task and the full evaluation steps in Table 2 and 3. is made using 5-fold cross validation on the MovieLens100K dataset. Lenskit and rrecsys were configured identically. In the case of recommenderlab we selected parameters such 5. CONCLUSIONS that its configuration was as close as possible to our and This poster contributed a recently released package for Lenskit’s evaluation methodology. Reported error metrics prototyping and interactively demonstrating recommenda- were computed as a global average over the whole ratings in tion algorithms in R. It comes with a nice range of imple- the test set. The SVD algorithm implementation in recom- mented standard CF algorithms. Reported results demon- menderlab is based on an approximation estimated by the strate that it reproduces results of the Java-based Lenskit EM algorithm, resulting in bad prediction performance but toolkit. Thus it remains to hope that this effort will be of enables the developer to vectorize, providing good compu- use for the field of recommender systems and the large R tation performance. In the case of the item based k-nearest user community. neighbor algorithm, rrecsys replicates recommenderlab im- plementation. Yet results of recommenderlab differ quite 6. REFERENCES clearly proving that disparity in the implementation of the [1] M. D. Ekstrand, M. Ludwig, J. A. Konstan, and J. T. evaluation methodology significantly influences the reported Riedl. Rethinking the recommender research ecosystem: results. We deployed a second set of experiments using Reproducibility, openness, and lenskit. RecSys ’11, MovieLens Latest3 [2] dataset cropped to a smaller chunk pages 133–140, New York, NY, USA, 2011. ACM. containing 620 users, 851 items, 58801 ratings, where each [2] F. M. Harper and J. A. Konstan. The movielens users has rated at least 20 items and each item was rated at datasets: History and context. ACM Trans. Interact. least 25 times. In Figure 1 we report results of evaluation Intell. Syst., 5(4):19:1–19:19, Dec. 2015. on this dataset with 5 folds. We ran these examples on a 2012’s laptop computer with an Intel i5 at 2.60GHz and 8 [3] A. Said and A. Bellogı́n. Comparative Recommender System Evaluation: Benchmarking Recommendation 3 Frameworks. RecSys, pages 129–136, 2014. Authors express their gratitude to GroupLens for allowing redistribution of the MovieLens Latest data.