=Paper=
{{Paper
|id=Vol-1905/recsys2017_poster23
|storemode=property
|title=pyRecLab: A Software Library for Quick Prototyping of Recommender Systems
|pdfUrl=https://ceur-ws.org/Vol-1905/recsys2017_poster23.pdf
|volume=Vol-1905
|authors=Gabriel Sepulveda,Vicente Dominguez,Denis Parra
|dblpUrl=https://dblp.org/rec/conf/recsys/SepulvedaDP17
}}
==pyRecLab: A Software Library for Quick Prototyping of Recommender Systems==
pyRecLab: A Software Library for Quick Prototyping of Recommender Systems Gabriel Sepulveda Vicente Dominguez Denis Parra Pontificia Universidad Catolica de Pontificia Universidad Catolica de Pontificia Universidad Catolica de Chile Chile Chile Santiago, Chile Santiago, Chile Santiago, Chile grsepulveda@uc.cl vidominguez@uc.cl dparra@ing.puc.cl ABSTRACT File IO This paper introduces pyRecLab, a software library written in C++ with Python bindings which allows to quickly train, test and de- Data Handlers ( Rating Matrix, Sparse Matrix, Data Frame ) velop recommender systems. Although there are several software libraries for this purpose, only a few let developers to get quickly Item Slope Most User Item Funk started with the most traditional methods, permitting them to try Avg One Popular KNN KNN SVD different parameters and approach several tasks without a signif- Python Interface icant loss of performance. Among the few libraries that have all these features, they are available in languages such as Java, Scala >>> import pylibrec or C#, what is a disadvantage for less experienced programmers Python Interpreter more used to the popular Python programming language. In this article we introduce details of pyRecLab, showing as well perfor- mance analysis in terms of error metrics (MAE and RMSE) and Figure 1: pyRecLab architecture. train/test time. We benchmark it against the popular Java-based to specific users, using well-known collaborative filtering meth- library LibRec, showing similar results. We expect programmers ods such as User K-NN, Item K-NN, Slope One and FunkSVD [9]. with little experience and people interested in quickly prototyping Some of the problems found were: (a) the lack of implementation recommender systems to be benefited from pyRecLab. of certain methods in some libraries, (b) poor train/test time perfor- mance under medium-sized datasets (such as Rrecsys which does KEYWORDS not implement sparse matrices), (c) lack of functionality which is Recommender Systems, Software Development, Recommender Li- typical in a recommendation setting, such us suggesting a list of brary, Python Library items given a specific user ID, (d) difficulties to change parameters in certain models, and (e) students’ lack of familiarity with certain ACM Reference format: Gabriel Sepulveda, Vicente Dominguez, and Denis Parra. 2017. pyRecLab: programming languages such as Java or C#. While Java is the most A Software Library for Quick Prototyping of Recommender Systems. In popular language based on several rankings, it is also the case that Proceedings of RecSys 2017 Posters, Como, Italy, August 27-31, 2 pages. Python is the most popular introductory teaching language in the U.S. since 2004 [5] as well as the one with largest growth in the 1 INTRODUCTION latest 5 years based on the PYPL ranking1 . When software developers face the challenge of learning about For these reasons, we developed pyRecLab2 . We wrote it in C++ recommender systems (RecSys), developing a RecSys for the first with Python bindings, in order to facilitate its adoption among new time, or quickly prototyping a recommender to test available data, programmers familiar with Python, but also offering an appropriate a reasonable option to get started is using an existent software performance when dealing with larger datasets. We implemented library. Nowadays, it is possible to find several libraries in different most of the foundational recommendation methods for rating pre- programming languages, being among of the most popular ones diction and recommendation. Moreover, users can easily change MyMedialite [3], LensKit [2], LibRec [4], lightfm [7] and rrecsys [1]. parameters to understand their effect and they can also produce While the aforementioned tools have documentation, implement recommendations given a specific user ID. several methods, and present most of the common functionality required to develop and evaluate a recommendation system, all of 2 OTHER RECOMMENDATION LIBRARIES them miss some type of functionality or algorithm which hinder MyMediaLite[3]: It implements several recommendation algo- specially newcomers. In particular, while teaching for three years a rithms, supporting explicit and implicit feedback, as well as context- graduate course on Recommender Systems during the Fall Semester aware methods. It also allows evaluation with metrics such as MAE, (2014-2016) at the Department of Computer Science at PUC Chile, RMSE, prec@N, and nDCG [9]. Many of it functionalities are avail- most students have found recurrent difficulties in using existent able from command line; however, to integrate it with other soft- tools to finish an introductory assignment. The assignment is re- ware it is necessary to program in languages like C# or F#, which lated to tasks such as rating prediction and item recommendation is difficult for many newcomer Python developers. Lenskit[2]: RecSys 2017 Poster Proceedings, August 27-31, Como, Italy 1 http://pypl.github.io/PYPL.html 2 Documentation and code samples at https://github.com/gasevi/pyreclab RecSys 2017 Poster Proceedings, August 27-31, Como, Italy Sepulveda et al. Table 1: pyRecLab vs. LibRec on MovieLens 100K data. librec train pyreclab train librec test pyreclab test 90 1.4 MAE RMSE 80 1.2 pyRecLab LibRec pyRecLab LibRec Training Time [5 - 80 s] 70 1 UserAvg 0.850191 0.850191 1.062995 1.062995 60 ItemAvg 0.827568 0.827568 1.033411 1.033411 50 0.8 SlopeOne 0.748552 0.748299 0.952795 0.952460 40 0.6 30 User KNN 0.754816 0.755361 0.962355 0.966395 0.4 20 Item KNN 0.749316 0.748354 0.953637 0.953433 0.2 10 Funk SVD 0.732820 0.731986 0.925390 0.923978 0 0 200 400 600 800 1000 1200 1400 1600 A popular library which provides all basic collaborative filtering Number of latent factors methods for predicting ratings (User/Item KNN, Slope One and FunkSVD). It is developed in Java, which could be an entry bar- Figure 2: pyRecLab vs. LibRec on time performance. rier for new programmers who are mostly familiar with Python. the Python/C API rather than Cython for implementation. This LibRec[4]: Just like MyMediaLite and Lenskit, a well developed allows us to define low-level structures in C++ language with a library in terms of algorithms implemented and the metrics avail- direct mapping with objects handled by the Python interpreter. able for evaluation. However, documentation is not as good as In this way, we have defined a data type for each of the recom- Lenskit and since it is implemented in Java, it also raises the barrier mendation algorithms, which can be instantiated directly from for new programmers. Lightfm[7]: This library implements sev- the Python interpreter. eral matrix factorization algorithms for both implicit and explicit feedback. It also has an interface for Python, facilitating its use 4 RESULTS & CONCLUSION to several developers. However, it does not implement basic tradi- To check the performance of pyRecLab, we tested it against the tional recommender algorithms (User/Item KNN, slope One), so it popular library LibRec [4] in terms of error and train/test time. is not advisable for introductory teaching purposes. Rrecsys[1]: Prediction Results. MAE and RMSE results of rating prediction This tool gets the closest to pyRecLab in terms of easy-of-use, quick over Movielens 100K dataset are shown in Table 1. Differences are prototyping and educational purposes. It is written in R language. very small to LibRec, showing that pyRecLab can reproduce results However, it has two main weaknesses: it misses some traditional of a mature recommender library. Time Performance. Although algorithms (like Slope One) and it is limited in terms of the amount the results vary depending on the method, Figure 2 shows train/test of data it can process, since it does not support sparse matrices. performance using FunkSVD. While both libraries perform simi- larly in training phase, pyRecLab performs faster in testing time at 3 DESIGN AND IMPLEMENTATION different number of latent factors. Figure 1, shows the main modules of pyRecLab. At the bottom, Summarizing, we have introduced PyRecLab, a library for rec- the blue block represents the Python interpreter, which loads the ommender systems which combines the performance of C++ in methods and data structures when importing the PyRecLab module. its implementation with the versatility of Python for easy-of-use. At the top, in orange, all the sub-modules of the library: We expect to add new algorithms (such as WRMF [6] and gSLIM [8]) and recommendations metrics, as well as new code samples to • File IO. This component allows data input/output by means of facilitate its adoption. reading from text files, as well as writing output recommenda- tions in txt and json formats. It allows great flexibility in terms of REFERENCES input file formats (csv, tsv) as well as allowing the user to specify [1] Ludovik Çoba and Markus Zanker. 2016. rrecsys: an R-package for prototyping what to file columns represent. recommendation algorithms. (2016). • Data handlers. This module implements several data structures, [2] Michael D Ekstrand, Michael Ludwig, Jack Kolb, and John T Riedl. 2011. LensKit: a modular recommender framework. In Proceedings of the fifth ACM conference which allow a homogeneous access to the ratings. It grants a good on Recommender systems. ACM, 349–350. level of independence from the original format from which data [3] Zeno Gantner, Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. were read, with a high level of abstraction. These data structures 2011. MyMediaLite: A free recommender system library. In Proceedings of the fifth ACM conference on Recommender systems. ACM, 305–308. will be directly used by the recommendation algorithms for the [4] Guibing Guo, Jie Zhang, Zhu Sun, and Neil Yorke-Smith. 2015. LibRec: A Java processing, storage and generation of output data. Library for Recommender Systems.. In UMAP Workshops. [5] Philip Guo. 2014. Python is now the most popular introductory teaching language • Recommendation Algorithms. Under the Data handlers block, at top us universities. BLOG@ CACM, July (2014), 47. there are a number of contiguous blocks representing the rec- [6] Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for im- ommendation algorithms. Algorithms for rating prediction and plicit feedback datasets. In Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on. Ieee, 263–272. recommendation are: Item Average, Slope One, User KNN, Item [7] Maciej Kula. 2015. Metadata Embeddings for User and Item Cold-start Recom- KNN and Funk SVD. On the other hand, Most Popular is only mendations. In Proceedings of the 2nd Workshop on New Trends on Content-Based used to generate recommendations. Recommender Systems (CEUR Workshop Proceedings), Vol. 1448. 14–21. [8] Santiago Larraín, Denis Parra, and Alvaro Soto. 2015. Towards Improving Top-N • Python Interface. This module represents the interface be- Recommendation by Generalization of SLIM.. In RecSys Posters. tween the recommendation algorithms and the Python inter- [9] Denis Parra and Shaghayegh Sahebi. 2013. Recommender systems: Sources of knowledge and evaluation metrics. In Advanced Techniques in Web Intelligence-2. preter. It was developed in C++, and since we aimed at maintain- Springer, 149–175. ing an appropriate level of code readability, we decided to use