=Paper= {{Paper |id=Vol-1905/recsys2017_poster23 |storemode=property |title=pyRecLab: A Software Library for Quick Prototyping of Recommender Systems |pdfUrl=https://ceur-ws.org/Vol-1905/recsys2017_poster23.pdf |volume=Vol-1905 |authors=Gabriel Sepulveda,Vicente Dominguez,Denis Parra |dblpUrl=https://dblp.org/rec/conf/recsys/SepulvedaDP17 }} ==pyRecLab: A Software Library for Quick Prototyping of Recommender Systems== https://ceur-ws.org/Vol-1905/recsys2017_poster23.pdf
            pyRecLab: A Software Library for Quick Prototyping of
                           Recommender Systems
               Gabriel Sepulveda                                 Vicente Dominguez                                            Denis Parra
      Pontificia Universidad Catolica de                    Pontificia Universidad Catolica de               Pontificia Universidad Catolica de
                     Chile                                                 Chile                                            Chile
                Santiago, Chile                                       Santiago, Chile                                  Santiago, Chile
              grsepulveda@uc.cl                                    vidominguez@uc.cl                                 dparra@ing.puc.cl

ABSTRACT
                                                                                                                         File IO
This paper introduces pyRecLab, a software library written in C++
with Python bindings which allows to quickly train, test and de-                                                      Data Handlers
                                                                                                       ( Rating Matrix, Sparse Matrix, Data Frame )
velop recommender systems. Although there are several software
libraries for this purpose, only a few let developers to get quickly                            Item     Slope      Most           User     Item      Funk
started with the most traditional methods, permitting them to try                               Avg       One      Popular         KNN      KNN       SVD

different parameters and approach several tasks without a signif-
                                                                                                                    Python Interface
icant loss of performance. Among the few libraries that have all
these features, they are available in languages such as Java, Scala
                                                                                                                              >>> import pylibrec
or C#, what is a disadvantage for less experienced programmers
                                                                                                                   Python Interpreter
more used to the popular Python programming language. In this
article we introduce details of pyRecLab, showing as well perfor-
mance analysis in terms of error metrics (MAE and RMSE) and                                     Figure 1: pyRecLab architecture.
train/test time. We benchmark it against the popular Java-based                to specific users, using well-known collaborative filtering meth-
library LibRec, showing similar results. We expect programmers                 ods such as User K-NN, Item K-NN, Slope One and FunkSVD [9].
with little experience and people interested in quickly prototyping            Some of the problems found were: (a) the lack of implementation
recommender systems to be benefited from pyRecLab.                             of certain methods in some libraries, (b) poor train/test time perfor-
                                                                               mance under medium-sized datasets (such as Rrecsys which does
KEYWORDS                                                                       not implement sparse matrices), (c) lack of functionality which is
Recommender Systems, Software Development, Recommender Li-                     typical in a recommendation setting, such us suggesting a list of
brary, Python Library                                                          items given a specific user ID, (d) difficulties to change parameters
                                                                               in certain models, and (e) students’ lack of familiarity with certain
ACM Reference format:
Gabriel Sepulveda, Vicente Dominguez, and Denis Parra. 2017. pyRecLab:         programming languages such as Java or C#. While Java is the most
A Software Library for Quick Prototyping of Recommender Systems. In            popular language based on several rankings, it is also the case that
Proceedings of RecSys 2017 Posters, Como, Italy, August 27-31, 2 pages.        Python is the most popular introductory teaching language in the
                                                                               U.S. since 2004 [5] as well as the one with largest growth in the
1    INTRODUCTION                                                              latest 5 years based on the PYPL ranking1 .
When software developers face the challenge of learning about                     For these reasons, we developed pyRecLab2 . We wrote it in C++
recommender systems (RecSys), developing a RecSys for the first                with Python bindings, in order to facilitate its adoption among new
time, or quickly prototyping a recommender to test available data,             programmers familiar with Python, but also offering an appropriate
a reasonable option to get started is using an existent software               performance when dealing with larger datasets. We implemented
library. Nowadays, it is possible to find several libraries in different       most of the foundational recommendation methods for rating pre-
programming languages, being among of the most popular ones                    diction and recommendation. Moreover, users can easily change
MyMedialite [3], LensKit [2], LibRec [4], lightfm [7] and rrecsys [1].         parameters to understand their effect and they can also produce
   While the aforementioned tools have documentation, implement                recommendations given a specific user ID.
several methods, and present most of the common functionality
required to develop and evaluate a recommendation system, all of
                                                                               2    OTHER RECOMMENDATION LIBRARIES
them miss some type of functionality or algorithm which hinder                 MyMediaLite[3]: It implements several recommendation algo-
specially newcomers. In particular, while teaching for three years a           rithms, supporting explicit and implicit feedback, as well as context-
graduate course on Recommender Systems during the Fall Semester                aware methods. It also allows evaluation with metrics such as MAE,
(2014-2016) at the Department of Computer Science at PUC Chile,                RMSE, prec@N, and nDCG [9]. Many of it functionalities are avail-
most students have found recurrent difficulties in using existent              able from command line; however, to integrate it with other soft-
tools to finish an introductory assignment. The assignment is re-              ware it is necessary to program in languages like C# or F#, which
lated to tasks such as rating prediction and item recommendation               is difficult for many newcomer Python developers. Lenskit[2]:
RecSys 2017 Poster Proceedings, August 27-31, Como, Italy                      1 http://pypl.github.io/PYPL.html
                                                                               2 Documentation and code samples at https://github.com/gasevi/pyreclab
RecSys 2017 Poster Proceedings, August 27-31, Como, Italy                                                                                                                       Sepulveda et al.

    Table 1: pyRecLab vs. LibRec on MovieLens 100K data.                                                        librec train    pyreclab train          librec test          pyreclab test
                                                                                                           90                                                                         1.4
                          MAE                    RMSE
                                                                                                           80
                                                                                                                                                                                      1.2
                   pyRecLab LibRec         pyRecLab LibRec




                                                                                Training Time [5 - 80 s]
                                                                                                           70
                                                                                                                                                                                      1
       UserAvg     0.850191  0.850191      1.062995  1.062995                                              60

       ItemAvg     0.827568  0.827568      1.033411  1.033411                                              50                                                                         0.8


       SlopeOne    0.748552  0.748299      0.952795  0.952460                                              40                                                                         0.6

                                                                                                           30
       User KNN    0.754816  0.755361      0.962355  0.966395                                                                                                                         0.4
                                                                                                           20
       Item KNN    0.749316  0.748354      0.953637  0.953433                                                                                                                         0.2
                                                                                                           10
       Funk SVD    0.732820  0.731986      0.925390  0.923978                                               0                                                                         0
                                                                                                                 200      400   600     800      1000      1200       1400     1600

A popular library which provides all basic collaborative filtering                                                              Number of latent factors
methods for predicting ratings (User/Item KNN, Slope One and
FunkSVD). It is developed in Java, which could be an entry bar-                   Figure 2: pyRecLab vs. LibRec on time performance.
rier for new programmers who are mostly familiar with Python.                  the Python/C API rather than Cython for implementation. This
LibRec[4]: Just like MyMediaLite and Lenskit, a well developed                 allows us to define low-level structures in C++ language with a
library in terms of algorithms implemented and the metrics avail-              direct mapping with objects handled by the Python interpreter.
able for evaluation. However, documentation is not as good as                  In this way, we have defined a data type for each of the recom-
Lenskit and since it is implemented in Java, it also raises the barrier        mendation algorithms, which can be instantiated directly from
for new programmers. Lightfm[7]: This library implements sev-                  the Python interpreter.
eral matrix factorization algorithms for both implicit and explicit
feedback. It also has an interface for Python, facilitating its use        4               RESULTS & CONCLUSION
to several developers. However, it does not implement basic tradi-         To check the performance of pyRecLab, we tested it against the
tional recommender algorithms (User/Item KNN, slope One), so it            popular library LibRec [4] in terms of error and train/test time.
is not advisable for introductory teaching purposes. Rrecsys[1]:              Prediction Results. MAE and RMSE results of rating prediction
This tool gets the closest to pyRecLab in terms of easy-of-use, quick      over Movielens 100K dataset are shown in Table 1. Differences are
prototyping and educational purposes. It is written in R language.         very small to LibRec, showing that pyRecLab can reproduce results
However, it has two main weaknesses: it misses some traditional            of a mature recommender library. Time Performance. Although
algorithms (like Slope One) and it is limited in terms of the amount       the results vary depending on the method, Figure 2 shows train/test
of data it can process, since it does not support sparse matrices.         performance using FunkSVD. While both libraries perform simi-
                                                                           larly in training phase, pyRecLab performs faster in testing time at
3    DESIGN AND IMPLEMENTATION                                             different number of latent factors.
Figure 1, shows the main modules of pyRecLab. At the bottom,                  Summarizing, we have introduced PyRecLab, a library for rec-
the blue block represents the Python interpreter, which loads the          ommender systems which combines the performance of C++ in
methods and data structures when importing the PyRecLab module.            its implementation with the versatility of Python for easy-of-use.
At the top, in orange, all the sub-modules of the library:                 We expect to add new algorithms (such as WRMF [6] and gSLIM
                                                                           [8]) and recommendations metrics, as well as new code samples to
• File IO. This component allows data input/output by means of
                                                                           facilitate its adoption.
  reading from text files, as well as writing output recommenda-
  tions in txt and json formats. It allows great flexibility in terms of   REFERENCES
  input file formats (csv, tsv) as well as allowing the user to specify     [1] Ludovik Çoba and Markus Zanker. 2016. rrecsys: an R-package for prototyping
  what to file columns represent.                                               recommendation algorithms. (2016).
• Data handlers. This module implements several data structures,            [2] Michael D Ekstrand, Michael Ludwig, Jack Kolb, and John T Riedl. 2011. LensKit:
                                                                                a modular recommender framework. In Proceedings of the fifth ACM conference
  which allow a homogeneous access to the ratings. It grants a good             on Recommender systems. ACM, 349–350.
  level of independence from the original format from which data            [3] Zeno Gantner, Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme.
  were read, with a high level of abstraction. These data structures            2011. MyMediaLite: A free recommender system library. In Proceedings of the
                                                                                fifth ACM conference on Recommender systems. ACM, 305–308.
  will be directly used by the recommendation algorithms for the            [4] Guibing Guo, Jie Zhang, Zhu Sun, and Neil Yorke-Smith. 2015. LibRec: A Java
  processing, storage and generation of output data.                            Library for Recommender Systems.. In UMAP Workshops.
                                                                            [5] Philip Guo. 2014. Python is now the most popular introductory teaching language
• Recommendation Algorithms. Under the Data handlers block,                     at top us universities. BLOG@ CACM, July (2014), 47.
  there are a number of contiguous blocks representing the rec-             [6] Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for im-
  ommendation algorithms. Algorithms for rating prediction and                  plicit feedback datasets. In Data Mining, 2008. ICDM’08. Eighth IEEE International
                                                                                Conference on. Ieee, 263–272.
  recommendation are: Item Average, Slope One, User KNN, Item               [7] Maciej Kula. 2015. Metadata Embeddings for User and Item Cold-start Recom-
  KNN and Funk SVD. On the other hand, Most Popular is only                     mendations. In Proceedings of the 2nd Workshop on New Trends on Content-Based
  used to generate recommendations.                                             Recommender Systems (CEUR Workshop Proceedings), Vol. 1448. 14–21.
                                                                            [8] Santiago Larraín, Denis Parra, and Alvaro Soto. 2015. Towards Improving Top-N
• Python Interface. This module represents the interface be-                    Recommendation by Generalization of SLIM.. In RecSys Posters.
  tween the recommendation algorithms and the Python inter-                 [9] Denis Parra and Shaghayegh Sahebi. 2013. Recommender systems: Sources of
                                                                                knowledge and evaluation metrics. In Advanced Techniques in Web Intelligence-2.
  preter. It was developed in C++, and since we aimed at maintain-              Springer, 149–175.
  ing an appropriate level of code readability, we decided to use