<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>pyRecLab: A Software Library for Quick Prototyping of Recommender Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Item Avg</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Slope One</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Funk SVD</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data Handlers</institution>
          ,
          <addr-line>Rating Matrix, Sparse Matrix, Data Frame</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Denis Parra Pontificia Universidad Catolica de Chile Santiago</institution>
          ,
          <country country="CL">Chile</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Gabriel Sepulveda Pontificia Universidad Catolica de Chile Santiago</institution>
          ,
          <country country="CL">Chile</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Recommender Systems, Software Development, Recommender Library</institution>
          ,
          <addr-line>Python Library</addr-line>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Vicente Dominguez Pontificia Universidad Catolica de Chile Santiago</institution>
          ,
          <country country="CL">Chile</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <abstract>
        <p>This paper introduces pyRecLab, a software library written in C++ with Python bindings which allows to quickly train, test and develop recommender systems. Although there are several software libraries for this purpose, only a few let developers to get quickly started with the most traditional methods, permitting them to try diferent parameters and approach several tasks without a significant loss of performance. Among the few libraries that have all these features, they are available in languages such as Java, Scala or C#, what is a disadvantage for less experienced programmers more used to the popular Python programming language. In this article we introduce details of pyRecLab, showing as well performance analysis in terms of error metrics (MAE and RMSE) and train/test time. We benchmark it against the popular Java-based library LibRec, showing similar results. We expect programmers with little experience and people interested in quickly prototyping recommender systems to be benefited from pyRecLab.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        When software developers face the challenge of learning about
recommender systems (RecSys), developing a RecSys for the first
time, or quickly prototyping a recommender to test available data,
a reasonable option to get started is using an existent software
library. Nowadays, it is possible to find several libraries in diferent
programming languages, being among of the most popular ones
MyMedialite [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], LensKit [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], LibRec [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], lightfm [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and rrecsys [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>While the aforementioned tools have documentation, implement
several methods, and present most of the common functionality
required to develop and evaluate a recommendation system, all of
them miss some type of functionality or algorithm which hinder
specially newcomers. In particular, while teaching for three years a
graduate course on Recommender Systems during the Fall Semester
(2014-2016) at the Department of Computer Science at PUC Chile,
most students have found recurrent dificulties in using existent
tools to finish an introductory assignment. The assignment is
related to tasks such as rating prediction and item recommendation</p>
      <p>
        &gt;&gt;&gt; import pylibrec
to specific users, using well-known collaborative filtering
methods such as User K-NN, Item K-NN, Slope One and FunkSVD [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
Some of the problems found were: (a) the lack of implementation
of certain methods in some libraries, (b) poor train/test time
performance under medium-sized datasets (such as Rrecsys which does
not implement sparse matrices), (c) lack of functionality which is
typical in a recommendation setting, such us suggesting a list of
items given a specific user ID, (d) dificulties to change parameters
in certain models, and (e) students’ lack of familiarity with certain
programming languages such as Java or C#. While Java is the most
popular language based on several rankings, it is also the case that
Python is the most popular introductory teaching language in the
U.S. since 2004 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] as well as the one with largest growth in the
latest 5 years based on the PYPL ranking1.
      </p>
      <p>For these reasons, we developed pyRecLab2. We wrote it in C++
with Python bindings, in order to facilitate its adoption among new
programmers familiar with Python, but also ofering an appropriate
performance when dealing with larger datasets. We implemented
most of the foundational recommendation methods for rating
prediction and recommendation. Moreover, users can easily change
parameters to understand their efect and they can also produce
recommendations given a specific user ID.
2</p>
      <p>
        OTHER RECOMMENDATION LIBRARIES
MyMediaLite[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]: It implements several recommendation
algorithms, supporting explicit and implicit feedback, as well as
contextaware methods. It also allows evaluation with metrics such as MAE,
RMSE, prec@N, and nDCG [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Many of it functionalities are
available from command line; however, to integrate it with other
software it is necessary to program in languages like C# or F#, which
is dificult for many newcomer Python developers. Lenskit[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]:
1http://pypl.github.io/PYPL.html
2Documentation and code samples at https://github.com/gasevi/pyreclab
Sepulveda et al.
pyreclab test
1.4
1.2
librec train
pyreclab train
librec test
A popular library which provides all basic collaborative filtering
methods for predicting ratings (User/Item KNN, Slope One and
FunkSVD). It is developed in Java, which could be an entry
barrier for new programmers who are mostly familiar with Python.
LibRec[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]: Just like MyMediaLite and Lenskit, a well developed
library in terms of algorithms implemented and the metrics
available for evaluation. However, documentation is not as good as
Lenskit and since it is implemented in Java, it also raises the barrier
for new programmers. Lightfm[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]: This library implements
several matrix factorization algorithms for both implicit and explicit
feedback. It also has an interface for Python, facilitating its use
to several developers. However, it does not implement basic
traditional recommender algorithms (User/Item KNN, slope One), so it
is not advisable for introductory teaching purposes. Rrecsys[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]:
This tool gets the closest to pyRecLab in terms of easy-of-use, quick
prototyping and educational purposes. It is written in R language.
However, it has two main weaknesses: it misses some traditional
algorithms (like Slope One) and it is limited in terms of the amount
of data it can process, since it does not support sparse matrices.
3 DESIGN AND IMPLEMENTATION
Figure 1, shows the main modules of pyRecLab. At the bottom,
the blue block represents the Python interpreter, which loads the
methods and data structures when importing the PyRecLab module.
At the top, in orange, all the sub-modules of the library:
• File IO. This component allows data input/output by means of
reading from text files, as well as writing output
recommendations in txt and json formats. It allows great flexibility in terms of
input file formats (csv, tsv) as well as allowing the user to specify
what to file columns represent.
• Data handlers. This module implements several data structures,
which allow a homogeneous access to the ratings. It grants a good
level of independence from the original format from which data
were read, with a high level of abstraction. These data structures
will be directly used by the recommendation algorithms for the
processing, storage and generation of output data.
• Recommendation Algorithms. Under the Data handlers block,
there are a number of contiguous blocks representing the
recommendation algorithms. Algorithms for rating prediction and
recommendation are: Item Average, Slope One, User KNN, Item
KNN and Funk SVD. On the other hand, Most Popular is only
used to generate recommendations.
• Python Interface. This module represents the interface
between the recommendation algorithms and the Python
interpreter. It was developed in C++, and since we aimed at
maintaining an appropriate level of code readability, we decided to use
90
80
]s 70
0
-8 60
5
[e 50
m
iT 40
g
in 30
n
i
ra 20
T
10
0
200
400
600 800 1000 1200 1400 1600
      </p>
      <p>Number of latent factors</p>
    </sec>
    <sec id="sec-2">
      <title>4 RESULTS &amp; CONCLUSION</title>
      <p>
        To check the performance of pyRecLab, we tested it against the
popular library LibRec [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] in terms of error and train/test time.
      </p>
      <p>Prediction Results. MAE and RMSE results of rating prediction
over Movielens 100K dataset are shown in Table 1. Diferences are
very small to LibRec, showing that pyRecLab can reproduce results
of a mature recommender library. Time Performance. Although
the results vary depending on the method, Figure 2 shows train/test
performance using FunkSVD. While both libraries perform
similarly in training phase, pyRecLab performs faster in testing time at
diferent number of latent factors.</p>
      <p>
        Summarizing, we have introduced PyRecLab, a library for
recommender systems which combines the performance of C++ in
its implementation with the versatility of Python for easy-of-use.
We expect to add new algorithms (such as WRMF [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and gSLIM
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]) and recommendations metrics, as well as new code samples to
facilitate its adoption.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Ludovik</given-names>
            <surname>Çoba</surname>
          </string-name>
          and
          <string-name>
            <given-names>Markus</given-names>
            <surname>Zanker</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>rrecsys: an R-package for prototyping recommendation algorithms</article-title>
          . (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Michael</surname>
            <given-names>D</given-names>
          </string-name>
          <string-name>
            <surname>Ekstrand</surname>
            ,
            <given-names>Michael</given-names>
          </string-name>
          <string-name>
            <surname>Ludwig</surname>
          </string-name>
          , Jack Kolb, and John T Riedl.
          <year>2011</year>
          .
          <article-title>LensKit: a modular recommender framework</article-title>
          .
          <source>In Proceedings of the fifth ACM conference on Recommender systems. ACM</source>
          ,
          <volume>349</volume>
          -
          <fpage>350</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Zeno</given-names>
            <surname>Gantner</surname>
          </string-name>
          , Stefen Rendle, Christoph Freudenthaler, and
          <string-name>
            <surname>Lars</surname>
          </string-name>
          Schmidt-Thieme.
          <year>2011</year>
          .
          <article-title>MyMediaLite: A free recommender system library</article-title>
          .
          <source>In Proceedings of the iffth ACM conference on Recommender systems . ACM</source>
          ,
          <volume>305</volume>
          -
          <fpage>308</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Guibing</given-names>
            <surname>Guo</surname>
          </string-name>
          , Jie Zhang, Zhu Sun, and
          <string-name>
            <surname>Neil</surname>
          </string-name>
          Yorke-Smith.
          <year>2015</year>
          .
          <article-title>LibRec: A Java Library for Recommender Systems.</article-title>
          .
          <source>In UMAP Workshops.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Philip</given-names>
            <surname>Guo</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Python is now the most popular introductory teaching language at top us universities</article-title>
          .
          <source>BLOG@ CACM</source>
          ,
          <string-name>
            <surname>July</surname>
          </string-name>
          (
          <year>2014</year>
          ),
          <fpage>47</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Yifan</given-names>
            <surname>Hu</surname>
          </string-name>
          , Yehuda Koren, and
          <string-name>
            <given-names>Chris</given-names>
            <surname>Volinsky</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Collaborative filtering for implicit feedback datasets</article-title>
          .
          <source>In Data Mining</source>
          ,
          <year>2008</year>
          . ICDM'08. Eighth IEEE International Conference on. Ieee,
          <volume>263</volume>
          -
          <fpage>272</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Maciej</given-names>
            <surname>Kula</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Metadata Embeddings for User and Item Cold-start Recommendations</article-title>
          .
          <source>In Proceedings of the 2nd Workshop on New Trends on Content-Based Recommender Systems (CEUR Workshop Proceedings)</source>
          , Vol.
          <volume>1448</volume>
          .
          <fpage>14</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Santiago</given-names>
            <surname>Larraín</surname>
          </string-name>
          , Denis Parra, and
          <string-name>
            <given-names>Alvaro</given-names>
            <surname>Soto</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Towards Improving Top-N Recommendation by Generalization of SLIM.</article-title>
          . In RecSys Posters.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Denis</given-names>
            <surname>Parra</surname>
          </string-name>
          and
          <string-name>
            <given-names>Shaghayegh</given-names>
            <surname>Sahebi</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Recommender systems: Sources of knowledge and evaluation metrics</article-title>
          .
          <source>In Advanced Techniques in Web Intelligence-2</source>
          . Springer,
          <fpage>149</fpage>
          -
          <lpage>175</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>