<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Algorithm selection with librec-auto</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Masoud Mansoury</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robin Burke</string-name>
          <email>robin.burke@colorado.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DePaul University</institution>
          ,
          <addr-line>Chicago IL</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Colorado</institution>
          ,
          <addr-line>Boulder, Boulder CO</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Due to the complexity of recommendation algorithms, experimentation on recommender systems has become a challenging task. Current recommendation algorithms, while powerful, involve large numbers of hyperparameters. Tuning hyperparameters for nding the best recommendation outcome often requires execution of large numbers of algorithmic experiments particularly when multiples evaluation metrics are considered. Existing recommender systems platforms fail to provide a basis for systematic experimentation of this type. In this paper, we describe librec-auto, a wrapper for the well-known LibRec library, which provides an environment that supports automated experimentation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>A recommender system aims to predict users' preferences and suggests desirable
items to them. Due to their power and e ectiveness in generating personalized
recommendations and consequently increasing companies' pro t, such systems
have become an essential tool in electronic commerce systems. There are a wide
variety of real world examples like video recommendation in Youtube, item
recommendation in Amazon, and music recommendation in Pandora. In all these
applications, speci c recommendation algorithms are employed for generating
recommendations to users.</p>
      <p>Research on recommender systems has expanded to a wide variety of topics
as new problems emerged. This expansion introduced new evaluation criteria for
measuring the performance of recommender systems and made recommendation
algorithms, while accurate, very complicated, computationally expensive, and
sensitive to parameter values. Understanding the nature of these algorithms
requires extensive experiments over combination sets of parameters, multiple
data sets, and di erent metrics.</p>
      <p>
        Depending on the characteristics of the data set and evaluation criteria,
speci c recommendation algorithm will be needed and assignments for its
parameters may vary [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Some algorithms may work well on dense data sets, some may
work well on binary data sets, and some may result in better recommendations
on contextual data sets. Thus, performing extensive experiments is required to
nd the best algorithm. Also, evaluation criteria are another factor in
experimentation. In addition to accuracy, which is the most important metric for achieving
personalization, non-accuracy measures such as diversity, novelty, fairness, and
coverage have become recognized as important measures of recommendation
effectiveness. Since some of these metrics are in con ict with each other, extensive
experiments are required to achieve a trade-o between them.
      </p>
      <p>
        Hyperparameters play an important role on the performance of many
recommendation algorithms, requiring careful tuning. For example, the integrated
neighborhood and singular value decomposition model3 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] involves seven
hyperparameters (i.e., bias regularization, user bias regularization, item bias
regularization, implicit feedback bias regularization, learning rate, maximum number of
iterations, and number of factors). Simple calculation shows that, for instance,
exploring 5 possible values for each parameter on a single data set will entail
running almost 80,000 di erent experiments. Manual con guration of so many
experiments would be tedious and error-prone, then motivating the design of a
system that can signi cantly reduce the con guration and setup time for each
experiment.
      </p>
      <p>Moreover, reproducibility of previous experiments is key for conducting
further analysis in the future. Doing this, it is necessary to save all elements of
each experiment and retrieve them in the future. This is very useful particularly
when analyzing recommendation results with di erent metrics is needed.</p>
      <p>
        However, while a number of recommendation platforms have been developed
for recommender systems researchers: LibRec [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Surprise [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], MyMediaLite [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
LensKit [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Mahout [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], pyRecLab [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], LKPy [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and others, none of these
tools supports automated experimentation. LibRec is well-known for its large
library of implemented recommendation algorithms (more than 70 as of this
writing), its attention to e ciency, and its ability to evaluate algorithms relative
to a variety of tasks and metrics. LibRec works well for single-shot algorithm
and data set evaluation, but it is not well-suited for reproducible, automated,
large-scale experimentation.
      </p>
      <p>
        We have implemented librec-auto, a Python-based wrapper that
encapsulates LibRec and supports the automation of repetitive experiments. In [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], we
presented a beta version of librec-auto along with a demo. In this paper, we
present the functionalities of librec-auto in detail and introduce several new
features implemented in the most recent version. We also comprehensively
compare librec-auto with well-known existing recommendation tools and discuss
its advantages over other tools.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Some existing recommendation tools</title>
      <p>In this section, we introduce some well-known recommendation tools for
recommender systems research and discuss their main features along with their
advantages and disadvantages.
3 SVD++ in Librec.</p>
      <sec id="sec-2-1">
        <title>Librec</title>
        <p>
          LibRec 2.0 [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] is a mature, exible, open-source Java-based platform for
recommender systems experimentation.4. It supports a large variety of
recommendation algorithms and a large library of evaluation metrics, supported by an active
community of developers and maintainers.
        </p>
        <p>However, LibRec does not naturally support large-scale experimentation.
Similar to other recommender systems platforms, LibRec is implemented as
a chained series of operations: splitting testing and training data, training an
algorithm, and executing the algorithm and evaluating results.</p>
        <p>This monolithic design creates several problems for researchers. One is that
intermediate operations are not saved and repeating similar experiments entails
redundant computation. It is also the case that training/testing splits are not
saved when LibRec computes them. Running with the same random seed
guarantees the same split will be computed, but because there is no explicit storage
of the training data, it is di cult for a researcher to perform detailed analysis
and debugging where knowledge of the speci c input data of each fold would be
helpful.</p>
        <p>Finally, LibRec uses a simple key-value con guration system based on the
Java Properties class. The at structure requires a multi-part key-naming scheme,
which places signi cant cognitive load on the experimenter to remember the
meaning of each key and the appropriate set of keys required for each algorithm.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>MyMediaLite</title>
        <p>
          MyMediaLite 3.11 [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] is an open-source C# platform5 for recommender systems
experimentation6. It supports both ranking and rating prediction tasks and
provides a wide range of recommendation algorithms for running experiments. It
also has useful functionalities like grid search for optimizing hyperparameters
and model saving .
        </p>
        <p>Even though MyMediaLite saves computed models, it does not save the
intermediate les and outputs. Also, it does not have a con guration le for specifying
the hyperparameters for its recommendation algorithms.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Other recommendation tools</title>
        <p>
          There are a wide range of other platforms for experimentations on recommender
systems: Mahout [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], Surprise7 [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], Lenskit 8 [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], pyRecLab9 [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], LKPy10 [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]
and others.
4 Source code available at github.com/guoguibing/librec
5 It has Perl code as well.
6 http://www.mymedialite.net/index.html
7 Source code available at https://github.com/NicolasHug/Surprise
8 Source code available at https://github.com/lenskit/lenskit
9 Source code available at https://github.com/gasevi/pyreclab
10 Source code available at https://github.com/lenskit/lkpy
        </p>
        <p>These platforms aim to provide a comprehensive tools for practitioners and
researchers for experimenting and developing recommendation algorithms, but
none provide tools for large-scale automated experimentation with a large library
of state-of-the-art algorithms.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Librec-auto</title>
      <p>librec-auto is a wrapper built around the core LibRec functionality to enhance
the platform's ability to support reproducible large-scale experiments. The tool is
currently under development with additional features planned. The core LibRec
library is used unmodi ed, with some additional Java functionality implemented
in a separate set of Java classes.</p>
      <p>Figure 1 shows the operation of librec-auto and its relationship to the
core LibRec system. New elements of the system are indicated in bold font.
As shown, the system reads an XML con guration le, which is converted into
the properties le (or les) required by LibRec. The data from each training
/ split is saved to that future experiments can share a common experimental
con guration. Unlike in core LibRec, recommendation generation and evaluation
are separate steps, so that previously-generated recommendation results can be
reused where possible.</p>
      <p>Another key element of the librec-auto implementation is the ability
(described below) to iterate over di erent parameter values in one scripted
experiment. The system performs all of the necessary bookkeeping and allows the user
to write scripts that summarize over all of experiments, including the creation
of visualizations.
3.1</p>
      <sec id="sec-3-1">
        <title>Flexible con guration</title>
        <p>LibRec uses a simple unstructured key-value properties le to de ne the
parameters related to an experimental con guration. The le format does not capture
the relationships between parameters and it can be di cult to determine, for
example, what parameters are appropriate for which algorithms. librec-auto uses
an XML-based con guration le, with greater self-descriptive power, and the
ability to perform error and consistency checking. The properties les required
by LibRec are produced and managed by librec-auto, making the process less
tedious, especially when multiple, similar, experiments are being performed.</p>
        <p>Figure 2 shows a fragment of one such XML con guration le, showing how
the element labels and embedding structure clarify the experimenter's intent.
This part of the le de nes an algorithm to be tested, in this case ItemKNN,
with its con guration settings { the number of neighbors to be used and the
similarity metric. The results are to be computed in recommendation lists of
size 10 and evaluated by the nDCG measure. Note that multiple values are
listed for neighborhood-size. The system will therefore run an experiment for
each given value.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Installation</title>
        <p>As usual for Python programs, librec-auto provides pip style installation. It
makes the installation process easy and provides access to code everywhere for
running experiments.</p>
        <p>On a machine equipped with Python and Java and with Librec and librec-auto
installed, the next step is to create a con guration le. The con guration
speci es the data to be operated on, the type of evaluation to be performed, the
algorithms to be run, and other operations.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Automation</title>
        <p>
          As the example shows, one of the important capabilities of librec-auto is its
ability to automate multiple experiments across algorithmic hyperparameters,
supporting tuning and sensitivity analysis. This type of functionality will be
familiar to users of scikit-learn's GridSearchCV model selection tool [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>For each combination of parameters, the system creates a separate
subdirectory with its own con guration and output les. A separate instance of LibRec
runs on each con guration, allowing for parallel operation.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Multi-threading</title>
        <p>Some existing tools (for example, Librec and LKPy) allow parallel operation of
individual experiments by computing folds of a cross-fold validation experiment
separately. This is su cient when single experiments are performed. librec-auto,
on the other hand, has the capability of running multiple experiments in parallel.
This allows for parallelization e ciencies for a wider range of experiment types.
3.5</p>
      </sec>
      <sec id="sec-3-5">
        <title>Intermediate results</title>
        <p>Because recommendation algorithms can be computationally expensive at scale,
librec-auto is designed to help minimize wasted computational e ort. As much
as possible, the intermediate results are stored, including training/test splits and
algorithm outputs, as noted above.
&lt; alg &gt;
&lt; class &gt; ItemKNNRecommender &lt;/ class &gt;
&lt; similarity &gt; cos &lt;/ similarity &gt;
&lt; neighborhood - size &gt;</p>
        <p>&lt; value &gt;10 &lt;/ value &gt; &lt; value &gt;20 &lt;/ value &gt;
&lt;/ neighborhood - size &gt;
&lt;/ alg &gt;
&lt; eval &gt;
&lt; metric &gt; NDCG &lt;/ metric &gt;
&lt; list - size &gt;10 &lt;/ list - size &gt;
&lt;/ eval &gt;</p>
        <p>Because intermediate outputs are recorded, it is possible in librec-auto
to execute multiple evaluation metrics on a single set of experimental results.
LibRec allows the computation of multiple metrics when the experiment is being
run, but applying a di erent metric after the fact requires starting over, with
potentially signi cant computational cost. Note that error-oriented and
rankingoriented experiments are not compatible: the results computed for a ranking
metric are recommendation lists and can be evaluated with ranking metrics
such as precision. librec-auto is aware of this distinction and can warn the
user if there is an attempt to apply an metric incompatible with the computed
results.
The multiplicity of experiments that librec-auto can perform can yield a large
amount of data that researchers must collate and examine. It would be tedious
to have to pore over LibRec's log les to attempt to discover the di erences
between di erent parameter settings.</p>
        <p>To make such interpretative tasks more e cient, librec-auto allows for a
nal summarization phase to follow the completion of a set of experiments. This
is e ectively one or more scripts that the experimenter can write to process the
log le data into a useful format. Using plotting libraries such as matplotlib, it
is possible to create and save visualizations that the researcher can quickly scan.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this paper and our associated demo, we presented librec-auto, a tool for
automating recommender systems experimentation using the LibRec 2.0 platform.
Our tool retains the bene ts of working with LibRec while making large-scale
experimentation easier, more e cient, and more reproducible. Some of
important features of librec-auto presented in this paper are: exible con guration,
installation, automation, multi-threading, intermediate results, and
summarization.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Adomavicius</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , Zhang, J.:
          <article-title>Impact of data characteristics on recommender systems performance</article-title>
          .
          <source>ACM Transactions on Management Information Systems (TMIS) 3</source>
          (
          <issue>3</issue>
          ) (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Buitinck</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Louppe</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mueller</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Niculae</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grobler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Layton</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , VanderPlas, J.,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holt</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
          </string-name>
          , G.:
          <article-title>API design for machine learning software: experiences from the scikit-learn project</article-title>
          .
          <source>In: ECML PKDD Workshop: Languages for Data Mining and Machine Learning</source>
          . pp.
          <volume>108</volume>
          {
          <issue>122</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Ekstrand</surname>
          </string-name>
          , M.D.:
          <article-title>The lkpy package for recommender systems experiments: Nextgeneration tools and lessons learned from the lenskit project</article-title>
          .
          <source>arXiv preprint arXiv:1809</source>
          .
          <volume>03125</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ekstrand</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ludwig</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolb</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riedl</surname>
            ,
            <given-names>J.T.</given-names>
          </string-name>
          :
          <article-title>Lenskit: a modular recommender framework</article-title>
          .
          <source>In: Proceedings of the fth ACM conference on Recommender systems</source>
          . pp.
          <volume>349</volume>
          {
          <issue>350</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ekstrand</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ludwig</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolb</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riedl</surname>
            ,
            <given-names>J.T.</given-names>
          </string-name>
          :
          <article-title>Lenskit: a modular recommender framework</article-title>
          .
          <source>In: RecSys '11 Proceedings of the fth ACM conference on Recommender systems</source>
          . pp.
          <volume>349</volume>
          {
          <issue>350</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Gantner</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rendle</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freudenthaler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt-Thieme</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Mymedialite: a free recommender system library</article-title>
          .
          <source>In: Proceedings of the fth ACM conference on Recommender systems</source>
          . pp.
          <volume>305</volume>
          {
          <issue>308</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , Zhang, J.,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yorke-Smith</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          :
          <article-title>Librec: A java library for recommender systems</article-title>
          .
          <source>In: UMAP Workshops</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hug</surname>
          </string-name>
          , N.:
          <article-title>Surprise, a python library for recommender systems</article-title>
          . In: URL: http://surpriselib.com (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Koren</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Factorization meets the neighborhood: a multifaceted collaborative ltering model</article-title>
          . In:
          <article-title>Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining</article-title>
          . pp.
          <volume>426</volume>
          {
          <fpage>434</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Mansoury</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burke</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ordonez-Gauger</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sepulveda</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Automating recommender systems experimentation with librec-auto</article-title>
          .
          <source>In: Proceedings of the 12th ACM Conference on Recommender Systems</source>
          . pp.
          <volume>500</volume>
          {
          <issue>501</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Seminario</surname>
            ,
            <given-names>C.E.</given-names>
          </string-name>
          , Wilson,
          <string-name>
            <surname>D.C.</surname>
          </string-name>
          :
          <article-title>Case study evaluation of mahout as a recommender platform</article-title>
          .
          <source>In: RUE@ RecSys</source>
          . pp.
          <volume>45</volume>
          {
          <issue>50</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Sepulveda</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dominguez</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parra</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>: pyreclab: A software library for quick prototyping of recommender systems</article-title>
          .
          <source>In: ACM RecSys Poster</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>