=Paper=
{{Paper
|id=Vol-3228/paper2
|storemode=property
|title=CaMeLS: Cooperative Meta-Learning Service for Recommender Systems
|pdfUrl=https://ceur-ws.org/Vol-3228/paper2.pdf
|volume=Vol-3228
|authors=Lukas Wegmeth,Joeran Beel
|dblpUrl=https://dblp.org/rec/conf/recsys/WegmethB22
}}
==CaMeLS: Cooperative Meta-Learning Service for Recommender Systems==
<pdf width="1500px">https://ceur-ws.org/Vol-3228/paper2.pdf</pdf>
<pre>
CaMeLS: Cooperative Meta-Learning Service for
Recommender Systems
Lukas Wegmeth1 , Joeran Beel1
1
    Intelligent Systems Group, University of Siegen, Adolf-Reichwein-Straße 2, 57076 Siegen, Germany


                                         Abstract
                                         We present CaMeLS, a proof of concept of a cooperative meta-learning service for recommender systems.
                                         CaMeLS leverages the computing power of recommender systems users by uploading their metadata
                                         and algorithm evaluation scores to a centralized environment. Through the resulting database, CaMeLS
                                         then offers meta-learning services for everyone. Additionally, users may access evaluations of common
                                         data sets immediately to know the best-performing algorithms for those data sets. The metadata table
                                         may also be used for other purposes, e.g., to perform benchmarks. In the initial version discussed in this
                                         paper, CaMeLS implements automatic algorithm selection through meta-learning over two recommender
                                         systems libraries. Automatic algorithm selection saves users time and computing power and does not
                                         require expertise, as the best algorithm is automatically found over multiple libraries. The CaMeLS
                                         database contains 20 metadata sets by default. We show that the automatic algorithm selection service is
                                         already on par with the single best algorithm in this default scenario. CaMeLS only requires a few seconds
                                         to predict a suitable algorithm, rather than potentially hours or days if performed manually, depending
                                         on the data set. The code is publicly available on our GitHub https://camels.recommender-systems.com.

                                         Keywords
                                         recommender systems, benchmark, model selection, algorithm selection, meta-learning, automated
                                         machine learning


1. Introduction
Model selection is an essential technique with many advantages for machine learning and, by
extension, recommender systems (RecSys). It can reduce the required time to build a meaningful
predictor, reduce user expertise requirements and increase prediction performance. Standard
methods for model selection, like random search, require expertise from the user to set up the
search space and require additional time and processing power as many rounds of validation
need to be completed. Additionally, depending on the task, these techniques may be inefficient
or yield sub-optimal results [1]. With the rise of automated machine learning (AutoML), new
methods like Bayesian hyperparameter optimization [2] were adopted, improving results but
still having similar requirements [3]. When there are multiple algorithms to choose, these
requirements get even more complicated.
   It is not yet possible to quickly and easily achieve results similar to state-of-the-art hyperpa-
rameter optimization techniques by other means. However, it is possible to warm-start this
Perspectives on the Evaluation of Recommender Systems Workshop (PERSPECTIVES 2022), September 22nd, 2022,
co-located with the 16th ACM Conference on Recommender Systems, Seattle, WA, USA.
Envelope-Open lukas.wegmeth@uni-siegen.de (L. Wegmeth); joeran.beel@uni-siegen.de (J. Beel)
Orcid 0000-0001-8848-9434 (L. Wegmeth); 0000-0002-4537-5573 (J. Beel)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
optimization through meta-learning [4]. Some state-of-the-art AutoML tools employ meta-
learning for this and many other purposes due to its power to transfer knowledge from one
task to another [5, 6]. Meta-learning and AutoML only recently surfaced in RecSys, but have
already proved to be valuable in different model selection tasks [7, 8, 9, 10, 11].
   Meta-learning is an application of machine learning. It is bound to the same constraints, e.g.,
the requirement for a sufficient amount of data. AutoML tools like Auto-sklearn [4] manually
gather such data and craft a diverse meta-learner specific to the task at hand. However, these
tools are not directly applicable to RecSys because the data and tasks differ drastically from
regular machine learning, e.g., in sparsity.
   Due to the success of meta-learning for general machine learning, we hypothesize that, if
sufficient data were present in a RecSys context, these techniques would also excel in the RecSys
domain. Of course, acquiring vast amounts of diverse data is challenging. There is no abundance
of public data sets for RecSys, and the list only grows slowly1 . Additionally, the quality of
available data sets may not be sufficient. Furthermore, private data sets may differ from public
ones, making generalizing a meta-learner to these even harder.
   To circumvent the problems mentioned above, we present the concept of a cooperative meta-
learning service for RecSys and a proof of concept that we call CaMeLS. Our group introduced
the idea for use with general machine learning and implemented a proof of concept in a previous
publication. It shows that the system provides an immense advantage in terms of time and
computing power [12]. With CaMeLS, we extend on this, specifically for RecSys, and provide
the first meta-learning performance evaluation. CaMeLS is an automatic algorithm selection
service, which solves a sub-task of model selection. Our evaluation of CaMeLS shows that it is
on par with the single best algorithm directly out of the box. Additionally, getting the correct or
suitable algorithm with CaMeLS merely takes a few seconds, compared to potentially multiple
hours if done manually.
   Conceptually, we envision an environment where users share relevant information about
RecSys data sets without the need to share sensitive, personal, and private data. Therefore,
CaMeLS collects only the metadata of input data sets and their performance metrics on a set of
algorithms. Users perform the training and evaluation on their machine, and CaMeLS collects
the results in a centralized environment open to anyone to read and write. This procedure
leverages the computing power of contributing users by making results available to everyone.
This cooperative effort converges into a collection of metadata sets and performance values
from which a powerful meta-learner can be built and continuously extended, fully accessible by
the community. Such a collection has many additional benefits. It saves time for everyone since
users can retrieve stored results instead of repeatedly computing evaluations for common tasks.
It also serves as a tabular metadata set that users can retrieve to develop and perform thorough
benchmarks on.


2. CaMeLS
While there are multiple ways to realize the introduced concept, we present a prototypical
implementation of CaMeLS as proof of concept. We implemented CaMeLS as a traditional
1
    https://cseweb.ucsd.edu/~jmcauley/datasets.html
client-server model with an open API. CaMeLS stores the metadata of RecSys data sets and their
evaluation scores on RecSys algorithms. It immediately returns the best algorithm if evaluations
for the input data set are already stored. Otherwise, predicting the best algorithm for unseen
data only takes a few seconds. Using CaMeLS is easy: uploading evaluations and predicting
algorithm performance need a single function call.
                                                                                                                        Meta-learner
                                                           manages
                           dictates standards                                                                                      enables
             Server                             Database
                                                                                                                   Model selection service


                                                                                     Verification


                                         Evaluation procedures


                                                                               Metadata acquisition


             Client

                                                                 RecSys Library 1                                                            RecSys Library 1
                      RecSys Library 1     RecSys Library 2      RecSys Library 2               RecSys Library 1   RecSys Library 2          RecSys Library 2


                       Donating User        Donating User            Donating User              Consuming User      Consuming User           Consuming User
                                            Submitting data                                                           Using the service


Figure 1: A diagram of the workflow for the proof of concept. Fundamentally, there are two groups of
users which are donors and consumers. They share a standardized pipeline whose settings are dictated
by the database setup.


   The workflow in CaMeLS depends on whether the user is submitting evaluation data or using
the service. It is shown in Figure 1. We call these users donors and consumers, respectively. If
users volunteer as donors, their input data set passes through a server-dictated preprocessing
to standardize the metadata extraction process. Should the data hash already be present on the
server, the user decides if they repeat the evaluation. Next, the metadata is calculated and stored
on the server. The user then trains the selected algorithms on their input data and evaluates
them. The splitting of the data set into a train and test set and the evaluation of predictions
from the trained models are standardized through shared, configurable functions. Standardized
metric computations ensure that the performance of each algorithm, no matter from which
library, is computed equally, which makes performance scores of different libraries comparable.
Finally, the user uploads the evaluation scores to the server. The server finally verifies the
upload and stores the values.
   If a user wishes to consume the service, they must pass their input data set through the same
standardized preprocessing steps that the donors originally went through. So, the metadata
is calculated and uploaded to the CaMeLS server. If the server knows the data by its hash, it
returns a recommended algorithm for the input data. Otherwise, a trained meta-learner for the
targeted user setup has to exist on the server to continue. If one does not exist, the management
policy decides if it is trained on-demand or scheduled by a controlling administrator. The
user chooses the meta-learner that then predicts the algorithm performance for the previously
unknown input metadata and returns the predicted performance of all algorithms to the client.
The client may then, for example, automatically construct a model with default parameters
based on the algorithm with the highest predicted performance.
   The meta-learner is a multi-label regressor that predicts the performance score of the metadata
set on each algorithm. By default, CaMeLS uses random forest regression by scikit-learn [13]
for the meta-learner. Alternatively, CaMeLS also allows the integration of other meta-learners.
The meta-learner learns the relationship between the complexity of data sets and algorithm
performances. The metadata set contains one training instance for each available data set. The
features of the metadata set correspond to 17 complexity measures listed in Appendix A. The
ground truth of the metadata set corresponds to the evaluation scores of one metric for each
algorithm. As a result, there are separate meta-learners and metadata sets for every available
metric. Consequently, a meta-learner predicts the evaluation scores of the associated metric for
each algorithm on unseen data.
   For now, CaMeLS supports algorithms from the RecSys libraries Lenskit [14] and Surprise
[15]. Appendix C lists all algorithms with their official descriptions. In addition, it uses the
generalized Movielens 2 data loading routines provided by Lenskit and extends these with more
routines for some common data sets and data set families like Amazon 3 . Appendix B contains
an organized list of the data sets. As a result, CaMeLS supports more than 30 data sets right
away. The implemented evaluation metrics are the runtime, normalized mean absolute error
(NMAE), mean absolute error (MAE), normalized root mean squared error (NRMSE), and root
means squared error (RMSE). Currently, CaMeLS supports the task of predicting explicit ratings
with the option to extend to implicit ranking prediction and any other user-defined tasks. The
CaMeLS database has a simple and extensible structure to store more complex relations in the
future, e.g., a more complex metadata system presented by Amazon [16].


3. Evaluation
To evaluate CaMeLS, we simulated a client donating metadata and evaluation scores of 20 of the
supported data sets. The evaluated data sets and additional information about the metadata and
algorithms is listed in the Appendix. We collected the performance metrics for the 16 supported
algorithms. Due to resource constraints, we collected the data with holdout validation and
configured data pruning where each user in each data set has to have at least five and at most
1000 ratings. With five performance metrics, this yields a total of 1600 evaluations. Since
each meta-learner trains on one metric at a time and because we treat the model selection
problem as a multi-label regression problem, there is only one instance per metadata set for the
meta-learner training. Hence, the meta-learner for each metric only learns from 20 instances
with 16 labels each in this procedure.
   We evaluate the meta-learners’ predictive performance by performing leave-one-out cross-
validation. Because randomness affects the evaluation due to the small size of our metadata set,
we average the results over 50 evaluation repetitions. We show the evaluation results for the
random forest meta-learner on the MAE and RMSE metrics in Table 1. The evaluation shows
that the RMSE meta-learner outperforms the single best algorithm in selection accuracy by
4.2% and is on par with its average error. The MAE meta-learner is worse than the single best
algorithm with a 5.1% lower selection accuracy and 0.01 higher average error. However, the

2
    https://grouplens.org/datasets/movielens/
3
    https://nijianmo.github.io/amazon/index.html
difference between the meta-learner and single best algorithm is marginal for both, especially
considering the average error. In algorithm selection, beating the single best algorithm is a
standard minimum requirement. And we can surpass it with CaMeLS for the RMSE meta-learner
even with scarce data and basic complexity measures.

Table 1
This table shows the performance of the random forest meta-learner on the MAE and RMSE metrics.
We compare it to the oracle and single best algorithm. The oracle knows the ground truth and always
picks the best algorithm. The single best algorithm is the algorithm that most often performed best on
the training metadata set. It shows the selection accuracy and average error of the selection method
cross-validated using the leave-one-out method and averaged over 50 validation repetitions.
                 Mean Absolute Error                                      Root Mean Squared Error
 Average Error   Selection Accuracy    Selection Method   Average Error    Selection Accuracy   Selection Method
 0.91            100%                  Oracle             1.26             100%                 Oracle
 0.92            25%                   Single Best        1.28             30%                  Single Best
 0.93            19.9%                 Meta-Learner       1.28             34.2%                Meta-Learner


  In addition to its predictive capabilities, CaMeLS saves time and computing power. Manually
reading the data sets into two different libraries and performing evaluations on each algorithm
may take multiple hours or days, depending on the size of the input data set. Contrarily, CaMeLS
can predict a suitable algorithm in a few seconds from reading the data.


4. Discussion
While meta-learning provides an opportunity for model selection in RecSys, there are still many
challenges to overcome with our presented concept. Model selection through meta-learning
in RecSys is not an easy task and requires further research into, e.g., metadata acquisition.
Sufficient data must be collected to start up and improve the service. However, the evaluation
has shown that even a small amount of data may already provide an immediate benefit. Of
course, the initial data collection task can be performed automatically on popular data sets,
similar to what we did for CaMeLS.
   The idea is that users also contribute data voluntarily. A simple incentive can be the implicitly
assumed benevolence of users to improve the service for everyone. But more tangible incentives
may be found. Depending on the use case, there must be considerations for whether the upload
should be an opt-in or opt-out procedure. At the same time, the upload routine must be easily
accessible, so users will not feel a burden when uploading their data. Additionally, if there are
no restrictions on usage, the server host should consider the free-rider problem.
   There is a range of other research questions to be answered, e.g., what are the best complexity
measures considering their computational effort, and what is the ideal setup for the meta-learner?
Right now, metadata acquisition and meta-learner training are both relatively high-speed due
to their simplicity. When more complex metadata is involved, the metadata calculation will
take longer and possibly discourage users from donating data. The benefit-cost ratio of any
task performed with this system is especially significant for the clients.
References
 [1] J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization., Journal of
     machine learning research 13 (2012).
 [2] J. Wu, X.-Y. Chen, H. Zhang, L.-D. Xiong, H. Lei, S.-H. Deng, Hyperparameter optimization
     for machine learning models based on bayesian optimizationb, Journal of Electronic
     Science and Technology 17 (2019) 26–40. URL: https://www.sciencedirect.com/science/
     article/pii/S1674862X19300047. doi:h t t p s : / / d o i . o r g / 1 0 . 1 1 9 8 9 / J E S T . 1 6 7 4 - 8 6 2 X . 8 0 9 0 4 1 2 0 .
 [3] P. Matuszyk, R. T. Castillo, D. Kottke, M. Spiliopoulou, A comparative study on hyperpa-
     rameter optimization for recommender systems, in: Workshop on Recommender Systems
     and Big Data Analytics (RS-BDA’16), volume 13, 2016.
 [4] M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, F. Hutter, Auto-sklearn 2.0: The
     next generation, CoRR abs/2007.04074 (2020). URL: https://arxiv.org/abs/2007.04074.
     arXiv:2007.04074.
 [5] M. Grobelnik, J. Vanschoren, Warm-starting darts using meta-learning, 2022. URL: https:
     //arxiv.org/abs/2205.06355. doi:1 0 . 4 8 5 5 0 / A R X I V . 2 2 0 5 . 0 6 3 5 5 .
 [6] L. Zimmer, M. Lindauer, F. Hutter, Auto-pytorch: Multi-fidelity metalearning for efficient
     and robust autodl, IEEE Transactions on Pattern Analysis and Machine Intelligence 43
     (2021) 3079–3090.
 [7] M. Luo, F. Chen, P. Cheng, Z. Dong, X. He, J. Feng, Z. Li, Metaselector: Meta-learning for
     recommendation with user-level adaptive model selection, CoRR abs/2001.10378 (2020).
     URL: https://arxiv.org/abs/2001.10378. a r X i v : 2 0 0 1 . 1 0 3 7 8 .
 [8] A. Nechaev, V. Meltsov, N. Zhukova, Utilizing metadata to select a recommendation
     algorithm for a user or an item, in: CEUR Workshop Proceedings, 2020.
 [9] H. Bharadhwaj, Meta-learning for user cold-start recommendation, in: 2019 International
     Joint Conference on Neural Networks (IJCNN), 2019, pp. 1–8. doi:1 0 . 1 1 0 9 / I J C N N . 2 0 1 9 .
     8852100.
[10] A. Collins, D. Tkaczyk, J. Beel, One-at-a-time: A meta-learning recommender-system
     for recommendation-algorithm selection on micro level, arXiv preprint arXiv:1805.12118
     (2018).
[11] R. Anand, J. Beel, Auto-surprise: An automated recommender-system (autorecsys) library
     with tree of parzens estimator (tpe) optimization, in: Fourteenth ACM Conference on
     Recommender Systems, 2020, pp. 585–587.
[12] M. Arambakam, J. Beel, Federated meta-learning: Democratizing algorithm selection
     across disciplines and software libraries, in: 7th ICML Workshop on Automated Machine
     Learning (AutoML), 2020.
[13] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,
     P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher,
     M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, Journal of Machine
     Learning Research 12 (2011) 2825–2830.
[14] M. D. Ekstrand, Lenskit for python: Next-generation software for recommender systems
     experiments, in: Proceedings of the 29th ACM international conference on information &
     knowledge management, 2020, pp. 2999–3006.
[15] N. Hug, Surprise: A python library for recommender systems, Journal of Open Source
     Software 5 (2020) 2174. URL: https://doi.org/10.21105/joss.02174. doi:1 0 . 2 1 1 0 5 / j o s s . 0 2 1 7 4 .
[16] S. Schelter, J.-H. Boese, J. Kirschnick, T. Klein, S. Seufert, Automatically tracking meta-
     data and provenance of machine learning experiments, in: Machine Learning Systems
     Workshop at NIPS, 2017, pp. 27–29.


A. Complexity Measures
The list of the complexity measure that CaMeLS calculates and uses as metadata.


   1. Number of users
   2. Number of items
   3. Minimum rating
   4. Maximum rating
   5. Mean rating
   6. Normalized mean rating
   7. Number of instances
   8. Highest number of rating by a single user
   9. Lowest number of ratings by a single user
  10. Highest number of ratings on a single item
  11. Lowest number of ratings on a single item
  12. Mean number of ratings by a single user
  13. Mean number of ratings on a single item
  14. Rating skew
  15. Rating kurtosis
  16. Rating standard deviation
  17. Rating variance


B. Data Sets
The list of data sets supported by CaMeLS. Bold text indicates that the data set was used in the
evaluation.

  Movielens Source:https://grouplens.org/datasets/movielens/
   1. Movielens 100K
   2. Movielens 1M
   3. Movielens 10M
   4. Movielens 20M
   5. Movielens Latest Small
  Amazon Source:https://nijianmo.github.io/amazon/index.html
   1. amazon-all-beauty
   2. amazon-appliances
   3. amazon-arts-crafts-and-sewing
   4. amazon-automotive
   5. amazon-books
   6. amazon-cds-and-vinyl
   7. amazon-cell-phones-and-accessories
   8. amazon-clothing-shoes-and-jewelry
   9. amazon-digital-music
  10. amazon-electronics
  11. amazon-fashion
  12. amazon-gift-cards
  13. amazon-grocery-and-gourmet-food
  14. amazon-industrial-and-scientific
  15. amazon-home-and-kitchen
  16. amazon-kindle-store
  17. amazon-luxury-beauty
  18. amazon-magazine-subscriptions
  19. amazon-movies-and-tv
  20. amazon-musical-instruments
  21. amazon-office-products
  22. amazon-patio-lawn-and-garden
  23. amazon-pet-supplies
  24. amazon-prime-pantry
  25. amazon-software
  26. amazon-sports-and-outdoors
  27. amazon-tools-and-home-improvement
  28. amazon-toys-and-games
  29. amazon-video-games
  BookCrossing Source:https://grouplens.org/datasets/book-crossing/
  EachMovie Source:http://www.gatsby.ucl.ac.uk/~chuwei/data/EachMovie/eachmovie.html
  Jester Source:http://eigentaste.berkeley.edu/dataset/
   1. Jester3
   2. Jester4


C. Algorithms
The list of algorithms supported by CaMeLS.

  Lenskit [14] algorithms with descriptions from their official documentation:
 1. UserUser: User-user nearest-neighbor collaborative filtering.
 2. ItemItem: Item-item nearest-neighbor collaborative filtering.
 3. BiasedMF: Biased matrix factorization trained with alternating least squares.
 4. BiasedSVD: Biased matrix factorization for implicit feedback using SciKit-Learn’s SVD
    solver.
 5. FunkSVD: Algorithm class implementing FunkSVD matrix factorization.
 6. Bias: A user-item bias rating prediction algorithm.
Surprise [15] algorithms with descriptions from their official documentation:
 1. NormalPredictor: Algorithm predicting a random rating based on the distribution of the
    training set, which is assumed to be normal.
 2. Baseline: Algorithm predicting the baseline estimate for given user and item.
 3. KNNBasic: A basic collaborative filtering algorithm.
 4. KNNWithMeans: A basic collaborative filtering algorithm, taking into account the mean
    ratings of each user.
 5. KNNWithZScore: A basic collaborative filtering algorithm, taking into account the z-score
    normalization of each user.
 6. KNNBaseline: A basic collaborative filtering algorithm taking into account a baseline
    rating.
 7. SVD: The famous SVD algorithm, as popularized by Simon Funk during the Netflix Prize.
 8. NMF: A collaborative filtering algorithm based on Non-negative Matrix Factorization.
 9. SlopeOne: A simple yet accurate collaborative filtering algorithm.
10. CoClustering: A collaborative filtering algorithm based on co-clustering.

</pre>