=Paper=
{{Paper
|id=Vol-3714/paper2
|storemode=property
|title=HPT4Rec: AutoML-based Hyperparameter Self-Tuning Framework for Session-based Recommender Systems
|pdfUrl=https://ceur-ws.org/Vol-3714/paper2.pdf
|volume=Vol-3714
|authors=Amir Reza Mohammadi,Amir Hossein Karimi,Mahdi Bohlouli,Eva Zangerle,Günther Specht
|dblpUrl=https://dblp.org/rec/conf/gvd/MohammadiKBZS23
}}
==HPT4Rec: AutoML-based Hyperparameter Self-Tuning Framework for Session-based Recommender Systems==
HPT4Rec: AutoML-based Hyperparameter Self-Tuning
Framework for Session-based Recommender Systems
Amir Reza Mohammadi1 , Amir Hossein Karimi2 , Mahdi Bohlouli3 , Eva Zangerle1 and
Günther Specht1
1
Department of Computer Science, Universität Innsbruck, Austria
2
Mathematics and Computer Science Department, Amirkabir University of Technology, Tehran, Iran
3
Computer Science and Information Technology Department, IASBS, Zanjan, Iran
Abstract
Recommender systems have evolved beyond the basic user-item filtering methods in research. However, these filtering
methods are still commonly used in real-world scenarios, mainly because they are easier to debug and reconfigure. Indeed the
existing frameworks do not adequately support algorithmic tuning. Moreover, they are primarily focused on the reproducibility
of state-of-the-art accuracy rather than ease of algorithm development and maintenance. Therefore, rapid and iterative
experimentation and debugging are considerably hindered. In this work, we propose an AutoML-based framework with
a modular deep session-based recommender code-base and an integrated automated HyperParameter Tuning (HPT4Rec)
component. The proposed framework automates searching for the best session-based model for a given data. Therefore it
can help to consistently update the model based on potential changes in the type and volume of data that is prevalent for a
real-world scenario. It is demonstrated that HPT4Rec provides extensible data structures, training service compatibility, and
GPU-accelerated execution while maintaining training efficiency and recommendation accuracy. We have conducted our
experiments on the benchmark RecSys 2015 dataset and achieved performance on par with state-of-the-art results. Achieved
results of our experiments show the importance of continuous and iterative parameter tuning, particularly for real-world
scenarios.
Keywords
AutoML, Session-based Recommender Systems, Framework, Hyperparameter Tuning
1. Introduction start problem. Session-based recommendation might be
a vital component of the future recommendation, espe-
It is often overwhelming to an e-commerce user to see cially for the business and real-world applications, as
so many products available for sale. Recognizing the there are concerns and regulations about collecting user
burden of data overload, recommender systems (RSs) data like GDPR [5].
improve user experience substantially in various appli- Methods based on deep learning (DL) have shown
cations. Traditional RSs often rely on user profiles to great promise in the session-based recommendation and
provide personalized recommendations. Collaborative fil- also in other communities [6]. As stated in various lit-
tering approaches [1, 2, 3] could use history of purchases erature [7, 8, 9], they perform better than traditional
to determine user similarity, or use matrix factorization baseline methods by around 20-30 percent. However,
to establish latent factor vectors for each user. In both recent investigations have shown that many of these
cases, it is essential to identify the user when making methods are not compelling enough [10], moreover, re-
recommendations. However, this may not always be pos- sults are hard to reproduce in many of them [11], and
sible, such as not being logged in, having deleted their the codes are not readily available. Recent publications
tracking information, or a new user not having profile. have addressed reproducibility by implementing several
Consequently, recommendation methods that require the DL-based recommendation algorithms as a framework
user’s history suffer from cold-start issues. [12, 13, 14]. While these frameworks are effective and
Making session-based recommendations is another al- helped to alleviate the problem, two key factors should
ternative to using historical data [4]. In this setup, recom- not be overlooked: 1. Iterative algorithm optimization:
mendations are only made based on the behavior of users If these algorithms are intended for real-world use, they
in their current session which helps on tackling the cold- should include tools for being iteratively tuned to a given
dataset (not the offline benchmark datasets). The process
34th GI-Workshop on Foundations of Databases (Grundlagen von Daten- should be iterative and persistent since new features may
banken), June 7-9, 2023, Hirsau, Germany
Envelope-Open amir.reza@uibk.ac.at (A. R. Mohammadi); ahkarimi@aut.ac.ir
emerge, and user preferences may change. 2. Modular-
(A. H. Karimi) ity and ease of reproduciblity: Besides accuracy, several
Orcid 0000-0003-3934-6941 (A. R. Mohammadi); 0009-0001-3946-6954 other factors must be taken into consideration, when im-
(A. H. Karimi) plementing literature-approved methods in production,
© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
including non-complexity, fault tolerance, real-time pre- with the neighborhood-based method, Jannach et al. [10]
diction, debuggability, resource consumption, and modu- combined sequential patterns and co-occurrence signals
larity [15, 16]. The most advanced and well-performing to get the best of both worlds. Tuan et al. [17] fused
models are often left behind in the business, because they session clicks with content features (namely, item titles
are complex and challenging to debug. As a result, busi- and categories) to generate recommendations based on
nesses still opt for more straightforward methods that 3-dimensional Convolutional Neural Networks (CNN).
are less accurate, but easier to manipulate and debug. In Li et al. [21] have developed a neural attentive recom-
several papers [8, 10, 17, 18] (discussed in the background mendation machine (NARM) using an encoder-decoder
section of prior work), various techniques were used to architecture. NARM can distinguish sequential behavior
slightly improve performance, which not only may not and the primary purposes of users using the attention
be useful for large-scale day-to-day use, but may also mechanism on RNN. In another study, a Short-Term At-
cause problems in production and during debugging. It tention Priority model (STAMP) [18], which employs a
would be more practical to implement a robust and mod- simple MLP network, and an attentive net has been pro-
ular core structure with clear interfaces and to give room posed for understanding users’ general interests as well
to add more complex mechanisms based on the business as their current interests. In both NARM and STAMP, an
demands. attention mechanism emphasizes the importance of the
Motivated by reasons mentioned above, in this paper, last click.
we present HPT4Rec, an AutoML-based framework for Almost all of the aforementioned RNN-based SBR mod-
hyperparameter self-tuning with a modular code-base els follow the same architecture as GRU4Rec [7]. They
aimed at session-based recommendation. Our frame- have just incorporated new features and mechanisms to
work simplifies the development and manipulation of improve performance on top of the core structure. There-
deep recommendation algorithms to meet business needs. fore, in HPT4Rec, a minimal code-base based on GRU4Rec
PyTorch and Microsoft NNI 1 are used to develop the was built, with all the necessary tools and modules for
code-base, both of which are well known in the DL and a methodologically simplified bottom-up approach to
AutoML communities and receive continuous updates. model development. This can remove the barrier of en-
Besides being open-source, this framework can be in- try for practitioners and allow them to add other features
stalled easily, and all prepared data and trained mod- if necessary.
els are available at https://github.com/amirreza-m95/ Related Frameworks In the modern RSs field, re-
HPT4Rec producibility is crucial. Recently, various researchers
[10, 11, 22, 23] pointed out the need for fair evaluation of
recommender models. Upon thorough hyperparameter
2. Prior Work tuning, their argument about the supremacy of latent-
factor models over deep neural models made it necessary
Background. The most commonly used deep model,
to develop new recommendation frameworks. Begin-
when dealing with sequential data are Recurrent Neu-
ning in 2011, Mymedialite [24], , RankSys [25], LensKit
ral Networks (RNN). There is a type of RNNs known
[26], LightFM [27], and Surprise [28] have established
as LSTM [19] that are shown to work particularly well,
a set of integrated tools for rapid prototyping and test-
including additional gates regulating, when to take into
ing of recommendation models, using standard metrics
account input and, when to reset the hidden state. These
and an intuitive model execution. Deep learning (DL)
models are not affected by the vanishing gradient prob-
recommendation models achieved remarkable success
lem usually associated with RNN models. A somewhat
and attracted growing community interest, which led
simpler alternative to LSTM, but still retaining all of its
to the development of new tools. The first open-source
properties, are Gated Recurrent Units (GRUs) [20], which
frameworks for DL-based recommenders were LibRec
we employ in this work as the core learning structure of
[29], Spotlight [30], and OpenRec [31]. Although these
the recommender for the experiments.
frameworks provided plenty of models, they lacked fil-
Hidasi et al. [7] suggested the RNN approach for
tering and Automated hyperparameter tuning strategies.
session-based recommendation (SBR) and then proposed
The RecQ [32], DeepRec [33], and Cornac [34] frame-
a parallel RNN architecture [9] to model sessions using
works have made a significant contribution towards a
the clicks and features of the clicked items. Further re-
more comprehensive collection of model implementa-
search was presented based on RNN methods in order to
tions. DaisyRec [35], RecBole [36], and Elliot [12] raised
improve the accuracy of this model. Performance of the
the bar considerably after the reproducibility hype, mak-
recurrent model can be boosted by taking into account
ing available a large number of models, data filtering and
temporal changes in user behavior and data augmen-
splitting operations, as well as hyperparameter tuning.
tation techniques[8]. By uniting the recurrent method
Nevertheless, we observed a deficiency of two increas-
1
https://github.com/microsoft/nni ingly critical aspects of recommendation model develop-
ment in real-world scenarios: Automated Hyperparame- state ℎ according to mechanism showed in eq. (1):
ter tuning and industry-level compatibility of tools and
training services. In reviewing these related frameworks, ht = 𝑔 (𝑊xt + 𝑈ht−1 ) (1)
we observed the lack of an open-source recommenda-
tion framework to perform automated hyperparameter where, The logistic sigmoid function 𝑔 is a smooth
tuning while adopting various hyperparameter tuning function with a bounded input of 𝑥𝑡 , which is the unit
strategies on different distributed platforms. HPT4Rec input at time 𝑡. Based on its actual state ℎ𝑡 , an RNN
represents a step toward reaching that goal. provides a probability distribution for the subsequent
Earlier studies attempted to find a universal automated element of the sequence.
solution for both architecture design [37, 38] and opti- GRU is a form of RNN that tends to cope with vanish-
mization [39, 40, 41] but that seems to be ineffective since ing gradient problems better than vanilla RNN. In essence,
the problems are diverse with different characteristics, GRU gates learn when to update their hidden state and
so a one-size-fits-all solution is not appropriate. The goal by how much. GRUs are superior to Long Short-Term
of complete automation might be inspiring for scientific Memory (LSTM) units when it comes to the session-based
research and serve as a long-term engineering objective, recommendation. [7].
but it seems likely that we will need to semi-automate the A linear interpolation between the prior activation
majority of these tasks and gradually reduce the human and the candidate activation is used to determine GRU
factor over time. Then it is expected that we will develop activation, ℎ𝑡 :
powerful tools to assist in making machine learning, first
ht = (1 − zt ) ht−1 + zt ht ̂ (2)
and foremost, more systematic and second, more effi-
cient. Aiming to accomplish this goal is the purpose of where the update gate is given by:
HPT4Rec.
zt = 𝜎 (𝑊𝑧 xt + 𝑈𝑧 ht−1 ) (3)
In a similar manner while the candidate activation
function, ℎ𝑡 , is also computed:
Scores on Items
Feedforward Layers
Embedding Layer
Gated Recurrent Unit
Gated Recurrent Unit
Gated Recurrent Unit
Input Data
ht ̂ = tanh (𝑊xt + 𝑈 (rt ⊙ ht−1 )) (4)
and eventually, the reset gate 𝑟𝑡 is provided by:
rt = 𝜎 (𝑊𝑟 xt + 𝑈𝑟 ht−1 ) (5)
We have presented the standard formulation of GRU
in Equations (3) and (4), but it is important to note that
Figure 1: Overview of HPT4Rec’s Session-based Recommen-
dation Architecture
framework users can tweak the model by using other
options, like using different final activations such as relu,
leaky-relu, and softmax.
3.1.1. GRU4Rec Architecture
3. HPT4Rec
The network core comprises the GRU layers, and further
In this section, we describe HPT4Rec’s architecture and
feedforward layers may be added between the GRU layer
tuning pipeline. First, we describe the general architec-
and the output. Each item’s predicted preference can be
ture of the recommender. Next, we present the compo- calculated to predict whether it will be the next item in
nents and architecture of the framework. Finally, we the session. If more than one GRU layer is employed,
discuss the available self-tuning methods and their best
the hidden state of each layer is used as an input for the
application scenarios. next layer. An option is to connect the input to a higher
layer of the network to improve performance [7]. We
3.1. Sequential Modeling with RNN adjusted the base network to suit the task better since rec-
ommender systems are not the principal application area
Variable-length sequence data can be modeled using of RNNs. The SBR model architecture is demonstrated
RNNs. RNNs are characterized by the internal hidden in Figure 1.
state present in the units that make up the network, In addition, we also use trainable embeddings to rep-
which sets them apart from conventional feedforward resent all of our inputs. With backpropagation Through-
neural networks. A standard RNN updates its hidden Time (BPTT), we can train our neural networks using
Experiment
Manager
Figure 2: HPT4Rec’s Architecture Overview
mini-batch gradient descent on multiple options for loss by the variable name, sampling strategy, and parameters
over a dynamic number of time steps. of a search space.
Session-parallel mini-batches. Click sessions are often A search space definition can be expressed as follows:
of varying length. It may take some users a long time to
1{
find their desired item, while others find it within seconds. 2 "dropout_rate": {"_type": "uniform", "_value": [0.1, 0.5]},
In the recommender system, accurate predictions should 3 "conv_size": {"_type": "choice", "_value": [2, 3, 5, 7]},
4 "hidden_size": {"_type": "choice", "_value": [124, 512, 1024]},
be provided regardless of the current session length. This 5 "lr": {"_type": "loguniform", "_value": [0.0001, 0.1]},
problem has been addressed by different methods like 6 "momentum": {"_type": "lognormal", "_value": [0.1, 1]}
7}
session-parallel mini-batches [9] and data augmentation
[8]. Since we are seeking the least sophisticated approach, We have five parameters to tune in this search space.
we have taken the former approach. According to this definition, the dropout rate is charac-
terized by a uniform distribution within a range of 0.1 to
3.2. Architecture and Data Flow 0.5. This search space will be used by Tuner to build con-
figurations, selecting a value from within the range for
Automated tuning of hyperparameters is a key feature of each parameter. Besides defining the search space, the
HPT4Rec. We provide 11 popular self-tuning algorithms. only requirement is to define a configuration file contain-
Experiments can be run on a wide range of training plat- ing information like experiment log folder, self-tuning
forms, including local machines, multiple servers on a algorithms, trial number, and duration threshold. The
distributed network, and open-source platforms such as configuration file is in YAML format.
Kubernetes and OpenPAI. In order to implement a new tuning algorithm or tweak
the existing ones, the base tuner should be inherited.
3.2.1. HPT4Rec’s Data Flow Then, by following the interface of the module and re-
HPT4Rec experiments are individual attempts to apply a turning the experiment results, passing the new parame-
configuration (e.g., a set of hyperparameters) to a model. ters, and updating the search space, the tuning module
The first step in constructing an experiment is to define will function properly.
the search space (i.e., parameters). The tuner will sample
parameters/architecture according to the search space,
which is defined as a JSON file. Search spaces are defined
Table 1
Self-tuning methods performance on different proxy datasets.
TPE SMAC Anneal
#Samples Recall@20 MRR@20 Time Recall@20 MRR@20 Time Recall@20 MRR@20 Time
125K 0.4314 0.2069 23 0.4229 0.2114 29 0.4332 0.203 25
250K 0.4687 0.225 39 0.473 0.2235 45 0.4633 0.2311 41
500K 0.5062 0.2426 76 0.5082 0.2442 77 0.5103 0.2487 57
1M 0.545 0.2559 139 0.5479 0.2636 147 0.5481 0.2619 191
3.2.2. Architecture ber of trials. A wide range of experiments revealed that
TPE outperformed random search. If the variables in the
By executing the experiment_runner python script
search space can be selected from a prior distribution,
through Cli and passing the configuration file path, exper-
Anneal is useful. Likewise, it is recommended to use
iments are instantiated. The experiment manager parses
naive evolution, when your experiment code supports
the configuration file to determine the path to the search
weight transfer, which implies that the experiment could
space and target the training service, and then runs the
inherit its parent’s converged weight from its predecessor.
model code with the appropriate parameters from the
Training can be substantially accelerated with the right
search space. Preprocessing will be performed by the
tuning method, resulting in less time and money spent
experiment manager (e.g., one-hot encoding, embedding
and higher revenue, as well as better recommenders, to
dropout). Following the execution of the model with
enhance user experience.
the first set of parameters, the self-tuner will examine
intermediate results (i.e., after each epoch) to determine
whether results are improving. Next, it will pass the 4. Experiments
model on to the evaluation module. Evaluation will be
conducted by the evaluator, and results will be provided 4.1. Experiment Setup
to self-tuning algorithm to update its inner state. Follow-
ing the update, the self-tuning algorithm determines the 4.1.1. Dataset
next metric to use. The iterative process will be repeated We conducted our experiments on the YOOCHOOSE
until a certain time or number of experiments is reached. e-commerce dataset for RecSys 2015 challenge 2 . A six-
Figure 2 illustrates this procedure. HPT4Rec will output month period of click-streams from an e-commerce site
results in a webUI interface and collect all metrics, inter- was included in this dataset. Click-streams are some-
mediate results, best parameters, and system logs in a times followed by purchase events. Following prepro-
JSON format. cessing, there are 7,936,469 sessions and 31,437,691 clicks
on 37,403 items left for training and testing. Each clicking
3.2.3. Self-tuning event contains a session ID, an item ID and, if the item is
a buy-item, a price tag. A shopping session can contain
The cycle of getting hyperparameters, carrying out exper-
anywhere between 1 and 200 clicks, but most sessions
iments, testing their results, and then tuning hyperparam-
contain less than 30 clicks. We keep only the click events
eters is deemed as self-tuning. Recommender systems
from the challenge’s training set. Sessions of length one
are used in various online websites with different lev-
are filtered out. The Yoochoose dataset was chosen since
els of user activity, which directly affects the volume of
it is the most general dataset, based on the dataset’s fea-
data available for training models. Additionally, Training
tures compared to other well-known datasets in this field
deep models require substantial computational resources,
such as Diginetica3 , Xing4 , and Last.fm5 . The default set-
which is another crucial aspect since it directly impacts
tings of the framework can be used for all the datasets we
revenue. Thereby, different tuning strategies are needed
mentioned just by omitting some of their extra features.
based on available features, the volume of data, and avail-
We employ a dataset characterized by minimalistic data
able computational resources. Based on the framework
features as a means to ensure the robust generalizability
review shown in table 1, HPT4Rec offers several tuning
of the model to diverse datasets encompassing a greater
techniques tailored for diverse scenarios that occur in
abundance of data features.
real-world scenarios.
After a series of experiments, we have gained an early
intuition about the most suitable use cases of each self- 2
http://2015.recsyschallenge.com
tuning algorithm. In that sense, Tree-structured Parzen 3
https://competitions.codalab.org/competitions/11161
Estimator (TPE) [42] is suitable when computation re- 4
http://2016.recsyschallenge.com/
sources are limited, and you can only try a limited num- 5
http://ocelma.net/MusicRecommendationDataset/lastfm-1K.html
4.1.2. Evaluation Metrics Table 2
Comparison of the our optimized recommender against base-
In order to match the user with the most relevant item lines.
on the list, recommender systems can recommend only a
model/type/loss HS 6 Recall@20 MRR@20
few items at a time. We, therefore, use recall@20 as our POP 0.005 0.0012
main evaluation metric, which counts the proportion of S-POP 0.2672 0.1775
cases that have the targeted item in the top 20 items for all Item-KNN 0.5065 0.2048
test cases. As long as an item is among the top-N, recall BPR-MF 0.2574 0.0618
does not take its rank into consideration. The MRR@20 GRU4REC BPR 1000 0.6322 0.2467
metric is the second metric used in the experiments. A GRU4REC top1 100 0.5853 0.2305
reciprocal ranking of the desired items determines this HPT4Rec TOP1 110 0.6259 0.2681
value. A reciprocal rank above 20 is set to zero.
4.1.3. Implementation Details Thus, to make recommendations that reflect changes in
user behavior over time, models must be continuously
For demonstration purposes and to have a quantifiable and iteratively optimized. It is possible to have different
search space, we optimized hidden size, batch size, learn- approaches when we have different quantities of data
ing rate and the number of GRU layers and fixed others as and computation to find the best-optimized model, as
follow. For our model, 50-dimensional embeddings were discussed in the self-tuning 3.2.2. Our experiments have
used for the items, with a 20% embedding dropout. The been conducted using four proxy datasets that mirror
optimization was conducted using Adam [43]. The GRU the RecSys benchmark data, which comprise different
search space was set at 50 to 1000 hidden units for each quantities of data. HPT4Rec’s recommender model was
model. A session ends with the GRU’s hidden state reset tuned using four self-tuning methods that used proxy
to zero. Models are developed in PyTorch and trained on datasets as training data. Evaluation metrics and tuning
an NVIDIA Tesla V100. The source code of the model, time were recorded to compare these methods. Table 2
checkpoints, and logs are available online. shows how we found the most effective model using 30
The comparison was made with four traditional recom- experiments. Results do not indicate the optimal use case
mendations (POP, S-POP, Item-KNN and BPR-MF) and scenario for tuning methods, but rather demonstrate that
with two well-performing configurations of GRU4Rec. each of these tuners performs well in different scenarios
• POP. In one of its simplest forms, the popular pre- and that one of them does not outperform the others in
dictor predicts the items that are most popular in all proxy datasets and evaluation metrics.
the training set. Even though it is simple, it often
provides a good baseline for certain domains. 4.2.2. Consistency with Published Results
• S-POP. This baseline recommends the items that A key element for any new tool is consistency with the
are most popular during the current session. As previously published results since a wide range of results
the session progresses, the recommendation list are possible due to a variety of implementation details,
grows. Global popularity values are used to break non-fixed seed values, and other domain-specific rea-
up ties. sons. Our research also featured HPT4Rec’s self-tuning
• Item-KNN. This baseline measures similarity by method for optimizing the base recommender model with
dividing the number of times two items appear the Original RecSys dataset. In Table 2 we show that
together in sessions by the square root of the HPT4Rec has outperformed baseline models by a fair
product of their occurrence rates. margin and is almost on par with state-of-the-art models
with this privilege that it has discovered parameters that
lead to a simpler model, which results in less resource
4.2. Performance and Results consumption in production mode. The pursuit of more
4.2.1. Diverse Self-tuning Methods Effectiveness streamlined models facilitates enhanced reproducibility,
a fundamental tenet of our methodology, thereby engen-
The most likely scenario for developing a recommender dering an essential advancement.
system in the real world is carrying out an experiment,
where different levels of training data are collected. This
may change as user activity increases and new users 5. Conclusion and Future Work
visit the website. Even in the offline dataset of RecSys
In this paper, we have released a session-based rec-
2015, the results of training on a complete dataset are
ommender system framework based on AutoML called
slightly worse than those of training on a recent region
HPT4Rec. We reviewed the recommended systems
of the dataset, which shows changing user behavior [8].
frameworks in the literature, showing HPT4Rec’s mer- networks meet the neighborhood for session-based
its and shortcomings, and emphasizing the advantages recommendation, in: Proceedings of the Eleventh
of modularity and automatic tuning. To the best of our ACM Conference on Recommender Systems, 2017,
knowledge, HPT4Rec is the first recommendation frame- pp. 306–310.
work that provides a thorough self-tuning experimen- [11] M. F. Dacrema, P. Cremonesi, D. Jannach, Are we
tal pipeline supported by business scale training service really making much progress? a worrying analy-
compatibility. We expect HPT4Rec to simplify the tuning sis of recent neural recommendation approaches,
effort of recommendation models, facilitate the devel- in: Proceedings of the 13th ACM Conference on
opment and debugging process of new algorithms, and Recommender Systems, 2019, pp. 101–109.
help migrate deep recommender algorithms to be used [12] V. W. Anelli, A. Bellogín, A. Ferrara, D. Malitesta,
in real-world scenarios. Our immediate future work will F. A. Merra, C. Pomo, F. M. Donini, T. D. Noia, Elliot:
emphasize automating other aspects of the recommen- a comprehensive and rigorous framework for re-
dation pipeline, such as automated data augmentation, producible recommender systems evaluation, 2021.
which has traditionally been done manually in literature. arXiv:2103.02590 .
[13] L. Yang, E. Bagdasaryan, J. Gruenstein, C.-K. Hsieh,
D. Estrin, Openrec: A modular framework for
References extensible and adaptable recommendation algo-
rithms, in: Proceedings of the Eleventh ACM In-
[1] Y. Koren, R. Bell, C. Volinsky, Matrix factorization
ternational Conference on Web Search and Data
techniques for recommender systems, Computer
Mining, WSDM ’18, Association for Computing
42 (2009) 30–37.
Machinery, New York, NY, USA, 2018, p. 664–672.
[2] Y. Koren, Factorization meets the neighborhood: a
URL: https://doi.org/10.1145/3159652.3159681.
multifaceted collaborative filtering model, in: Pro-
[14] S. Zhang, Y. Tay, L. Yao, B. Wu, A. Sun, Deeprec:
ceedings of the 14th ACM SIGKDD international
An open-source toolkit for deep learning based rec-
conference on Knowledge discovery and data min-
ommendation, 2019. arXiv:1905.10536 .
ing, 2008, pp. 426–434.
[15] P. Kouki, I. Fountalis, N. Vasiloglou, X. Cui, E. Lib-
[3] R. Salakhutdinov, A. Mnih, G. Hinton, Restricted
erty, K. Al Jadda, From the lab to production: A
boltzmann machines for collaborative filtering, in:
case study of session-based recommendations in
Proceedings of the 24th international conference
the home-improvement domain, in: Fourteenth
on Machine learning, 2007, pp. 791–798.
ACM conference on recommender systems, 2020,
[4] J. B. Schafer, J. Konstan, J. Riedl, Recommender
pp. 140–149.
systems in e-commerce, in: Proceedings of the 1st
[16] D. Jannach, M. Jugovac, Measuring the business
ACM conference on Electronic commerce, 1999, pp.
value of recommender systems, ACM Trans. Man-
158–166.
age. Inf. Syst. 10 (2019). URL: https://doi.org/10.
[5] E. Commission, 2018 reform of eu data
1145/3370082.
protection rules, 2018-05-25. URL: https:
[17] T. X. Tuan, T. M. Phuong, 3d convolutional net-
//ec.europa.eu/commission/sites/beta-political/
works for session-based recommendation with con-
files/data-protection-factsheet-changes_en.pdf.
tent features, in: Proceedings of the eleventh
[6] A. Datar, C. Pan, M. Nazeri, X. Xiao, Toward
ACM conference on recommender systems, 2017,
wheeled mobility on vertically challenging terrain:
pp. 138–146.
Platforms, datasets, and algorithms, arXiv preprint
[18] Q. Liu, Y. Zeng, R. Mokhosi, H. Zhang, Stamp:
arXiv:2303.00998 (2023).
short-term attention/memory priority model for
[7] B. Hidasi, A. Karatzoglou, L. Baltrunas, D. Tikk,
session-based recommendation, in: Proceedings of
Session-based recommendations with recurrent
the 24th ACM SIGKDD International Conference
neural networks, CoRR abs/1511.06939 (2016).
on Knowledge Discovery & Data Mining, 2018, pp.
[8] Y. K. Tan, X. Xu, Y. Liu, Improved recurrent neural
1831–1839.
networks for session-based recommendations, in:
[19] S. Hochreiter, J. Schmidhuber, Long short-term
Proceedings of the 1st workshop on deep learning
memory, Neural computation 9 (1997) 1735–1780.
for recommender systems, 2016, pp. 17–22.
[20] K. Cho, B. Van Merriënboer, D. Bahdanau, Y. Ben-
[9] B. Hidasi, M. Quadrana, A. Karatzoglou, D. Tikk,
gio, On the properties of neural machine transla-
Parallel recurrent neural network architectures for
tion: Encoder-decoder approaches, Fifth Workshop
feature-rich session-based recommendations, in:
on Syntax, Semantics and Structure in Statistical
Proceedings of the 10th ACM conference on recom-
Translation (2014).
mender systems, 2016, pp. 241–248.
[21] J. Li, P. Ren, Z. Chen, Z. Ren, T. Lian, J. Ma, Neu-
[10] D. Jannach, M. Ludewig, When recurrent neural
ral attentive session-based recommendation, in:
Proceedings of the 2017 ACM on Conference on ACM/IEEE 47th Annual International Symposium
Information and Knowledge Management, 2017, pp. on Computer Architecture (ISCA), IEEE, 2020, pp.
1419–1428. 982–995.
[22] S. Rendle, L. Zhang, Y. Koren, On the difficulty of [34] A. Salah, Q.-T. Truong, H. W. Lauw, Cornac: A com-
evaluating baselines: A study on recommender sys- parative framework for multimodal recommender
tems, 2019. arXiv:1905.01395 . systems, Journal of Machine Learning Research 21
[23] D. Jannach, G. de Souza P. Moreira, E. Oldridge, (2020) 1–5.
Why are deep learning models not consistently [35] Z. Sun, D. Yu, H. Fang, J. Yang, X. Qu, J. Zhang,
winning recommender systems competitions yet? C. Geng, Are we evaluating rigorously? bench-
a position paper, in: Proceedings of the Rec- marking recommendation for reproducible evalu-
ommender Systems Challenge 2020, RecSysChal- ation and fair comparison, in: Fourteenth ACM
lenge ’20, Association for Computing Machinery, Conference on Recommender Systems, RecSys ’20,
New York, NY, USA, 2020, p. 44–49. URL: https: Association for Computing Machinery, New York,
//doi.org/10.1145/3415959.3416001. NY, USA, 2020, p. 23–32. URL: https://doi.org/10.
[24] Z. Gantner, S. Rendle, C. Freudenthaler, L. Schmidt- 1145/3383313.3412489.
Thieme, MyMediaLite: A free recommender system [36] W. X. Zhao, S. Mu, Y. Hou, Z. Lin, K. Li, Y. Chen,
library, in: 5th ACM International Conference on Y. Lu, H. Wang, C. Tian, X. Pan, Y. Min, Z. Feng,
Recommender Systems (RecSys 2011), 2011. X. Fan, X. Chen, P. Wang, W. Ji, Y. Li, X. Wang,
[25] S. Vargas, Novelty and diversity enhancement J.-R. Wen, Recbole: Towards a unified, comprehen-
and evaluation in recommender systems and in- sive and efficient framework for recommendation
formation retrieval, in: Proceedings of the 37th algorithms, 2020. arXiv:2011.01731 .
international ACM SIGIR conference on Research [37] P. Zhao, K. Xiao, Y. Zhang, K. Bian, W. Yan, Amer:
& development in information retrieval, 2014, pp. Automatic behavior modeling and interaction ex-
1281–1281. ploration in recommender system, arXiv preprint
[26] M. D. Ekstrand, Lenskit for python: Next- arXiv:2006.05933 (2020).
generation software for recommender systems ex- [38] Y. Chen, Y. Yang, H. Sun, Y. Wang, Y. Xu, W. Shen,
periments, in: Proceedings of the 29th ACM Inter- R. Zhou, Y. Tong, J. Bai, R. Zhang, Autoadr: Auto-
national Conference on Information & Knowledge matic model design for ad relevance, in: Proceed-
Management, 2020, pp. 2999–3006. ings of the 29th ACM International Conference on
[27] M. Kula, Metadata embeddings for user and Information & Knowledge Management, 2020, pp.
item cold-start recommendations, arXiv preprint 2365–2372.
arXiv:1507.08439 (2015). [39] T.-H. Wang, X. Hu, H. Jin, Q. Song, X. Han, Z. Liu,
[28] N. Hug, Surprise: A python library for recom- Autorec: An automated recommender system, in:
mender systems, Journal of Open Source Software Fourteenth ACM Conference on Recommender Sys-
5 (2020) 2174. tems, 2020, pp. 582–584.
[29] G. Guo, J. Zhang, Z. Sun, N. Yorke-Smith, Librec: A [40] R. Anand, J. Beel, Auto-surprise: An automated
java library for recommender systems., in: UMAP recommender-system (autorecsys) library with tree
Workshops, volume 4, Citeseer, 2015. of parzens estimator (tpe) optimization, in: Four-
[30] M. Kula, Spotlight, https://github.com/maciejkula/ teenth ACM Conference on Recommender Systems,
spotlight, 2017. 2020, pp. 585–587.
[31] L. Yang, E. Bagdasaryan, J. Gruenstein, C.-K. Hsieh, [41] H. Liu, X. Zhao, C. Wang, X. Liu, J. Tang, Auto-
D. Estrin, Openrec: A modular framework for ex- mated embedding size search in deep recommender
tensible and adaptable recommendation algorithms, systems, in: Proceedings of the 43rd International
in: Proceedings of the Eleventh ACM International ACM SIGIR Conference on Research and Develop-
Conference on Web Search and Data Mining, 2018, ment in Information Retrieval, 2020, pp. 2307–2316.
pp. 664–672. [42] J. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, Algo-
[32] J. Yu, M. Gao, H. Yin, J. Li, C. Gao, Q. Wang, Gen- rithms for hyper-parameter optimization, in: 25th
erating reliable friends via adversarial training to annual conference on neural information process-
improve social recommendation, in: 2019 IEEE ing systems (NIPS 2011), volume 24, Neural Infor-
International Conference on Data Mining (ICDM), mation Processing Systems Foundation, 2011.
IEEE, 2019, pp. 768–777. [43] D. P. Kingma, J. Ba, Adam: A method for stochas-
[33] U. Gupta, S. Hsia, V. Saraph, X. Wang, B. Reagen, tic optimization, arXiv preprint arXiv:1412.6980
G.-Y. Wei, H.-H. S. Lee, D. Brooks, C.-J. Wu, Deep- (2014).
recsys: A system for optimizing end-to-end at-
scale neural recommendation inference, in: 2020