Towards Interaction-based User Embeddings in Sequential
Recommender Models
Marina Ananyeva1,2 , Oleg Lashinin1 , Veronika Ivanova1 , Sergey Kolesnikov1 and
Dmitry I. Ignatov2
1
    Tinkoff, Moscow, Russia
2
    National Research University Higher School of Economics, Moscow, Russia


                                             Abstract
                                             All transductive recommender systems are unable to make predictions for users who were not included in the training sample
                                             due to the process of learning user-specific embeddings. In this paper, we propose a new method for replacing identity-based
                                             user embeddings in existing sequential models with interaction-based user vectors trained purely on interaction sequences.
                                             Such vectors are composed of user interactions using GRU layers with adjusted dropout and maximum item sequence length.
                                             This approach is substantially more efficient and does not require retraining when new users appear. Extensive experiments
                                             on three open-source datasets demonstrate noticeable improvement in quality metrics for the most of selected state-of-the-art
                                             sequential recommender models.

                                             Keywords
                                             sequential recommendation, user-specific embeddings, inductive learning


1. Introduction                                                                                                                          Secondly, storing trainable user vectors may allocate
                                                                                                                                      a lot of memory, since the amount of occupied space is
Recommender systems are widely used in various online                                                                                 usually O(n), where n is the number of users. It results
services, such as social networks, e-commerce, and enter-                                                                             in issues associated with model exploitation and storage
tainment platforms. These services gather large amounts                                                                               for a large number of users, which are prevalent in the
of sequential data, including the history of interactions                                                                             development of online services. Without the use of user-
between users and items. Some sequential models re-                                                                                   specific vectors, we don’t occupy memory for storing
quire learning the ID-based latent user vectors, which                                                                                a look-up ID-dependent user embedding matrix, reduc-
are supposed to represent both short-term and long-term                                                                               ing space complexity to O(1) by on-the-fly inference of
preferences based on user-specific information and pre-                                                                               user embedding by the input interactions, which greatly
vious history of interactions. However, there are several                                                                             simplifies the operating process.
drawbacks to this approach.                                                                                                              In this research, we present a method for constructing
   Firstly, transductive models can recommend items only                                                                              real-time produced user vectors that is able to overcome
to users from the training set. The predictions cannot                                                                                the limitations mentioned above. The contributions of
be obtained only based on previous interactions of out-                                                                               this work are summarized as follows:
of-sample users because the model’s user embeddings
depend on users’ IDs and additional features (if provided).                                                                                • We proposed a method of composing user em-
The problem of making recommendations for new users                                                                                          bedding based purely on interaction sequences,
is solved either by fully retraining the model on the up-                                                                                    which can be employed in architectures of ex-
dated data or iterative training on new batches [1]. For                                                                                     isting recommender sequential models instead
industrial purposes, the retraining process on large-scale                                                                                   of ID-based user-specific embeddings. This ap-
data is time- and space-consuming, constantly affecting                                                                                      proach helps to avoid the need to retrain recom-
the user coverage with recommendations and the quality                                                                                       mendation models as new interactions emerge.
of service for new users.                                                                                                                    In addition, it does not require storage of per-
                                                                                                                                             user embeddings and is therefore more storage
                                                                                                                                             efficient and scalable.
ORSUM@ACM RecSys 2022: 5th Workshop on Online Recommender
Systems and User Modeling, jointly with the 16th ACM Conference on                                                                         • We have comprehensively reviewed existing
Recommender Systems, September 23rd, 2022, Seattle, WA, USA                                                                                  works in three A and B-ranked conference series
Envelope-Open mananeva@hse.ru (M. Ananyeva); o.a.lashinin@tinkoff.ru                                                                         (RecSys, CIKM, and SIGIR) in 2019-2021 that use
(O. Lashinin); ext.vvivanova@tinkoff.ru (V. Ivanova);                                                                                        identity-based user embeddings in architectures.
s.s.kolesnikov@tinkoff.ru (S. Kolesnikov); dignatov@hse.ru
                                                                                                                                             This shows that a third of the existing models can
(D. I. Ignatov)
Orcid 1234-5678-9012 (M. Ananyeva)                                                                                                           be improved using our approach.
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       Attribution 4.0 International (CC BY 4.0).
                                       CEUR Workshop Proceedings (CEUR-WS.org)                                                          The experiments can be reproduced using our open-
Figure 1: Illustration of the proposed approach. We create user embeddings from its interaction history. Therefore, we do not
need to learn ID-based user-specific embeddings. The last known vector of interaction history embeddings is used as user
embedding.


Table 1
Full and short papers on sequential recommender models per conference series from 2019 to 2021.
  Conference              Articles               Number of models with user vectors Explained motivation for using user
                                                                                    vectors
   RecSys'21                 [2]                              0/1 (0%)              -
   RecSys'20          [3, 4, 5, 6, 7, 8]                  2/6 (33%) [3, 5]          Long-term preferences, model person-
                                                                                    alization
   RecSys'19                 [9]                              0/1 (0%)              -
    CIKM'21 [10, 11, 12, 13, 14, 15, 16, 17, 18]       3/9 (33%) [11, 12, 18]       Long-term preferences (2 works),
                                                                                    friends’ impact
    CIKM'20           [19, 20, 21, 22]                     1/4 (25%) [19]           Short&long-term preferences
    CIKM'19           [23, 24, 25, 26]                 3/4 (75%) [24, 25, 26]       Short&long-term preferences
   SIGIR'21           [27, 28, 29, 30]                        0/4 (0%)              -
   SIGIR'20             [31, 32, 33]                       1/3 (33%) [33]           For ranking score in BPR and GMF
   SIGIR'19               [34, 35]                      2/2 (100%) [34, 35]         Lifelong user behavior, user-specific
                                                                                    representations


source repository1 .                                        training set. In contrast, inductive learning models can
                                                            provide recommendations for out-of-sample users, who
                                                            have interactions but are not included in the training pro-
2. Related work                                             cess. For instance, Mult-VAE [43] and CF-LGCN-E [44],
                                                            which is modified version of LightGCN [45] for inductive
Sequence-based recommender models are commonly
                                                            learning mode, can provide predictions for users outside
used for recommendation tasks on serial data. Most of
                                                            of the training sample. Nevertheless, the quality of in-
them are based on recurrent neural networks (RNNs), for
                                                            ductive models is often lower than that of transductive
instance, GRU4Rec [36], SASRec [37], and SHAN [38].
                                                            ones.
Additionally, Transformers4Rec [39] is gaining popular-
                                                               Thus, one of the open challenges for transductive mod-
ity in usage for sequential and session-based tasks.
                                                            els which show high performance is to overcome the
    Some architectures attempt to model temporal decay
                                                            problem of making predictions for out-of-sample users.
effects in user interaction history in order to improve the
                                                            Additionally, the effect of user-specific embeddings on
relevance of recommendations. Customer needs as well
                                                            the quality of recommendations is not yet sufficiently
as both short-term and long-term preferences change
                                                            studied. In this research, we propose researching whether
over time, which should be taken into account in the pre-
                                                            we really need user-specific embeddings or if it is better
dictions. Intuitively, the most recent interactions should
                                                            to train ID- and feature-free user vectors based solely on
have greater weight than older ones in deciding on the
                                                            previous item interactions.
next item. Additionally, users may require substitutions
or supplements for an already acquired item. These as-
sumptions have been incorporated into the design of the 3. METHODOLOGY
SLRC [40], Chorus [41] and KDA [42] models.
    The approaches mentioned above have serious lim- 3.1. The rationale for Using User-specific
itations for industrial applications: item recommenda-             Vectors
tions are made based on user-specific embeddings, which
can be trained only for users that were included in the To determine how frequently trainable user-specific vec-
                                                            tors are used in existing sequential recommender models
1
  https://github.com/tinkoff-ai/use_rs                      and to systematize the reasons for their use, we examined
the proceedings of scientific conferences with relevant       store pre-computed embeddings in a look-up matrix with
articles. The summary in Table 1 shows that our anal-         users’ IDs as each vector can be derived on-the-fly from
ysis research includes articles that were presented be-       the input interaction sequence using the learned neural
tween 2019 and 2021 in three conference series: RecSys,       network weights. It addresses the scalability issues for
CIKM, and SIGIR. A paper was considered relevant if it        commercial applications. Secondly, it can be regarded as
proposed a sequential recommendation model, including         a step toward users’ privacy and confidentiality, because
session-based and POI recommendation tasks, and if their      a user identifier is redundant information, and without us-
performance was compared to sequential recommender            ing it we can not map it back to personal data. Lastly, our
model baselines. As a result, we compiled a list of 34        approach allows adapting previously introduced sequen-
relevant studies, 12 of which contain applications of user    tial recommender models to inductive learning scenarios,
embeddings. According to the authors, the primary pur-        when we can infer the recommendations for the users,
pose for including user vector processing in the proposed     who were not included in a training sample.
methods, which appeared in five studies, was to represent
long-term preferences. Other objectives included model-       3.3. Models
ing both short-term and long-term preferences, learning
user-specific vectors from mixed representations of all       In our experiments, we decided to use one of the most
users sharing the same account, and modeling the impact       popular frameworks for sequential recommendation
of friends’ behavior.                                         models - ReChorus2 and RecBole3 . For the experimen-
                                                              tal setup, we have selected state-of-the-art models that
                                                              have proven themselves in many new research papers as
3.2. Initialization of User Vectors
                                                              reliable baselines for comparison with new models.
Each sequential model processes the history of users’            Thus, we selected three models from the ReChorus
interactions in order to represent relationships between framework - KDA, Chorus, and SLRC - and two mod-
interactions and then model the user’s behavioral pat- els from RecBole - SHAN and HGN - in order to study
terns [36, 37].                                               how different user vector initialization techniques affect
   In our work, we investigate the feasibility of employing model performance on three open-source datasets.
this method to obtain vector representations that reflect        Two RecBole models have been implemented in Re-
users’ interests, as well as how it influences the quality of Chorus to ensure a fair comparison of the models.
sequential models. All of the selected models transform
the user ID into a low-dimensional real-valued dense                • Sequential Hierarchical Attention Network
                               𝑑
vector representation u ∈ ℝ , where d is the dimension                  (SHAN) [38] is a two-layer hierarchical attention
of the user embedding. The embedding is then processed                  network. The attention mechanism is needed to
in accordance with the architecture of each model.                      assign altered weights of items for the user to
   Instead of using this technique, we propose to re-                   capture the dynamic property, while the hierar-
place the ID-based user embedding initialization with                   chical structure integrates the user’s long- and
interaction-based user embedding initialization, suggest-               short-term preferences. User embedding vector
ing that the user ID be discarded as limiting information               is used as context information to obtain various
for efficiency and scalability.                                         weights   for different users.
   Let Su be the input representations of previous interac-         • Hierarchical Gating Network (HGN) [47] con-
tions Su ∈ ℝ𝐿×𝑑 , where L is the maximum history length.                sists of three parts: feature gating, instance gat-
First, we apply a Dropout layer [46] to the matrix Su .                 ing, and item-item product modules. The fea-
The sequence representation is then processed by GRU                    ture gating module allows the adaptive selection
layers. Note that our goal is to show that our approach is              of effective latent features based on user inter-
effective even with a simple recurrent layer like GRU. The              ests. At the instance gating module, items that
use of more advanced layers is left for future improve-                 reflect short-term user preferences are selected
ments. The final step is to use a linear layer to reduce                and passed down to lower layers along with item
the embedding dimension to its initial size d and take the              features. User embedding is used in both feature
last known vector of interaction history. As seen on Fig-               gating and instance gating modules.
ure 1, we obtain a user embedding u ∈ ℝ𝑑 as the output              • Chorus [41] incorporates the representation of
of successively applied layers: Dropout layer, two GRU                  different sequence contexts by knowledge and
layers, and Dense layer, the input of which is a sequence               time-aware item modeling. The constructed tem-
of each user’s historical data.                                         poral kernel functions modify the temporal dy-
   This approach has several significant advantages. First,             namics of relations by representing two sorts of
the space complexity is optimized from 𝑂(𝑛) to 𝑂(1), 2
                                                               https://github.com/THUwangcy/ReChorus
where 𝑛 is the number of users. There is no need to 3 https://recbole.io/
Table 2                                                     4.1. Datasets
Descriptive statistics of datasets.
                                                            We chose the three datasets most commonly used for
      Dataset          #users #items #actions #density      sequential recommendation: MovieLens-1M 4 , Amazon-
   MovieLens-1M         6,040   3,416  1M     4.84%         Grocery and Gourmet Food and Amazon-Electronics 5 .
 Grocery&Gourmet       127,496 41,280 1,1M    0.022%        These open-source datasets have different domains, sizes,
   Electronics         192,403 63,001 1,7M    0.014%
                                                            and sparsity. They contain user interaction sequences
                                                            with timestamps and item metadata, including the list of
                                                            also view and also buy relations in Amazon datasets and
       items - substitutes and complements - and al-        the list of genres in the MovieLens data set. We use a
       lowing relational representations to contribute      common leave-one-out strategy with 99 negative items,
       differentially to the final item embedding. User     similar to [42]. For SHAN, HGN, and SLRC, we only need
       embeddings are used in both the BPR and GMF          user interaction sequences, while Chorus and KDA are
       approaches for making predictions.                   based on knowledge graphs, so we use metadata to build
     • Short-Term and Life-Time Repeat Consump-             them. In Amazon datasets, we simply use the relations of
       tion (SLRC) [40] model uses the Hawkes Process       also view and also buy, provided in the metadata data set,
       and Collaborative Filtering, which requires learn-   as was done in [41]. We chose the most popular movies
       ing user embeddings to distinguish between user      of the same genre as the equivalent of also view items for
       interests and help explore new items. Consider-      the MovieLens data set, and the most popular items in
       ing the lack of recurrent interactions in the Ama-   the set of movies that the user has watched right after the
       zon and MovieLens datasets, we use this model to     ground-truth item as the equivalent of also buy items.
       derive substitutive and complementary types of
       relations between items, as implemented in the
       SLRC model in Chorus.                                4.2. Evaluation Metrics
     • Knowledge-aware Dynamic Attention (KDA)              Hit Ratio (HR@k) and Normalized Discounted Cumula-
       [42] takes both item relations and their temporal    tive Gain (NDCG@k) were used as evaluation metrics,
       evolution into account. The core idea of KDA is to   where k = [5, 10, 20, 50]. HR@k measures whether at
       aggregate the sequence of interactions into multi-   least one ground-truth item appears in the top-k recom-
       ple relation-specific embeddings via an attention    mendation list, whereas NDCG@k considers both the
       mechanism. Fourier transform with trainable fre-     position and relevance of the item in the recommenda-
       quency domain embeddings was used in a novel         tion list. The values of NDCG@10 and HR@10 for 5
       way to simulate the diverse temporal effects of      original and 5 modified models are presented in Table 3.
       various relational interactions. User vectors, as
       well as item vectors and interaction representa-
                                                            4.3. Experiment Settings
       tions, are used in the final ranking score.
                                                         All models were implemented using the PyTorch frame-
Overall, we selected all models stated above and com-    work [48]. For a fair comparison, we set the embedding
pared the original architectures with the architectures  size to 64, batch size to 256, and the maximum history
without user-specific vectors, based on our approach of  length to 20 for all models and datasets, similar to exper-
learning only from interaction sequences.                iments in [41]. Additionally, we demonstrate the results
                                                         of experiments for other values of the maximum history
4. EXPERIMENTS                                           length: 10, 30, and 50. Other hyperparameters are depen-
                                                         dent on the model and are set to their default values the
In this section, we introduce our experimental setup and same as in the original implementations. The tuning of
compare the performance of original models with modi- the hyperparameters across all methods and datasets is
fied ones. Our experiments are designed to answer the left for future work.
following research questions:
RQ1: Does the proposed method have a positive effect on 4.4. Baselines
the quality of existing sequential recommender models?
RQ2: How does the maximum sequence length affect We include two baselines in order to obtain the relative
the models’ performance?                                 performance of non-sequential methods. Specifically, we
                                                         include the POP method [49] which is a common non-
                                                         personal baseline that recommends the most popular

                                                            4
                                                                https://grouplens.org/datasets/movielens/1m/
                                                            5
                                                                http://jmcauley.ucsd.edu/data/amazon/
Table 3
The results of pairwise comparison of original and modified models. The best result in each pair of sequential models considered
is in bold.
                                        ML-1M                Grocery&Gourmet                Electronics
                                NDCG@10        HR@10       NDCG@10        HR@10       NDCG@10        HR@10
                  𝑃𝑂𝑃              0.2513       0.4575        0.2628       0.4350        0.3087       0.4849
                  𝐵𝑃𝑅 − 𝑀𝐹         0.4074       0.6844        0.3690       0.5516        0.3444       0.5348
                  𝑆𝐻 𝐴𝑁            0.3137       0.5661        0.2480       0.4067        0.2887       0.4418
                  𝐻 𝐺𝑁             0.5233       0.7846       0.3898        0.567        0.3845        0.5875
                  𝑆𝐿𝑅𝐶             0.3226       0.5778        0.3334       0.4982        0.3914       0.5569
                  𝐶ℎ𝑜𝑟𝑢𝑠           0.4309       0.7161        0.4046       0.5862        0.4063       0.5994
                  𝐾 𝐷𝐴            0.6041        0.8386        0.4442       0.6279       0.4605        0.6733
                  𝑆𝐻 𝐴𝑁𝑜𝑢𝑟        0.3209        0.5700       0.2565        0.4306       0.3137        0.4950
                  𝐻 𝐺𝑁𝑜𝑢𝑟         0.5812        0.8086        0.3268       0.4989        0.3662       0.5666
                  𝑆𝐿𝑅𝐶𝑜𝑢𝑟         0.5822        0.8091       0.3673        0.5376       0.4171        0.6040
                  𝐶ℎ𝑜𝑟𝑢𝑠𝑜𝑢𝑟       0.5976        0.8258       0.4089        0.5958       0.4523        0.6527
                  𝐾 𝐷𝐴𝑜𝑢𝑟          0.6011       0.8257       0.4456        0.6291        0.4544       0.6685


         (a) MovieLens-1M               (b) Amazon Grocery & Gourmet Food                   (c) Amazon Electronics
Figure 2: Relative change in NDCG@10 of five models on MovieLens-1M, Amazon Grocery&Gourmet Food, and Amazon
Electronics Datasets.


items. Additionally, we add a BPR-MF [50] approach that          NDCG@10 has decreased by 16%. As the authors of
is often adopted as a classic matrix factorization-based         HGN observed, the predictions of this model are highly
method.                                                          dependent on the last items. When our approach con-
                                                                 structs a user vector from long sequences, the impact of
4.5. Performance Comparison                                      the last items may be reduced. On Amazon Electronics
                                                                 we can see decreasing of metric values by 1% for 𝐾 𝐷𝐴𝑜𝑢𝑟
As can be seen, Table 3 shows the recommendation per-            and by 5% for 𝐻 𝐺𝑁𝑜𝑢𝑟 , while for 𝑆𝐿𝑅𝐶𝑜𝑢𝑟 , 𝑆𝐻 𝐴𝑁𝑜𝑢𝑟 and
formance of original architectures and modified models           𝐶ℎ𝑜𝑟𝑢𝑠𝑜𝑢𝑟 the quality improved in range of 7% to 12%.
on three datasets (RQ1). The proposed strategy has a             The overall performance of the four models improved
significant impact on model quality across all datasets.         dramatically, but changing user-specific vectors had al-
For instance, on MovieLens-1M we can see increases in            most no influence on the 𝐾 𝐷𝐴𝑜𝑢𝑟 model. According to a
both NDCG@10 and HR@10 for 4 modified models, com-               research article on KDA, one possible explanation is that
pared to the original ones: 𝑆𝐻 𝐴𝑁𝑜𝑢𝑟 , 𝐻 𝐺𝑁𝑜𝑢𝑟 , 𝑆𝐿𝑅𝐶𝑜𝑢𝑟         the architecture of KDA [42] is not highly sensitive to
and 𝐶ℎ𝑜𝑟𝑢𝑠𝑜𝑢𝑟 , while for 𝐾 𝐷𝐴𝑜𝑢𝑟 quality remains nearly         the presence of user vectors at all. 𝑆𝐿𝑅𝐶𝑜𝑢𝑟 demonstrates
the same. The quality improvement varies widely, rang-           significant improvement in quality for all datasets. It
ing from 1% for 𝑆𝐻 𝐴𝑁𝑜𝑢𝑟 to 81% for 𝑆𝐿𝑅𝐶𝑜𝑢𝑟 . We even ob-        is explained by the fact that the SLRC algorithm’s core
serve a slight boost in evaluation metrics for the strongest     component is collaborative filtering (CF), which is good
baseline, KDA, on Amazon Grocery&Gourmet Food.                   for modeling long-term user preferences. Our technique
   However, the quality of 𝐻 𝐺𝑁𝑜𝑢𝑟 has deteriorated:             allows us to evaluate short-term preferences in CF, which
the original model may have overlooked. If we consider        sequential models instead of user-specific embeddings.
each of the model-dataset pairs as a separate experiment,     Our method does not require constant retraining of the
our approach dramatically increases the quality metrics       model as the number of users increases, and is memory-
in 11 out of 15 cases.                                        efficient. Extensive experiments on 3 real-world datasets
   Summing up, comparative experiments on three real-         reveal that the majority of evaluated models were im-
world datasets show the effectiveness of our approach         proved in quality. Additionally, we studied the relation-
and significant improvement of quality for the majority       ship between the model’s relative improvement and item
of examined models. A new method with replaced user-          sequence length when our method is applied. Thus, we
specific embeddings provides a significant relative gain      suggest researchers experiment with our approach in
in performance (e.g., 0.6% − 12.1% for SHAN [38], 1.1% −      their studies by using ID-based user-specific embeddings.
38.7% for Chorus [41], 6.6% − 80% for SLRC[40]).              Our results can open up a new research area for ablation
   Figure 2 shows how the maximum history length influ-       studies on the use of user-specific embeddings in recom-
ences quality improvement when our approach is applied        mender systems. In the future, we are going to apply our
(RQ2). The smaller the maximum sequence length, the           approach to more modern models and try more complex
better the model captures user short-term preferences,        architectures than GRU. In addition, it is essential to in-
while long-term effects outweigh short-term effects for       vestigate how high-quality and stable this approach is
larger lengths. When the length of the sequence shrinks,      with an extremely small number of user interactions.
the long-term influence of modeling, which is the primary
reason for using user embeddings in a model, disappears.
As a result, replacing user-specific vectors works effec-     Acknowledgements
tively for both short (l = 10) and long (l = 50) sequences.
                                                              This research was supported by the Tinkoff Laboratory
                                                              and the Laboratory for Model and Methods of Compu-
5. CONCLUSION                                                 tational Pragmatics at the National Research University
                                                              Higher School of Economics (HSE). The contribution of
In this research, we proposed a method of composing           Dmitry I. Ignatov to the article was done within the frame-
vectors based purely on interaction sequences, which can      work of the HSE University Basic Research Program.
be employed in architectures of existing recommender
References                                                         2021, pp. 433–442.
                                                              [12] Y. Li, Y. Ding, B. Chen, X. Xin, Y. Wang, Y. Shi,
 [1] Y. Zhang, F. Feng, C. Wang, X. He, M. Wang, Y. Li,            R. Tang, D. Wang, Extracting attentive social tempo-
     Y. Zhang, How to retrain recommender system? a                ral excitation for sequential recommendation, arXiv
     sequential meta-learning method, in: Proceedings              preprint arXiv:2109.13539 (2021).
     of the 43rd International ACM SIGIR Conference           [13] Y. Li, T. Chen, P.-F. Zhang, H. Yin, Lightweight
     on Research and Development in Information Re-                self-attentive sequential recommendation, in: Pro-
     trieval, 2020, pp. 1479–1488.                                 ceedings of the 30th ACM International Conference
 [2] W. Song, S. Wang, Y. Wang, S. Wang, Next-item                 on Information & Knowledge Management, 2021,
     recommendations in short sessions, in: Fifteenth              pp. 967–977.
     ACM Conference on Recommender Systems, 2021,             [14] K. Hu, L. Li, Q. Xie, J. Liu, X. Tao, What is next
     pp. 282–291.                                                  when sequential prediction meets implicitly hard
 [3] C. Hansen, C. Hansen, L. Maystre, R. Mehrotra,                interaction?, in: Proceedings of the 30th ACM Inter-
     B. Brost, F. Tomasi, M. Lalmas, Contextual and                national Conference on Information & Knowledge
     sequential user embeddings for large-scale music              Management, 2021, pp. 710–719.
     recommendation, in: Fourteenth ACM Conference            [15] Z. Fan, Z. Liu, S. Wang, L. Zheng, P. S. Yu, Model-
     on Recommender Systems, 2020, pp. 53–62.                      ing sequences as distributions with uncertainty for
 [4] J. Lin, W. Pan, Z. Ming, Fissa: fusing item similarity        sequential recommendation, in: Proceedings of the
     models with self-attention networks for sequential            30th ACM International Conference on Information
     recommendation, in: Fourteenth ACM Conference                 & Knowledge Management, 2021, pp. 3019–3023.
     on Recommender Systems, 2020, pp. 130–139.               [16] Z. He, H. Zhao, Z. Lin, Z. Wang, A. Kale, J. Mcauley,
 [5] L. Wu, S. Li, C.-J. Hsieh, J. Sharpnack, Sse-pt: Se-          Locker: Locally constrained self-attentive sequen-
     quential recommendation via personalized trans-               tial recommendation, in: Proceedings of the 30th
     former, in: Fourteenth ACM Conference on Rec-                 ACM International Conference on Information &
     ommender Systems, 2020, pp. 328–337.                          Knowledge Management, 2021, pp. 3088–3092.
 [6] F. Mi, X. Lin, B. Faltings, Ader: Adaptively dis-        [17] Q. Cui, C. Zhang, Y. Zhang, J. Wang, M. Cai, St-pil:
     tilled exemplar replay towards continual learning             Spatial-temporal periodic interest learning for next
     for session-based recommendation, in: Fourteenth              point-of-interest recommendation, in: Proceed-
     ACM Conference on Recommender Systems, 2020,                  ings of the 30th ACM International Conference on
     pp. 408–413.                                                  Information & Knowledge Management, 2021, pp.
 [7] S. Liu, Y. Zheng, Long-tail session-based recom-              2960–2964.
     mendation, in: Fourteenth ACM conference on              [18] Z. Chen, W. Zhang, J. Yan, G. Wang, J. Wang, Learn-
     recommender systems, 2020, pp. 509–514.                       ing dual dynamic representations on time-sliced
 [8] S. M. Cho, E. Park, S. Yoo, Meantime: Mixture                 user-item interaction graphs for sequential recom-
     of attention mechanisms with multi-temporal em-               mendation, in: Proceedings of the 30th ACM Inter-
     beddings for sequential recommendation, in: Four-             national Conference on Information & Knowledge
     teenth ACM Conference on Recommender Systems,                 Management, 2021, pp. 231–240.
     2020, pp. 515–520.                                       [19] W. Ji, K. Wang, X. Wang, T. Chen, A. Cristea, Se-
 [9] Z. Zhao, L. Hong, L. Wei, J. Chen, A. Nath, S. An-            quential recommender via time-aware attentive
     drews, A. Kumthekar, M. Sathiamoorthy, X. Yi,                 memory network, in: Proceedings of the 29th ACM
     E. Chi, Recommending what video to watch next:                International Conference on Information & Knowl-
     a multitask ranking system, in: Proceedings of the            edge Management, 2020, pp. 565–574.
     13th ACM Conference on Recommender Systems,              [20] W. Chen, P. Ren, F. Cai, F. Sun, M. de Rijke, Improv-
     2019, pp. 43–51.                                              ing end-to-end sequential recommendations with
[10] Q. Wu, C. Yang, S. Yu, X. Gao, G. Chen, Seq2bub-              intent-aware diversification, in: Proceedings of the
     bles: Region-based embedding learning for user                29th ACM International Conference on Information
     behaviors in sequential recommenders, in: Proceed-            & Knowledge Management, 2020, pp. 175–184.
     ings of the 30th ACM International Conference on         [21] K. Zhou, H. Wang, W. X. Zhao, Y. Zhu, S. Wang,
     Information & Knowledge Management, 2021, pp.                 F. Zhang, Z. Wang, J.-R. Wen, S3-rec: Self-
     2160–2169.                                                    supervised learning for sequential recommenda-
[11] Z. Fan, Z. Liu, J. Zhang, Y. Xiong, L. Zheng, P. S.           tion with mutual information maximization, in:
     Yu, Continuous-time sequential recommendation                 Proceedings of the 29th ACM International Confer-
     with temporal graph collaborative transformer, in:            ence on Information & Knowledge Management,
     Proceedings of the 30th ACM International Confer-             2020, pp. 1893–1902.
     ence on Information & Knowledge Management,
[22] M. M. Tanjim, Dynamicrec: A dynamic convolu-                   ence on Research and Development in Information
     tional network for next item recommendation, in:               Retrieval, 2020, pp. 89–98.
     Proceedings of the 29th ACM International Confer-         [32] L. Zheng, N. Guo, W. Chen, J. Yu, D. Jiang,
     ence on Information and Knowledge Management                   Sentiment-guided sequential recommendation, in:
     (CIKM-2020), 2020.                                             Proceedings of the 43rd International ACM SIGIR
[23] F. Sun, J. Liu, J. Wu, C. Pei, X. Lin, W. Ou, P. Jiang,        Conference on Research and Development in Infor-
     Bert4rec: Sequential recommendation with bidirec-              mation Retrieval, 2020, pp. 1957–1960.
     tional encoder representations from transformer, in:      [33] C. Wang, M. Zhang, W. Ma, Y. Liu, S. Ma, Make it a
     Proceedings of the 28th ACM international confer-              chorus: knowledge-and time-aware item modeling
     ence on information and knowledge management,                  for sequential recommendation, in: Proceedings
     2019, pp. 1441–1450.                                           of the 43rd International ACM SIGIR Conference
[24] A. Yan, S. Cheng, W.-C. Kang, M. Wan, J. McAuley,              on Research and Development in Information Re-
     Cosrec: 2d convolutional neural networks for se-               trieval, 2020, pp. 109–118.
     quential recommendation, in: Proceedings of the           [34] K. Ren, J. Qin, Y. Fang, W. Zhang, L. Zheng, W. Bian,
     28th ACM International Conference on Information               G. Zhou, J. Xu, Y. Yu, X. Zhu, et al., Lifelong se-
     and Knowledge Management, 2019, pp. 2173–2176.                 quential modeling with personalized memorization
[25] Y. Wu, K. Li, G. Zhao, X. Qian, Long-and short-                for user response prediction, in: Proceedings of the
     term preference learning for next poi recommenda-              42nd International ACM SIGIR Conference on Re-
     tion, in: Proceedings of the 28th ACM international            search and Development in Information Retrieval,
     conference on information and knowledge manage-                2019, pp. 565–574.
     ment, 2019, pp. 2301–2304.                                [35] M. Ma, P. Ren, Y. Lin, Z. Chen, J. Ma, M. d. Ri-
[26] F. Lv, T. Jin, C. Yu, F. Sun, Q. Lin, K. Yang, W. Ng,          jke, 𝜋-net: A parallel information-sharing network
     Sdm: Sequential deep matching model for online                 for shared-account cross-domain sequential recom-
     large-scale recommender system, in: Proceedings                mendations, in: Proceedings of the 42nd Inter-
     of the 28th ACM International Conference on In-                national ACM SIGIR Conference on Research and
     formation and Knowledge Management, 2019, pp.                  Development in Information Retrieval, 2019, pp.
     2635–2643.                                                     685–694.
[27] R. Cai, J. Wu, A. San, C. Wang, H. Wang, Category-        [36] B. Hidasi, A. Karatzoglou, L. Baltrunas, D. Tikk,
     aware collaborative sequential recommendation, in:             Session-based recommendations with recurrent
     Proceedings of the 44th International ACM SIGIR                neural networks, arXiv preprint arXiv:1511.06939
     Conference on Research and Development in Infor-               (2015).
     mation Retrieval, 2021, pp. 388–397.                      [37] W.-C. Kang, J. McAuley, Self-attentive sequential
[28] Z. Liu, Z. Fan, Y. Wang, P. S. Yu, Augmenting se-              recommendation, in: 2018 IEEE International Con-
     quential recommendation with pseudo-prior items                ference on Data Mining (ICDM), IEEE, 2018, pp.
     via reversely pre-training transformer, in: Proceed-           197–206.
     ings of the 44th international ACM SIGIR confer-          [38] H. Ying, F. Zhuang, F. Zhang, Y. Liu, G. Xu, X. Xie,
     ence on Research and development in information                H. Xiong, J. Wu, Sequential recommender system
     retrieval, 2021, pp. 1608–1612.                                based on hierarchical attention network, in: IJCAI
[29] X. Yuan, D. Duan, L. Tong, L. Shi, C. Zhang, Icai-             International Joint Conference on Artificial Intelli-
     sr: Item categorical attribute integrated sequential           gence, 2018.
     recommendation, in: Proceedings of the 44th In-           [39] G. de Souza Pereira Moreira, S. Rabhi, J. M. Lee,
     ternational ACM SIGIR Conference on Research                   R. Ak, E. Oldridge, Transformers4rec: Bridging
     and Development in Information Retrieval, 2021,                the gap between nlp and sequential/session-based
     pp. 1687–1691.                                                 recommendation, in: Fifteenth ACM Conference
[30] X. Fan, Z. Liu, J. Lian, W. X. Zhao, X. Xie, J.-R.             on Recommender Systems, 2021, pp. 143–153.
     Wen, Lighter and better: low-rank decomposed              [40] C. Wang, M. Zhang, W. Ma, Y. Liu, S. Ma, Model-
     self-attention networks for next-item recommenda-              ing item-specific temporal dynamics of repeat con-
     tion, in: Proceedings of the 44th International ACM            sumption for recommender systems, in: The World
     SIGIR Conference on Research and Development                   Wide Web Conference, 2019, pp. 1977–1987.
     in Information Retrieval, 2021, pp. 1733–1737.            [41] C. Wang, M. Zhang, W. Ma, Y. Liu, S. Ma, Make it a
[31] R. Ren, Z. Liu, Y. Li, W. X. Zhao, H. Wang, B. Ding,           chorus: knowledge-and time-aware item modeling
     J.-R. Wen, Sequential recommendation with self-                for sequential recommendation, in: Proceedings
     attentive multi-adversarial network, in: Proceed-              of the 43rd International ACM SIGIR Conference
     ings of the 43rd International ACM SIGIR Confer-               on Research and Development in Information Re-
                                                                    trieval, 2020, pp. 109–118.
[42] C. Wang, W. Ma, M. Zhang, C. Chen, Y. Liu, S. Ma,         on knowledge discovery & data mining, 2019, pp.
     Toward dynamic user intention: Temporal evolu-            825–833.
     tionary effects of item relations in sequential rec- [48] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Brad-
     ommendation, ACM Transactions on Information              bury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein,
     Systems (TOIS) 39 (2020) 1–33.                            L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito,
[43] D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara,       M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner,
     Variational autoencoders for collaborative filtering,     L. Fang, J. Bai, S. Chintala, Pytorch: An im-
     in: Proceedings of the 2018 world wide web confer-        perative style, high-performance deep learning li-
     ence, 2018, pp. 689–698.                                  brary, in: H. Wallach, H. Larochelle, A. Beygelz-
[44] R. Ragesh, S. Sellamanickam, V. Lingam, A. Iyer,          imer, F. d'Alché-Buc, E. Fox, R. Garnett (Eds.), Ad-
     R. Bairi, User embedding based neighborhood ag-           vances in Neural Information Processing Systems
     gregation method for inductive recommendation,            32, Curran Associates, Inc., 2019, pp. 8024–8035.
     arXiv preprint arXiv:2102.07575 (2021).                   URL: http://papers.neurips.cc/paper/9015-pytorch
[45] Y. Shen, Y. Wu, Y. Zhang, C. Shan, J. Zhang, B. K.        -an-imperative-style-high-performance-deep-lea
     Letaief, D. Li, How powerful is graph convolution         rning-library.pdf.
     for recommendation?, in: Proceedings of the 30th [49] N. Neophytou, B. Mitra, C. Stinson, Revisiting pop-
     ACM International Conference on Information &             ularity and demographic biases in recommender
     Knowledge Management, 2021, pp. 1619–1629.                evaluation and effectiveness, in: European Confer-
[46] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever,    ence on Information Retrieval, Springer, 2022, pp.
     R. Salakhutdinov, Dropout: a simple way to prevent        641–654.
     neural networks from overfitting, The journal of [50] S. Rendle, C. Freudenthaler, Z. Gantner, L. Schmidt-
     machine learning research 15 (2014) 1929–1958.            Thieme, Bpr: Bayesian personalized ranking from
[47] C. Ma, P. Kang, X. Liu, Hierarchical gating networks      implicit feedback, arXiv preprint arXiv:1205.2618
     for sequential recommendation, in: Proceedings            (2012).
     of the 25th ACM SIGKDD international conference