Towards Interaction-based User Embeddings in Sequential Recommender Models Marina Ananyeva1,2 , Oleg Lashinin1 , Veronika Ivanova1 , Sergey Kolesnikov1 and Dmitry I. Ignatov2 1 Tinkoff, Moscow, Russia 2 National Research University Higher School of Economics, Moscow, Russia Abstract All transductive recommender systems are unable to make predictions for users who were not included in the training sample due to the process of learning user-specific embeddings. In this paper, we propose a new method for replacing identity-based user embeddings in existing sequential models with interaction-based user vectors trained purely on interaction sequences. Such vectors are composed of user interactions using GRU layers with adjusted dropout and maximum item sequence length. This approach is substantially more efficient and does not require retraining when new users appear. Extensive experiments on three open-source datasets demonstrate noticeable improvement in quality metrics for the most of selected state-of-the-art sequential recommender models. Keywords sequential recommendation, user-specific embeddings, inductive learning 1. Introduction Secondly, storing trainable user vectors may allocate a lot of memory, since the amount of occupied space is Recommender systems are widely used in various online usually O(n), where n is the number of users. It results services, such as social networks, e-commerce, and enter- in issues associated with model exploitation and storage tainment platforms. These services gather large amounts for a large number of users, which are prevalent in the of sequential data, including the history of interactions development of online services. Without the use of user- between users and items. Some sequential models re- specific vectors, we don’t occupy memory for storing quire learning the ID-based latent user vectors, which a look-up ID-dependent user embedding matrix, reduc- are supposed to represent both short-term and long-term ing space complexity to O(1) by on-the-fly inference of preferences based on user-specific information and pre- user embedding by the input interactions, which greatly vious history of interactions. However, there are several simplifies the operating process. drawbacks to this approach. In this research, we present a method for constructing Firstly, transductive models can recommend items only real-time produced user vectors that is able to overcome to users from the training set. The predictions cannot the limitations mentioned above. The contributions of be obtained only based on previous interactions of out- this work are summarized as follows: of-sample users because the model’s user embeddings depend on users’ IDs and additional features (if provided). • We proposed a method of composing user em- The problem of making recommendations for new users bedding based purely on interaction sequences, is solved either by fully retraining the model on the up- which can be employed in architectures of ex- dated data or iterative training on new batches [1]. For isting recommender sequential models instead industrial purposes, the retraining process on large-scale of ID-based user-specific embeddings. This ap- data is time- and space-consuming, constantly affecting proach helps to avoid the need to retrain recom- the user coverage with recommendations and the quality mendation models as new interactions emerge. of service for new users. In addition, it does not require storage of per- user embeddings and is therefore more storage efficient and scalable. ORSUM@ACM RecSys 2022: 5th Workshop on Online Recommender Systems and User Modeling, jointly with the 16th ACM Conference on • We have comprehensively reviewed existing Recommender Systems, September 23rd, 2022, Seattle, WA, USA works in three A and B-ranked conference series Envelope-Open mananeva@hse.ru (M. Ananyeva); o.a.lashinin@tinkoff.ru (RecSys, CIKM, and SIGIR) in 2019-2021 that use (O. Lashinin); ext.vvivanova@tinkoff.ru (V. Ivanova); identity-based user embeddings in architectures. s.s.kolesnikov@tinkoff.ru (S. Kolesnikov); dignatov@hse.ru This shows that a third of the existing models can (D. I. Ignatov) Orcid 1234-5678-9012 (M. Ananyeva) be improved using our approach. © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) The experiments can be reproduced using our open- Figure 1: Illustration of the proposed approach. We create user embeddings from its interaction history. Therefore, we do not need to learn ID-based user-specific embeddings. The last known vector of interaction history embeddings is used as user embedding. Table 1 Full and short papers on sequential recommender models per conference series from 2019 to 2021. Conference Articles Number of models with user vectors Explained motivation for using user vectors RecSys'21 [2] 0/1 (0%) - RecSys'20 [3, 4, 5, 6, 7, 8] 2/6 (33%) [3, 5] Long-term preferences, model person- alization RecSys'19 [9] 0/1 (0%) - CIKM'21 [10, 11, 12, 13, 14, 15, 16, 17, 18] 3/9 (33%) [11, 12, 18] Long-term preferences (2 works), friends’ impact CIKM'20 [19, 20, 21, 22] 1/4 (25%) [19] Short&long-term preferences CIKM'19 [23, 24, 25, 26] 3/4 (75%) [24, 25, 26] Short&long-term preferences SIGIR'21 [27, 28, 29, 30] 0/4 (0%) - SIGIR'20 [31, 32, 33] 1/3 (33%) [33] For ranking score in BPR and GMF SIGIR'19 [34, 35] 2/2 (100%) [34, 35] Lifelong user behavior, user-specific representations source repository1 . training set. In contrast, inductive learning models can provide recommendations for out-of-sample users, who have interactions but are not included in the training pro- 2. Related work cess. For instance, Mult-VAE [43] and CF-LGCN-E [44], which is modified version of LightGCN [45] for inductive Sequence-based recommender models are commonly learning mode, can provide predictions for users outside used for recommendation tasks on serial data. Most of of the training sample. Nevertheless, the quality of in- them are based on recurrent neural networks (RNNs), for ductive models is often lower than that of transductive instance, GRU4Rec [36], SASRec [37], and SHAN [38]. ones. Additionally, Transformers4Rec [39] is gaining popular- Thus, one of the open challenges for transductive mod- ity in usage for sequential and session-based tasks. els which show high performance is to overcome the Some architectures attempt to model temporal decay problem of making predictions for out-of-sample users. effects in user interaction history in order to improve the Additionally, the effect of user-specific embeddings on relevance of recommendations. Customer needs as well the quality of recommendations is not yet sufficiently as both short-term and long-term preferences change studied. In this research, we propose researching whether over time, which should be taken into account in the pre- we really need user-specific embeddings or if it is better dictions. Intuitively, the most recent interactions should to train ID- and feature-free user vectors based solely on have greater weight than older ones in deciding on the previous item interactions. next item. Additionally, users may require substitutions or supplements for an already acquired item. These as- sumptions have been incorporated into the design of the 3. METHODOLOGY SLRC [40], Chorus [41] and KDA [42] models. The approaches mentioned above have serious lim- 3.1. The rationale for Using User-specific itations for industrial applications: item recommenda- Vectors tions are made based on user-specific embeddings, which can be trained only for users that were included in the To determine how frequently trainable user-specific vec- tors are used in existing sequential recommender models 1 https://github.com/tinkoff-ai/use_rs and to systematize the reasons for their use, we examined the proceedings of scientific conferences with relevant store pre-computed embeddings in a look-up matrix with articles. The summary in Table 1 shows that our anal- users’ IDs as each vector can be derived on-the-fly from ysis research includes articles that were presented be- the input interaction sequence using the learned neural tween 2019 and 2021 in three conference series: RecSys, network weights. It addresses the scalability issues for CIKM, and SIGIR. A paper was considered relevant if it commercial applications. Secondly, it can be regarded as proposed a sequential recommendation model, including a step toward users’ privacy and confidentiality, because session-based and POI recommendation tasks, and if their a user identifier is redundant information, and without us- performance was compared to sequential recommender ing it we can not map it back to personal data. Lastly, our model baselines. As a result, we compiled a list of 34 approach allows adapting previously introduced sequen- relevant studies, 12 of which contain applications of user tial recommender models to inductive learning scenarios, embeddings. According to the authors, the primary pur- when we can infer the recommendations for the users, pose for including user vector processing in the proposed who were not included in a training sample. methods, which appeared in five studies, was to represent long-term preferences. Other objectives included model- 3.3. Models ing both short-term and long-term preferences, learning user-specific vectors from mixed representations of all In our experiments, we decided to use one of the most users sharing the same account, and modeling the impact popular frameworks for sequential recommendation of friends’ behavior. models - ReChorus2 and RecBole3 . For the experimen- tal setup, we have selected state-of-the-art models that have proven themselves in many new research papers as 3.2. Initialization of User Vectors reliable baselines for comparison with new models. Each sequential model processes the history of users’ Thus, we selected three models from the ReChorus interactions in order to represent relationships between framework - KDA, Chorus, and SLRC - and two mod- interactions and then model the user’s behavioral pat- els from RecBole - SHAN and HGN - in order to study terns [36, 37]. how different user vector initialization techniques affect In our work, we investigate the feasibility of employing model performance on three open-source datasets. this method to obtain vector representations that reflect Two RecBole models have been implemented in Re- users’ interests, as well as how it influences the quality of Chorus to ensure a fair comparison of the models. sequential models. All of the selected models transform the user ID into a low-dimensional real-valued dense • Sequential Hierarchical Attention Network 𝑑 vector representation u ∈ ℝ , where d is the dimension (SHAN) [38] is a two-layer hierarchical attention of the user embedding. The embedding is then processed network. The attention mechanism is needed to in accordance with the architecture of each model. assign altered weights of items for the user to Instead of using this technique, we propose to re- capture the dynamic property, while the hierar- place the ID-based user embedding initialization with chical structure integrates the user’s long- and interaction-based user embedding initialization, suggest- short-term preferences. User embedding vector ing that the user ID be discarded as limiting information is used as context information to obtain various for efficiency and scalability. weights for different users. Let Su be the input representations of previous interac- • Hierarchical Gating Network (HGN) [47] con- tions Su ∈ ℝ𝐿×𝑑 , where L is the maximum history length. sists of three parts: feature gating, instance gat- First, we apply a Dropout layer [46] to the matrix Su . ing, and item-item product modules. The fea- The sequence representation is then processed by GRU ture gating module allows the adaptive selection layers. Note that our goal is to show that our approach is of effective latent features based on user inter- effective even with a simple recurrent layer like GRU. The ests. At the instance gating module, items that use of more advanced layers is left for future improve- reflect short-term user preferences are selected ments. The final step is to use a linear layer to reduce and passed down to lower layers along with item the embedding dimension to its initial size d and take the features. User embedding is used in both feature last known vector of interaction history. As seen on Fig- gating and instance gating modules. ure 1, we obtain a user embedding u ∈ ℝ𝑑 as the output • Chorus [41] incorporates the representation of of successively applied layers: Dropout layer, two GRU different sequence contexts by knowledge and layers, and Dense layer, the input of which is a sequence time-aware item modeling. The constructed tem- of each user’s historical data. poral kernel functions modify the temporal dy- This approach has several significant advantages. First, namics of relations by representing two sorts of the space complexity is optimized from 𝑂(𝑛) to 𝑂(1), 2 https://github.com/THUwangcy/ReChorus where 𝑛 is the number of users. There is no need to 3 https://recbole.io/ Table 2 4.1. Datasets Descriptive statistics of datasets. We chose the three datasets most commonly used for Dataset #users #items #actions #density sequential recommendation: MovieLens-1M 4 , Amazon- MovieLens-1M 6,040 3,416 1M 4.84% Grocery and Gourmet Food and Amazon-Electronics 5 . Grocery&Gourmet 127,496 41,280 1,1M 0.022% These open-source datasets have different domains, sizes, Electronics 192,403 63,001 1,7M 0.014% and sparsity. They contain user interaction sequences with timestamps and item metadata, including the list of also view and also buy relations in Amazon datasets and items - substitutes and complements - and al- the list of genres in the MovieLens data set. We use a lowing relational representations to contribute common leave-one-out strategy with 99 negative items, differentially to the final item embedding. User similar to [42]. For SHAN, HGN, and SLRC, we only need embeddings are used in both the BPR and GMF user interaction sequences, while Chorus and KDA are approaches for making predictions. based on knowledge graphs, so we use metadata to build • Short-Term and Life-Time Repeat Consump- them. In Amazon datasets, we simply use the relations of tion (SLRC) [40] model uses the Hawkes Process also view and also buy, provided in the metadata data set, and Collaborative Filtering, which requires learn- as was done in [41]. We chose the most popular movies ing user embeddings to distinguish between user of the same genre as the equivalent of also view items for interests and help explore new items. Consider- the MovieLens data set, and the most popular items in ing the lack of recurrent interactions in the Ama- the set of movies that the user has watched right after the zon and MovieLens datasets, we use this model to ground-truth item as the equivalent of also buy items. derive substitutive and complementary types of relations between items, as implemented in the SLRC model in Chorus. 4.2. Evaluation Metrics • Knowledge-aware Dynamic Attention (KDA) Hit Ratio (HR@k) and Normalized Discounted Cumula- [42] takes both item relations and their temporal tive Gain (NDCG@k) were used as evaluation metrics, evolution into account. The core idea of KDA is to where k = [5, 10, 20, 50]. HR@k measures whether at aggregate the sequence of interactions into multi- least one ground-truth item appears in the top-k recom- ple relation-specific embeddings via an attention mendation list, whereas NDCG@k considers both the mechanism. Fourier transform with trainable fre- position and relevance of the item in the recommenda- quency domain embeddings was used in a novel tion list. The values of NDCG@10 and HR@10 for 5 way to simulate the diverse temporal effects of original and 5 modified models are presented in Table 3. various relational interactions. User vectors, as well as item vectors and interaction representa- 4.3. Experiment Settings tions, are used in the final ranking score. All models were implemented using the PyTorch frame- Overall, we selected all models stated above and com- work [48]. For a fair comparison, we set the embedding pared the original architectures with the architectures size to 64, batch size to 256, and the maximum history without user-specific vectors, based on our approach of length to 20 for all models and datasets, similar to exper- learning only from interaction sequences. iments in [41]. Additionally, we demonstrate the results of experiments for other values of the maximum history 4. EXPERIMENTS length: 10, 30, and 50. Other hyperparameters are depen- dent on the model and are set to their default values the In this section, we introduce our experimental setup and same as in the original implementations. The tuning of compare the performance of original models with modi- the hyperparameters across all methods and datasets is fied ones. Our experiments are designed to answer the left for future work. following research questions: RQ1: Does the proposed method have a positive effect on 4.4. Baselines the quality of existing sequential recommender models? RQ2: How does the maximum sequence length affect We include two baselines in order to obtain the relative the models’ performance? performance of non-sequential methods. Specifically, we include the POP method [49] which is a common non- personal baseline that recommends the most popular 4 https://grouplens.org/datasets/movielens/1m/ 5 http://jmcauley.ucsd.edu/data/amazon/ Table 3 The results of pairwise comparison of original and modified models. The best result in each pair of sequential models considered is in bold. ML-1M Grocery&Gourmet Electronics NDCG@10 HR@10 NDCG@10 HR@10 NDCG@10 HR@10 𝑃𝑂𝑃 0.2513 0.4575 0.2628 0.4350 0.3087 0.4849 𝐵𝑃𝑅 − 𝑀𝐹 0.4074 0.6844 0.3690 0.5516 0.3444 0.5348 𝑆𝐻 𝐴𝑁 0.3137 0.5661 0.2480 0.4067 0.2887 0.4418 𝐻 𝐺𝑁 0.5233 0.7846 0.3898 0.567 0.3845 0.5875 𝑆𝐿𝑅𝐶 0.3226 0.5778 0.3334 0.4982 0.3914 0.5569 𝐶ℎ𝑜𝑟𝑢𝑠 0.4309 0.7161 0.4046 0.5862 0.4063 0.5994 𝐾 𝐷𝐴 0.6041 0.8386 0.4442 0.6279 0.4605 0.6733 𝑆𝐻 𝐴𝑁𝑜𝑢𝑟 0.3209 0.5700 0.2565 0.4306 0.3137 0.4950 𝐻 𝐺𝑁𝑜𝑢𝑟 0.5812 0.8086 0.3268 0.4989 0.3662 0.5666 𝑆𝐿𝑅𝐶𝑜𝑢𝑟 0.5822 0.8091 0.3673 0.5376 0.4171 0.6040 𝐶ℎ𝑜𝑟𝑢𝑠𝑜𝑢𝑟 0.5976 0.8258 0.4089 0.5958 0.4523 0.6527 𝐾 𝐷𝐴𝑜𝑢𝑟 0.6011 0.8257 0.4456 0.6291 0.4544 0.6685 (a) MovieLens-1M (b) Amazon Grocery & Gourmet Food (c) Amazon Electronics Figure 2: Relative change in NDCG@10 of five models on MovieLens-1M, Amazon Grocery&Gourmet Food, and Amazon Electronics Datasets. items. Additionally, we add a BPR-MF [50] approach that NDCG@10 has decreased by 16%. As the authors of is often adopted as a classic matrix factorization-based HGN observed, the predictions of this model are highly method. dependent on the last items. When our approach con- structs a user vector from long sequences, the impact of 4.5. Performance Comparison the last items may be reduced. On Amazon Electronics we can see decreasing of metric values by 1% for 𝐾 𝐷𝐴𝑜𝑢𝑟 As can be seen, Table 3 shows the recommendation per- and by 5% for 𝐻 𝐺𝑁𝑜𝑢𝑟 , while for 𝑆𝐿𝑅𝐶𝑜𝑢𝑟 , 𝑆𝐻 𝐴𝑁𝑜𝑢𝑟 and formance of original architectures and modified models 𝐶ℎ𝑜𝑟𝑢𝑠𝑜𝑢𝑟 the quality improved in range of 7% to 12%. on three datasets (RQ1). The proposed strategy has a The overall performance of the four models improved significant impact on model quality across all datasets. dramatically, but changing user-specific vectors had al- For instance, on MovieLens-1M we can see increases in most no influence on the 𝐾 𝐷𝐴𝑜𝑢𝑟 model. According to a both NDCG@10 and HR@10 for 4 modified models, com- research article on KDA, one possible explanation is that pared to the original ones: 𝑆𝐻 𝐴𝑁𝑜𝑢𝑟 , 𝐻 𝐺𝑁𝑜𝑢𝑟 , 𝑆𝐿𝑅𝐶𝑜𝑢𝑟 the architecture of KDA [42] is not highly sensitive to and 𝐶ℎ𝑜𝑟𝑢𝑠𝑜𝑢𝑟 , while for 𝐾 𝐷𝐴𝑜𝑢𝑟 quality remains nearly the presence of user vectors at all. 𝑆𝐿𝑅𝐶𝑜𝑢𝑟 demonstrates the same. The quality improvement varies widely, rang- significant improvement in quality for all datasets. It ing from 1% for 𝑆𝐻 𝐴𝑁𝑜𝑢𝑟 to 81% for 𝑆𝐿𝑅𝐶𝑜𝑢𝑟 . We even ob- is explained by the fact that the SLRC algorithm’s core serve a slight boost in evaluation metrics for the strongest component is collaborative filtering (CF), which is good baseline, KDA, on Amazon Grocery&Gourmet Food. for modeling long-term user preferences. Our technique However, the quality of 𝐻 𝐺𝑁𝑜𝑢𝑟 has deteriorated: allows us to evaluate short-term preferences in CF, which the original model may have overlooked. If we consider sequential models instead of user-specific embeddings. each of the model-dataset pairs as a separate experiment, Our method does not require constant retraining of the our approach dramatically increases the quality metrics model as the number of users increases, and is memory- in 11 out of 15 cases. efficient. Extensive experiments on 3 real-world datasets Summing up, comparative experiments on three real- reveal that the majority of evaluated models were im- world datasets show the effectiveness of our approach proved in quality. Additionally, we studied the relation- and significant improvement of quality for the majority ship between the model’s relative improvement and item of examined models. A new method with replaced user- sequence length when our method is applied. Thus, we specific embeddings provides a significant relative gain suggest researchers experiment with our approach in in performance (e.g., 0.6% − 12.1% for SHAN [38], 1.1% − their studies by using ID-based user-specific embeddings. 38.7% for Chorus [41], 6.6% − 80% for SLRC[40]). Our results can open up a new research area for ablation Figure 2 shows how the maximum history length influ- studies on the use of user-specific embeddings in recom- ences quality improvement when our approach is applied mender systems. In the future, we are going to apply our (RQ2). The smaller the maximum sequence length, the approach to more modern models and try more complex better the model captures user short-term preferences, architectures than GRU. In addition, it is essential to in- while long-term effects outweigh short-term effects for vestigate how high-quality and stable this approach is larger lengths. When the length of the sequence shrinks, with an extremely small number of user interactions. the long-term influence of modeling, which is the primary reason for using user embeddings in a model, disappears. As a result, replacing user-specific vectors works effec- Acknowledgements tively for both short (l = 10) and long (l = 50) sequences. This research was supported by the Tinkoff Laboratory and the Laboratory for Model and Methods of Compu- 5. CONCLUSION tational Pragmatics at the National Research University Higher School of Economics (HSE). The contribution of In this research, we proposed a method of composing Dmitry I. Ignatov to the article was done within the frame- vectors based purely on interaction sequences, which can work of the HSE University Basic Research Program. be employed in architectures of existing recommender References 2021, pp. 433–442. [12] Y. Li, Y. Ding, B. Chen, X. Xin, Y. Wang, Y. Shi, [1] Y. Zhang, F. Feng, C. Wang, X. He, M. Wang, Y. Li, R. Tang, D. Wang, Extracting attentive social tempo- Y. Zhang, How to retrain recommender system? a ral excitation for sequential recommendation, arXiv sequential meta-learning method, in: Proceedings preprint arXiv:2109.13539 (2021). of the 43rd International ACM SIGIR Conference [13] Y. Li, T. Chen, P.-F. Zhang, H. Yin, Lightweight on Research and Development in Information Re- self-attentive sequential recommendation, in: Pro- trieval, 2020, pp. 1479–1488. ceedings of the 30th ACM International Conference [2] W. Song, S. Wang, Y. Wang, S. Wang, Next-item on Information & Knowledge Management, 2021, recommendations in short sessions, in: Fifteenth pp. 967–977. ACM Conference on Recommender Systems, 2021, [14] K. Hu, L. Li, Q. Xie, J. Liu, X. Tao, What is next pp. 282–291. when sequential prediction meets implicitly hard [3] C. Hansen, C. Hansen, L. Maystre, R. Mehrotra, interaction?, in: Proceedings of the 30th ACM Inter- B. Brost, F. Tomasi, M. Lalmas, Contextual and national Conference on Information & Knowledge sequential user embeddings for large-scale music Management, 2021, pp. 710–719. recommendation, in: Fourteenth ACM Conference [15] Z. Fan, Z. Liu, S. Wang, L. Zheng, P. S. Yu, Model- on Recommender Systems, 2020, pp. 53–62. ing sequences as distributions with uncertainty for [4] J. Lin, W. Pan, Z. Ming, Fissa: fusing item similarity sequential recommendation, in: Proceedings of the models with self-attention networks for sequential 30th ACM International Conference on Information recommendation, in: Fourteenth ACM Conference & Knowledge Management, 2021, pp. 3019–3023. on Recommender Systems, 2020, pp. 130–139. [16] Z. He, H. Zhao, Z. Lin, Z. Wang, A. Kale, J. Mcauley, [5] L. Wu, S. Li, C.-J. Hsieh, J. Sharpnack, Sse-pt: Se- Locker: Locally constrained self-attentive sequen- quential recommendation via personalized trans- tial recommendation, in: Proceedings of the 30th former, in: Fourteenth ACM Conference on Rec- ACM International Conference on Information & ommender Systems, 2020, pp. 328–337. Knowledge Management, 2021, pp. 3088–3092. [6] F. Mi, X. Lin, B. Faltings, Ader: Adaptively dis- [17] Q. Cui, C. Zhang, Y. Zhang, J. Wang, M. Cai, St-pil: tilled exemplar replay towards continual learning Spatial-temporal periodic interest learning for next for session-based recommendation, in: Fourteenth point-of-interest recommendation, in: Proceed- ACM Conference on Recommender Systems, 2020, ings of the 30th ACM International Conference on pp. 408–413. Information & Knowledge Management, 2021, pp. [7] S. Liu, Y. Zheng, Long-tail session-based recom- 2960–2964. mendation, in: Fourteenth ACM conference on [18] Z. Chen, W. Zhang, J. Yan, G. Wang, J. Wang, Learn- recommender systems, 2020, pp. 509–514. ing dual dynamic representations on time-sliced [8] S. M. Cho, E. Park, S. Yoo, Meantime: Mixture user-item interaction graphs for sequential recom- of attention mechanisms with multi-temporal em- mendation, in: Proceedings of the 30th ACM Inter- beddings for sequential recommendation, in: Four- national Conference on Information & Knowledge teenth ACM Conference on Recommender Systems, Management, 2021, pp. 231–240. 2020, pp. 515–520. [19] W. Ji, K. Wang, X. Wang, T. Chen, A. Cristea, Se- [9] Z. Zhao, L. Hong, L. Wei, J. Chen, A. Nath, S. An- quential recommender via time-aware attentive drews, A. Kumthekar, M. Sathiamoorthy, X. Yi, memory network, in: Proceedings of the 29th ACM E. Chi, Recommending what video to watch next: International Conference on Information & Knowl- a multitask ranking system, in: Proceedings of the edge Management, 2020, pp. 565–574. 13th ACM Conference on Recommender Systems, [20] W. Chen, P. Ren, F. Cai, F. Sun, M. de Rijke, Improv- 2019, pp. 43–51. ing end-to-end sequential recommendations with [10] Q. Wu, C. Yang, S. Yu, X. Gao, G. Chen, Seq2bub- intent-aware diversification, in: Proceedings of the bles: Region-based embedding learning for user 29th ACM International Conference on Information behaviors in sequential recommenders, in: Proceed- & Knowledge Management, 2020, pp. 175–184. ings of the 30th ACM International Conference on [21] K. Zhou, H. Wang, W. X. Zhao, Y. Zhu, S. Wang, Information & Knowledge Management, 2021, pp. F. Zhang, Z. Wang, J.-R. Wen, S3-rec: Self- 2160–2169. supervised learning for sequential recommenda- [11] Z. Fan, Z. Liu, J. Zhang, Y. Xiong, L. Zheng, P. S. tion with mutual information maximization, in: Yu, Continuous-time sequential recommendation Proceedings of the 29th ACM International Confer- with temporal graph collaborative transformer, in: ence on Information & Knowledge Management, Proceedings of the 30th ACM International Confer- 2020, pp. 1893–1902. ence on Information & Knowledge Management, [22] M. M. Tanjim, Dynamicrec: A dynamic convolu- ence on Research and Development in Information tional network for next item recommendation, in: Retrieval, 2020, pp. 89–98. Proceedings of the 29th ACM International Confer- [32] L. Zheng, N. Guo, W. Chen, J. Yu, D. Jiang, ence on Information and Knowledge Management Sentiment-guided sequential recommendation, in: (CIKM-2020), 2020. Proceedings of the 43rd International ACM SIGIR [23] F. Sun, J. Liu, J. Wu, C. Pei, X. Lin, W. Ou, P. Jiang, Conference on Research and Development in Infor- Bert4rec: Sequential recommendation with bidirec- mation Retrieval, 2020, pp. 1957–1960. tional encoder representations from transformer, in: [33] C. Wang, M. Zhang, W. Ma, Y. Liu, S. Ma, Make it a Proceedings of the 28th ACM international confer- chorus: knowledge-and time-aware item modeling ence on information and knowledge management, for sequential recommendation, in: Proceedings 2019, pp. 1441–1450. of the 43rd International ACM SIGIR Conference [24] A. Yan, S. Cheng, W.-C. Kang, M. Wan, J. McAuley, on Research and Development in Information Re- Cosrec: 2d convolutional neural networks for se- trieval, 2020, pp. 109–118. quential recommendation, in: Proceedings of the [34] K. Ren, J. Qin, Y. Fang, W. Zhang, L. Zheng, W. Bian, 28th ACM International Conference on Information G. Zhou, J. Xu, Y. Yu, X. Zhu, et al., Lifelong se- and Knowledge Management, 2019, pp. 2173–2176. quential modeling with personalized memorization [25] Y. Wu, K. Li, G. Zhao, X. Qian, Long-and short- for user response prediction, in: Proceedings of the term preference learning for next poi recommenda- 42nd International ACM SIGIR Conference on Re- tion, in: Proceedings of the 28th ACM international search and Development in Information Retrieval, conference on information and knowledge manage- 2019, pp. 565–574. ment, 2019, pp. 2301–2304. [35] M. Ma, P. Ren, Y. Lin, Z. Chen, J. Ma, M. d. Ri- [26] F. Lv, T. Jin, C. Yu, F. Sun, Q. Lin, K. Yang, W. Ng, jke, 𝜋-net: A parallel information-sharing network Sdm: Sequential deep matching model for online for shared-account cross-domain sequential recom- large-scale recommender system, in: Proceedings mendations, in: Proceedings of the 42nd Inter- of the 28th ACM International Conference on In- national ACM SIGIR Conference on Research and formation and Knowledge Management, 2019, pp. Development in Information Retrieval, 2019, pp. 2635–2643. 685–694. [27] R. Cai, J. Wu, A. San, C. Wang, H. Wang, Category- [36] B. Hidasi, A. Karatzoglou, L. Baltrunas, D. Tikk, aware collaborative sequential recommendation, in: Session-based recommendations with recurrent Proceedings of the 44th International ACM SIGIR neural networks, arXiv preprint arXiv:1511.06939 Conference on Research and Development in Infor- (2015). mation Retrieval, 2021, pp. 388–397. [37] W.-C. Kang, J. McAuley, Self-attentive sequential [28] Z. Liu, Z. Fan, Y. Wang, P. S. Yu, Augmenting se- recommendation, in: 2018 IEEE International Con- quential recommendation with pseudo-prior items ference on Data Mining (ICDM), IEEE, 2018, pp. via reversely pre-training transformer, in: Proceed- 197–206. ings of the 44th international ACM SIGIR confer- [38] H. Ying, F. Zhuang, F. Zhang, Y. Liu, G. Xu, X. Xie, ence on Research and development in information H. Xiong, J. Wu, Sequential recommender system retrieval, 2021, pp. 1608–1612. based on hierarchical attention network, in: IJCAI [29] X. Yuan, D. Duan, L. Tong, L. Shi, C. Zhang, Icai- International Joint Conference on Artificial Intelli- sr: Item categorical attribute integrated sequential gence, 2018. recommendation, in: Proceedings of the 44th In- [39] G. de Souza Pereira Moreira, S. Rabhi, J. M. Lee, ternational ACM SIGIR Conference on Research R. Ak, E. Oldridge, Transformers4rec: Bridging and Development in Information Retrieval, 2021, the gap between nlp and sequential/session-based pp. 1687–1691. recommendation, in: Fifteenth ACM Conference [30] X. Fan, Z. Liu, J. Lian, W. X. Zhao, X. Xie, J.-R. on Recommender Systems, 2021, pp. 143–153. Wen, Lighter and better: low-rank decomposed [40] C. Wang, M. Zhang, W. Ma, Y. Liu, S. Ma, Model- self-attention networks for next-item recommenda- ing item-specific temporal dynamics of repeat con- tion, in: Proceedings of the 44th International ACM sumption for recommender systems, in: The World SIGIR Conference on Research and Development Wide Web Conference, 2019, pp. 1977–1987. in Information Retrieval, 2021, pp. 1733–1737. [41] C. Wang, M. Zhang, W. Ma, Y. Liu, S. Ma, Make it a [31] R. Ren, Z. Liu, Y. Li, W. X. Zhao, H. Wang, B. Ding, chorus: knowledge-and time-aware item modeling J.-R. Wen, Sequential recommendation with self- for sequential recommendation, in: Proceedings attentive multi-adversarial network, in: Proceed- of the 43rd International ACM SIGIR Conference ings of the 43rd International ACM SIGIR Confer- on Research and Development in Information Re- trieval, 2020, pp. 109–118. [42] C. Wang, W. Ma, M. Zhang, C. Chen, Y. Liu, S. Ma, on knowledge discovery & data mining, 2019, pp. Toward dynamic user intention: Temporal evolu- 825–833. tionary effects of item relations in sequential rec- [48] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Brad- ommendation, ACM Transactions on Information bury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, Systems (TOIS) 39 (2020) 1–33. L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, [43] D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, Variational autoencoders for collaborative filtering, L. Fang, J. Bai, S. Chintala, Pytorch: An im- in: Proceedings of the 2018 world wide web confer- perative style, high-performance deep learning li- ence, 2018, pp. 689–698. brary, in: H. Wallach, H. Larochelle, A. Beygelz- [44] R. Ragesh, S. Sellamanickam, V. Lingam, A. Iyer, imer, F. d'Alché-Buc, E. Fox, R. Garnett (Eds.), Ad- R. Bairi, User embedding based neighborhood ag- vances in Neural Information Processing Systems gregation method for inductive recommendation, 32, Curran Associates, Inc., 2019, pp. 8024–8035. arXiv preprint arXiv:2102.07575 (2021). URL: http://papers.neurips.cc/paper/9015-pytorch [45] Y. Shen, Y. Wu, Y. Zhang, C. Shan, J. Zhang, B. K. -an-imperative-style-high-performance-deep-lea Letaief, D. Li, How powerful is graph convolution rning-library.pdf. for recommendation?, in: Proceedings of the 30th [49] N. Neophytou, B. Mitra, C. Stinson, Revisiting pop- ACM International Conference on Information & ularity and demographic biases in recommender Knowledge Management, 2021, pp. 1619–1629. evaluation and effectiveness, in: European Confer- [46] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, ence on Information Retrieval, Springer, 2022, pp. R. Salakhutdinov, Dropout: a simple way to prevent 641–654. neural networks from overfitting, The journal of [50] S. Rendle, C. Freudenthaler, Z. Gantner, L. Schmidt- machine learning research 15 (2014) 1929–1958. Thieme, Bpr: Bayesian personalized ranking from [47] C. Ma, P. Kang, X. Liu, Hierarchical gating networks implicit feedback, arXiv preprint arXiv:1205.2618 for sequential recommendation, in: Proceedings (2012). of the 25th ACM SIGKDD international conference