RecBaselines2023: a new dataset for choosing baselines for recommender models Veronika Ivanova1 , Oleg Lashinin2 , Marina Ananyeva1,2 and Sergey Kolesnikov2 1 National Research University Higher School of Economics, Myasnitskaya Ulitsa, 20, Moscow, 101000, Russian Federation 2 Tinkoff, 2-Ya Khutorskaya Ulitsa, 38A, bld. 26, Moscow, 117198, Russian Federation Abstract The number of proposed recommender algorithms continues to grow. The authors propose new ap- proaches and compare them with existing models, called baselines. Due to the large number of rec- ommender models, it is difficult to estimate which algorithms to choose in the article. To solve this problem, we have collected and published a dataset containing information about the recommender models used in 903 papers, both as baselines and as proposed approaches. This dataset can be seen as a typical dataset with interactions between papers and previously proposed models. In addition, we provide a descriptive analysis of the dataset and highlight possible challenges to be investigated with the data. Furthermore, we have conducted extensive experiments using a well-established methodology to build a good recommender algorithm under the dataset. Our experiments show that the selection of the best baselines for proposing new recommender approaches can be considered and successfully solved by existing state-of-the-art collaborative filtering models. Finally, we discuss limitations and future work. Keywords recommender systems, dataset, baselines 1. Introduction There is an increasing number of publications in the field of recommender systems. Authors need to evaluate the performance of the proposed model against reference models to demonstrate its efficiency. Reference models are usually referred to as baselines. However, there are no rigid guidelines that define a comprehensive list of essential baselines. Inaccurate selection of baselines can lead to incorrect conclusions about the performance of the proposed model. Subsequent papers [1] on the reproducibility and progress of existing work have demonstrated this fact. For example, in two recent papers [2, 3], the authors report that for a particular information retrieval task, some non-neural methods outperform recent neural methods. In 2016, Kharazmi et al. [4] examined previous work on IR and found a tendency to select weak baselines for comparative experiments. In the field of recommender systems, the empirical analyses of session-based recommendation papers showed that sometimes almost trivial methods can outperform the latest neural methods [5, 6]. One of the reasons for such disappointing performance of novel models is the poor choice of baselines, which gives the illusion of better BIR 2023: 13th International Workshop on Bibliometric-enhanced Information Retrieval at ECIR 2023, April 2, 2023 $ veronika.ivanova88@yandex.ru( (V. Ivanova); o.a.lashinin@tinkoff.ru (O. Lashinin); m.ananyeva@tinkoff.ru (M. Ananyeva); scitator@gmail.com (S. Kolesnikov)  0000-0001-8894-9592 (O. Lashinin); 0000-0002-9885-2230 (M. Ananyeva); 0000-0002-4820-987X (S. Kolesnikov) © 2023 Copyright © JJJJ for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 52 CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings results. Another consequence of not choosing appropriate baselines for a new algorithm is that the proposed paper may be rejected [7]. Thus, the choice of baselines is currently one of the major issues in recommender systems research [8, 9]. With accurate baseline selection, the development of recommender systems can progress more quickly. Both researchers and practitioners are faced with an increasing number of models to select for their experiments in order to consider relevant baselines. However, the number of baselines included in the paper is limited for the following reasons. First, more baselines require more computational time. The recent success of deep learning forces the inclusion of complicated algorithms as baselines. Therefore, some researchers cannot afford to choose effective hyperparameters sufficiently [8]. Second, some papers with new recommender algorithms do not have the source code of the implementation [8]. This may lead to poor performance of third party implementations [10, 8]. Finally, due to space limitations, a paper cannot include too many baselines. Thus, it is common practice to study the performance of only 3-7 baselines against the newly proposed method. The problem of selecting a few relevant items from a large set is a well-known task and can be solved by recommender systems [11]. To the best of our knowledge, there is no open source dataset that can be used as a basis for developing the recommender baseline suggestion system. It is important to note that the baseline recommendations can be applied to other areas of machine learning, such as natural language processing, computer vision, time series prediction, and others. However, in this paper we focus only on recommender systems. In this paper we describe a process for collecting a novel dataset called RecBaselines2023. It can be considered as a classical dataset with interactions between papers and baselines. In addition, we present the results of experiments that have been performed on RecBaselines2023. Our results show potential advantages of our experiments and open new research directions. Specifically, the main contribution of this paper can be listed as follows: • We have created a new open source dataset called RecBaseline20231 for selecting baselines for experiments on recommender models. We examined 1009 papers for the collection process. After preprocessing, RecBaseline2023 contains information on 363 baselines used in 903 articles published between 2010 and 2022. We also provide a data collection procedure and descriptive statistics. • We discuss that the problem of baseline selection can be solved by collaborative filtering approaches. We then compare the baseline ranking quality of seven state-of-the-art top-N recommender models on RecBaseline2023. The results show that this problem can be effectively solved by selected algorithms. • We describe a scenario where a partial list of baselines needs to be completed. The list is given to collaborative filtering approaches that recommend baselines based on the list of methods already used. Some other possible use cases of RecBaseline2023 are also mentioned. 1 We are releasing an online version of the dataset: https://github.com/fotol1/recbaselines2023. 53 Table 1 The table shows the corresponding highly cited baselines in each of the three recommender tasks. We use these as a starting point for our data collection. Citation counts are valid as of 27 March 2023. Recommender task Mandatory baselines (Citation counts) conventional top-N BPR-MF (5297) [18], WMF (3688) [19], MultVAE (844) [20], LightGCN (1245) [21] next-item GRU4Rec (2152) [22], SASRec (1146) [23], BERT4Rec (836) [24] next-basket TIFUKNN (56) [25], RepeatNet (189) [26] 2. Related work The problem of choosing baselines for research experiments in machine learning is not well studied. A similar problem is citation recommendations. This direction aims at suggesting other papers to cite. The two main classes of citation recommendations are content-based and collaborative filtering [9]. The content-based methods use textual elements such as abstract and title or metadata elements such as authors. In [12], the authors proposed a content-based approach requiring only textual features and collected the OpenCorpus dataset of 7 million articles. The literature graph was created in [13] using nodes for articles, authors and scientific concepts. Collaborative filtering methods are based on comparing similarities between articles. Liu et al. [14] measured the cosine similarity of article vectors and created article vectors based on co-occurrence in the same citation list. The same concept was used by Haruna et al. [15]. However, they considered the references and citations of the target paper and mined the hidden associations between them using paper-citation relationships. Later, by improving the similarity calculation, this approach was further developed in [16]. Although we know what to cite, it is not clear whether the recommended paper should be used as a baseline. Therefore, researchers are also working on more specialised tasks such as tag or baseline recommendations. The task of tag recommendation has been successfully studied by Wang et al. [17]. They used the collaborative topic regression model. The authors sampled items from the CiteULike dataset, including abstracts, titles and tags for each article. Bedi et al. [7] introduced the task of identifying the papers used as baselines in a given scientific article. The author formulated it as a reference classification problem on a developed ACL anthology corpus dataset, where about 2000 papers were selected and manually annotated. However, research article datasets are not specifically designed for the task of selecting baselines for recommender system experiments. We hope that our dataset will help to fill this gap and provide researchers with a practical approach to selecting baseline models for their research. 3. Dataset Collection. We added several common recommendation tasks to our dataset, including the traditional top-n, next-item and next-basket recommendations. These tasks were used as the basis for the data collection. For each task there are well-established and highly cited baselines, some of which are listed in Table 1. Note that there are no strict guidelines in recommender 54 Table 2 An example of a row in the RecBaselines2023 dataset. This file is called before_preprocessing.csv and is available online. Column Description Example Paper_id Unique paper identification 12 URL WEB link to a paper https://arxiv.org/pdf/1809.07053.pdf NAIS: Neural Attentive Item Title Paper title Similarity Model for Recommendation Year A year of publishing 2018 Baselines List of used recommender algorithms MF;MLP;FISM;NAIS Table 3 Examples of different names for the same algorithms Approach Occurring names POP MostPop, Popular, TopPopular BPR-MF [18] BPR, BPR-MF, MF-BPR NCF [27] NeuCF, NeuMF, NCF Mult-VAE [28] Mult-VAE, Multi-VAE, VAE-CF Table 4 Descriptive statistics of RecBaselines2023 dataset. Stage number of papers number of models number of interactions density before preprocessing 1009 2188 7748 0.3% after preprocessing 903 363 5467 1.6% systems research as to which baselines should be used for each of the above tasks. Therefore, we cannot guarantee that other algorithms cannot be used to complement the list of commonly used algorithms. However, the approaches listed in Table 1 have many citations, which is appropriate for the starting point of data collection. To collect our dataset, we took the following steps: 1. For each model from Table 1, we obtain the list of papers that cited the model in Google Scholar [29]. If a paper included experiments with the model, we included it. We did not include papers with experiments on related problems (such as link prediction or matrix completion, explanation generation). In addition, papers without experiments are not included in the dataset. Note that a paper could cite more than one baseline model of Table 1. Duplicate papers were later filtered out of the dataset during pre-processing. Once we had gone through all the citations of models in Table 1, we continued to process citations of papers that had already been added. This was all done manually by the authors of the paper over the period of one month. 2. Information about each paper collected to build our dataset is presented in Table 2. Each row contains a paper id, URL, paper title, year of publication and a list of recommender models used. The URL and year of publication are taken from the Google Scholar page, while the paper title and list of baselines are taken from the paper itself. 55 150 200 Number of papers Number of papers 100 100 50 0 0 3 4 5 6 7 8 9 10 11 12 13 14 15 16 29 20 3 20 7 20 9 20 1 12 15 20 6 18 20 9 20 0 20 1 22 1 1 0 1 1 1 2 2 20 20 20 20 Number of baselines (a) Distribution of the number of papers by the (b) Distribution of the number of papers over year of publication. the number of algorithms per one paper. Figure 1: Dataset statistics 350 300 Number of papers 250 200 150 RU BPR FP C C SA OP N C N M CA F LI MK R H N CN M E E E M E G N AR IT S 4R SR P TG EU G Figure 2: Distribution of the number of papers for the top-10 most popular baselines, included in our dataset. After removing duplicates, we obtained the dataset with 1009 papers and 2187 baselines. A large number of baseline models were only included in one or two papers. Preprocessing. A number of steps were taken to preprocess the data for future research: 1. In some papers, popular models are presented under different names. This is most likely due to space limitations or different names for algorithms that were even proposed in the original paper. For example, the authors of the article Neural Collaborative Filtering (NCF) [27] used a different name, NeuMF, in their experiments. As a result, the cited articles include both NCF and NeuMF. We try to condense common cases and list them in Table 3. To resolve this inconsistency, we have replaced the multiple names of a model with a single option. 2. Some papers modify methods slightly and report different variations of the same methods. 56 Number of papers Number of papers Number of papers Number of papers Number of papers 15 20 25 30 35 0 10 20 30 40 0 10 20 30 40 50 20 40 60 0 20 40 60 80 20 20 20 20 20 1 9 09 17 16 09 20 20 11 20 11 17 20 20 20 13 13 19 20 20 20 15 20 20 18 20 17 16 20 20 20 20 18 20 19 17 (d) POP 20 20 (g) NARM 18 (a) BPRMF 20 19 20 (m) STAMP 2 1 20 20 (j) ITEMKNN 20 20 19 20 21 20 20 20 20 21 20 21 21 20 20 20 20 20 22 22 22 22 22 Number of papers Number of papers Number of papers Number of papers Number of papers 0 10 20 30 40 10 20 30 40 50 10 20 30 40 0 20 40 60 0 20 40 60 80 100 20 20 20 20 20 top-15 most popular baselines was included. 1 9 19 1 8 18 17 20 20 20 1 8 19 1 9 20 20 20 2 0 20 1 57 9 20 20 20 20 20 20 (h) CASER 20 20 (k) SRGNN (e) SASREC 21 21 (b) GRU4REC 20 20 (n) BERT4REC 21 2 1 20 21 20 20 20 20 20 22 22 22 22 22 Number of papers Number of papers Number of papers Number of papers Number of papers 5 10 15 0 20 40 0 20 40 60 80 0 20 40 60 0 20 40 20 20 20 20 20 16 19 20 17 13 20 20 15 17 20 18 20 16 20 20 18 20 20 20 19 17 20 20 20 19 21 18 (o) MF 20 (l) NGCF (c) FPMC 20 20 (f) NEUMF 20 20 19 20 2 1 20 (i) LIGHTGCN 20 20 20 21 21 20 21 20 20 20 20 20 22 22 22 2 2 22 Figure 3: The figures show the distribution over the years of the number of papers in which one of the For example, the authors introduce three new loss functions in [30] and apply them to different methods such as NeuMF [27], CML [31] and LightGCN [21]. Considering three losses for each of the three models makes our dataset more sparse. To avoid this problem, the preprocessed version of RecBaselnies2023 contains only the main algorithms without any specified modifications. 3. To replace rare baselines and papers with extremely few baselines, we then iteratively filtered the dataset until there were only papers with three or more baselines and each baseline was present in three or more papers. The resulting statistics for the filtered dataset can be found in Table 4. Statistics. We briefly present some statistics from the collected dataset. The main charac- teristics such as number of papers, number of models, number of interactions and density are presented in Table 4. Figure 1 and Figure 2 represent three distributions of the dataset: the distribution of the number of papers over the year they were published, the distribution of the number of papers over the number of algorithms included in a paper, and the distribution of the number of papers for the top 10 most popular baselines included in our dataset. The earliest publication date of a paper is 2009, the number of papers remains relatively small and only exceeds 10 in 2017. Then the number of papers increases significantly from year to year. As can be seen in Figure 1, a typical number of baselines included is between 3 and 8. Therefore, the algorithms used to recommend baselines for recommender systems have to work with a small number of available interactions. Figure 3 describes the distribution of the number of papers over the years for each of the top 15 popular baselines from the RecBaselines2023 dataset. The most popular models are BPR, GRU4Rec, LightGCN, NeuMF and others. These models were used as starting points for the collection of other papers. Therefore, they are represented in the dataset in large numbers. 4. Collaborative Filtering for Baseline Selection Baseline selection can be solved by collaborative filtering (CF) algorithms. For example, the following definition was given in [32]. Definition 4.1. Collaborative filtering is the process of filtering or ranking items using the opinions of other people. We can replace the word "items" with "baselines" and the word "people" with "researchers". This definition then provides a justification for the use of the technique. In addition, scientists’ "opinions" are often motivated by several reasons. The first is the desire to compare the new algorithm with the best-known or best-performing approaches. The second is to include models based on the same idea. For example, the authors of [33] compare their graph-based model with 4 baselines, 3 of which are also graph-based. These or other reasons explain the choice of models from a large number of options. Therefore, researchers and practitioners may be interested in baseline recommendations based on a partial list of algorithms already in use. Hopefully, this can be done by applying approaches 58 Table 5 Performance comparison on RecBaselines2023. The best value is in bold. Model R@10 R@20 N@10 N@20 M@10 M@20 Random 0.045 0.08 0.029 0.029 0.008 0.008 BPRMF [18] 0.2281 0.353 0.134 0.134 0.035 0.035 MostPop 0.312 0.339 0.138 0.138 0.035 0.035 MF2020 [41] 0.348 0.446 0.227 0.227 0.067 0.067 EASER [37] 0.397 0.549 0.243 0.243 0.069 0.069 NeuMF [27] 0.42 0.513 0.252 0.252 0.073 0.073 Slim [38] 0.446 0.576 0.264 0.264 0.078 0.078 VAECF [28] 0.455 0.603 0.264 0.264 0.075 0.075 𝑅𝑃 3𝛽 [39] 0.473 0.607 0.303 0.303 0.088 0.088 in the inductive scenario [34]. These approaches do not have ID-based user embeddings [33, 20, 35, 36]. They understand user interests based on the set of interactions. Therefore, we can easily adopt such techniques for suggesting baselines based on a partial list of already included methods. 5. Experiments We have experimented with collaborative filtering approaches on the Top-N recommendation task on the RecBaselines2023 dataset. Our experiments aim to answer the following question: "What is the performance of different state-of-the-art collaborative filtering approaches on the RecBaselines2023 dataset?" Models. We included popular approaches of different types: simple random and MostPop; matrix factorisation based BPRMF [18], MF2020; item based EASE [37], SLIM [38]; graph based 𝑅𝐵3 [39], VAE based MultiVAE [20]. According to [40, 8], such models are very strong CF-based baselines. Metrics. Standard quality ranking metrics are chosen, namely Recall@K, NDCG@K and MAP@K. Experiment settings. To provide reproducible experiments we use Elliot [42] similar to [40]. This framework allows experiments to be fully described in a configuration file. This file is available online 2 and hyperparameter ranges can be found there. The total number of hyperparameters set for each model is 20. Evaluation protocol. All interactions are divided into train/valid/test splits. The valid split is used for early stopping and hyperparameter selection. The final quality is estimated on the test split. All papers published before 2021 are used for training. In addition, 80 % of the interactions for papers published in 2021 and 2023 are used for training, and the remaining 20 % are used for validation and test, respectively. Results. To investigate our question, we report quality metrics for different approaches in Table 5. As we can see, the best model 𝑅𝑃 3𝛽 is two times better than MostPop’s recommenda- 2 https://github.com/fotol1/recbaselines2023 59 Figure 4: The scientist chooses baselines for his experiments. Three baselines have already been chosen. He or she can now pass the three approaches to one of the collaborative filtering algorithms under consideration. As a result, two more baselines are suggested as additional approaches to include in the paper. Based on historical data, these recommended baselines may also be chosen by other researchers. tions. This shows that there are not many universal baselines in recommendations, and that researchers choose baselines carefully. Surprisingly, the best model, 𝑅𝑃 3𝛽 , has a Recall@20 of 0.6. This means that we can find more than half of the hidden baselines in lists of length 20. 6. Selecting baselines for partial lists This section describes one possible way of using RecBaselines2023. The Figure 4 represents the main idea. A scientist has invented a new recommendation algorithm and wants to compare it with other work. For example, a new approach was inspired by two methods a and b. So they are automatically included in the experiments of the new paper. In addition, the researchers know that a current state-of-the-art algorithm is a model c. So it should also be considered. Given the set of three baselines {a, b, c}, he or she can run the set in one of the adapted collaborative filtering approaches. This will return a list of recommended baselines. The choice of these is consistent with the historical data represented in RecBaselines2023. In Table 6 we demonstrate recommendations for some sub-lists of baselines. We use SLIM, EASE and 𝑅𝑃 3𝛽 as recommender models because they are item-based models that can make predictions based on any input list of items. The first three examples emulate iterative updates to the next-item recommender set of baselines. The next examples demonstrate recommendations based on a single baseline using different frameworks. As we can see, SLIM and 𝑅𝑃 3𝛽 are flexible in changing recommendations as new next-item models are added. When we provide only one element of a particular framework, our models recommend baselines using similar frameworks. For example, RippleNet [43] is a knowledge-based model. If someone includes RippleNet in their experiments, our models will suggest including other knowledge-based 60 Table 6 Examples of recommendations based on partial lists of items. Input items SLIM EASE 𝑅𝑃 3𝛽 GRU4REC CASER, SASREC SASREC, CASER MIND, DIEN GRU4REC, SASREC, BERT4REC CASER, TISASREC BPR, CASER TISASREC, JODIE GRU4REC, SASREC, BERT4REC, TISASREC CASER, S3REC BPR, CASER LESSR, CHORUS PINSAGE GCMC, CMN NGCF, GCMC CMN, GCMC VAECF TRANSCF, LRML LIGHTGCN, TRANSCF SGL, TRANSCF RIPPLENET PER, CKE CKE, PER PER, LIBFM LIGHTGCN, SGL, VAECF TRANSCF, LRML NGCF, BPR TRANSCF, SBPR approaches such as PER [44], CKE [45]. 7. Limitations and future work Our work has some limitations. In this section we will discuss them and show possible ways to overcome them. Firstly, the published version of the dataset will become obsolete. We will publish regular updates. In addition, if authors of newly proposed methods want to add their work, we can do this quickly in the repository via a pull request. Secondly, The dataset may contain misspellings or other errors, with the presence or absence of some baselines in the included works. We have tried to do our best, and have double-checked the interactions several times. If you find any errors, please contact us via issues on Github. Finally, There are some challenges in publishing baseline recommendations. For example, some of the baselines presented have been used in previous work. However, the latest state-of- the-art approaches replace them. The models considered are not sensitive to this fact. We argue that this problem exists for other datasets as well. It has been shown in [46] that recommending the most recent films can improve quality even for the simple MostPop method. Nevertheless, the practical application can be modified and the most recent baselines with high relevance scores can be treated as more suitable. We leave this as future work. 8. Conclusion This paper investigates the problem of recommending baselines for experiments. We have collected an open source dataset RecBaselines2023, which describes baseline models used for comparative experiments in papers on different types of recommender systems. It consists of 903 papers and 363 baseline models, with 5467 interactions between them. The dataset includes interactions between papers and baseline models, and additional data about each paper, such as web link to a paper, paper title, and year of publication. RecBaselines2023 can be used by researchers to properly compile the baseline list for their experiments. The dataset will be updated as new papers are published. We have used collaborative filtering techniques to identify the best algorithms based on incomplete lists of previously included baselines. Our experiments with hidden predictions of recommender baselines show that state-of-the-art collaborative 61 filtering techniques can successfully perform this task. We hope that our dataset can open up new lines of research. References [1] M. Ferrari Dacrema, S. Boglio, P. Cremonesi, D. Jannach, A troubling analysis of re- producibility and progress in recommender systems research, ACM Transactions on Information Systems (TOIS) 39 (2021) 1–49. [2] J. Lin, The neural hype and comparisons against weak baselines, in: ACM SIGIR Forum, 2, ACM New York, NY, USA, 2019, pp. 40–51. [3] W. Yang, K. Lu, P. Yang, J. Lin, Critically examining the" neural hype" weak baselines and the additivity of effectiveness gains from neural ranking models, in: Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, 2019, pp. 1129–1132. [4] S. Kharazmi, F. Scholer, D. Vallet, M. Sanderson, Examining additivity and weak baselines, ACM Transactions on Information Systems (TOIS) 34 (2016) 1–18. [5] M. Ludewig, D. Jannach, Evaluation of session-based recommendation algorithms, User Modeling and User-Adapted Interaction 28 (2018) 331–390. [6] M. Ludewig, N. Mauro, S. Latifi, D. Jannach, Performance comparison of neural and non-neural approaches to session-based recommendation, in: Proceedings of the 13th ACM conference on recommender systems, 2019, pp. 462–466. [7] M. Bedi, T. Pandey, S. Bhatia, T. Chakraborty, Why did you not compare with that? identifying papers for use as baselines, in: European Conference on Information Retrieval, Springer, 2022, pp. 51–64. [8] M. Ferrari Dacrema, P. Cremonesi, D. Jannach, Are we really making much progress? a worrying analysis of recent neural recommendation approaches, in: Proceedings of the 13th ACM conference on recommender systems, 2019, pp. 101–109. [9] J. Beel, B. Gipp, S. Langer, C. Breitinger, Paper recommender systems: a literature survey, International Journal on Digital Libraries 17 (2016) 305–338. [10] A. Petrov, C. Macdonald, A systematic review and replicability study of bert4rec for sequential recommendation, in: Proceedings of the 16th ACM Conference on Recommender Systems, 2022, pp. 436–447. [11] D. Bollen, B. P. Knijnenburg, M. C. Willemsen, M. Graus, Understanding choice overload in recommender systems, in: Proceedings of the fourth ACM conference on Recommender systems, 2010, pp. 63–70. [12] C. Bhagavatula, S. Feldman, R. Power, W. Ammar, Content-based citation recommendation, arXiv preprint arXiv:1802.08301 (2018). [13] W. Ammar, D. Groeneveld, C. Bhagavatula, I. Beltagy, M. Crawford, D. Downey, J. Dunkel- berger, A. Elgohary, S. Feldman, V. Ha, et al., Construction of the literature graph in semantic scholar, arXiv preprint arXiv:1805.02262 (2018). [14] H. Liu, X. Kong, X. Bai, W. Wang, T. M. Bekele, F. Xia, Context-based collaborative filtering for citation recommendation, Ieee Access 3 (2015) 1695–1703. 62 [15] K. Haruna, M. Akmar Ismail, D. Damiasih, J. Sutopo, T. Herawan, A collaborative approach for research paper recommender system, PloS one 12 (2017) e0184516. [16] N. Sakib, R. B. Ahmad, K. Haruna, A collaborative approach toward scientific paper recommendation using citation context, IEEE Access 8 (2020) 51246–51255. [17] H. Wang, B. Chen, W. Li, Collaborative topic regression with social regularization for tag recommendation, in: F. Rossi (Ed.), IJCAI 2013, Proceedings of the 23rd International Joint Conference on Artificial Intelligence, Beijing, China, August 3-9, 2013, IJCAI/AAAI, 2013, pp. 2719–2725. URL: http://www.aaai.org/ocs/index.php/IJCAI/IJCAI13/paper/view/7006. [18] S. Rendle, C. Freudenthaler, Z. Gantner, L. Schmidt-Thieme, Bpr: Bayesian personalized ranking from implicit feedback, arXiv preprint arXiv:1205.2618 (2012). [19] Y. Hu, Y. Koren, C. Volinsky, Collaborative filtering for implicit feedback datasets, in: 2008 Eighth IEEE international conference on data mining, Ieee, 2008, pp. 263–272. [20] J. Xu, Y. Ren, H. Tang, X. Pu, X. Zhu, M. Zeng, L. He, Multi-vae: Learning disentangled view-common and view-peculiar visual representations for multi-view clustering, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9234–9243. [21] Y. Shen, Y. Wu, Y. Zhang, C. Shan, J. Zhang, B. K. Letaief, D. Li, How powerful is graph con- volution for recommendation?, in: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 2021, pp. 1619–1629. [22] B. Hidasi, A. Karatzoglou, L. Baltrunas, D. Tikk, Session-based recommendations with recurrent neural networks, arXiv preprint arXiv:1511.06939 (2015). [23] W.-C. Kang, J. McAuley, Self-attentive sequential recommendation, in: 2018 IEEE Interna- tional Conference on Data Mining (ICDM), IEEE, 2018, pp. 197–206. [24] F. Sun, J. Liu, J. Wu, C. Pei, X. Lin, W. Ou, P. Jiang, Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer, in: Proceedings of the 28th ACM international conference on information and knowledge management, 2019, pp. 1441–1450. [25] H. Hu, X. He, J. Gao, Z.-L. Zhang, Modeling personalized item frequency information for next-basket recommendation, in: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020, pp. 1071–1080. [26] P. Ren, Z. Chen, J. Li, Z. Ren, J. Ma, M. De Rijke, Repeatnet: A repeat aware neural recommendation machine for session-based recommendation, in: Proceedings of the AAAI Conference on Artificial Intelligence, 01, 2019, pp. 4806–4813. [27] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, T.-S. Chua, Neural collaborative filtering, in: Proceedings of the 26th international conference on world wide web, 2017, pp. 173–182. [28] D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, Variational autoencoders for col- laborative filtering, in: Proceedings of the 2018 world wide web conference, 2018, pp. 689–698. [29] P. Jacsó, Google scholar: the pros and the cons, Online information review 29 (2005) 208–214. [30] Z. Gao, Z. Cheng, F. Pérez, J. Sun, M. Volkovs, Mcl: Mixed-centric loss for collaborative filtering, in: Proceedings of the ACM Web Conference 2022, 2022, pp. 2339–2347. [31] C.-K. Hsieh, L. Yang, Y. Cui, T.-Y. Lin, S. Belongie, D. Estrin, Collaborative metric learning, in: Proceedings of the 26th international conference on world wide web, 2017, pp. 193–201. 63 [32] J. B. Schafer, D. Frankowski, J. Herlocker, S. Sen, Collaborative filtering recommender systems, in: The adaptive web, Springer, 2007, pp. 291–324. [33] Y. Shen, Y. Wu, Y. Zhang, C. Shan, J. Zhang, B. K. Letaief, D. Li, How powerful is graph con- volution for recommendation?, in: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 2021, pp. 1619–1629. [34] T. Schnabel, M. Wan, L. Yang, Situating recommender systems in practice: Towards inductive learning and incremental updates, arXiv preprint arXiv:2211.06365 (2022). [35] Y. Wu, Q. Cao, H. Shen, S. Tao, X. Cheng, Inmo: A model-agnostic and scalable module for inductive collaborative filtering, in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2022, pp. 91–101. [36] M. Ananyeva, O. Lashinin, V. Ivanova, S. Kolesnikov, D. I. Ignatov, Towards interaction- based user embeddings in sequential recommender models, in: J. Vinagre, M. Al- Ghossein, A. M. Jorge, A. Bifet, L. Peska (Eds.), Proceedings of the 5th Workshop on Online Recommender Systems and User Modeling co-located with the 16th ACM Con- ference on Recommender Systems, ORSUM@RecSys 2022, Seattle, WA, USA, Septem- ber 23rd, 2022, volume 3303 of CEUR Workshop Proceedings, CEUR-WS.org, 2022. URL: https://ceur-ws.org/Vol-3303/paper10.pdf. [37] H. Steck, Embarrassingly shallow autoencoders for sparse data, in: The World Wide Web Conference, 2019, pp. 3251–3257. [38] X. Ning, G. Karypis, Slim: Sparse linear methods for top-n recommender systems, in: 2011 IEEE 11th international conference on data mining, IEEE, 2011, pp. 497–506. [39] B. Paudel, F. Christoffel, C. Newell, A. Bernstein, Updatable, accurate, diverse, and scalable recommendations for interactive applications, ACM Transactions on Interactive Intelligent Systems (TiiS) 7 (2016) 1–34. [40] V. W. Anelli, A. Bellogín, T. Di Noia, D. Jannach, C. Pomo, Top-n recommendation algorithms: A quest for the state-of-the-art, arXiv preprint arXiv:2203.01155 (2022). [41] S. Rendle, W. Krichene, L. Zhang, J. Anderson, Neural collaborative filtering vs. matrix factorization revisited, in: Fourteenth ACM conference on recommender systems, 2020, pp. 240–248. [42] V. W. Anelli, A. Bellogín, A. Ferrara, D. Malitesta, F. A. Merra, C. Pomo, F. M. Donini, T. D. Noia, Elliot: A comprehensive and rigorous framework for reproducible recommender systems evaluation, in: F. Diaz, C. Shah, T. Suel, P. Castells, R. Jones, T. Sakai (Eds.), SIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021, ACM, 2021, pp. 2405–2414. URL: https://doi.org/10.1145/3404835.3463245. doi:10.1145/3404835.3463245. [43] H. Wang, F. Zhang, J. Wang, M. Zhao, W. Li, X. Xie, M. Guo, Ripplenet: Propagating user preferences on the knowledge graph for recommender systems, in: Proceedings of the 27th ACM international conference on information and knowledge management, 2018, pp. 417–426. [44] X. Yu, X. Ren, Y. Sun, Q. Gu, B. Sturt, U. Khandelwal, B. Norick, J. Han, Personalized entity recommendation: A heterogeneous information network approach, in: Proceedings of the 7th ACM international conference on Web search and data mining, 2014, pp. 283–292. [45] H. Wang, M. Zhao, X. Xie, W. Li, M. Guo, Knowledge graph convolutional networks for recommender systems, in: The world wide web conference, 2019, pp. 3307–3313. 64 [46] N. Neophytou, B. Mitra, C. Stinson, Revisiting popularity and demographic biases in recommender evaluation and effectiveness, in: European Conference on Information Retrieval, Springer, 2022, pp. 641–654. 65