=Paper= {{Paper |id=Vol-3317/paper10 |storemode=property |title=Session-based Recommendation with Dual Graph Networks |pdfUrl=https://ceur-ws.org/Vol-3317/Paper10.pdf |volume=Vol-3317 |authors=Tajuddeen Rabiu Gwadabe,Mohammed Ali Mohammed Al-hababi,Ying Liu |dblpUrl=https://dblp.org/rec/conf/cikm/GwadabeAL22 }} ==Session-based Recommendation with Dual Graph Networks== https://ceur-ws.org/Vol-3317/Paper10.pdf
Session-based Recommendation with Dual Graph Networks
Tajuddeen Rabiu Gwadabe1,* , Mohammed Ali Mohammed Al-hababi 1 and Ying Liu1
1
    School of Computer Science and Technology, University of Chinese Academy of Sciences (UCAS), China


                                             Abstract
                                             Session-based recommendation task aims at predicting the next item an anonymous user might click. Recently, graph neural
                                             networks have gained a lot of attention in this task. Existing models either construct a directed graph or a hypergraph and
                                             learn item embedding using some form of graph neural networks. We argue that constructing both directed and undirected
                                             graphs for each session may outperform either method since for some sessions the sequence of interaction may be relevant
                                             while for others it may not be relevant. In this paper, we propose a novel Session-based Recommendation model with
                                             Dual Graph Networks (SR-DGN). SR-DGN constructs a directed and an undirected graph from each session and learns
                                             both sequential and non-sequential item representation using sequential and non-sequential graph neural networks models
                                             respectively. Using shared learnable parameters, SR-DGN learns global and local user preferences for each network and
                                             uses the network with the best scores for recommendation. Experiments conducted on three real-world datasets showed its
                                             superiority over state-of-the-art models.

                                             Keywords
                                             session-based recommendation, graph neural networks, directed and undirected graphs,



1. Introduction                                                                                                                       els like DHCN [4] proposed constructing a hypergraph
                                                                                                                                      for a session and learning item representation on a hy-
Recommender systems have become an essential com-                                                                                     pergraph convolutional network that also neglects the
ponent of the internet user experience as they assist                                                                                 sequence of interactions between items.
consumers sift through the ever-increasing volume of                                                                                     This has led to two thought classes. Either consider
information. Some online sites allow non-login users,                                                                                 the sequence of interactions between items since users in-
however, the recommender systems have to rely on the                                                                                  teracted with items sequentially or neglect the sequence
current anonymous session exclusively for making rec-                                                                                 since item order may not be relevant since users inter-
ommendations. Session-based recommender systems aim                                                                                   act with the items in an online setting. However, both
at providing relevant recommendations to such anony-                                                                                  thoughts have merit. For example, on an e-commerce
mous users.                                                                                                                           site, buying a particular brand of phone might influence
   Recent developments in deep learning architectures                                                                                 buying a screen guard - hence the sequence might be
have resulted in researchers focusing on using these ar-                                                                              relevant. However, buying household supplies such as
chitectures in session-based recommendation task. Re-                                                                                 tissue might not influence buying any other particular
current neural networks [1] were first proposed to learn                                                                              item - hence the sequence might be irrelevant. We argue
the sequential interaction between items in a session.                                                                                that the two thought classes might be complementary to
More recently, Graph Neural Networks (GNN) have been                                                                                  each other. That is, for some sessions, considering the
proposed for session-based recommendation [2]. These                                                                                  sequence is relevant while for some sessions it might be
models construct directed graphs for each session and                                                                                 irrelevant.
learn item representation using the sequential Gated                                                                                     To this end, we propose a Session-based Recommen-
Graph Neural Networks (GGNN). On the other hand                                                                                       dation model with Dual Graph Networks, SR-DGN. SR-
memory network models like STAMP [3] have shown                                                                                       DGN first constructs two graph networks - a directed
that the order of the sequence may not be important in                                                                                and an indirect graph for each session and learns item
session-based recommendation and proposed session-                                                                                    representation using sequential and non-sequential GNN
based recommendation model that does not depend on                                                                                    models respectively. From the individual item represen-
the sequence of interactions. Similarly, hypergraph mod-                                                                              tations, SR-DGN learns local and global user preferences.
                                                                                                                                      Each network will present a score for each item and the
DL4SR’22: Workshop on Deep Learning for Search and Recommen-                                                                          network with the best score is used for making the rec-
dation, co-located with the 31st ACM International Conference on
Information and Knowledge Management (CIKM), October 17-21, 2022,
                                                                                                                                      ommendation. Our main contributions are summarized
Atlanta, USA                                                                                                                          as follows:
*
  Corresponding author.
$ tgwadabe@mails.ucas.ac.cn (T. R. Gwadabe);                                                                                               β€’ SR-DGN proposed using two graph networks -
mohammed_al-hababi@mails.ucas.ac.cn (M. A. M. A. );                                                                                          directed and undirected graph for each session
yingliu@ucas.ac.cn (Y. Liu)                                                                                                                  and learns item representations using sequen-
                                       Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       Attribution 4.0 International (CC BY 4.0).
                                       CEUR Workshop Proceedings (CEUR-WS.org)                                                               tial and non-sequential GNN models. For learn-
       ing sequential item representation, SR-DGN uses        3. SR-DGN
       GGNN [5], while for learning non-sequential item
       representation, SR-DGN uses an SGC [6] with a          3.1. Problem Statement and Graph
       gating highway connection.                                  Construction
     β€’ SR-DGN learns local and global user preferences
       from each graph network using shared learnable         Session-based recommendation aims to predict the next
       parameters between the networks. Then, each            click of an anonymous user session. For a dataset with
       network in SR-DGN provides scores for each item        distinct items set 𝑉 = 𝑣1 , 𝑣2 , . . . , 𝑣𝑛 , let an anony-
       and the network with the best score is used for        mous session 𝑠 be represented by the ordered list, 𝑠 =
       making the recommendation.                             [𝑣𝑠,1 , 𝑣𝑠,2 , . . . , 𝑣𝑠,π‘‘βˆ’1 ] where 𝑣𝑠,𝑖 ∈ 𝑉 is a clicked item
                                                              within the session 𝑠, session-based recommendation aims
     β€’ Experimental results on three benchmark
                                                              to recommend the next item to be clicked, 𝑣𝑠,𝑑 . The out-
       datasets demonstrate the effectiveness of
                                                              put of SR-DGN is a ranked probability score for all the
       SR-DGN. Further analysis showed that while
                                                              candidate items where the top-K items based on the prob-
       the sequential network performs better on
                                                              abilities yΜ‚ will be recommended as the potential next
       some datasets, the non-sequential network
                                                              clicks.
       perform better on others. These further prove
                                                                 For each session 𝑠, our model constructs a directed and
       the effectiveness of using both networks for the
                                                              an undirected graph 𝒒𝑠 = (𝒱𝑠 , ℰ𝑠 ) and 𝒒𝑛 = (𝒱𝑛 , ℰ𝑛 )
       session-based recommendation task.
                                                              respectively. For both graphs, 𝑣𝑖 ∈ 𝑉𝑠 and 𝑣𝑖 ∈ 𝑉𝑛
                                                              if 𝑣𝑖 is clicked within the current session. A directed
2. Related Works                                              edge (π‘£π‘–βˆ’1 , 𝑣𝑖 ) ∈ ℰ𝑠 exists from π‘£π‘–βˆ’1 to 𝑣𝑖 if item 𝑣𝑖
                                                              was clicked immediately after π‘£π‘–βˆ’1 . An undirected edge
Session-based recommendation models use the implicit          (π‘£π‘–βˆ’1 , 𝑣𝑖 ) ∈ ℰ𝑛 exists between π‘£π‘–βˆ’1 to 𝑣𝑖 if item 𝑣𝑖 was
temporal feedbacks of users such as clicks obtained by        clicked before or after item π‘£π‘–βˆ’1 . For the directed graph,
tracking user activities. Traditional machine learning        we normalized the outgoing and incoming adjacency ma-
models such as Markov Chain (MC) models have been             trices by the degree of the outgoing node. The overview
used for sequential recommendation. Zimdars et al. [7]        of the SR-DGN model is given in Figure 1.
proposed extracting sequential patterns from sessions
and predicting the next click using decision tree models.     3.2. Learning Sequential and
FMPC [8] generalizes MC method and matrix factoriza-
tion to model short term user preference and long-term             Non-Sequential Item Embedding
user preference respectively. However, MC models suffer       We first transform all items 𝑣𝑖 ∈ 𝑉 into a unified em-
from the assumption of an independence relationship           bedding space 𝑣𝑖 ∈ R𝑑 , where 𝑑 is the dimension size.
between the states in a sequence and an unmanageable          Using this initial embedding, we learn sequential and
state space when considering all the possible sequences.      non-sequential item embedding, 𝑣𝑖𝑠 and 𝑣𝑖𝑛 respectively.
   Recently, deep learning models have achieved the state-
of-the-art performance in session-based recommenda-           3.2.1. Learning Sequential Item Embedding
tion. Hidasi et al. [1] first proposed GRU4Rec, a recurrent
neural network for session-based recommendation. The          We use GGNN [5] similar to SR-GNN [2] for learning the
model uses session-parallel mini-batches and pairwise         sequential item representation. Given the incoming and
ranking loss. Liu et al. [9] proposed NARM, which uses        outgoing adjacency matrices and the initial item embed-
recurrent neural network with attention mechanism to          ding, GGNN updates the item embedding as follows:
learn both the local and the global user preference. Li
et al. [3] proposed using memory networks and showed                    a𝑑𝑖𝑠 = A𝑖𝑠: [vπ‘‘βˆ’1
                                                                                      1   , . . . , vπ‘‘βˆ’1 𝑇
                                                                                                     𝑛 ] H1 + b1 ,        (1)
that modelling the sequential nature may not be neces-
                                                                                𝑧𝑖𝑑 = 𝜎(W𝑧 a𝑑𝑖𝑠 U𝑧 vπ‘‘βˆ’1 ),                (2)
sary. Wu et al. [2] constructed directed graphs for each                                            𝑖

session and learned the local and global user preferences                       π‘Ÿπ‘–π‘‘ = 𝜎(Wπ‘Ÿ a𝑑𝑖𝑠 Uπ‘Ÿ vπ‘‘βˆ’1
                                                                                                    𝑖   ),                (3)
using GGNN. Wang et al. [10] constructs a hypergraph
                                                                        v̂𝑑𝑖 = π‘‘π‘Žπ‘›β„Ž(Wπ‘œ a𝑑𝑖𝑠 + Uπ‘Ÿ (π‘Ÿπ‘–π‘‘ βŠ™ vπ‘‘βˆ’1 ))           (4)
for each session and proposed a hypergraph attention                                                     𝑖

network for recommendation.                                               v𝑖𝑠 = (1 βˆ’ 𝑧𝑖𝑑 ) βŠ™ vπ‘‘βˆ’1
                                                                                              𝑖   + 𝑧𝑖𝑑 βŠ™ v̂𝑑𝑖            (5)
                                                              where A𝑖𝑠: ∈ R         is the 𝑖-th row of the incoming
                                                                                  1π‘₯2𝑛

                                                              and outgoing matrices. H1 ∈ R𝑑π‘₯2𝑑 and b1 ∈ R𝑑 are
                                                              weight and bias parameters respectively. 𝑧𝑖𝑑 ∈ R𝑑π‘₯𝑑 and
                                                              π‘Ÿπ‘–π‘‘ ∈ R𝑑π‘₯𝑑 are the reset and update gates respectively.
Figure 1: Overview of SR-DGN model. For each session 𝑠, directed and undirected graphs are constructed and sequential and
non-sequential item representations are learned using sequential and non-sequential GNN models respectively. Using shared
learnable parameters, the final sequential and non-sequential session representation is learned. Finally, the best prediction is
selected for making recommendations.



3.2.2. Learning Non-Sequential Item Embedding                    where h𝑠,𝑖 is the i-th sequential item embedding and 𝛼𝑖
                                                                 is the attention weight of the i-th timestamp given by:
To learn the non-sequential item representation, 𝑣𝑖𝑛 we
used SGC [6] with a proposed highway connections. For-                      𝛼𝑖 = q𝑇 𝜎(W1 h𝑠,𝑑 + W1 h𝑠,𝑖 + 𝑏),              (10)
mally, the update can is given by:
                                                                 where the parameters q,W1 and W2 are learnable to
         a𝑑𝑖𝑛 = A𝑑𝑖𝑛: [vπ‘‘βˆ’1
                        1   , . . . , vπ‘‘βˆ’1 𝑇
                                       𝑛 ] H2 + b2 ,       (6)   control the additive attention. The final sequential ses-
                                                                 sion representation s𝑓 𝑠 is obtained by aggregating the
            v𝑖𝑛 = g1 βŠ™ a𝑑𝑖𝑛 + (1 βˆ’ g1 ) βŠ™ v𝑖 ,             (7)   sequential local and global preferences using a gating
                   g1 = Wg1 ([v𝑖𝑛 ; v𝑖 ]).                 (8)   mechanism. Formally, the final session representation
                                                                 s𝑓 𝑠 is obtained as follows;
where A𝑖𝑛: ∈ R     1π‘₯𝑛
                      is the 𝑖-th row of the undirected
graph adjacency matrix. H2 ∈ R𝑑π‘₯𝑑 and b2 ∈ R𝑑 are                 s𝑓 𝑠 = g2 βŠ™ s𝑔𝑠 + (1 βˆ’ g2 ) βŠ™ s𝑙𝑠 ,                      (11)
weight and bias parameters respectively. g1 is the gating
mechanism used to improve the performance of the non- g2 is the gating function obtained by;
sequential item representation.
                                                                         g2 = Wg2 ([s𝑔𝑠 ; s𝑙𝑠 ])                           (12)

3.3. Learning Session Embedding                             Wg2 ∈ R2𝑑×𝑑 is a trainable transformation matrix and
From the sequential and non-sequential item embedding, [;] is a concatenation operation. From the non-sequential
we learn the local and the global user preferences for each item embedding, using the same learnable parameters,
network using shared learnable parameters. Considering the final non-sequential representation, s𝑓 𝑠 can be ob-
the sequential item embedding, to obtain the final session tained.
embedding, we use a gating mechanism that aggregates
the global and the local user preferences. The sequential 3.4. Making Recommendation
local user preference s𝑙𝑠 is obtained from the sequential
                                                            From the sequential and non-sequential final session rep-
embedding of the last clicked item while the sequential
                                                            resentations, the sequential and non-sequential unnor-
global preference s𝑔𝑠 is obtained from the sequential
                                                            malized scores of each candidate item 𝑣𝑖 ∈ 𝑉 can be
embedding of all clicked items in a session using additive
                                                            obtained by multiplying the item embedding v𝑖 with the
an attention mechanism. Formally, the sequential global
                                                            each the corresponding final session representation. The
preference s𝑔𝑠 is given by;
                                                            sequential unnormalized score ẑ𝑖𝑠 , is defined as:
                              𝑑
                                                                                         ẑ𝑖𝑠 = s𝑇𝑓𝑠 v𝑖 .                  (13)
                             βˆ‘οΈ
                     s𝑔𝑠 =        𝛼𝑖 h𝑠,𝑖 ,                (9)
                              𝑖
                                                                 The non-sequential unnormalized score ẑ𝑖𝑛 is obtained in
                                                                 similar way. For the recommendation, we use the sum of
the two unnormalized scores. A softmax is then applied                   4.1.2. Baseline
to calculate the normalized probability output vector of
                                                            We compare the performance of our proposed SR-DGN
the model ŷ𝑖 as follows:
                                                            model with traditional and deep learning representative
                    yΜ‚ = π‘ π‘œπ‘“ π‘‘π‘šπ‘Žπ‘₯(zΜ‚)                  (14) baseline models. The traditional baseline model used is
                                                            Factorized Personalized Markov Chain model (FPMC) [8].
where zΜ‚ ∈ R is the sum unnormalized score of the
               𝑛                                            The  deep learning baselines include RNN-based models
sequential and non-sequential scores and yΜ‚ = R𝑛 is the GRU4Rec [1], RNN with attention model (NARM) [9],
probability of each item to be the next click in session 𝑠. memory-based with attention model (STAMP) [3], di-
   For any given session, the loss function is defined as rected graph model SR-GNN [2] and hypergraph models
the cross-entropy between the predicted click and the DHCH [4] and SHARE [10]
ground truth. The cross-entropy loss function is defined
as follows:                                                 4.1.3. Evaluation Metrics.
                  𝑛
                 βˆ‘οΈ                                       We used two common accuracy metrics, 𝑃 @𝐾 = 20, 10
     β„’(yΜ‚) = βˆ’        y𝑖 π‘™π‘œπ‘”(ŷ𝑖 ) + (1 βˆ’ y𝑖 )π‘™π‘œπ‘”(1 βˆ’ ŷ𝑖 ) (15)
                                                          and 𝑀 𝑅𝑅@𝐾 = 20, 10, for evaluation. P@K evalu-
              𝑖=1
                                                          ates the proportion of correctly recommended unranked
where y is the one-hot encoding of the ground truth items, while MRR@K evaluates the position of the cor-
items. Adam optimizer is then used to optimize the cross- rectly recommended ranked items.
entropy loss.
                                                          4.1.4. Hyperparameter Setup.

4. Performance Evaluation                                 We used the same hyperparameters similar to previous
                                                          models [2, 4, 10]. we set the hidden dimension in all ex-
In this section we aim to answer the following questions: periments to 𝑑 = 100, learning rate for Adam optimizer
   RQ1. How does the proposed SR-DGN model compare set to 0.001 with a decay of 0.1 after every 3 training
against the existing state-of-the-art baseline models?    epochs. 𝑙2 norm and batch size were set to 10βˆ’5 and 100
   RQ2. How does the proposed SR-DGN sequential and respectively on all datasets.
non-sequential networks compare against each other?
                                                                         4.2. Comparison with Baseline
4.1. Experimental Configurations                                         We compare the performance of SR-DGN with the ex-
4.1.1. Datasets                                                          isting baseline models in terms of 𝑃 @𝐾 = 20, 10 and
                                                                         𝑀 𝑅𝑅@𝐾 = 20, 10 on Yoochoose 1/64, RetailRocket
Three popular publicly available datasets, Yoochoose 1 ,                 and Diginetica datasets. Table 1 shows the performance
RetailRocket2 and Diginetica3 were used to evaluate the                  with the best performance highlighted in boldface. It
performance of the proposed model. The Yoochoose                         can be seen that SR-DGN outperforms the best baseline
dataset was obtained from the RecSys challenge 2015.                     models on all datasets. It is evident that, using both di-
RetailRocket dataset contains 6 months personalized                      rected and undirected graphs can potentially improve
transactions from an e-commerce site available on                        the overall performance of graph neural network models
Kaggle while the Diginetica dataset is from the CIKM                     for session-based recommendation.
2016 Cup. All datasets consist of transactional data                        From Table 1, it can also be seen that all deep learning
from e-commerce sites. We used similar pre-processing                    models outperformed FPMC the traditional model except
with [2, 10] by removing the items occurring less than                   GRU4Rec. It can also be seen that on the RetailRocket
5 times and the session of length less than 2. We used                   dataset, STAMP (non-sequential model) outperformed
the last week transactions for testing in all datasets.                  NARM (sequential model). However, on the Diginetica
Similar to existing models, we augment the training                      dataset, the reverse case can be observed. These perfor-
sessions by splitting the input sequence. For exam-                      mances support our argument that both the sequential
ple, from the sequence 𝑠 = [𝑣𝑠,1 , 𝑣𝑠,2 , . . . , 𝑣𝑠,𝑛 ]                 and non-sequential architecture for learning item rep-
we generate the following input sequence:                                resentation can be complementary. Despite the simple
([𝑣𝑠,1 ], 𝑣𝑠,2 ), . . . , ([𝑣𝑠,1 , 𝑣𝑠,2 , . . . , 𝑣𝑠,π‘›βˆ’1 ], 𝑣𝑠,𝑛 ) and   architecture of our sequential and non-sequential models,
used the most recent 1/64 portion of the Yoochoose                       SR-DGN was able to outperform more complex models
dataset.                                                                 like DHCN that uses self-supervised learning with both
                                                                         intra- and inter- session information.
1
    http://2015.recsyschallenge.com/challege.html
2
    https://www.kaggle.com/retailrocket/ecommerce-dataset
3
    http://cikm2016.cs.iupui.edu/cikm-cup
Table 1
Performance of SR-DGN compared with other baseline models. The boldface is the best result over all methods and * denotes
the significant difference for t-test. (all values are in percentages)

                                                                                   Models
 Dataset             Metrics
                                 FMPC     GRU4Rec      NARM        STAMP      SR-GNN 𝑆 2 -DHCN        SHARE         GGNN      SGC     SR-DGN
 Yoochoose 1/64      P@20        45.62       60.64      68.32        68.74       70.57      70.39       71.17         71.06   71.32   71.70
                    MRR@20       15.01       22.89      28.63        29.67       30.94      29.92       31.06         31.32   31.27   31.51
                     P@10        32.01       52.45      57.50        58.07       60.09      59.18       60.59         60.60   60.71   61.29*
                    MRR@10       14.35       21.53      27.97        28.92       30.69      28.54       30.78         30.69   30.52   30.79
 Diginetica          P@20        26.53       29.45      49.70        45.64       50.73      53.18       52.73         51.83   52.68   53.42*
                    MRR@20       6.95        8.33       16.17        14.32       17.59      18.44       18.05         17.99   18.63   18.66
                     P@10        15.43       17.93      35.44        33.98       36.86      39.87       39.52         38.62   39.65   40.20*
                    MRR@10       6.20        7.33       15.13        14.26       15.52      17.53       17.12         17.07   17.73   17.75
 RetailRocket        P@20        32.37       44.01      50.22        50.96       50.32      53.66       54.00         54.55   54.72   55.85*
                    MRR@20       13.82       23.67      24.59        25.17       26.57      27.30       27.12         29.39   29.23   29.77
                     P@10        25.99       38.35      42.07        42.95       43.21      46.15       46.21         47.36   47.44   48.20*
                    MRR@10       13.38       23.27      24.88        24.61       26.07      26.85       26.61         28.98   28.73   29.24*



4.3. Comparison with GGNN and SGC                               Table 2
                                                                Performance comparison of aggregation methods in SR-DGN
We compare the performance of GGNN [5] (sequential)             (all values are in percentages)
and SGC [6] (non-sequential) models with SR-DGN on
all the three datasets in terms of 𝑃 @𝐾 = 20, 10 and                  Datasets           Metrics     Max        Sum
𝑀 𝑅𝑅@𝐾 = 20, 10. Table 1 shows that on all the                        Yoochoose 1/64      P@20      71.65       71.70
datasets, the combined SR-DGN model outperformed                                         MRR@20     31.19       31.51
both GGNN and SGC. Generally SGC outperforms GGNN                                         P@10      61.16       61.29
on Precision metrics while GGNN outperforms SGC on                                       MRR@10     30.45       30.79
MRR metrics. To ensure good performance of differ-                    Diginetica          P@20      52.73       53.42
ent datasets, considering both the sequential and non-                                   MRR@20     18.45       18.66
sequential models as in the case of our proposed SR-DGN                                   P@10      39.69       40.20
may be the solution.                                                                     MRR@10     17.54       17.75
                                                                      RetailRocket        P@20      55.11       55.85
4.4. Ablation Study                                                                      MRR@20     29.55       29.77
                                                                                          P@10      49.01       48.20
SR-DGN uses summation to aggregate the sequential and                                    MRR@10     29.06       29.24
non-sequential unnormalized scores. We compare the
performance of summation to max aggregation method.
Table 2 shows the performance of summation aggrega-             spectively. Using shared learnable parameters, SR-DGN
tion method against max aggregation. It can be seen that        learns the global and local user preferences from each
on all datasets and on metrics, the summation method            of the item embedding learnt. For making recommen-
outperforms the max methods. It results are intuitive           dation, SR-DGN selects the max of the sequential and
since with summation, items with overall highest proba-         non-sequential scores. Experimental results showed that,
bilites are recommended. It also further demonstrate            SR-DGN outperformed state-of-the-art models on three
the advantage of using both the sequential and non-             benchmark datasets. Further analysis revealed that, for
sequential networks in SR-DGN.                                  some datasets, non-sequential model outperforms se-
                                                                quential and the reverse is true for some other datasets.
5. Conclusion                                                   SR-SGN takes advantage of both scenarios to achieve
                                                                better performance.
In this paper, we proposed SR-DGN, a graph neural net-
work model for session-based recommendation. SR-DGN
constructs a directed and an undirected graph for each Acknowledgments
session and learns sequential and non-sequential item This project was partially supported by Grants from Nat-
embedding using sequential and non-sequential GNN re- ural Science Foundation of China 62176247. It was also
supported by the Fundamental Research Funds for the          [9] J. Li, P. Ren, Z. Chen, Z. Ren, T. Lian, J. Ma, Neu-
Central Universities and CAS/TWAS Presidential Fellow-           ral attentive session-based recommendation, in:
ship for International Doctoral Students.                        Proceedings of the 2017 ACM on Conference on
                                                                 Information and Knowledge Management, CIKM
                                                                 ’17, Association for Computing Machinery, New
References                                                       York, NY, USA, 2017, p. 1419–1428. URL: https:
                                                                 //doi.org/10.1145/3132847.3132926. doi:10.1145/
[1] B. Hidasi, A. Karatzoglou, L. Baltrunas, D. Tikk,
                                                                 3132847.3132926.
    Session-based recommendations with recurrent
                                                            [10] J. Wang, K. Ding, Z. Zhu, J. Caverlee, Session-based
    neural networks, 2016. arXiv:1511.06939.
                                                                 recommendation with hypergraph attention net-
[2] S. Wu, Y. Tang, Y. Zhu, L. Wang, X. Xie, T. Tan,
                                                                 works, Proceedings of the 2021 SIAM International
    Session-based recommendation with graph neu-
                                                                 Conference on Data Mining (SDM) (2021) 82–90.
    ral networks, Proceedings of the AAAI Confer-
                                                                 URL: http://dx.doi.org/10.1137/1.9781611976700.10.
    ence on Artificial Intelligence 33 (2019) 346–353.
                                                                 doi:10.1137/1.9781611976700.10.
    URL: http://dx.doi.org/10.1609/aaai.v33i01.3301346.
    doi:10.1609/aaai.v33i01.3301346.
[3] Q. Liu, Y. Zeng, R. Mokhosi, H. Zhang, Stamp:
    Short-term attention/memory priority model for
    session-based recommendation, in: Proceedings of
    the 24th ACM SIGKDD International Conference on
    Knowledge Discovery and Data Mining, KDD ’18,
    Association for Computing Machinery, New York,
    NY, USA, 2018, p. 1831–1839. URL: https://doi.org/
    10.1145/3219819.3219950. doi:10.1145/3219819.
    3219950.
[4] X. Xia, H. Yin, J. Yu, Q. Wang, L. Cui, X. Zhang, Self-
    supervised hypergraph convolutional networks for
    session-based recommendation, in: Proceedings
    of the AAAI Conference on Artificial Intelligence,
    AAAI ’21, 2021, pp. 4503–4511. URL: https://ojs.aaai.
    org/index.php/AAAI/article/view/16578.
[5] Y. Li, R. Zemel, M. Brockschmidt, D. Tarlow, Gated
    graph sequence neural networks, in: Proceedings
    of ICLR’16, 2016.
[6] F. Wu, A. Souza, T. Zhang, C. Fifty, T. Yu, K. Wein-
    berger, Simplifying graph convolutional networks,
    in: K. Chaudhuri, R. Salakhutdinov (Eds.), Proceed-
    ings of the 36th International Conference on Ma-
    chine Learning, volume 97 of Proceedings of Ma-
    chine Learning Research, PMLR, 2019, pp. 6861–6871.
    URL: http://proceedings.mlr.press/v97/wu19e.html.
[7] A. Zimdars, D. M. Chickering, C. Meek, Using
    temporal data for making recommendations, in:
    Proceedings of the Seventeenth Conference on Un-
    certainty in Artificial Intelligence, UAI’01, Morgan
    Kaufmann Publishers Inc., San Francisco, CA, USA,
    2001, p. 580–588.
[8] S. Rendle, C. Freudenthaler, L. Schmidt-Thieme,
    Factorizing personalized markov chains for next-
    basket recommendation, in: Proceedings of the
    19th International Conference on World Wide Web,
    WWW ’10, Association for Computing Machinery,
    New York, NY, USA, 2010, p. 811–820. URL: https:
    //doi.org/10.1145/1772690.1772773. doi:10.1145/
    1772690.1772773.