1. Introduction

Session-based Recommendation with Dual Graph Networks

Tajuddeen Rabiu Gwadabe

Mohammed Ali Mohammed Al-hababi

Ying Liu

0 0 School of Computer Science and Technology, University of Chinese Academy of Sciences (UCAS) , China

Session-based recommendation task aims at predicting the next item an anonymous user might click. Recently, graph neural networks have gained a lot of attention in this task. Existing models either construct a directed graph or a hypergraph and learn item embedding using some form of graph neural networks. We argue that constructing both directed and undirected graphs for each session may outperform either method since for some sessions the sequence of interaction may be relevant while for others it may not be relevant. In this paper, we propose a novel Session-based Recommendation model with Dual Graph Networks (SR-DGN). SR-DGN constructs a directed and an undirected graph from each session and learns both sequential and non-sequential item representation using sequential and non-sequential graph neural networks models respectively. Using shared learnable parameters, SR-DGN learns global and local user preferences for each network and uses the network with the best scores for recommendation. Experiments conducted on three real-world datasets showed its superiority over state-of-the-art models.

eol>session-based recommendation graph neural networks directed and undirected graphs

1. Introduction

els like DHCN [ 4 ] proposed constructing a hypergraph for a session and learning item representation on a hyRecommender systems have become an essential com- pergraph convolutional network that also neglects the ponent of the internet user experience as they assist sequence of interactions between items. consumers sift through the ever-increasing volume of This has led to two thought classes. Either consider information. Some online sites allow non-login users, the sequence of interactions between items since users inhowever, the recommender systems have to rely on the teracted with items sequentially or neglect the sequence current anonymous session exclusively for making rec- since item order may not be relevant since users interommendations. Session-based recommender systems aim act with the items in an online setting. However, both at providing relevant recommendations to such anony- thoughts have merit. For example, on an e-commerce mous users. site, buying a particular brand of phone might influence

Recent developments in deep learning architectures buying a screen guard - hence the sequence might be have resulted in researchers focusing on using these ar- relevant. However, buying household supplies such as chitectures in session-based recommendation task. Re- tissue might not influence buying any other particular current neural networks [ 1 ] were first proposed to learn item - hence the sequence might be irrelevant. We argue the sequential interaction between items in a session. that the two thought classes might be complementary to More recently, Graph Neural Networks (GNN) have been each other. That is, for some sessions, considering the proposed for session-based recommendation [ 2 ]. These sequence is relevant while for some sessions it might be models construct directed graphs for each session and irrelevant. learn item representation using the sequential Gated To this end, we propose a Session-based RecommenGraph Neural Networks (GGNN). On the other hand dation model with Dual Graph Networks, SR-DGN. SRmemory network models like STAMP [ 3 ] have shown DGN first constructs two graph networks - a directed that the order of the sequence may not be important in and an indirect graph for each session and learns item session-based recommendation and proposed session- representation using sequential and non-sequential GNN based recommendation model that does not depend on models respectively. From the individual item representhe sequence of interactions. Similarly, hypergraph mod- tations, SR-DGN learns local and global user preferences. Each network will present a score for each item and the network with the best score is used for making the recommendation. Our main contributions are summarized as follows: • SR-DGN proposed using two graph networks directed and undirected graph for each session and learns item representations using sequential and non-sequential GNN models. For learning sequential item representation, SR-DGN uses GGNN [ 5 ], while for learning non-sequential item representation, SR-DGN uses an SGC [ 6 ] with a gating highway connection. • SR-DGN learns local and global user preferences from each graph network using shared learnable parameters between the networks. Then, each network in SR-DGN provides scores for each item and the network with the best score is used for making the recommendation. • Experimental results on three benchmark datasets demonstrate the efectiveness of SR-DGN. Further analysis showed that while the sequential network performs better on some datasets, the non-sequential network perform better on others. These further prove the efectiveness of using both networks for the session-based recommendation task.

2. Related Works

3. SR-DGN

3.1. Problem Statement and Graph Construction

Session-based recommendation aims to predict the next click of an anonymous user session. For a dataset with distinct items set = 1, 2, . . . , , let an anonymous session be represented by the ordered list, = [,1, ,2, . . . , ,− 1] where , ∈ is a clicked item within the session , session-based recommendation aims to recommend the next item to be clicked, ,. The output of SR-DGN is a ranked probability score for all the candidate items where the top-K items based on the probabilities yˆ will be recommended as the potential next clicks.

For each session , our model constructs a directed and an undirected graph = (, ℰ) and = (, ℰ) respectively. For both graphs, ∈ and ∈ if is clicked within the current session. A directed edge (− 1, ) ∈ ℰ exists from − 1 to if item was clicked immediately after − 1. An undirected edge (− 1, ) ∈ ℰ exists between − 1 to if item was clicked before or after item − 1. For the directed graph, we normalized the outgoing and incoming adjacency matrices by the degree of the outgoing node. The overview of the SR-DGN model is given in Figure 1.

Session-based recommendation models use the implicit temporal feedbacks of users such as clicks obtained by tracking user activities. Traditional machine learning models such as Markov Chain (MC) models have been used for sequential recommendation. Zimdars et al. [ 7 ] proposed extracting sequential patterns from sessions and predicting the next click using decision tree models. 3.2. Learning Sequential and FMPC [ 8 ] generalizes MC method and matrix factorization to model short term user preference and long-term Non-Sequential Item Embedding user preference respectively. However, MC models sufer We first transform all items ∈ into a unified emfrom the assumption of an independence relationship bedding space ∈ R, where is the dimension size. between the states in a sequence and an unmanageable Using this initial embedding, we learn sequential and state space when considering all the possible sequences. non-sequential item embedding, and respectively.

Recently, deep learning models have achieved the stateof-the-art performance in session-based recommenda- 3.2.1. Learning Sequential Item Embedding tion. Hidasi et al. [ 1 ] first proposed GRU4Rec, a recurrent neural network for session-based recommendation. The We use GGNN [ 5 ] similar to SR-GNN [ 2 ] for learning the model uses session-parallel mini-batches and pairwise sequential item representation. Given the incoming and ranking loss. Liu et al. [ 9 ] proposed NARM, which uses outgoing adjacency matrices and the initial item embedrecurrent neural network with attention mechanism to ding, GGNN updates the item embedding as follows: learn both the local and the global user preference. Li et al. [ 3 ] proposed using memory networks and showed a = A:[v1− 1, . . . , v− 1] H1 + b1, (1) tsharayt.mWoudeeltlianlg. [t2h]ecsoenqsuterunctitaeldndaitruercetemdagyranpohtsbfeorneecaecsh- = (WaUv− 1), (2) session and learned the local and global user preferences = (WaUv− 1), (3) fuosrinegacGhGsNesNs.ioWnaanngdeptraol.p[o1s0e]dcaonhsytpruecrtgsraaphhyaptetregnrtaiponh vˆ = ℎ(Wa + U( ⊙ v− 1)) (4) network for recommendation. vˆ (5) v = (1 − ) ⊙ v− 1 + ⊙ where A: ∈ R12 is the -th row of the incoming and outgoing matrices. H1 ∈ R2 and b1 ∈ R are weight and bias parameters respectively. ∈ R and ∈ R are the reset and update gates respectively.

3.2.2. Learning Non-Sequential Item Embedding

To learn the non-sequential item representation, we used SGC [ 6 ] with a proposed highway connections. Formally, the update can is given by: a = A:[v1− 1, . . . , v− 1] H2 + b2,

v = g1 ⊙ a + (1 − g )

1 ⊙ v, g1 = Wg1 ([v; v]).

where h, is the i-th sequential item embedding and is the attention weight of the i-th timestamp given by: = q (W1h, + W1h, + ),

(10) (6) (7) (8) where the parameters q,W1 and W2 are learnable to control the additive attention. The final sequential session representation s is obtained by aggregating the sequential local and global preferences using a gating mechanism. Formally, the final session representation s is obtained as follows; where A: ∈ R1 is the -th row of the undirected graph adjacency matrix. H2 ∈ R and b2 ∈ R are weight and bias parameters respectively. g1 is the gating mechanism used to improve the performance of the non- g2 is the gating function obtained by; sequential item representation. s = g2 ⊙ s + (1 − g )

2 ⊙ s, g2 = Wg2 ([s; s]) (11) (12) 3.3. Learning Session Embedding Wg2 ∈ R2× is a trainable transformation matrix and From the sequential and non-sequential item embedding, [;] is a concatenation operation. From the non-sequential we learn the local and the global user preferences for each item embedding, using the same learnable parameters, network using shared learnable parameters. Considering the final non-sequential representation, s can be obthe sequential item embedding, to obtain the final session tained. embedding, we use a gating mechanism that aggregates the global and the local user preferences. The sequential 3.4. Making Recommendation local user preference s is obtained from the sequential embedding of the last clicked item while the sequential global preference s is obtained from the sequential embedding of all clicked items in a session using additive an attention mechanism. Formally, the sequential global preference s is given by; From the sequential and non-sequential final session representations, the sequential and non-sequential unnormalized scores of each candidate item ∈ can be obtained by multiplying the item embedding v with the each the corresponding final session representation. The sequential unnormalized score ˆz, is defined as: s = ∑︁ h,, (9)

ˆz = sv.

(13) The non-sequential unnormalized score ˆz is obtained in similar way. For the recommendation, we use the sum of the two unnormalized scores. A softmax is then applied to calculate the normalized probability output vector of the model yˆ as follows: yˆ = (ˆz)

(14) where ˆz ∈ R is the sum unnormalized score of the sequential and non-sequential scores and yˆ = R is the probability of each item to be the next click in session .

For any given session, the loss function is defined as the cross-entropy between the predicted click and the ground truth. The cross-entropy loss function is defined as follows: We compare the performance of our proposed SR-DGN model with traditional and deep learning representative baseline models. The traditional baseline model used is Factorized Personalized Markov Chain model (FPMC) [ 8 ].

The deep learning baselines include RNN-based models GRU4Rec [ 1 ], RNN with attention model (NARM) [ 9 ], memory-based with attention model (STAMP) [ 3 ], directed graph model SR-GNN [ 2 ] and hypergraph models DHCH [ 4 ] and SHARE [ 10 ] 4.1.3. Evaluation Metrics.

We used two common accuracy metrics, @ = 20, 10 ℒ(yˆ) = − ∑︁ y(yˆ) + (1 − y)(1 − yˆ) (15) and @ = 20, 10, for evaluation. P@K evalu=1 ates the proportion of correctly recommended unranked where y is the one-hot encoding of the ground truth items, while MRR@K evaluates the position of the coritems. Adam optimizer is then used to optimize the cross- rectly recommended ranked items. entropy loss.

4. Performance Evaluation

In this section we aim to answer the following questions:

RQ1. How does the proposed SR-DGN model compare against the existing state-of-the-art baseline models?

RQ2. How does the proposed SR-DGN sequential and non-sequential networks compare against each other?

4.1. Experimental Configurations 4.1.1. Datasets

Three popular publicly available datasets, Yoochoose 1, RetailRocket2 and Diginetica3 were used to evaluate the performance of the proposed model. The Yoochoose dataset was obtained from the RecSys challenge 2015. RetailRocket dataset contains 6 months personalized transactions from an e-commerce site available on Kaggle while the Diginetica dataset is from the CIKM 2016 Cup. All datasets consist of transactional data from e-commerce sites. We used similar pre-processing with [ 2, 10 ] by removing the items occurring less than 5 times and the session of length less than 2. We used the last week transactions for testing in all datasets. Similar to existing models, we augment the training sessions by splitting the input sequence. For example, from the sequence = we generate the following [,1, ,2, . . . , ,] input sequence: ([,1], ,2), . . . , ([,1, ,2, . . . , ,− 1], ,) and used the most recent 1/64 portion of the Yoochoose dataset. 1http://2015.recsyschallenge.com/challege.html 2https://www.kaggle.com/retailrocket/ecommerce-dataset 3http://cikm2016.cs.iupui.edu/cikm-cup

4.1.4. Hyperparameter Setup.

We used the same hyperparameters similar to previous models [ 2, 4, 10 ]. we set the hidden dimension in all experiments to = 100, learning rate for Adam optimizer set to 0.001 with a decay of 0.1 after every 3 training epochs. 2 norm and batch size were set to 10− 5 and 100 respectively on all datasets.

4.2. Comparison with Baseline

We compare the performance of SR-DGN with the existing baseline models in terms of @ = 20, 10 and @ = 20, 10 on Yoochoose 1/64, RetailRocket and Diginetica datasets. Table 1 shows the performance with the best performance highlighted in boldface. It can be seen that SR-DGN outperforms the best baseline models on all datasets. It is evident that, using both directed and undirected graphs can potentially improve the overall performance of graph neural network models for session-based recommendation.

From Table 1, it can also be seen that all deep learning models outperformed FPMC the traditional model except GRU4Rec. It can also be seen that on the RetailRocket dataset, STAMP (non-sequential model) outperformed NARM (sequential model). However, on the Diginetica dataset, the reverse case can be observed. These performances support our argument that both the sequential and non-sequential architecture for learning item representation can be complementary. Despite the simple architecture of our sequential and non-sequential models, SR-DGN was able to outperform more complex models like DHCN that uses self-supervised learning with both intra- and inter- session information.

Max 71.65 31.19 61.16 30.45 52.73 18.45 39.69 17.54 55.11 29.55 49.01 29.06

Sum 71.70 31.51 61.29 30.79 53.42 18.66 40.20 17.75 55.85 29.77 48.20 29.24 spectively. Using shared learnable parameters, SR-DGN learns the global and local user preferences from each of the item embedding learnt. For making recommendation, SR-DGN selects the max of the sequential and non-sequential scores. Experimental results showed that, SR-DGN outperformed state-of-the-art models on three benchmark datasets. Further analysis revealed that, for some datasets, non-sequential model outperforms sequential and the reverse is true for some other datasets. SR-SGN takes advantage of both scenarios to achieve better performance.

Acknowledgments

This project was partially supported by Grants from Natural Science Foundation of China 62176247. It was also Dataset Yoochoose 1/64 Diginetica RetailRocket

Metrics 45.62 15.01 32.01 14.35

4.3. Comparison with GGNN and SGC

We compare the performance of GGNN [ 5 ] (sequential) and SGC [ 6 ] (non-sequential) models with SR-DGN on all the three datasets in terms of @ = 20, 10 and @ = 20, 10. Table 1 shows that on all the datasets, the combined SR-DGN model outperformed both GGNN and SGC. Generally SGC outperforms GGNN on Precision metrics while GGNN outperforms SGC on MRR metrics. To ensure good performance of diferent datasets, considering both the sequential and nonsequential models as in the case of our proposed SR-DGN may be the solution.

4.4. Ablation Study

SR-DGN uses summation to aggregate the sequential and non-sequential unnormalized scores. We compare the performance of summation to max aggregation method. Table 2 shows the performance of summation aggregation method against max aggregation. It can be seen that on all datasets and on metrics, the summation method outperforms the max methods. It results are intuitive since with summation, items with overall highest probabilites are recommended. It also further demonstrate the advantage of using both the sequential and nonsequential networks in SR-DGN.

5. Conclusion

In this paper, we proposed SR-DGN, a graph neural network model for session-based recommendation. SR-DGN constructs a directed and an undirected graph for each session and learns sequential and non-sequential item embedding using sequential and non-sequential GNN resupported by the Fundamental Research Funds for the Central Universities and CAS/TWAS Presidential Fellowship for International Doctoral Students.

[1]

Hidasi ,

Karatzoglou ,

Baltrunas ,

Tikk , Session-based recommendations with recurrent neural networks , 2016 . arXiv: 1511 . 06939 .

[2]

Wu ,

Tang ,

Zhu ,

Wang ,

Xie ,

Tan , Session-based recommendation with graph neural networks , Proceedings of the AAAI Conference on Artificial Intelligence 33 ( 2019 ) 346 - 353 . URL: http://dx.doi.org/10.1609/aaai.v33i01.3301346. doi: 10 .1609/aaai.v33i01. 3301346 .

[3]

Liu ,

Zeng ,

Mokhosi ,

Zhang , Stamp: Short-term attention/memory priority model for session-based recommendation , in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD '18, Association for Computing Machinery, New York, NY, USA, 2018 , p. 1831 - 1839 . URL: https://doi.org/ 10.1145/3219819.3219950. doi: 10 .1145/3219819. 3219950.

[4]

Xia ,

Yin ,

Yu ,

Wang ,

Cui , X. Zhang, Selfsupervised hypergraph convolutional networks for session-based recommendation , in: Proceedings of the AAAI Conference on Artificial Intelligence , AAAI ' 21 , 2021 , pp. 4503 - 4511 . URL: https://ojs.aaai. org/index.php/AAAI/article/view/16578.

[5]

Li ,

Zemel ,

Brockschmidt ,

Tarlow , Gated graph sequence neural networks , in: Proceedings of ICLR'16 , 2016 .

[6]

Wu ,

Souza ,

Zhang , C. Fifty,

Yu ,

Weinberger , Simplifying graph convolutional networks , in: K. Chaudhuri, R. Salakhutdinov (Eds.), Proceedings of the 36th International Conference on Machine Learning , volume 97 of Proceedings of Machine Learning Research, PMLR , 2019 , pp. 6861 - 6871 . URL: http://proceedings.mlr.press/v97/wu19e.html.

[7]

Zimdars ,

D. M.

Chickering ,

Meek , Using temporal data for making recommendations , in: Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence , UAI' 01 , Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2001 , p. 580 - 588 .

[8]

Rendle ,

Freudenthaler ,

Schmidt-Thieme , Factorizing personalized markov chains for nextbasket recommendation , in: Proceedings of the 19th International Conference on World Wide Web, WWW '10 , Association for Computing Machinery, New York, NY, USA, 2010 , p. 811 - 820 . URL: https: //doi.org/10.1145/1772690.1772773. doi: 10 .1145/ 1772690.1772773.

[9]

Li ,

Ren ,

Chen ,

Ren ,

Lian , J. Ma, Neural attentive session-based recommendation , in: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management , CIKM '17, Association for Computing Machinery, New York, NY, USA, 2017 , p. 1419 - 1428 . URL: https: //doi.org/10.1145/3132847.3132926. doi: 10 .1145/ 3132847.3132926.

[10]

Wang ,

Ding ,

Zhu ,

Caverlee , Session-based recommendation with hypergraph attention networks , Proceedings of the 2021 SIAM International Conference on Data Mining (SDM) ( 2021 ) 82 - 90 . URL: http://dx.doi.org/10.1137/1.9781611976700.10. doi: 10 .1137/1.9781611976700.10.