=Paper=
{{Paper
|id=Vol-3317/paper10
|storemode=property
|title=Session-based Recommendation with Dual Graph Networks
|pdfUrl=https://ceur-ws.org/Vol-3317/Paper10.pdf
|volume=Vol-3317
|authors=Tajuddeen Rabiu Gwadabe,Mohammed Ali Mohammed Al-hababi,Ying Liu
|dblpUrl=https://dblp.org/rec/conf/cikm/GwadabeAL22
}}
==Session-based Recommendation with Dual Graph Networks==
Session-based Recommendation with Dual Graph Networks Tajuddeen Rabiu Gwadabe1,* , Mohammed Ali Mohammed Al-hababi 1 and Ying Liu1 1 School of Computer Science and Technology, University of Chinese Academy of Sciences (UCAS), China Abstract Session-based recommendation task aims at predicting the next item an anonymous user might click. Recently, graph neural networks have gained a lot of attention in this task. Existing models either construct a directed graph or a hypergraph and learn item embedding using some form of graph neural networks. We argue that constructing both directed and undirected graphs for each session may outperform either method since for some sessions the sequence of interaction may be relevant while for others it may not be relevant. In this paper, we propose a novel Session-based Recommendation model with Dual Graph Networks (SR-DGN). SR-DGN constructs a directed and an undirected graph from each session and learns both sequential and non-sequential item representation using sequential and non-sequential graph neural networks models respectively. Using shared learnable parameters, SR-DGN learns global and local user preferences for each network and uses the network with the best scores for recommendation. Experiments conducted on three real-world datasets showed its superiority over state-of-the-art models. Keywords session-based recommendation, graph neural networks, directed and undirected graphs, 1. Introduction els like DHCN [4] proposed constructing a hypergraph for a session and learning item representation on a hy- Recommender systems have become an essential com- pergraph convolutional network that also neglects the ponent of the internet user experience as they assist sequence of interactions between items. consumers sift through the ever-increasing volume of This has led to two thought classes. Either consider information. Some online sites allow non-login users, the sequence of interactions between items since users in- however, the recommender systems have to rely on the teracted with items sequentially or neglect the sequence current anonymous session exclusively for making rec- since item order may not be relevant since users inter- ommendations. Session-based recommender systems aim act with the items in an online setting. However, both at providing relevant recommendations to such anony- thoughts have merit. For example, on an e-commerce mous users. site, buying a particular brand of phone might influence Recent developments in deep learning architectures buying a screen guard - hence the sequence might be have resulted in researchers focusing on using these ar- relevant. However, buying household supplies such as chitectures in session-based recommendation task. Re- tissue might not influence buying any other particular current neural networks [1] were first proposed to learn item - hence the sequence might be irrelevant. We argue the sequential interaction between items in a session. that the two thought classes might be complementary to More recently, Graph Neural Networks (GNN) have been each other. That is, for some sessions, considering the proposed for session-based recommendation [2]. These sequence is relevant while for some sessions it might be models construct directed graphs for each session and irrelevant. learn item representation using the sequential Gated To this end, we propose a Session-based Recommen- Graph Neural Networks (GGNN). On the other hand dation model with Dual Graph Networks, SR-DGN. SR- memory network models like STAMP [3] have shown DGN first constructs two graph networks - a directed that the order of the sequence may not be important in and an indirect graph for each session and learns item session-based recommendation and proposed session- representation using sequential and non-sequential GNN based recommendation model that does not depend on models respectively. From the individual item represen- the sequence of interactions. Similarly, hypergraph mod- tations, SR-DGN learns local and global user preferences. Each network will present a score for each item and the DL4SRβ22: Workshop on Deep Learning for Search and Recommen- network with the best score is used for making the rec- dation, co-located with the 31st ACM International Conference on Information and Knowledge Management (CIKM), October 17-21, 2022, ommendation. Our main contributions are summarized Atlanta, USA as follows: * Corresponding author. $ tgwadabe@mails.ucas.ac.cn (T. R. Gwadabe); β’ SR-DGN proposed using two graph networks - mohammed_al-hababi@mails.ucas.ac.cn (M. A. M. A. ); directed and undirected graph for each session yingliu@ucas.ac.cn (Y. Liu) and learns item representations using sequen- Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) tial and non-sequential GNN models. For learn- ing sequential item representation, SR-DGN uses 3. SR-DGN GGNN [5], while for learning non-sequential item representation, SR-DGN uses an SGC [6] with a 3.1. Problem Statement and Graph gating highway connection. Construction β’ SR-DGN learns local and global user preferences from each graph network using shared learnable Session-based recommendation aims to predict the next parameters between the networks. Then, each click of an anonymous user session. For a dataset with network in SR-DGN provides scores for each item distinct items set π = π£1 , π£2 , . . . , π£π , let an anony- and the network with the best score is used for mous session π be represented by the ordered list, π = making the recommendation. [π£π ,1 , π£π ,2 , . . . , π£π ,π‘β1 ] where π£π ,π β π is a clicked item within the session π , session-based recommendation aims β’ Experimental results on three benchmark to recommend the next item to be clicked, π£π ,π‘ . The out- datasets demonstrate the effectiveness of put of SR-DGN is a ranked probability score for all the SR-DGN. Further analysis showed that while candidate items where the top-K items based on the prob- the sequential network performs better on abilities yΜ will be recommended as the potential next some datasets, the non-sequential network clicks. perform better on others. These further prove For each session π , our model constructs a directed and the effectiveness of using both networks for the an undirected graph π’π = (π±π , β°π ) and π’π = (π±π , β°π ) session-based recommendation task. respectively. For both graphs, π£π β ππ and π£π β ππ if π£π is clicked within the current session. A directed 2. Related Works edge (π£πβ1 , π£π ) β β°π exists from π£πβ1 to π£π if item π£π was clicked immediately after π£πβ1 . An undirected edge Session-based recommendation models use the implicit (π£πβ1 , π£π ) β β°π exists between π£πβ1 to π£π if item π£π was temporal feedbacks of users such as clicks obtained by clicked before or after item π£πβ1 . For the directed graph, tracking user activities. Traditional machine learning we normalized the outgoing and incoming adjacency ma- models such as Markov Chain (MC) models have been trices by the degree of the outgoing node. The overview used for sequential recommendation. Zimdars et al. [7] of the SR-DGN model is given in Figure 1. proposed extracting sequential patterns from sessions and predicting the next click using decision tree models. 3.2. Learning Sequential and FMPC [8] generalizes MC method and matrix factoriza- tion to model short term user preference and long-term Non-Sequential Item Embedding user preference respectively. However, MC models suffer We first transform all items π£π β π into a unified em- from the assumption of an independence relationship bedding space π£π β Rπ , where π is the dimension size. between the states in a sequence and an unmanageable Using this initial embedding, we learn sequential and state space when considering all the possible sequences. non-sequential item embedding, π£ππ and π£ππ respectively. Recently, deep learning models have achieved the state- of-the-art performance in session-based recommenda- 3.2.1. Learning Sequential Item Embedding tion. Hidasi et al. [1] first proposed GRU4Rec, a recurrent neural network for session-based recommendation. The We use GGNN [5] similar to SR-GNN [2] for learning the model uses session-parallel mini-batches and pairwise sequential item representation. Given the incoming and ranking loss. Liu et al. [9] proposed NARM, which uses outgoing adjacency matrices and the initial item embed- recurrent neural network with attention mechanism to ding, GGNN updates the item embedding as follows: learn both the local and the global user preference. Li et al. [3] proposed using memory networks and showed aπ‘ππ = Aππ : [vπ‘β1 1 , . . . , vπ‘β1 π π ] H1 + b1 , (1) that modelling the sequential nature may not be neces- π§ππ‘ = π(Wπ§ aπ‘ππ Uπ§ vπ‘β1 ), (2) sary. Wu et al. [2] constructed directed graphs for each π session and learned the local and global user preferences πππ‘ = π(Wπ aπ‘ππ Uπ vπ‘β1 π ), (3) using GGNN. Wang et al. [10] constructs a hypergraph vΜπ‘π = π‘ππβ(Wπ aπ‘ππ + Uπ (πππ‘ β vπ‘β1 )) (4) for each session and proposed a hypergraph attention π network for recommendation. vππ = (1 β π§ππ‘ ) β vπ‘β1 π + π§ππ‘ β vΜπ‘π (5) where Aππ : β R is the π-th row of the incoming 1π₯2π and outgoing matrices. H1 β Rππ₯2π and b1 β Rπ are weight and bias parameters respectively. π§ππ‘ β Rππ₯π and πππ‘ β Rππ₯π are the reset and update gates respectively. Figure 1: Overview of SR-DGN model. For each session π , directed and undirected graphs are constructed and sequential and non-sequential item representations are learned using sequential and non-sequential GNN models respectively. Using shared learnable parameters, the final sequential and non-sequential session representation is learned. Finally, the best prediction is selected for making recommendations. 3.2.2. Learning Non-Sequential Item Embedding where hπ ,π is the i-th sequential item embedding and πΌπ is the attention weight of the i-th timestamp given by: To learn the non-sequential item representation, π£ππ we used SGC [6] with a proposed highway connections. For- πΌπ = qπ π(W1 hπ ,π‘ + W1 hπ ,π + π), (10) mally, the update can is given by: where the parameters q,W1 and W2 are learnable to aπ‘ππ = Aπ‘ππ: [vπ‘β1 1 , . . . , vπ‘β1 π π ] H2 + b2 , (6) control the additive attention. The final sequential ses- sion representation sπ π is obtained by aggregating the vππ = g1 β aπ‘ππ + (1 β g1 ) β vπ , (7) sequential local and global preferences using a gating g1 = Wg1 ([vππ ; vπ ]). (8) mechanism. Formally, the final session representation sπ π is obtained as follows; where Aππ: β R 1π₯π is the π-th row of the undirected graph adjacency matrix. H2 β Rππ₯π and b2 β Rπ are sπ π = g2 β sππ + (1 β g2 ) β sππ , (11) weight and bias parameters respectively. g1 is the gating mechanism used to improve the performance of the non- g2 is the gating function obtained by; sequential item representation. g2 = Wg2 ([sππ ; sππ ]) (12) 3.3. Learning Session Embedding Wg2 β R2πΓπ is a trainable transformation matrix and From the sequential and non-sequential item embedding, [;] is a concatenation operation. From the non-sequential we learn the local and the global user preferences for each item embedding, using the same learnable parameters, network using shared learnable parameters. Considering the final non-sequential representation, sπ π can be ob- the sequential item embedding, to obtain the final session tained. embedding, we use a gating mechanism that aggregates the global and the local user preferences. The sequential 3.4. Making Recommendation local user preference sππ is obtained from the sequential From the sequential and non-sequential final session rep- embedding of the last clicked item while the sequential resentations, the sequential and non-sequential unnor- global preference sππ is obtained from the sequential malized scores of each candidate item π£π β π can be embedding of all clicked items in a session using additive obtained by multiplying the item embedding vπ with the an attention mechanism. Formally, the sequential global each the corresponding final session representation. The preference sππ is given by; sequential unnormalized score zΜππ , is defined as: π‘ zΜππ = sπππ vπ . (13) βοΈ sππ = πΌπ hπ ,π , (9) π The non-sequential unnormalized score zΜππ is obtained in similar way. For the recommendation, we use the sum of the two unnormalized scores. A softmax is then applied 4.1.2. Baseline to calculate the normalized probability output vector of We compare the performance of our proposed SR-DGN the model yΜπ as follows: model with traditional and deep learning representative yΜ = π ππ π‘πππ₯(zΜ) (14) baseline models. The traditional baseline model used is Factorized Personalized Markov Chain model (FPMC) [8]. where zΜ β R is the sum unnormalized score of the π The deep learning baselines include RNN-based models sequential and non-sequential scores and yΜ = Rπ is the GRU4Rec [1], RNN with attention model (NARM) [9], probability of each item to be the next click in session π . memory-based with attention model (STAMP) [3], di- For any given session, the loss function is defined as rected graph model SR-GNN [2] and hypergraph models the cross-entropy between the predicted click and the DHCH [4] and SHARE [10] ground truth. The cross-entropy loss function is defined as follows: 4.1.3. Evaluation Metrics. π βοΈ We used two common accuracy metrics, π @πΎ = 20, 10 β(yΜ) = β yπ πππ(yΜπ ) + (1 β yπ )πππ(1 β yΜπ ) (15) and π π π @πΎ = 20, 10, for evaluation. P@K evalu- π=1 ates the proportion of correctly recommended unranked where y is the one-hot encoding of the ground truth items, while MRR@K evaluates the position of the cor- items. Adam optimizer is then used to optimize the cross- rectly recommended ranked items. entropy loss. 4.1.4. Hyperparameter Setup. 4. Performance Evaluation We used the same hyperparameters similar to previous models [2, 4, 10]. we set the hidden dimension in all ex- In this section we aim to answer the following questions: periments to π = 100, learning rate for Adam optimizer RQ1. How does the proposed SR-DGN model compare set to 0.001 with a decay of 0.1 after every 3 training against the existing state-of-the-art baseline models? epochs. π2 norm and batch size were set to 10β5 and 100 RQ2. How does the proposed SR-DGN sequential and respectively on all datasets. non-sequential networks compare against each other? 4.2. Comparison with Baseline 4.1. Experimental Configurations We compare the performance of SR-DGN with the ex- 4.1.1. Datasets isting baseline models in terms of π @πΎ = 20, 10 and π π π @πΎ = 20, 10 on Yoochoose 1/64, RetailRocket Three popular publicly available datasets, Yoochoose 1 , and Diginetica datasets. Table 1 shows the performance RetailRocket2 and Diginetica3 were used to evaluate the with the best performance highlighted in boldface. It performance of the proposed model. The Yoochoose can be seen that SR-DGN outperforms the best baseline dataset was obtained from the RecSys challenge 2015. models on all datasets. It is evident that, using both di- RetailRocket dataset contains 6 months personalized rected and undirected graphs can potentially improve transactions from an e-commerce site available on the overall performance of graph neural network models Kaggle while the Diginetica dataset is from the CIKM for session-based recommendation. 2016 Cup. All datasets consist of transactional data From Table 1, it can also be seen that all deep learning from e-commerce sites. We used similar pre-processing models outperformed FPMC the traditional model except with [2, 10] by removing the items occurring less than GRU4Rec. It can also be seen that on the RetailRocket 5 times and the session of length less than 2. We used dataset, STAMP (non-sequential model) outperformed the last week transactions for testing in all datasets. NARM (sequential model). However, on the Diginetica Similar to existing models, we augment the training dataset, the reverse case can be observed. These perfor- sessions by splitting the input sequence. For exam- mances support our argument that both the sequential ple, from the sequence π = [π£π ,1 , π£π ,2 , . . . , π£π ,π ] and non-sequential architecture for learning item rep- we generate the following input sequence: resentation can be complementary. Despite the simple ([π£π ,1 ], π£π ,2 ), . . . , ([π£π ,1 , π£π ,2 , . . . , π£π ,πβ1 ], π£π ,π ) and architecture of our sequential and non-sequential models, used the most recent 1/64 portion of the Yoochoose SR-DGN was able to outperform more complex models dataset. like DHCN that uses self-supervised learning with both intra- and inter- session information. 1 http://2015.recsyschallenge.com/challege.html 2 https://www.kaggle.com/retailrocket/ecommerce-dataset 3 http://cikm2016.cs.iupui.edu/cikm-cup Table 1 Performance of SR-DGN compared with other baseline models. The boldface is the best result over all methods and * denotes the significant difference for t-test. (all values are in percentages) Models Dataset Metrics FMPC GRU4Rec NARM STAMP SR-GNN π 2 -DHCN SHARE GGNN SGC SR-DGN Yoochoose 1/64 P@20 45.62 60.64 68.32 68.74 70.57 70.39 71.17 71.06 71.32 71.70 MRR@20 15.01 22.89 28.63 29.67 30.94 29.92 31.06 31.32 31.27 31.51 P@10 32.01 52.45 57.50 58.07 60.09 59.18 60.59 60.60 60.71 61.29* MRR@10 14.35 21.53 27.97 28.92 30.69 28.54 30.78 30.69 30.52 30.79 Diginetica P@20 26.53 29.45 49.70 45.64 50.73 53.18 52.73 51.83 52.68 53.42* MRR@20 6.95 8.33 16.17 14.32 17.59 18.44 18.05 17.99 18.63 18.66 P@10 15.43 17.93 35.44 33.98 36.86 39.87 39.52 38.62 39.65 40.20* MRR@10 6.20 7.33 15.13 14.26 15.52 17.53 17.12 17.07 17.73 17.75 RetailRocket P@20 32.37 44.01 50.22 50.96 50.32 53.66 54.00 54.55 54.72 55.85* MRR@20 13.82 23.67 24.59 25.17 26.57 27.30 27.12 29.39 29.23 29.77 P@10 25.99 38.35 42.07 42.95 43.21 46.15 46.21 47.36 47.44 48.20* MRR@10 13.38 23.27 24.88 24.61 26.07 26.85 26.61 28.98 28.73 29.24* 4.3. Comparison with GGNN and SGC Table 2 Performance comparison of aggregation methods in SR-DGN We compare the performance of GGNN [5] (sequential) (all values are in percentages) and SGC [6] (non-sequential) models with SR-DGN on all the three datasets in terms of π @πΎ = 20, 10 and Datasets Metrics Max Sum π π π @πΎ = 20, 10. Table 1 shows that on all the Yoochoose 1/64 P@20 71.65 71.70 datasets, the combined SR-DGN model outperformed MRR@20 31.19 31.51 both GGNN and SGC. Generally SGC outperforms GGNN P@10 61.16 61.29 on Precision metrics while GGNN outperforms SGC on MRR@10 30.45 30.79 MRR metrics. To ensure good performance of differ- Diginetica P@20 52.73 53.42 ent datasets, considering both the sequential and non- MRR@20 18.45 18.66 sequential models as in the case of our proposed SR-DGN P@10 39.69 40.20 may be the solution. MRR@10 17.54 17.75 RetailRocket P@20 55.11 55.85 4.4. Ablation Study MRR@20 29.55 29.77 P@10 49.01 48.20 SR-DGN uses summation to aggregate the sequential and MRR@10 29.06 29.24 non-sequential unnormalized scores. We compare the performance of summation to max aggregation method. Table 2 shows the performance of summation aggrega- spectively. Using shared learnable parameters, SR-DGN tion method against max aggregation. It can be seen that learns the global and local user preferences from each on all datasets and on metrics, the summation method of the item embedding learnt. For making recommen- outperforms the max methods. It results are intuitive dation, SR-DGN selects the max of the sequential and since with summation, items with overall highest proba- non-sequential scores. Experimental results showed that, bilites are recommended. It also further demonstrate SR-DGN outperformed state-of-the-art models on three the advantage of using both the sequential and non- benchmark datasets. Further analysis revealed that, for sequential networks in SR-DGN. some datasets, non-sequential model outperforms se- quential and the reverse is true for some other datasets. 5. Conclusion SR-SGN takes advantage of both scenarios to achieve better performance. In this paper, we proposed SR-DGN, a graph neural net- work model for session-based recommendation. SR-DGN constructs a directed and an undirected graph for each Acknowledgments session and learns sequential and non-sequential item This project was partially supported by Grants from Nat- embedding using sequential and non-sequential GNN re- ural Science Foundation of China 62176247. It was also supported by the Fundamental Research Funds for the [9] J. Li, P. Ren, Z. Chen, Z. Ren, T. Lian, J. Ma, Neu- Central Universities and CAS/TWAS Presidential Fellow- ral attentive session-based recommendation, in: ship for International Doctoral Students. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM β17, Association for Computing Machinery, New References York, NY, USA, 2017, p. 1419β1428. URL: https: //doi.org/10.1145/3132847.3132926. doi:10.1145/ [1] B. Hidasi, A. Karatzoglou, L. Baltrunas, D. Tikk, 3132847.3132926. Session-based recommendations with recurrent [10] J. Wang, K. Ding, Z. Zhu, J. Caverlee, Session-based neural networks, 2016. arXiv:1511.06939. recommendation with hypergraph attention net- [2] S. Wu, Y. Tang, Y. Zhu, L. Wang, X. Xie, T. Tan, works, Proceedings of the 2021 SIAM International Session-based recommendation with graph neu- Conference on Data Mining (SDM) (2021) 82β90. ral networks, Proceedings of the AAAI Confer- URL: http://dx.doi.org/10.1137/1.9781611976700.10. ence on Artificial Intelligence 33 (2019) 346β353. doi:10.1137/1.9781611976700.10. URL: http://dx.doi.org/10.1609/aaai.v33i01.3301346. doi:10.1609/aaai.v33i01.3301346. [3] Q. Liu, Y. Zeng, R. Mokhosi, H. Zhang, Stamp: Short-term attention/memory priority model for session-based recommendation, in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD β18, Association for Computing Machinery, New York, NY, USA, 2018, p. 1831β1839. URL: https://doi.org/ 10.1145/3219819.3219950. doi:10.1145/3219819. 3219950. [4] X. Xia, H. Yin, J. Yu, Q. Wang, L. Cui, X. Zhang, Self- supervised hypergraph convolutional networks for session-based recommendation, in: Proceedings of the AAAI Conference on Artificial Intelligence, AAAI β21, 2021, pp. 4503β4511. URL: https://ojs.aaai. org/index.php/AAAI/article/view/16578. [5] Y. Li, R. Zemel, M. Brockschmidt, D. Tarlow, Gated graph sequence neural networks, in: Proceedings of ICLRβ16, 2016. [6] F. Wu, A. Souza, T. Zhang, C. Fifty, T. Yu, K. Wein- berger, Simplifying graph convolutional networks, in: K. Chaudhuri, R. Salakhutdinov (Eds.), Proceed- ings of the 36th International Conference on Ma- chine Learning, volume 97 of Proceedings of Ma- chine Learning Research, PMLR, 2019, pp. 6861β6871. URL: http://proceedings.mlr.press/v97/wu19e.html. [7] A. Zimdars, D. M. Chickering, C. Meek, Using temporal data for making recommendations, in: Proceedings of the Seventeenth Conference on Un- certainty in Artificial Intelligence, UAIβ01, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2001, p. 580β588. [8] S. Rendle, C. Freudenthaler, L. Schmidt-Thieme, Factorizing personalized markov chains for next- basket recommendation, in: Proceedings of the 19th International Conference on World Wide Web, WWW β10, Association for Computing Machinery, New York, NY, USA, 2010, p. 811β820. URL: https: //doi.org/10.1145/1772690.1772773. doi:10.1145/ 1772690.1772773.