Neural factorization for Offer Recommendation using Knowledge Graph Embeddings Gourab Chowdhury Madiraju Srilakshmi Indian Institute of Technology Kharagpur Indian Institute of Technology Kharagpur West Bengal, India West Bengal, India gourab@iitkgp.ac.in sreelakshmi@iitkgp.ac.in Mainak Chain Sudeshna Sarkar Indian Institute of Technology Kharagpur Indian Institute of Technology Kharagpur West Bengal, India West Bengal, India mainakchain@iitkgp.ac.in sudeshna@cse.iitkgp.ernet.in ABSTRACT We consider the case in which an offer is a flat discount for a Companies send a lot of promotional offers and coupons to cus- given brand and category combination. An example of an offer is tomers to attract them to buy more. Offer recommendation sys- "Rs 200 off on Polo T-shirts". There can be other types of offers tems can help to identify relevant offers to users. In this paper, such as coupons/promo codes (Use Coupon GO15O to get Rs 150 we present a Neural Factorization (NF) model for the task of Offer cash-back), combo offers (Buy 3 get 1 free, Buy 3 get Rs 300 off) recommendation. We represent users and offers with Knowledge and loyalty points. Graph Embeddings (KGE). Specifically, we model the available data Xia et al. [13] proposed an approach for the task of offer rec- in the form of a Knowledge Graph (KG) and learn embeddings for ommendation based on the features of users and offers. [13] used entities and relations using a standard KGE technique called TransE. tree-based methods namely Random forest [4] and Gradient boosted We also incorporate the user temporal features in the NF model trees [5] to handle this task. The features involve attributes of the using Long Short Term Memory (LSTM) with attention framework. users such as their email domain and OS systems and attributes of We experiment with Kaggle Acquire Valued Shoppers Challenge offers such as text description, discount amount as well as shopping dataset and show that the performance of our model is significantly history features such as recent discount information and number better than tree-based methods. of times the user visited the coupon before. We propose to use a Neural Factorization (NF) model to learn CCS CONCEPTS user-offer interactions. The user and offer representations are given as input to the NF model whose output is the probability of the • Information systems → Recommender systems; • Informa- offer being accepted by the user. We predict the probabilities of all tion Systems → Information retrieval; Recommender Systems; offers available at the given time and recommend the top k probable • Computing methodologies → Neural networks. offers to the customer. In this work, we explore different ways of representing users and offers. KEYWORDS In our first model, we represent users and offers with features Neural Factorization, Recommender Systems, Knowledge Graph extracted from the dataset. The user features contain the normalized Embedding, E-commerce offers count of items purchased in a month, the normalized count of items ACM Reference Format: purchased in each category, days since the last visit, the average Gourab Chowdhury, Madiraju Srilakshmi, Mainak Chain, and Sudeshna amount paid per visit etc. The offer features include category, brand Sarkar. 2019. Neural factorization for Offer Recommendation using Knowl- on which offer is given, amount of discount, minimum quantity of edge Graph Embeddings. In Proceedings of the SIGIR 2019 Workshop on purchase etc. eCommerce (SIGIR 2019 eCom), 6 pages. In our second model, we explore representing user and offers as embedding. For this, we construct a Knowledge Graph (KG) involv- 1 INTRODUCTION ing users, categories, brands, price values as nodes and belongs-to, Marketing and promotions are used to attract customers in the purchase and price as edges between them. We adopt a knowl- retail domain. Companies spend a lot of money to send promotional edge graph embedding technique called TransE [3] to generate offers or discounts to customers. It is therefore important to identify embeddings of users and offers. relevant offers that the users are likely to accept. In our third model, we capture the user sequential behaviour using Long Short Term Memory (LSTM) with attention framework. Copyright © 2019 by the paper’s authors. Copying permitted for private and academic The input to the model is the sequence of baskets purchased by the purposes. In: J. Degenhardt, S. Kallumadi, U. Porwal, A. Trotman (eds.): user and the output is the probabilities on all categories available. Proceedings of the SIGIR 2019 eCom workshop, July 2019, Paris, France, published at We incorporate this information as an additional input to the NF http://ceur-ws.org model. We experiment with Kaggle Acquire Valued Shoppers Challenge dataset which contains user offer interactions, user purchase history SIGIR 2019 eCom, July 2019, Paris, France Chowdhury, et al. Table 1: Overall data statistics interactions as positive samples while the negative samples are generated by random sampling. The final output layer predicts yuo , Statistics Acquire Valued Shoppers data the probability of user u accepting an offer o. The architecture is illustrated in Figure 2. # users 315411 The model is trained by minimizing the loss between predicted # brands 35689 value (yuo ˆ ) and the target value yuo . The target value represents # offers 39 the user action towards an offer. It is 1 when the user availed the # categories 836 offer and 0 otherwise. The loss function is defined as follows: # total transactions 349,655,789 Õ L= −(yuo log (yuo ˆ ) + (1 − yuo ) log (1 − yuo ˆ )) (1) (u,o)∈Y − ∪Y + and offer content information. We apply our models on this data where Y + denotes the set of positive interaction of users, offers and show that the NF based models achieves better performance and Y − denotes the negative instances (sampled from unobserved than tree-based methods. data). The user_ids and offer_ids are different in the train and test 2 PRELIMINARIES datasets. Therefore, we can't use the learned factors for the predic- In this section, we formally define the task of offer recommendation tion. We represent the user and offer by their content information and present the details of the data available to handle this task. and input them to the trained neural model and predict the probabil- ity of the user accepting the offer. Similarly, we find the probabilities 2.1 Problem Definition of all offers available at the given time. The offers are ranked based Offer Recommendation is the task of predicting the best offers on the probability values and top k offers are recommended to the for a given user. Let U = {u 1 , u 2 , ..., um } be the set of users and user. O = {o 1 , o 2 , o 3 ...on } be the set of offers. The task is to recommend top k offers to the users so that they are likely to include the next offer converted by the user. 2.2 Dataset In our work, we experiment with Kaggle Acquire Valued Shoppers Challenge dataset 1 . The dataset contains the transaction history of users from March 2012 to July 2013. A transaction consists of user_id, item_id and date. The set of items purchased by the same user in the same date are termed as a basket. The user-offer interactions are recorded from March 2013 to July 2013. A user-offer interaction consists of user_id, offer_id and date. Each user has availed exactly one offer in this period. Each offer Figure 1: Neural Factorization Model is specified by its category, brand, discount amount and minimum quantity. The overall data statistics are listed in Table 1. 3.2 Neural Factorization with features 3 NEURAL FACTORIZATION FOR OFFER (NF+features) RECOMMENDATION In this variation of our method, user and offer features obtained We use Neural Factorization (NF) model for the task of offer rec- from the dataset are given as input to the neural factorization model. ommendation. We have experimented with different methods of We extract user and offer features from user purchase history, user- representing users and offers. We first explain the basic neural offer interactions and offer content information. The following factorization framework and then introduce our methods. features are used to represent users and offers. 3.1 Architecture of Neural Factorization • User features: – Normalized count of purchase of each category Our framework is based on Neural Collaborative Filtering proposed – Number of visits in 30 days before the day of offer by He et al. [6]. The system [6] used Multi Layer Perceptron (MLP) – Days since the last visit (from the day of offer). for modelling user-item interactions which is able to capture the – Average items purchased per visit non-linear relations between users and items. – Average amount paid per visit In our model, the user vector (vu ) and offer vector (vo ) are given • Offer features: as input to two input layers. Each input layer is followed by a – Category on which offer is given dense layer. The output of these dense layers are concatenated and – Brand on which offer is given are given as input to a Neural network. We use past user-offer – Average price of the each brand-category combination 1 https://www.kaggle.com/c/acquire-valued-shoppers-challenge/data – Discount amount Neural factorization for Offer Recommendation using Knowledge Graph Embeddings SIGIR 2019 eCom, July 2019, Paris, France – Quantity to be purchased to avail discount Given a triplet of the form which indicates that the head – How cheap the product is compared to other products in entity (h) is connected to the tail entity (t) by the relationship (r), the same category (Amount). TransE [3] learns the embedding such that h + r ≈ t (Figure 3). The numeric value is used as input for the numeric features while TransE uses the following scoring function: the non-numeric features such as brand and category are one-hot fr (h, t) = −||h + r − t ||1/2 (2) encoded. Since these features have multiple possible values, the TransE obtains positive triplets from the graph and negative triplets feature dimensions become large. These features are also unable to capture indirect relationships between users and offers. 3.3 Neural Factorization with Knowledge Graph Embeddings (NF+KGE) The representation of users and offers plays a significant role in the effectiveness of a recommendation model. We wish to use a representation that is able to capture relevant knowledge of users, items, offers, the attributes of the above entities and the interactions between them. We propose to use Knowledge Graph Embedding (KGE) tech- Figure 3: Illustration of TransE (Figure is adapted from [12] niques to learn embeddings for users and offers. These techniques have been found to be effective in capturing complex and indirect by randomly corrupting the head or tail node of a positive triplet. relationships among entities in the Knowledge Graph (KG) and are The TransE training method minimizes the pairwise loss between proven to be successful in many applications such as link prediction the positive facts τ + = (h, r, t) and the negative facts τ − = (h ′, r ′, t ′ ). and recommendation systems etc [12]. The objective function is given as follows: We construct a Knowledge Graph (KG) based on user purchase Õ Õ history and offer content information. The nodes of the graph min max(0, γ − fr (h, t) + fr ′ (h ′, t ′ )) (3) τ + ∈D + τ − ∈D − are users, categories, brands and price range. We find the average price of items in a brand and categorize them into high and low. γ is the margin of separation, D + is the set of positive facts and D + We have 3 types of relations in our graph. The user nodes are is the set of negative facts. The steps to learn the embeddings are connected to category nodes by relation purchased, category nodes given in Algorithm 1. are connected to brand nodes by relation belongs_to and brand nodes are connected to price range nodes by relation price. The Algorithm 1: TransE method to learn embeddings for entities graph is formed as a set of triplets (h, r, t) i.e., head node (h) is and relations connected to tail node (t) by relationship (r ). An example graph Input: Training set T = (h, r, t) , Entity set E, Relation set R representation is shown in Figure 3. Randomly initialize the embeddings for e ∈ E,r ∈ R for (h, r, t) in T do // Corrupt the triplet by changing h or t and add it to T T ← T ∪ (h ′, r , t ′ ) Update embeddings w.r.t the loss L i.e. (h ′,r,t ′ ),(h,r,t )∈T ∆[γ + d(h + r, t) − d(h + r , t )] Í ′ ′ // Dissimilarity measure d can be either the L1 or the L2 norm end We input the learned graph embeddings to the NF model. The users are represented with the user embedding and offers are rep- Figure 2: Knowledge graph representation of available data resented with the combination of category embedding, brand em- bedding, discount amount and minimum quantity to be purchased The triplets generated for the example graph are as follows: T = to avail the offer. {(U1 , purchased, C 1 ), (U2 , purchased, C 1 ), (C 1 , belonдs_to, B 1 ), (C 2 , belonдs_to, B 1 ), (C 3 , belonдs_to, B 2 ), (B 1 , price, low), 3.4 Neural Factorization with temporal (B 2 , price, hiдh)}. features (NF+KGE+TF) We use a standard knowledge graph embedding method called The above Knowledge Graph Embeddings (KGE) is unable to cap- TransE [3] to learn the embeddings for all entities and relationships ture the sequential behaviour of users since the knowledge graph in the graph. We have chosen TransE because this method is simple does not represent the time stamp of the interactions or the sequen- and has been found to be efficient in modelling multi-relational tiality of the transactions. Therefore, we try to enhance our model data [12]. by incorporating a temporal component by considering the user SIGIR 2019 eCom, July 2019, Paris, France Chowdhury, et al. sequential purchase behaviour using a Long Short Term Memory Table 2: Train and Test data statistics (LSTM) with attention model. Since the offers are given on specific categories, we formulate # Users # Distinct Start End the task of predicting the next category to be purchased by the user. Offers Date Date We hypothesize that this information may help to identify suitable Train 160057 24 01-March-2013 30-April-2013 offers for a user. Test 151484 29 01-May-2013 31-July-2013 Each user has purchased a set of items per visit. The set of items can be termed as a basket. We consider the category of an item (category_id) instead of the exact item (item_id) in each basket. We The learned category probabilities are given as an auxiliary input give the sequence of baskets purchased by the user as input to the to the NF model. The rest of the architecture is similar and is shown LSTM and predict the probabilities of he categories purchased in the in Figure 5. next basket. The predicted category probabilities are incorporated as an additional input to the NF model. Let u = {b1 , b2 , . . . , bt } be the basket sequence of the target user. Each basket is the group of categories: bk = {c 1 , c 2 , . . . , c n }, where n is the size of the basket. We represent each category with the embedding learned from the Knowledge Graph discussed earlier. We use average pooling to represent the basket. This approach is similar to the basket prediction method proposed by Yu et al.[14]. The user sequence is now denoted as u = {v 1 , v 2 , . . . , vt }, where vk is the average of graph-based category embeddings in the basket bk . We input the user sequence till the current time t into the LSTM model. Let ht be the LSTM hidden unit and yt be the output at t-th time step. The hidden state ht of each interaction is updated by the previous hidden state ht −1 and the current basket embedding vt . Figure 5: Incorporating category preferences in NF ht = LSTM(ht −1 , vt ) (4) We apply attention on top of the LSTM layer to give weights to 4 EXPERIMENT the baskets at different time-steps. Let H = (h 1 , h 2 , . . . , ht ) be the output vectors that are produced by LSTM layer. They are inputs to 4.1 Train and testset details the attention layer and the weights for each time-step are learned The train and test split are considered as given in the dataset i.e., the i.e. A = (a 1 , a 2 , . . . , at ). The weighted sum of the hidden states first 2 months of user-offer interaction data are used as train and (M) is input into a dense layer (D) to find the scores of all categories. the rest of the 3 months as the test set. The train and test statistics We find their probabilities using the sigmoid activation function. are listed in Table 2. To train the Knowledge Graph Embeddings, we This architecture is illustrated in Figure 4. have used recent 5 months of user purchase history before offers are given to the users (01-October-2012 to 28-February-2013). This A = softmax(w T ∗ tanh(H )) (5) has been done on the assumption that the recent purchase history T M =A ∗H (6) reflects the current preferences of the user. The same data is used i for predicting user category preferences as well as feature creation. P (yt +1 | v 1≤i ≤t ) = sigmoid(D[M]) (7) 4.2 Evaluation Metrics and Experimental Setup 4.2.1 Evaluation Metrics. We use the following metrics to evaluate our model. • Recall@1, Recall@3 and Recall@5, where Recall@k is the proportion of cases when the offer accepted by the user is among the top-k recommended offers among all test cases. • Mean Reciprocal Rank@5 (MRR@5): It is the mean of recip- rocal ranks of correctly recommended offers. If the actual offer is not in the top-5 recommended offers, then the rank is set to zero. 4.2.2 Experimental Setup. We use the following parameters to learn knowledge graph embedding. The size of the embedding for all entities (user, brand, category, price range) is 100. We use Adam optimizer, and the learning rate is set to 0.001. The parameters for predicting user next preferred categories Figure 4: LSTM model for next category prediction using LSTM are as follows. We use one LSTM layer with 100 hidden Neural factorization for Offer Recommendation using Knowledge Graph Embeddings SIGIR 2019 eCom, July 2019, Paris, France Table 3: Performance comparison of our approach with base- 5.1 Offer Recommender Systems line methods(in %) There are very few works on offers and coupons recommendation systems. Method Recall@1 Recall@3 Recall@5 MRR@5 Xia et al. [13] approached the problem of offer recommendation RFC 4.93 11.43 22.49 9.23 in e-commerce domain. They used a private dataset consists of XGBoost 6.73 12.56 24.05 14.69 customer’s shopping trips, shopping trip counts, clicked coupons, and retailers that issued the coupon. The coupons are characterized NF+features 15.41 22.05 33.19 23.48 based on their textual descriptions and validity period. The authors NF+KGE 21.77 26.55 34.90 25.57 curated a number of features from the data to represent users and NF+KGE+TF 15.96 34.00 46.68 27.28 coupons and ranked the coupons based on scores generated by XGBoost [5] and Random Forest [4] algorithms. Similar work is proposed by Hu et al. [7] in the telecom domain. units. There is a dropout layer in between the LSTM layer and the Hu et al. [7] used Random Forest method [4] to provide telecom attention layer with 25% dropout. The learning rate is set to 0.001. offers to mobile users. The authors extracted user features such as The parameters for neural factorization models are as follows. age, gender, voice call duration, SMS count etc. from the customer The activation function for dense layers is Leaky Relu and the batch profile repository, and historical usage repository. These features size is set at 512. The learning rate is set to 0.0001. We applied batch are given as input to the Random Forest algorithm [4]. normalization at every layer. 4.3 Comparison of Models 5.2 Repeat Purchase Prediction Systems Repeat purchase prediction is the task given for Kaggle Acquire We compare the NF based models namely NF+features, NF+KGE Valued Shoppers Challenge. Since we have used the same dataset and NF+KGE+TF against XGBoost and Random Forest Classifier for our task of offer recommendation systems, we present a review (RFC) methods with features. of the work done on repeat purchase prediction. The standard item KNN method can't be applied to this dataset Anand et al. [2] proposed a prediction model based on a com- because each user in the dataset has availed exactly one offer. There bination of temporal and aggregate level models. They extracted is no possibility of finding similar offers to the offers that are previ- three types of features capturing different aspects of user behaviour ously availed by the user and recommend them. namely customer-based, product-based, customer-product interac- tions based. customer-based features include total visits made, total 4.4 Result and Analysis spend, the loyalty of the customer etc. product-based features in- The results for the two baseline methods and the three NF based clude the fraction of repeat customers for the offer-product etc. models discussed in this paper are presented in Table 1. It is evident customer-product interactions based features include the number of from the results that the NF based models outperform the XGboost visits, quantity bought, the amount spent etc. and RFC methods with features. To capture the aggregate level behaviour of the user, the above Neural factorization model with graph-based embedding (NF+KGE) features are computed over the entire transaction history. To cap- performs better than Neural factorization with features (NF+features). ture the temporal behaviour, the features are split and computed The final variation of our model with temporal features (NF+KGE+TF) over non-overlapping time windows. The authors used Long Short gives the significant improvement over all other models considered Term Memory (LSTM) model as the classifier for temporal features in terms of Recall@3 and Recall@5 and MRR@5. Neural factoriza- and quantile regression (QR) model as the classifier for aggregate tion model with graph-based embedding (NF+KGE) performs best level features. The two models are combined using a mixture of in terms of Recall@1. experts (ME). We hypothesize that the knowledge graph based embedding Nikulin et al. [10] used Random forest [4] and Gradient boosted is effective as it is able to use the connections between different trees [5] to predict the repeat purchase behaviour of the users. The entities and can therefore effectively capture indirect as well as authors curated a number of statistical features from the data and latent relationships. applied the above methods. However the limitation of the above model is that it fails to capture the temporal interactions of the user. Our third model (NF+KGE+TF) addresses this by using an additional input from the 5.3 Knowledge Graph Embedding based LSTM based on the user’s sequence of baskets. Recommender Systems In recent times, knowledge graph embeddings have been shown 5 RELATED WORK to be effective for recommendation systems. The basic idea is to Related work to our model is categorized into three parts. Sub- represent the available data in the form of a graph, learn embeddings section 5.1 reviews the work reported on offer recommendation for entities using Knowledge graph embedding methods [12] and systems. Subsection 5.2 reviews the methods used in a related task incorporate them into recommendation. of repeat purchase prediction methods. In subsection 5.2, we discuss [15] presents Collaborative Knowledge base Embedding (CKE) about Knowledge Graph Embedding (KGE) based recommendation which uses TransR [9] to learn the structural representations of systems. items which are combined with visual and textual embeddings. SIGIR 2019 eCom, July 2019, Paris, France Chowdhury, et al. Deep Knowledge-aware Network (DKN) [11] learns entity embed- ACM SIGIR conference on Research and Development in Information Retrieval, dings using TransD [8] and and designs a CNN framework by pages 729–732. ACM, 2016. [15] Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. Col- combining them with word embeddings for news recommendation. laborative knowledge base embedding for recommender systems. In Proceedings Ai et al. [1] learn embedding of users and items by the method of of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 353–362, 2016. TransE [3] and the recommendation is based on user-item similarity score in the projected space. 6 CONCLUSION In this paper, we have presented a neural factorization method for the task of offer recommendation with different representations of users and offers. We have shown that our models perform better than the tree-based methods. We have also shown that the learned graph-based user and offer embeddings capture deeper and indirect connections between users and offers, which helps to improve the quality of recommendation. The incorporation of temporal features involving transaction sequences improves the performance further in some cases. ACKNOWLEDGMENTS This research was supported by Capillary Technologies Interna- tional Pte Limited. We thank Dr. Subrat Panda and Mr. Jyotiska Bhattacharjee from Capillary Technologies International Pte Lim- ited who provided insight and expertise that greatly assisted the research. REFERENCES [1] Qingyao Ai, Vahid Azizi, Xu Chen, and Yongfeng Zhang. Learning heteroge- neous knowledge base embeddings for explainable recommendation. Algorithms, 11(9):137, 2018. [2] Gaurangi Anand, Auon H Kazmi, Pankaj Malhotra, Lovekesh Vig, Puneet Agar- wal, and Gautam Shroff. Deep temporal features to predict repeat buyers. In NIPS 2015 Workshop: Machine Learning for eCommerce, 2015. [3] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Ok- sana Yakhnenko. Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems, pages 2787–2795, 2013. [4] Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001. [5] Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001. [6] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web, pages 173–182. International World Wide Web Conferences Steering Committee, 2017. [7] Wan-Hsun Hu, Shih-Hsien Tang, Yin-Che Chen, Chia-Hsuan Yu, and Wen-Cheng Hsu. Promotion recommendation method and system based on random forest. In Proceedings of the 5th Multidisciplinary International Social Networks Conference, page 11. ACM, 2018. [8] Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 687–696, 2015. [9] Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. Learning entity and relation embeddings for knowledge graph completion. In Twenty-ninth AAAI conference on artificial intelligence, 2015. [10] Vladimir Nikulin. Prediction of the shoppers loyalty with aggregated data streams. Journal of Artificial Intelligence and Soft Computing Research, 6(2):69–79, 2016. [11] Hongwei Wang, Fuzheng Zhang, Xing Xie, and Minyi Guo. Dkn: Deep knowledge- aware network for news recommendation. arXiv preprint arXiv:1801.08284, 2018. [12] Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering, 29(12):2724–2743, 2017. [13] Yandi Xia, Giuseppe Di Fabbrizio, Shikhar Vaibhav, and Ankur Datta. A content- based recommender system for e-commerce offers and coupons. In eCOM@SIGIR, 2017. [14] Feng Yu, Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. A dynamic recurrent model for next basket recommendation. In Proceedings of the 39th International