CCS CONCEPTS

Neural factorization for Ofer Recommendation using Knowledge Graph Embeddings

Gourab Chowdhury

gourab@iitkgp.ac.in 1

Mainak Chain

mainakchain@iitkgp.ac.in 1

Neural Factorization, Recommender Systems, Knowledge Graph

Madiraju Srilakshmi

sreelakshmi@iitkgp.ac.in 1

Sudeshna Sarkar

sudeshna@cse.iitkgp.ernet.in 1 0 Embedding, E-commerce ofers 1 Indian Institute of Technology Kharagpur , West Bengal , India

2019

Companies send a lot of promotional ofers and coupons to customers to attract them to buy more. Ofer recommendation systems can help to identify relevant ofers to users. In this paper, we present a Neural Factorization (NF) model for the task of Ofer recommendation. We represent users and ofers with Knowledge Graph Embeddings (KGE). Specifically, we model the available data in the form of a Knowledge Graph (KG) and learn embeddings for entities and relations using a standard KGE technique called TransE. We also incorporate the user temporal features in the NF model using Long Short Term Memory (LSTM) with attention framework. We experiment with Kaggle Acquire Valued Shoppers Challenge dataset and show that the performance of our model is significantly better than tree-based methods.

CCS CONCEPTS

• Information systems → Recommender systems; • Information Systems → Information retrieval; Recommender Systems; • Computing methodologies → Neural networks.

INTRODUCTION

Marketing and promotions are used to attract customers in the retail domain. Companies spend a lot of money to send promotional offers or discounts to customers. It is therefore important to identify relevant offers that the users are likely to accept.

We consider the case in which an ofer is a flat discount for a given brand and category combination. An example of an ofer is "Rs 200 of on Polo T-shirts". There can be other types of ofers such as coupons/promo codes (Use Coupon GO15O to get Rs 150 cash-back), combo ofers (Buy 3 get 1 free, Buy 3 get Rs 300 of) and loyalty points.

Xia et al. [ 13 ] proposed an approach for the task of ofer recommendation based on the features of users and ofers. [ 13 ] used tree-based methods namely Random forest [ 4 ] and Gradient boosted trees [ 5 ] to handle this task. The features involve attributes of the users such as their email domain and OS systems and attributes of ofers such as text description, discount amount as well as shopping history features such as recent discount information and number of times the user visited the coupon before.

We propose to use a Neural Factorization (NF) model to learn user-ofer interactions. The user and ofer representations are given as input to the NF model whose output is the probability of the ofer being accepted by the user. We predict the probabilities of all ofers available at the given time and recommend the top k probable ofers to the customer. In this work, we explore diferent ways of representing users and ofers.

In our first model, we represent users and ofers with features extracted from the dataset. The user features contain the normalized count of items purchased in a month, the normalized count of items purchased in each category, days since the last visit, the average amount paid per visit etc. The ofer features include category, brand on which ofer is given, amount of discount, minimum quantity of purchase etc.

In our second model, we explore representing user and ofers as embedding. For this, we construct a Knowledge Graph (KG) involving users, categories, brands, price values as nodes and belongs-to, purchase and price as edges between them. We adopt a knowledge graph embedding technique called TransE [ 3 ] to generate embeddings of users and ofers.

In our third model, we capture the user sequential behaviour using Long Short Term Memory (LSTM) with attention framework. The input to the model is the sequence of baskets purchased by the user and the output is the probabilities on all categories available. We incorporate this information as an additional input to the NF model.

We experiment with Kaggle Acquire Valued Shoppers Challenge dataset which contains user ofer interactions, user purchase history and ofer content information. We apply our models on this data and show that the NF based models achieves better performance than tree-based methods. 2

PRELIMINARIES

In this section, we formally define the task of ofer recommendation and present the details of the data available to handle this task. 2.1

Problem Definition

Ofer Recommendation is the task of predicting the best ofers for a given user. Let U = {u1, u2, ..., um } be the set of users and O = {o1, o2, o3...on } be the set of ofers. The task is to recommend top k ofers to the users so that they are likely to include the next ofer converted by the user. 2.2

Dataset

In our work, we experiment with Kaggle Acquire Valued Shoppers Challenge dataset 1.

The dataset contains the transaction history of users from March 2012 to July 2013. A transaction consists of user_id, item_id and date. The set of items purchased by the same user in the same date are termed as a basket.

The user-ofer interactions are recorded from March 2013 to July 2013. A user-ofer interaction consists of user_id, ofer_id and date. Each user has availed exactly one ofer in this period. Each ofer is specified by its category, brand, discount amount and minimum quantity. The overall data statistics are listed in Table 1. 3

NEURAL FACTORIZATION FOR OFFER RECOMMENDATION

We use Neural Factorization (NF) model for the task of ofer recommendation. We have experimented with diferent methods of representing users and ofers. We first explain the basic neural factorization framework and then introduce our methods. 3.1

Architecture of Neural Factorization

Our framework is based on Neural Collaborative Filtering proposed by He et al. [ 6 ]. The system [ 6 ] used Multi Layer Perceptron (MLP) for modelling user-item interactions which is able to capture the non-linear relations between users and items.

In our model, the user vector (vu ) and ofer vector (vo ) are given as input to two input layers. Each input layer is followed by a dense layer. The output of these dense layers are concatenated and are given as input to a Neural network. We use past user-ofer 1https://www.kaggle.com/c/acquire-valued-shoppers-challenge/data (u,o)∈Y−∪Y+

L = interactions as positive samples while the negative samples are generated by random sampling. The final output layer predicts yuo , the probability of user u accepting an ofer o. The architecture is illustrated in Figure 2.

The model is trained by minimizing the loss between predicted value (yuˆo ) and the target value yuo . The target value represents the user action towards an ofer. It is 1 when the user availed the ofer and 0 otherwise. The loss function is defined as follows: Õ −(yuo log (yuˆo ) + (1 − yuo ) log (1 − yuˆo )) (1) where Y+ denotes the set of positive interaction of users, ofers and Y− denotes the negative instances (sampled from unobserved data).

The user_ids and ofer_ids are diferent in the train and test datasets. Therefore, we can't use the learned factors for the prediction. We represent the user and ofer by their content information and input them to the trained neural model and predict the probability of the user accepting the ofer. Similarly, we find the probabilities of all ofers available at the given time. The ofers are ranked based on the probability values and top k ofers are recommended to the user. – Quantity to be purchased to avail discount – How cheap the product is compared to other products in the same category (Amount).

The numeric value is used as input for the numeric features while the non-numeric features such as brand and category are one-hot encoded. Since these features have multiple possible values, the feature dimensions become large. These features are also unable to capture indirect relationships between users and ofers. 3.3

Neural Factorization with Knowledge Graph Embeddings (NF+KGE)

The representation of users and ofers plays a significant role in the efectiveness of a recommendation model. We wish to use a representation that is able to capture relevant knowledge of users, items, ofers, the attributes of the above entities and the interactions between them.

We propose to use Knowledge Graph Embedding (KGE) techniques to learn embeddings for users and ofers. These techniques have been found to be efective in capturing complex and indirect relationships among entities in the Knowledge Graph (KG) and are proven to be successful in many applications such as link prediction and recommendation systems etc [ 12 ].

We construct a Knowledge Graph (KG) based on user purchase history and ofer content information. The nodes of the graph are users, categories, brands and price range. We find the average price of items in a brand and categorize them into high and low. We have 3 types of relations in our graph. The user nodes are connected to category nodes by relation purchased, category nodes are connected to brand nodes by relation belongs_to and brand nodes are connected to price range nodes by relation price. The graph is formed as a set of triplets (h, r , t ) i.e., head node (h) is connected to tail node (t ) by relationship (r ). An example graph representation is shown in Figure 3.

The triplets generated for the example graph are as follows: T = {(U1, purchased, C1), (U2, purchased, C1), (C1, belonдs_to, B1), (C2, belonдs_to, B1), (C3, belonдs_to, B2), (B1, price, low), (B2, price, hiдh)}.

We use a standard knowledge graph embedding method called TransE [ 3 ] to learn the embeddings for all entities and relationships in the graph. We have chosen TransE because this method is simple and has been found to be eficient in modelling multi-relational data [ 12 ].

Given a triplet of the form <h,r,t> which indicates that the head entity (h) is connected to the tail entity (t) by the relationship (r), TransE [ 3 ] learns the embedding such that h + r ≈ t (Figure 3). TransE uses the following scoring function: fr (h, t ) = −||h + r − t ||1/2 (2) TransE obtains positive triplets from the graph and negative triplets The above Knowledge Graph Embeddings (KGE) is unable to capture the sequential behaviour of users since the knowledge graph does not represent the time stamp of the interactions or the sequentiality of the transactions. Therefore, we try to enhance our model by incorporating a temporal component by considering the user sequential purchase behaviour using a Long Short Term Memory (LSTM) with attention model.

Since the ofers are given on specific categories, we formulate the task of predicting the next category to be purchased by the user. We hypothesize that this information may help to identify suitable ofers for a user.

Each user has purchased a set of items per visit. The set of items can be termed as a basket. We consider the category of an item (category_id) instead of the exact item (item_id) in each basket. We give the sequence of baskets purchased by the user as input to the LSTM and predict the probabilities of he categories purchased in the next basket. The predicted category probabilities are incorporated as an additional input to the NF model.

Let u = {b1, b2, . . . , bt } be the basket sequence of the target user. Each basket is the group of categories: bk = {c1, c2, . . . , cn }, where n is the size of the basket. We represent each category with the embedding learned from the Knowledge Graph discussed earlier. We use average pooling to represent the basket. This approach is similar to the basket prediction method proposed by Yu et al.[ 14 ].

The user sequence is now denoted as u = {v1, v2, . . . , vt }, where vk is the average of graph-based category embeddings in the basket bk . We input the user sequence till the current time t into the LSTM model. Let ht be the LSTM hidden unit and yt be the output at t -th time step. The hidden state ht of each interaction is updated by the previous hidden state ht −1 and the current basket embedding vt .

ht = LSTM(ht −1, vt ) We apply attention on top of the LSTM layer to give weights to the baskets at diferent time-steps. Let H = (h1, h2, . . . , ht ) be the output vectors that are produced by LSTM layer. They are inputs to the attention layer and the weights for each time-step are learned i.e. A = (a1, a2, . . . , at ). The weighted sum of the hidden states (M) is input into a dense layer (D) to find the scores of all categories. We find their probabilities using the sigmoid activation function. This architecture is illustrated in Figure 4.

A = softmax(wT ∗ tanh(H ))

M = AT ∗ H P (yt +1 | v1i≤i ≤t ) = sigmoid(D[M]) (4) (5) (6) (7)

The learned category probabilities are given as an auxiliary input to the NF model. The rest of the architecture is similar and is shown in Figure 5. The train and test split are considered as given in the dataset i.e., the ifrst 2 months of user-ofer interaction data are used as train and the rest of the 3 months as the test set. The train and test statistics are listed in Table 2. To train the Knowledge Graph Embeddings, we have used recent 5 months of user purchase history before ofers are given to the users (01-October-2012 to 28-February-2013). This has been done on the assumption that the recent purchase history reflects the current preferences of the user. The same data is used for predicting user category preferences as well as feature creation. 4.2.2 Experimental Setup. We use the following parameters to learn knowledge graph embedding. The size of the embedding for all entities (user, brand, category, price range) is 100. We use Adam optimizer, and the learning rate is set to 0.001.

The parameters for predicting user next preferred categories using LSTM are as follows. We use one LSTM layer with 100 hidden 4.93 6.73 15.41 21.77 15.96 units. There is a dropout layer in between the LSTM layer and the attention layer with 25% dropout. The learning rate is set to 0.001.

The parameters for neural factorization models are as follows. The activation function for dense layers is Leaky Relu and the batch size is set at 512. The learning rate is set to 0.0001. We applied batch normalization at every layer. 4.3

Comparison of Models

We compare the NF based models namely NF+features, NF+KGE and NF+KGE+TF against XGBoost and Random Forest Classifier (RFC) methods with features.

The standard item KNN method can't be applied to this dataset because each user in the dataset has availed exactly one ofer. There is no possibility of finding similar ofers to the ofers that are previously availed by the user and recommend them. 4.4

Result and Analysis

The results for the two baseline methods and the three NF based models discussed in this paper are presented in Table 1. It is evident from the results that the NF based models outperform the XGboost and RFC methods with features.

Neural factorization model with graph-based embedding (NF+KGE) performs better than Neural factorization with features (NF+features).

The final variation of our model with temporal features (NF+KGE+TF) gives the significant improvement over all other models considered in terms of Recall@3 and Recall@5 and MRR@5. Neural factorization model with graph-based embedding (NF+KGE) performs best in terms of Recall@1.

We hypothesize that the knowledge graph based embedding is efective as it is able to use the connections between diferent entities and can therefore efectively capture indirect as well as latent relationships.

However the limitation of the above model is that it fails to capture the temporal interactions of the user. Our third model (NF+KGE+TF) addresses this by using an additional input from the LSTM based on the user’s sequence of baskets. 5

RELATED WORK

Related work to our model is categorized into three parts. Subsection 5.1 reviews the work reported on ofer recommendation systems. Subsection 5.2 reviews the methods used in a related task of repeat purchase prediction methods. In subsection 5.2, we discuss about Knowledge Graph Embedding (KGE) based recommendation systems. 5.1

Ofer Recommender Systems

There are very few works on ofers and coupons recommendation systems.

Xia et al. [ 13 ] approached the problem of ofer recommendation in e-commerce domain. They used a private dataset consists of customer’s shopping trips, shopping trip counts, clicked coupons, and retailers that issued the coupon. The coupons are characterized based on their textual descriptions and validity period. The authors curated a number of features from the data to represent users and coupons and ranked the coupons based on scores generated by XGBoost [ 5 ] and Random Forest [ 4 ] algorithms.

Similar work is proposed by Hu et al. [ 7 ] in the telecom domain. Hu et al. [ 7 ] used Random Forest method [ 4 ] to provide telecom ofers to mobile users. The authors extracted user features such as age, gender, voice call duration, SMS count etc. from the customer profile repository, and historical usage repository. These features are given as input to the Random Forest algorithm [ 4 ]. 5.2

Repeat Purchase Prediction Systems

Repeat purchase prediction is the task given for Kaggle Acquire Valued Shoppers Challenge. Since we have used the same dataset for our task of ofer recommendation systems, we present a review of the work done on repeat purchase prediction.

Anand et al. [ 2 ] proposed a prediction model based on a combination of temporal and aggregate level models. They extracted three types of features capturing diferent aspects of user behaviour namely customer-based, product-based, customer-product interactions based. customer-based features include total visits made, total spend, the loyalty of the customer etc. product-based features include the fraction of repeat customers for the ofer-product etc. customer-product interactions based features include the number of visits, quantity bought, the amount spent etc.

To capture the aggregate level behaviour of the user, the above features are computed over the entire transaction history. To capture the temporal behaviour, the features are split and computed over non-overlapping time windows. The authors used Long Short Term Memory (LSTM) model as the classifier for temporal features and quantile regression (QR) model as the classifier for aggregate level features. The two models are combined using a mixture of experts (ME).

Nikulin et al. [ 10 ] used Random forest [ 4 ] and Gradient boosted trees [ 5 ] to predict the repeat purchase behaviour of the users. The authors curated a number of statistical features from the data and applied the above methods. 5.3

Knowledge Graph Embedding based Recommender Systems

In recent times, knowledge graph embeddings have been shown to be efective for recommendation systems. The basic idea is to represent the available data in the form of a graph, learn embeddings for entities using Knowledge graph embedding methods [ 12 ] and incorporate them into recommendation.

[ 15 ] presents Collaborative Knowledge base Embedding (CKE) which uses TransR [ 9 ] to learn the structural representations of items which are combined with visual and textual embeddings. Deep Knowledge-aware Network (DKN) [ 11 ] learns entity embeddings using TransD [ 8 ] and and designs a CNN framework by combining them with word embeddings for news recommendation. Ai et al. [ 1 ] learn embedding of users and items by the method of TransE [ 3 ] and the recommendation is based on user-item similarity score in the projected space. 6

CONCLUSION

In this paper, we have presented a neural factorization method for the task of ofer recommendation with diferent representations of users and ofers. We have shown that our models perform better than the tree-based methods. We have also shown that the learned graph-based user and ofer embeddings capture deeper and indirect connections between users and ofers, which helps to improve the quality of recommendation. The incorporation of temporal features involving transaction sequences improves the performance further in some cases.

ACKNOWLEDGMENTS

This research was supported by Capillary Technologies International Pte Limited. We thank Dr. Subrat Panda and Mr. Jyotiska Bhattacharjee from Capillary Technologies International Pte Limited who provided insight and expertise that greatly assisted the research.

[1]

Qingyao

Ai , Vahid Azizi, Xu Chen,

and Yongfeng

Zhang . Learning heterogeneous knowledge base embeddings for explainable recommendation . Algorithms , 11 ( 9 ): 137 , 2018 .

[2]

Gaurangi

Anand , Auon H Kazmi, Pankaj Malhotra, Lovekesh Vig, Puneet Agarwal, and

Gautam

Shrof . Deep temporal features to predict repeat buyers . In NIPS 2015 Workshop: Machine Learning for eCommerce , 2015 .

[3]

Antoine

Bordes , Nicolas Usunier, Alberto Garcia-Duran,

Jason

Weston , and

Oksana

Yakhnenko . Translating embeddings for modeling multi-relational data . In Advances in neural information processing systems , pages 2787 - 2795 , 2013 .

[4]

Leo

Breiman . Random forests . Machine learning , 45 ( 1 ): 5 - 32 , 2001 .

[5] Jerome

Friedman . Greedy function approximation: a gradient boosting machine . Annals of statistics , pages 1189 - 1232 , 2001 .

[6]

Xiangnan

He , Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua . Neural collaborative filtering . In Proceedings of the 26th International Conference on World Wide Web , pages 173 - 182 . International World Wide Web Conferences Steering Committee, 2017 .

[7] Wan-Hsun

, Shih-Hsien

Tang

, Yin-Che

Chen

, Chia-Hsuan Yu , and Wen-Cheng Hsu. Promotion recommendation method and system based on random forest . In Proceedings of the 5th Multidisciplinary International Social Networks Conference, page 11. ACM , 2018 .

[8]

Guoliang

Ji , Shizhu He, Liheng Xu, Kang Liu, and

Jun

Zhao . Knowledge graph embedding via dynamic mapping matrix . In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) , pages 687 - 696 , 2015 .

[9]

Yankai

Lin , Zhiyuan Liu, Maosong Sun, Yang Liu, and

Xuan

Zhu . Learning entity and relation embeddings for knowledge graph completion . In Twenty-ninth AAAI conference on artificial intelligence , 2015 .

[10]

Vladimir

Nikulin . Prediction of the shoppers loyalty with aggregated data streams . Journal of Artificial Intelligence and Soft Computing Research , 6 ( 2 ): 69 - 79 , 2016 .

[11] Hongwei

Wang

, Fuzheng Zhang, Xing Xie, and

Minyi

Guo . Dkn: Deep knowledgeaware network for news recommendation . arXiv preprint arXiv:1801.08284 , 2018 .

[12] Quan

Wang

, Zhendong Mao,

Bin

Wang ,

and Li

Guo . Knowledge graph embedding: A survey of approaches and applications . IEEE Transactions on Knowledge and Data Engineering , 29 ( 12 ): 2724 - 2743 , 2017 .

[13] Yandi

Xia

, Giuseppe Di Fabbrizio, Shikhar Vaibhav, and

Ankur

Datta . A contentbased recommender system for e-commerce ofers and coupons . In eCOM@SIGIR , 2017 .

[14] Feng

, Qiang Liu, Shu Wu,

Liang

Wang , and

Tieniu

Tan . A dynamic recurrent model for next basket recommendation . In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval , pages 729 - 732 . ACM, 2016 .

[15] Fuzheng

Zhang

, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma . Collaborative knowledge base embedding for recommender systems . In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages 353 - 362 , 2016 .