=Paper= {{Paper |id=Vol-2410/paper38.pdf |storemode=property |title=Neural Factorization for Offer Recommendation using Knowledge Graph Embeddings |pdfUrl=https://ceur-ws.org/Vol-2410/paper38.pdf |volume=Vol-2410 |authors=Gourab Chowdhury,Madiraju Srilakshmi,Mainak Chain,Sudeshna Sarkar |dblpUrl=https://dblp.org/rec/conf/sigir/ChowdhurySCS19 }} ==Neural Factorization for Offer Recommendation using Knowledge Graph Embeddings== https://ceur-ws.org/Vol-2410/paper38.pdf
              Neural factorization for Offer Recommendation using
                         Knowledge Graph Embeddings
                            Gourab Chowdhury                                                           Madiraju Srilakshmi
               Indian Institute of Technology Kharagpur                                      Indian Institute of Technology Kharagpur
                           West Bengal, India                                                            West Bengal, India
                          gourab@iitkgp.ac.in                                                        sreelakshmi@iitkgp.ac.in

                                 Mainak Chain                                                             Sudeshna Sarkar
               Indian Institute of Technology Kharagpur                                      Indian Institute of Technology Kharagpur
                           West Bengal, India                                                            West Bengal, India
                       mainakchain@iitkgp.ac.in                                                    sudeshna@cse.iitkgp.ernet.in

ABSTRACT                                                                                 We consider the case in which an offer is a flat discount for a
Companies send a lot of promotional offers and coupons to cus-                        given brand and category combination. An example of an offer is
tomers to attract them to buy more. Offer recommendation sys-                         "Rs 200 off on Polo T-shirts". There can be other types of offers
tems can help to identify relevant offers to users. In this paper,                    such as coupons/promo codes (Use Coupon GO15O to get Rs 150
we present a Neural Factorization (NF) model for the task of Offer                    cash-back), combo offers (Buy 3 get 1 free, Buy 3 get Rs 300 off)
recommendation. We represent users and offers with Knowledge                          and loyalty points.
Graph Embeddings (KGE). Specifically, we model the available data                        Xia et al. [13] proposed an approach for the task of offer rec-
in the form of a Knowledge Graph (KG) and learn embeddings for                        ommendation based on the features of users and offers. [13] used
entities and relations using a standard KGE technique called TransE.                  tree-based methods namely Random forest [4] and Gradient boosted
We also incorporate the user temporal features in the NF model                        trees [5] to handle this task. The features involve attributes of the
using Long Short Term Memory (LSTM) with attention framework.                         users such as their email domain and OS systems and attributes of
We experiment with Kaggle Acquire Valued Shoppers Challenge                           offers such as text description, discount amount as well as shopping
dataset and show that the performance of our model is significantly                   history features such as recent discount information and number
better than tree-based methods.                                                       of times the user visited the coupon before.
                                                                                         We propose to use a Neural Factorization (NF) model to learn
CCS CONCEPTS                                                                          user-offer interactions. The user and offer representations are given
                                                                                      as input to the NF model whose output is the probability of the
• Information systems → Recommender systems; • Informa-
                                                                                      offer being accepted by the user. We predict the probabilities of all
tion Systems → Information retrieval; Recommender Systems;
                                                                                      offers available at the given time and recommend the top k probable
• Computing methodologies → Neural networks.
                                                                                      offers to the customer. In this work, we explore different ways of
                                                                                      representing users and offers.
KEYWORDS
                                                                                         In our first model, we represent users and offers with features
Neural Factorization, Recommender Systems, Knowledge Graph                            extracted from the dataset. The user features contain the normalized
Embedding, E-commerce offers                                                          count of items purchased in a month, the normalized count of items
ACM Reference Format:                                                                 purchased in each category, days since the last visit, the average
Gourab Chowdhury, Madiraju Srilakshmi, Mainak Chain, and Sudeshna                     amount paid per visit etc. The offer features include category, brand
Sarkar. 2019. Neural factorization for Offer Recommendation using Knowl-              on which offer is given, amount of discount, minimum quantity of
edge Graph Embeddings. In Proceedings of the SIGIR 2019 Workshop on                   purchase etc.
eCommerce (SIGIR 2019 eCom), 6 pages.                                                    In our second model, we explore representing user and offers as
                                                                                      embedding. For this, we construct a Knowledge Graph (KG) involv-
1 INTRODUCTION                                                                        ing users, categories, brands, price values as nodes and belongs-to,
Marketing and promotions are used to attract customers in the                         purchase and price as edges between them. We adopt a knowl-
retail domain. Companies spend a lot of money to send promotional                     edge graph embedding technique called TransE [3] to generate
offers or discounts to customers. It is therefore important to identify               embeddings of users and offers.
relevant offers that the users are likely to accept.                                     In our third model, we capture the user sequential behaviour
                                                                                      using Long Short Term Memory (LSTM) with attention framework.
Copyright © 2019 by the paper’s authors. Copying permitted for private and academic   The input to the model is the sequence of baskets purchased by the
purposes.
In: J. Degenhardt, S. Kallumadi, U. Porwal, A. Trotman (eds.):                        user and the output is the probabilities on all categories available.
Proceedings of the SIGIR 2019 eCom workshop, July 2019, Paris, France, published at   We incorporate this information as an additional input to the NF
http://ceur-ws.org                                                                    model.
                                                                                         We experiment with Kaggle Acquire Valued Shoppers Challenge
                                                                                      dataset which contains user offer interactions, user purchase history
SIGIR 2019 eCom, July 2019, Paris, France                                                                                         Chowdhury, et al.

                     Table 1: Overall data statistics                         interactions as positive samples while the negative samples are
                                                                              generated by random sampling. The final output layer predicts yuo ,
        Statistics                Acquire Valued Shoppers data                the probability of user u accepting an offer o. The architecture is
                                                                              illustrated in Figure 2.
        # users                   315411
                                                                                  The model is trained by minimizing the loss between predicted
        # brands                  35689
                                                                              value (yuo
                                                                                       ˆ ) and the target value yuo . The target value represents
        # offers                  39
                                                                              the user action towards an offer. It is 1 when the user availed the
        # categories              836
                                                                              offer and 0 otherwise. The loss function is defined as follows:
        # total transactions      349,655,789                                              Õ
                                                                                  L=               −(yuo log (yuo
                                                                                                               ˆ ) + (1 − yuo ) log (1 − yuo
                                                                                                                                          ˆ )) (1)
                                                                                    (u,o)∈Y − ∪Y +
and offer content information. We apply our models on this data               where Y + denotes the set of positive interaction of users, offers
and show that the NF based models achieves better performance                 and Y − denotes the negative instances (sampled from unobserved
than tree-based methods.                                                      data).
                                                                                 The user_ids and offer_ids are different in the train and test
2     PRELIMINARIES                                                           datasets. Therefore, we can't use the learned factors for the predic-
In this section, we formally define the task of offer recommendation          tion. We represent the user and offer by their content information
and present the details of the data available to handle this task.            and input them to the trained neural model and predict the probabil-
                                                                              ity of the user accepting the offer. Similarly, we find the probabilities
2.1     Problem Definition                                                    of all offers available at the given time. The offers are ranked based
Offer Recommendation is the task of predicting the best offers                on the probability values and top k offers are recommended to the
for a given user. Let U = {u 1 , u 2 , ..., um } be the set of users and      user.
O = {o 1 , o 2 , o 3 ...on } be the set of offers. The task is to recommend
top k offers to the users so that they are likely to include the next
offer converted by the user.

2.2     Dataset
In our work, we experiment with Kaggle Acquire Valued Shoppers
Challenge dataset 1 .
    The dataset contains the transaction history of users from March
2012 to July 2013. A transaction consists of user_id, item_id and
date. The set of items purchased by the same user in the same date
are termed as a basket.
    The user-offer interactions are recorded from March 2013 to July
2013. A user-offer interaction consists of user_id, offer_id and date.
Each user has availed exactly one offer in this period. Each offer                         Figure 1: Neural Factorization Model
is specified by its category, brand, discount amount and minimum
quantity. The overall data statistics are listed in Table 1.
                                                                              3.2    Neural Factorization with features
3     NEURAL FACTORIZATION FOR OFFER
                                                                                     (NF+features)
      RECOMMENDATION
                                                                              In this variation of our method, user and offer features obtained
We use Neural Factorization (NF) model for the task of offer rec-
                                                                              from the dataset are given as input to the neural factorization model.
ommendation. We have experimented with different methods of
                                                                              We extract user and offer features from user purchase history, user-
representing users and offers. We first explain the basic neural
                                                                              offer interactions and offer content information. The following
factorization framework and then introduce our methods.
                                                                              features are used to represent users and offers.
3.1     Architecture of Neural Factorization                                       • User features:
                                                                                     – Normalized count of purchase of each category
Our framework is based on Neural Collaborative Filtering proposed
                                                                                     – Number of visits in 30 days before the day of offer
by He et al. [6]. The system [6] used Multi Layer Perceptron (MLP)
                                                                                     – Days since the last visit (from the day of offer).
for modelling user-item interactions which is able to capture the
                                                                                     – Average items purchased per visit
non-linear relations between users and items.
                                                                                     – Average amount paid per visit
   In our model, the user vector (vu ) and offer vector (vo ) are given
                                                                                   • Offer features:
as input to two input layers. Each input layer is followed by a
                                                                                     – Category on which offer is given
dense layer. The output of these dense layers are concatenated and
                                                                                     – Brand on which offer is given
are given as input to a Neural network. We use past user-offer
                                                                                     – Average price of the each brand-category combination
1 https://www.kaggle.com/c/acquire-valued-shoppers-challenge/data                    – Discount amount
Neural factorization for Offer Recommendation using Knowledge Graph Embeddings                            SIGIR 2019 eCom, July 2019, Paris, France


       – Quantity to be purchased to avail discount                            Given a triplet of the form  which indicates that the head
       – How cheap the product is compared to other products in              entity (h) is connected to the tail entity (t) by the relationship (r),
         the same category (Amount).                                         TransE [3] learns the embedding such that h + r ≈ t (Figure 3).
   The numeric value is used as input for the numeric features while         TransE uses the following scoring function:
the non-numeric features such as brand and category are one-hot                                      fr (h, t) = −||h + r − t ||1/2                       (2)
encoded. Since these features have multiple possible values, the
                                                                             TransE obtains positive triplets from the graph and negative triplets
feature dimensions become large. These features are also unable to
capture indirect relationships between users and offers.

3.3    Neural Factorization with Knowledge
       Graph Embeddings (NF+KGE)
The representation of users and offers plays a significant role in
the effectiveness of a recommendation model. We wish to use a
representation that is able to capture relevant knowledge of users,
items, offers, the attributes of the above entities and the interactions
between them.
   We propose to use Knowledge Graph Embedding (KGE) tech-                   Figure 3: Illustration of TransE (Figure is adapted from [12]
niques to learn embeddings for users and offers. These techniques
have been found to be effective in capturing complex and indirect            by randomly corrupting the head or tail node of a positive triplet.
relationships among entities in the Knowledge Graph (KG) and are             The TransE training method minimizes the pairwise loss between
proven to be successful in many applications such as link prediction         the positive facts τ + = (h, r, t) and the negative facts τ − = (h ′, r ′, t ′ ).
and recommendation systems etc [12].                                         The objective function is given as follows:
   We construct a Knowledge Graph (KG) based on user purchase                                Õ Õ
history and offer content information. The nodes of the graph                         min                  max(0, γ − fr (h, t) + fr ′ (h ′, t ′ ))     (3)
                                                                                            τ + ∈D + τ − ∈D −
are users, categories, brands and price range. We find the average
price of items in a brand and categorize them into high and low.             γ is the margin of separation, D + is the set of positive facts and D +
We have 3 types of relations in our graph. The user nodes are                is the set of negative facts. The steps to learn the embeddings are
connected to category nodes by relation purchased, category nodes            given in Algorithm 1.
are connected to brand nodes by relation belongs_to and brand
nodes are connected to price range nodes by relation price. The               Algorithm 1: TransE method to learn embeddings for entities
graph is formed as a set of triplets (h, r, t) i.e., head node (h) is         and relations
connected to tail node (t) by relationship (r ). An example graph              Input: Training set T = (h, r, t) , Entity set E, Relation set R
representation is shown in Figure 3.                                           Randomly initialize the embeddings for e ∈ E,r ∈ R
                                                                               for (h, r, t) in T do
                                                                                   // Corrupt the triplet by changing h or t and add it to T
                                                                                   T ← T ∪ (h ′, r , t ′ )
                                                                                   Update embeddings w.r.t the loss L i.e.
                                                                                      (h ′,r,t ′ ),(h,r,t )∈T ∆[γ + d(h + r, t) − d(h + r , t )]
                                                                                   Í                                                 ′       ′
                                                                                   // Dissimilarity measure d can be either the L1 or the L2
                                                                                     norm
                                                                               end


                                                                                We input the learned graph embeddings to the NF model. The
                                                                             users are represented with the user embedding and offers are rep-
Figure 2: Knowledge graph representation of available data                   resented with the combination of category embedding, brand em-
                                                                             bedding, discount amount and minimum quantity to be purchased
    The triplets generated for the example graph are as follows: T =         to avail the offer.
{(U1 , purchased, C 1 ), (U2 , purchased, C 1 ), (C 1 , belonдs_to, B 1 ),
(C 2 , belonдs_to, B 1 ), (C 3 , belonдs_to, B 2 ), (B 1 , price, low),
                                                                             3.4     Neural Factorization with temporal
(B 2 , price, hiдh)}.                                                                features (NF+KGE+TF)
    We use a standard knowledge graph embedding method called                The above Knowledge Graph Embeddings (KGE) is unable to cap-
TransE [3] to learn the embeddings for all entities and relationships        ture the sequential behaviour of users since the knowledge graph
in the graph. We have chosen TransE because this method is simple            does not represent the time stamp of the interactions or the sequen-
and has been found to be efficient in modelling multi-relational             tiality of the transactions. Therefore, we try to enhance our model
data [12].                                                                   by incorporating a temporal component by considering the user
SIGIR 2019 eCom, July 2019, Paris, France                                                                                            Chowdhury, et al.


sequential purchase behaviour using a Long Short Term Memory                                  Table 2: Train and Test data statistics
(LSTM) with attention model.
    Since the offers are given on specific categories, we formulate                         # Users    # Distinct         Start              End
the task of predicting the next category to be purchased by the user.                                    Offers           Date               Date
We hypothesize that this information may help to identify suitable
                                                                                   Train    160057         24       01-March-2013       30-April-2013
offers for a user.
                                                                                   Test     151484         29        01-May-2013        31-July-2013
    Each user has purchased a set of items per visit. The set of items
can be termed as a basket. We consider the category of an item
(category_id) instead of the exact item (item_id) in each basket. We
                                                                                    The learned category probabilities are given as an auxiliary input
give the sequence of baskets purchased by the user as input to the
                                                                                 to the NF model. The rest of the architecture is similar and is shown
LSTM and predict the probabilities of he categories purchased in the
                                                                                 in Figure 5.
next basket. The predicted category probabilities are incorporated
as an additional input to the NF model.
    Let u = {b1 , b2 , . . . , bt } be the basket sequence of the target user.
Each basket is the group of categories: bk = {c 1 , c 2 , . . . , c n }, where
n is the size of the basket. We represent each category with the
embedding learned from the Knowledge Graph discussed earlier.
We use average pooling to represent the basket. This approach is
similar to the basket prediction method proposed by Yu et al.[14].
    The user sequence is now denoted as u = {v 1 , v 2 , . . . , vt }, where
vk is the average of graph-based category embeddings in the basket
bk . We input the user sequence till the current time t into the LSTM
model. Let ht be the LSTM hidden unit and yt be the output at t-th
time step. The hidden state ht of each interaction is updated by the
previous hidden state ht −1 and the current basket embedding vt .                      Figure 5: Incorporating category preferences in NF
                          ht = LSTM(ht −1 , vt )                          (4)
We apply attention on top of the LSTM layer to give weights to                   4 EXPERIMENT
the baskets at different time-steps. Let H = (h 1 , h 2 , . . . , ht ) be the
output vectors that are produced by LSTM layer. They are inputs to               4.1 Train and testset details
the attention layer and the weights for each time-step are learned               The train and test split are considered as given in the dataset i.e., the
i.e. A = (a 1 , a 2 , . . . , at ). The weighted sum of the hidden states        first 2 months of user-offer interaction data are used as train and
(M) is input into a dense layer (D) to find the scores of all categories.        the rest of the 3 months as the test set. The train and test statistics
We find their probabilities using the sigmoid activation function.               are listed in Table 2. To train the Knowledge Graph Embeddings, we
This architecture is illustrated in Figure 4.                                    have used recent 5 months of user purchase history before offers
                                                                                 are given to the users (01-October-2012 to 28-February-2013). This
                          A = softmax(w T ∗ tanh(H ))                     (5)    has been done on the assumption that the recent purchase history
                                                    T
                                            M =A ∗H                       (6)    reflects the current preferences of the user. The same data is used
                               i                                                 for predicting user category preferences as well as feature creation.
                  P (yt +1 | v 1≤i ≤t ) = sigmoid(D[M])                   (7)
                                                                                 4.2    Evaluation Metrics and Experimental Setup
                                                                                 4.2.1 Evaluation Metrics. We use the following metrics to evaluate
                                                                                 our model.
                                                                                      • Recall@1, Recall@3 and Recall@5, where Recall@k is the
                                                                                        proportion of cases when the offer accepted by the user is
                                                                                        among the top-k recommended offers among all test cases.
                                                                                      • Mean Reciprocal Rank@5 (MRR@5): It is the mean of recip-
                                                                                        rocal ranks of correctly recommended offers. If the actual
                                                                                        offer is not in the top-5 recommended offers, then the rank
                                                                                        is set to zero.
                                                                                 4.2.2 Experimental Setup. We use the following parameters to
                                                                                 learn knowledge graph embedding. The size of the embedding for
                                                                                 all entities (user, brand, category, price range) is 100. We use Adam
                                                                                 optimizer, and the learning rate is set to 0.001.
                                                                                     The parameters for predicting user next preferred categories
     Figure 4: LSTM model for next category prediction                           using LSTM are as follows. We use one LSTM layer with 100 hidden
Neural factorization for Offer Recommendation using Knowledge Graph Embeddings                      SIGIR 2019 eCom, July 2019, Paris, France

Table 3: Performance comparison of our approach with base-                  5.1    Offer Recommender Systems
line methods(in %)
                                                                            There are very few works on offers and coupons recommendation
                                                                            systems.
    Method          Recall@1     Recall@3     Recall@5      MRR@5              Xia et al. [13] approached the problem of offer recommendation
    RFC                4.93         11.43        22.49        9.23          in e-commerce domain. They used a private dataset consists of
    XGBoost            6.73         12.56        24.05        14.69         customer’s shopping trips, shopping trip counts, clicked coupons,
                                                                            and retailers that issued the coupon. The coupons are characterized
    NF+features       15.41        22.05         33.19       23.48          based on their textual descriptions and validity period. The authors
    NF+KGE            21.77        26.55         34.90       25.57          curated a number of features from the data to represent users and
    NF+KGE+TF         15.96        34.00         46.68       27.28          coupons and ranked the coupons based on scores generated by
                                                                            XGBoost [5] and Random Forest [4] algorithms.
                                                                               Similar work is proposed by Hu et al. [7] in the telecom domain.
units. There is a dropout layer in between the LSTM layer and the           Hu et al. [7] used Random Forest method [4] to provide telecom
attention layer with 25% dropout. The learning rate is set to 0.001.        offers to mobile users. The authors extracted user features such as
   The parameters for neural factorization models are as follows.           age, gender, voice call duration, SMS count etc. from the customer
The activation function for dense layers is Leaky Relu and the batch        profile repository, and historical usage repository. These features
size is set at 512. The learning rate is set to 0.0001. We applied batch    are given as input to the Random Forest algorithm [4].
normalization at every layer.

4.3    Comparison of Models                                                 5.2    Repeat Purchase Prediction Systems
                                                                       Repeat purchase prediction is the task given for Kaggle Acquire
We compare the NF based models namely NF+features, NF+KGE
                                                                       Valued Shoppers Challenge. Since we have used the same dataset
and NF+KGE+TF against XGBoost and Random Forest Classifier
                                                                       for our task of offer recommendation systems, we present a review
(RFC) methods with features.
                                                                       of the work done on repeat purchase prediction.
   The standard item KNN method can't be applied to this dataset
                                                                          Anand et al. [2] proposed a prediction model based on a com-
because each user in the dataset has availed exactly one offer. There
                                                                       bination of temporal and aggregate level models. They extracted
is no possibility of finding similar offers to the offers that are previ-
                                                                       three types of features capturing different aspects of user behaviour
ously availed by the user and recommend them.
                                                                       namely customer-based, product-based, customer-product interac-
                                                                       tions based. customer-based features include total visits made, total
4.4 Result and Analysis
                                                                       spend, the loyalty of the customer etc. product-based features in-
The results for the two baseline methods and the three NF based        clude the fraction of repeat customers for the offer-product etc.
models discussed in this paper are presented in Table 1. It is evident customer-product interactions based features include the number of
from the results that the NF based models outperform the XGboost       visits, quantity bought, the amount spent etc.
and RFC methods with features.                                            To capture the aggregate level behaviour of the user, the above
   Neural factorization model with graph-based embedding (NF+KGE)      features are computed over the entire transaction history. To cap-
performs better than Neural factorization with features (NF+features). ture the temporal behaviour, the features are split and computed
   The final variation of our model with temporal features (NF+KGE+TF) over non-overlapping time windows. The authors used Long Short
gives the significant improvement over all other models considered     Term Memory (LSTM) model as the classifier for temporal features
in terms of Recall@3 and Recall@5 and MRR@5. Neural factoriza-         and quantile regression (QR) model as the classifier for aggregate
tion model with graph-based embedding (NF+KGE) performs best           level features. The two models are combined using a mixture of
in terms of Recall@1.                                                  experts (ME).
   We hypothesize that the knowledge graph based embedding                Nikulin et al. [10] used Random forest [4] and Gradient boosted
is effective as it is able to use the connections between different    trees [5] to predict the repeat purchase behaviour of the users. The
entities and can therefore effectively capture indirect as well as     authors curated a number of statistical features from the data and
latent relationships.                                                  applied the above methods.
   However the limitation of the above model is that it fails to
capture the temporal interactions of the user. Our third model
(NF+KGE+TF) addresses this by using an additional input from the       5.3 Knowledge Graph Embedding based
LSTM based on the user’s sequence of baskets.                                  Recommender Systems
                                                                            In recent times, knowledge graph embeddings have been shown
5     RELATED WORK                                                          to be effective for recommendation systems. The basic idea is to
Related work to our model is categorized into three parts. Sub-             represent the available data in the form of a graph, learn embeddings
section 5.1 reviews the work reported on offer recommendation               for entities using Knowledge graph embedding methods [12] and
systems. Subsection 5.2 reviews the methods used in a related task          incorporate them into recommendation.
of repeat purchase prediction methods. In subsection 5.2, we discuss           [15] presents Collaborative Knowledge base Embedding (CKE)
about Knowledge Graph Embedding (KGE) based recommendation                  which uses TransR [9] to learn the structural representations of
systems.                                                                    items which are combined with visual and textual embeddings.
SIGIR 2019 eCom, July 2019, Paris, France                                                                                                            Chowdhury, et al.


Deep Knowledge-aware Network (DKN) [11] learns entity embed-                                   ACM SIGIR conference on Research and Development in Information Retrieval,
dings using TransD [8] and and designs a CNN framework by                                      pages 729–732. ACM, 2016.
                                                                                          [15] Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. Col-
combining them with word embeddings for news recommendation.                                   laborative knowledge base embedding for recommender systems. In Proceedings
Ai et al. [1] learn embedding of users and items by the method of                              of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and
                                                                                               Data Mining, pages 353–362, 2016.
TransE [3] and the recommendation is based on user-item similarity
score in the projected space.

6    CONCLUSION
In this paper, we have presented a neural factorization method for
the task of offer recommendation with different representations of
users and offers. We have shown that our models perform better
than the tree-based methods. We have also shown that the learned
graph-based user and offer embeddings capture deeper and indirect
connections between users and offers, which helps to improve the
quality of recommendation. The incorporation of temporal features
involving transaction sequences improves the performance further
in some cases.

ACKNOWLEDGMENTS
This research was supported by Capillary Technologies Interna-
tional Pte Limited. We thank Dr. Subrat Panda and Mr. Jyotiska
Bhattacharjee from Capillary Technologies International Pte Lim-
ited who provided insight and expertise that greatly assisted the
research.

REFERENCES
 [1] Qingyao Ai, Vahid Azizi, Xu Chen, and Yongfeng Zhang. Learning heteroge-
     neous knowledge base embeddings for explainable recommendation. Algorithms,
     11(9):137, 2018.
 [2] Gaurangi Anand, Auon H Kazmi, Pankaj Malhotra, Lovekesh Vig, Puneet Agar-
     wal, and Gautam Shroff. Deep temporal features to predict repeat buyers. In
     NIPS 2015 Workshop: Machine Learning for eCommerce, 2015.
 [3] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Ok-
     sana Yakhnenko. Translating embeddings for modeling multi-relational data. In
     Advances in neural information processing systems, pages 2787–2795, 2013.
 [4] Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
 [5] Jerome H Friedman. Greedy function approximation: a gradient boosting machine.
     Annals of statistics, pages 1189–1232, 2001.
 [6] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng
     Chua. Neural collaborative filtering. In Proceedings of the 26th International
     Conference on World Wide Web, pages 173–182. International World Wide Web
     Conferences Steering Committee, 2017.
 [7] Wan-Hsun Hu, Shih-Hsien Tang, Yin-Che Chen, Chia-Hsuan Yu, and Wen-Cheng
     Hsu. Promotion recommendation method and system based on random forest. In
     Proceedings of the 5th Multidisciplinary International Social Networks Conference,
     page 11. ACM, 2018.
 [8] Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. Knowledge graph
     embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual
     Meeting of the Association for Computational Linguistics and the 7th International
     Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages
     687–696, 2015.
 [9] Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. Learning entity
     and relation embeddings for knowledge graph completion. In Twenty-ninth AAAI
     conference on artificial intelligence, 2015.
[10] Vladimir Nikulin. Prediction of the shoppers loyalty with aggregated data streams.
     Journal of Artificial Intelligence and Soft Computing Research, 6(2):69–79, 2016.
[11] Hongwei Wang, Fuzheng Zhang, Xing Xie, and Minyi Guo. Dkn: Deep knowledge-
     aware network for news recommendation. arXiv preprint arXiv:1801.08284, 2018.
[12] Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. Knowledge graph embedding:
     A survey of approaches and applications. IEEE Transactions on Knowledge and
     Data Engineering, 29(12):2724–2743, 2017.
[13] Yandi Xia, Giuseppe Di Fabbrizio, Shikhar Vaibhav, and Ankur Datta. A content-
     based recommender system for e-commerce offers and coupons. In eCOM@SIGIR,
     2017.
[14] Feng Yu, Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. A dynamic recurrent
     model for next basket recommendation. In Proceedings of the 39th International