=Paper=
{{Paper
|id=Vol-2311/paper_20
|storemode=property
|title=A Content-based Recommender System for E-commerce Offers and Coupons
|pdfUrl=https://ceur-ws.org/Vol-2311/paper_20.pdf
|volume=Vol-2311
|authors=Yandi Xia,Giuseppe Di Fabbrizio,Shikhar Vaibhav,Ankur Datta
|dblpUrl=https://dblp.org/rec/conf/sigir/XiaFVD17
}}
==A Content-based Recommender System for E-commerce Offers and Coupons==
<pdf width="1500px">https://ceur-ws.org/Vol-2311/paper_20.pdf</pdf>
<pre>
           A Content-based Recommender System for E-commerce
                            Offers and Coupons
                                     Yandi Xia                                                         Giuseppe Di Fabbrizio
                     Rakuten Institute of Technology                                              Rakuten Institute of Technology
                    Boston, Massachusetts - USA 02110                                            Boston, Massachusetts - USA 02110
                         yandi.xia@rakuten.com                                                        difabbrizio@gmail.com

                               Shikhar Vaibhav                                                                  Ankur Datta
                                 Ebates                                                           Rakuten Institute of Technology
                  San Francisco, California - USA 94100                                          Boston, Massachusetts - USA 02110
                         svaibhav@ebates.com                                                         ankur.datta@rakuten.com
ABSTRACT                                                                              1   INTRODUCTION
Reward-based e-commerce companies expose thousands of online                          Reward-based e-commerce services are a fast-growing online sector
offers and coupons every day. Customers who signed up for online                      that provides cashback to subscribers by leveraging discounted
coupon services either receive a daily digest email with selected                     prices from a broad network of affiliated companies.
offers or select specific offers on the company website front-page.                      Typically, a consumer would use the company website to search
High-quality online discounts are selected and delivered through                      for specific products, retailers or potentially click on one of the
these two means by applying a manual process that involves a team                     prominent published offers. Shoppers are then redirected to the
of experts who are responsible for evaluating recency, product                        websites of the respective retailers. A visit to the merchant’s website
popularity, retailer trends, and other business-related criteria. Such                generated by this redirection is defined as shopping trip.
a process is costly, time-consuming, and not customized on users’                        At the same time, subscribers receive daily digest email with the
preferences or shopping history. In this work, we propose a content-                  most popular discounts. Figure 1 shows a fragment of a daily digest
based recommender system that streamlines the coupon selection                        email featuring a list of current offers and coupons. Both website
process and personalizes the recommendation to improve the click-                     and email contents are manually curated by experts focusing on
through rate and, ultimately, the conversion rates. When compared                     delivering the highest quality offers to customers.
to the popularity-based baseline, our content-based recommender
system improves F-measures from 0.21 to 0.85 and increases the
estimated click-through rate from 1.20% to 7.80%. The experimental                                 Merchant A
system is currently scheduled for A/B testing with real customers.

CCS CONCEPTS                                                                                       Merchant B
• Computing methodologies → Classification and regression trees;
• Applied computing → Online shopping;
                                                                                                   Merchant C
KEYWORDS
Recommender Systems; Personalization; E-commerce offers and
coupons
                                                                                                   Merchant D
ACM Reference format:
Yandi Xia, Giuseppe Di Fabbrizio, Shikhar Vaibhav, and Ankur Datta. 2017.
A Content-based Recommender System for E-commerce
Offers and Coupons. In Proceedings of SIGIR eCom 2017, Tokyo, Japan, August                        Merchant E
2017, 7 pages.

Copyright © 2017 by the paper’s authors. Copying permitted for private and academic   Figure 1: Example of email campaign message including of-
purposes.
In: J. Degenhardt, S. Kallumadi, M. de Rijke, L. Si, A. Trotman, Y. Xu (eds.):        fers and coupons.
Proceedings of the SIGIR 2017 eCom workshop, August 2017, Tokyo, Japan, published
at http://ceur-ws.org
                                                                                         This manual process is laborious and does not scale well to
                                                                                      millions of online customers who are receiving the same offer rec-
                                                                                      ommendations regardless of their previous browsing or purchase
                                                                                      history. A recommender system would be able to capture the cus-
                                                                                      tomers’ preferences by optimizing the coupon selection based on
                                                                                      the previous history and the similarity across users and / or items.
SIGIR eCom 2017, August 2017, Tokyo, Japan                     Y. Xia et al., Yandi Xia, Giuseppe Di Fabbrizio, Shikhar Vaibhav, and Ankur Datta


However, compared to traditional recommender systems in other                 dislikes the offers yui ,c k , but merely indicates the choice preference
domains, such as movies [6, 10] and music [16], online offers differ          in this round of interactions. The user may go back to the offer list
for a number of reasons: 1) Coupons are highly volatile items only            and select another item which is different from c j . This makes the
valid in a limited time span and removed after their expiration               prediction modeling challenging since user’s implicit feedback is
date; 2) Customers using coupons are affected by high turnover; 3)            noisy and there is a lack of negative feedback. In our case, we only
Users’ population is skewed between a minority of heavy users and             considered the positive feedback and modeled the shopping trip
a majority of occasional or light users; and finally, 4) Online offers        predictions as one-class classification (OCC) problem [21], where
are rarely rated like usually happens for movie and music domains.            we only use the positive samples and artificially derive the negative
    Consequently, the online discount domain is more prone to the             samples from the user’s shopping history.
cold-start problem [4], where there is insufficient click history for            To handle the classification task, we experimented with two non-
a new item or, conversely, there is not enough history for a new              parametric learning methods that are resilient to noisy features,
subscriber. Matrix factorization methods are less robust to the cold-         robust to outliers, and capable of handling missing features (but
start problem [4] and lack the capability to capture the number of            not necessarily robust to noisy labels): 1) Random forests [5] [12,
features necessary to model preferences of high volatile items and            Chapter 15]; and 2) Gradient boosted trees [9].
large customer populations.
    For these reasons, we adopted a content-based filtering (CBF)             2.1    Generating negative samples
approach [17] where we exploit coupons and customers’ attributes              When relying on the user’s implicit feedback, an accepted conven-
to build a stochastic classification model that predicts the posterior        tion is that any higher ranked item that was not clicked on is likely
probability for a coupon to be clicked by the customer. We rely on            less relevant than the clicked ones [20]. In an industrial setting, it
users’ implicit feedback which indirectly express users’ preferences          is often the case that neither the coupon rank or the coupon list
through behavior like clicking on specific offers in a ranked list. This      exposed to users are available. In this case, it is necessary to derive
is typically a more noisy signal compared to explicit feedback such           “negative shopping trips” (i.e., not relevant coupons) by some other
as ratings and reviews, but it is also abundant and with consistent           means. For the negative training and testing sets, we followed the
bias across user’s click-through history [15, 23].                            procedure described below:
    However, click information is more challenging to use since it               For each shopping trip, we extracted the triple: user, coupon, and
does not directly reflect the users’ satisfaction neither provides            click time;
clues about items that are irrelevant (negative feedback) [18] which                 • We retrieved the user’s shopping trip history and select all
is fundamental to build an effective discriminative model. In this                      the coupons clicked by them up to the current click time.
paper, we illustrate how to utilize noisy implicit feedback to build                    We call this data set A;
a CFB model for offers and coupons recommendations. The main                         • Given the considered shopping trip click time, we retrieved
contributions are the following:                                                        all the non-expired coupons at that time. We call this data
     (1) We describe a method to generate negative samples from                         set S;
         the implicit feedback data;                                                 • We removed all the clicked coupons by the user from the
     (2) We perform extensive experiments to compare different                          non-expired coupons set and obtained the data set U =
         learning models;                                                               S − A;
     (3) We demonstrate that our approach is effective with respect                  • We uniformly sampled 2, 4, 8, and 9 coupons from U, and
         to the baseline and the upper-bound baseline.                                  build from each of the sample set a list of negative shopping
                                                                                        trips that would simulate the missing negative sample in
2    MODELING APPROACHES                                                                our data set.
We formalize the coupon recommendation problem as a content-                  Note that we sampled different sizes of negative sets for evaluation
based filtering [17] with implicit feedback. Let M and N indicate             purposes, but we only reported the results for negative sets of size
the number of users and coupons, respectively. For each shopping              9 since it is the closest to the real system conditions where the user
trip interaction included in the matrix Y ∈ RM ×N , a user ui is              is exposed to a list of 10 offers and selects one of them.
exposed to a number of coupons L and selects a specific coupon c j
generating the implicit feedback yui ,c j = 1, and yui ,c k = 0 for all       2.2    Random forests
the other items in the list of length L, where k = {0, 1, . . . , L − 1}      Random forest (RF) models [5][12, Chapter 15] use bagging tech-
and k , j. Formally, predicting a user’s shopping trip for an online          niques to overcome decision trees’s low-bias and high variance
offer requires to estimate the function ŷui ,c j = f (ui , c j |Θ), where    problem. A RF model averages the results of several noisy and rela-
ŷui ,c j is the predicted score for the online interaction (ui , c j ); Θ    tively unbiased decision trees trained on different parts of the same
represents the model parameters to learn, and f is the mapping                training set. The general method to create a RF model follows the
function between the parameters and the predicted scores. The                 steps below [12, Chapter 15]:
learning can be done by applying a machine learning classification               For b = 1, . . . , B, where B is the total number of bagged tree:
algorithm that optimizes an object function such as logistic loss or
                                                                                   (1) Draw N samples from the training set and build a tree
cross-entropy loss [12].
                                                                                       Tb by recursively apply the steps below for each terminal
    However, taking a closer look at the implicit feedback used for
                                                                                       node until reaching the minimum node size nmin :
modeling, yui ,c k = 0 does not necessarily mean that the user ui
A Content-based Recommender System for E-commerce
Offers and Coupons                                                                                                       SIGIR eCom 2017, August 2017, Tokyo, Japan

            (a) Randomly select m features from the total p features,          imbalance and outliers [12], and F (x) can approximate arbitrarily
                where m ≤ p;                                                   complex decision boundaries.
            (b) Pick the best node split among the m features;
            (c) Split the selected node into two children nodes;               2.4           Popularity-based baseline
      (2) Save the obtained ensemble tree {Tb }1B                              To evaluate the contributions of our coupon recommender system, it
    To determine the quality of the node split, either Gini impurity           is necessary to establish a baseline measure that correlates with the
or cross-entropy measures can be used for the candidate split [12,             current recommender system performance. However, the current
Chapter 9]. For a classification task, the prediction is the majority          coupon recommendation process cannot be evaluated offline since
vote among the results from the tree ensemble. In other terms, if              recommendations are done by manually selecting popular retailers
Ĉb (x ) is the class prediction for the bt h random-forest (rf) tree,         and offers. The resulting list of coupons is then exposed to all the
then the predicted class is: ĈrBf (x ) = majority vote{Ĉb (x )}1B .          users by either daily email or website front-page visits. All users see
                                                                               the same list of coupons regardless of their shopping trip history or
2.3      Gradient boosted trees                                                preferences. A direct comparison with such a system would require
Gradient boosted trees (GBTs) [9] optimize a loss functional: L =              exposing both recommendation techniques through A/B testing
Ey [L(y, F (x)|X)] where F (x) can be a mathematically difficult to            with real customers.
characterize function, such as a decision tree f (x) over X. The opti-
                                                       PM                       2017
mal value of the function is expressed as F ? (x) = m=0      fm (x, a, w),      ….......     Jun 07   Jun 08    Jun 09    Jun 10   Jun 11   Jun 12   Jun 13   Jun 14   Jun 15   Jun 16   ….......
                                                            M are addi-
where f 0 (x, a, w) is the initial guess and { fm (x, a, w)}m=1
tive boosts on x defined by the optimization method. The parameter                                        Shopping
                                                                                                            trips
am of fm (x, a, w) denotes split points of predictor variables and                                         history


wm denotes the boosting weights on the leaf nodes of the decision                          Mimic
                                                                                           experts
trees corresponding to the partitioned training set Xj for region j.                                       Top N
                                                                                                            most

To compute F ? (x), we need to calculate, for each boosting round
                                                                                                           clicked
                                                                                                          coupons
                                                                                                                         Recommend to all members
m,

                                          N
                                          X
              {am , wm } = arg mina,w           L(yi , Fm (xi ))         (1)   Figure 2: Popularity-based baseline (PBB) coupon selection
                                          i=1                                  obtained by considering all the valid coupons prior the shop-
   with Fm (x) = Fm−1 (x) + fm (x, am , wm ). This expression is in-           ping trip date.
dicative of a gradient descent step:
                                                                                  As an alternative approach, we propose an offline baseline eval-
               Fm (x) = Fm−1 (x) + ρm (−дm (xi ))                        (2)   uation based on popular coupons, where the popularity is derived
                                                                               from shopping trips click-through rates. In addition to popularity
                                      ∂L(y, F (x))
                                                  
   where ρm is the step length and      ∂F (x)                            =    criteria, there could be other business reasons for offer selection
                                                   F (xi )=Fm−1 (xi )          in the current setting, but for baseline and simulation purposes we
дm (xi ) being the search direction. To solve am and wm , we make
                                                                               have limited the selection to the popularity criteria only.
the basis functions fm (xi ; a, w) correlate most to −дm (xi ), where
                                                                                  The assumption is that user-defined popularity correlates with
the gradients are defined over the training data distribution. In
                                                                               the current recommender system and it is likely to represent an
particular, using Taylor series expansion, we can get closed form
                                                                               upper-bound estimation of the performance. At a high-level, we
solutions for am and wm – see [7] for details. It can be shown that
                                                                               give two baselines which select the most popular coupons based
                         N
                         X                                                     only on the past click data (PBB) or only based on the future click
         am = arg mina         (−дm (xi ) − ρm fm (xi , a, wm )) 2       (3)   data (God-View PBB), which represents a strong upper bound.
                         i=1                                                      As illustrated in Figure 2, we first divide users’ shopping trips into
   and                                                                         training and testing partitions with a 90% and 10% ratio, respectively.
                                                                               For each shopping trip in the shopping trip test set, we performed
                       N
                       X                                                       the following steps:
      ρm = arg minρ          L(yi , Fm−1 (xi ) + ρ fm (xi ; am , wm ))   (4)         • We extracted the shopping trip click date;
                       i=1                                                           • From the shopping trip train set, we extracted all the shop-
   which yields,                                                                         ping trips prior the click date with a valid coupon (i.e.,
                                                                                         unexpired coupons at the time of the click date);
               Fm (x) = Fm−1 (x) + ρm fm (x, am , wm )                   (5)         • We sorted the obtained list of coupons by click-through
                                                                                         rate and selected the top N popular coupons;
   Each boosting round updates the weights wm, j on the leaves and
                                                                                     • If the coupon in the test shopping trip is in the top N
helps create a new tree in the next iteration. The optimal selection
                                                                                         most popular coupons (with N = 10), then there would
of decision tree parameters is based on optimizing the fm (x, a, w)
                                                                                         be a match and the prediction is correct; otherwise the
using a logistic loss. For GBTs, each decision tree is resistant to
                                                                                         prediction is wrong.
SIGIR eCom 2017, August 2017, Tokyo, Japan                                      Y. Xia et al., Yandi Xia, Giuseppe Di Fabbrizio, Shikhar Vaibhav, and Ankur Datta


   The general idea is that the customized popularity list simulates                           Table 1: Shopping trips data distribution across different
the expert selection by identifying the best offer for each of the                             data sets.
considered day in the test set. We also define a stronger baseline
where we have access to future click-through information after the                                 Data set                       Members       Shopping trips       Coupons         Retailers
shopping trip under evaluation. By predicting the future, we can
                                                                                                   Train                              448,055        1,037,030         15,276              1,528
build a baseline which is the upper-bound of the popularity-based
                                                                                                   Test                                49,789          114,300          7594               1,192
baselines when recommending the same list of offers to the whole
                                                                                                   Train Negative                     358,444       11,666,592          9,705              1,453
user population. We call this baseline God-View PBB (GVPBB).
                                                                                                   Test Negative                       39,831        1,285,875          8,190              1,413
2.5                Run-time
Figure 3 shows the simplified architecture of the run-time system.                             Table 2: Data counts and stats for head, torso, and tail sets of
A daily list of thousands of candidate coupons is submitted for                                the shopping trips distribution.
selection to the recommender system. For each member subscribing
to the reward-based e-commerce company, each pair of member                                           Data                 Members     Shopping         %    Min      Max            Avg
/ offer is evaluated by the recommender system and scored for                                         set                                  trips
affinity. The resulting score list is used to rank the predictions
                                                                                                      Head                   25,149      317,623    28.7%        8    1,569     15.79
which are then truncated to the top 10 candidates and used either
                                                                                                      Torso                 132,337      441,533    39.8%        3        7      4.17
as personalized email campaigns or personalized front page offers.
                                                                                                      Tail                  333,686      349,262    31.5%        1        2      1.31

                                                   0.1
    Merchant A


                                                                                                   The details about the negative data sets in Table 1 are explained
                                                   0.8
     Merchant B
                                                                                               in the previous Section 2.1.
                                                   0.5
                                                                   Merchant C

                                                                                                   The number of shopping trips per customers follow the typical
    Merchant C
                                 Machine                 Ranking                               power law probability distribution [1] where the heavy-tailed data
                                 Learning          0.2
                                                                   Merchant F


    Merchant D
                                  Model
                                                                                               still holds a substantial amount of probability information. Figure 4
                                                   0.9                                         shows the probability density function (PDF) p(X ) in blue, the com-
                                Predictions                        Merchant A

    Merchant E

                                                                                               plementary cumulative density function (CCDF) p(X ≥ x ) in red,
                                                   0.3
    Merchant F
                                                                                               and the corresponding power law distribution fit with parameters
                                        Affinity   0.7
                                                                                               xmin = 1 and α = 3.27.
 All the coupons                        scores
                                                                                                                10 0
                  Figure 3: Run-time coupons selection and ranking.                                         10 -1
                                                                                                            10 -2
3                 DATA                                                                                      10 -3
                                                                                               p(X), p(X ≥ x)


We used data collected from customers directly clicking on the                                              10 -4
published offers, e-shopping at the merchants’ website and conse-
quently adding offers, or auto-applying offers through browser’s                                            10 -5
button extensions.                                                                                                            PDF
                                                                                                            10 -6
    The data includes historical information about customers’ click-                                                          Fit
throughs (i.e., shopping trips) to visit affiliated retailer websites. The                                  10 -7             CCDF
data was sampled across a full year of traffic and anonymized for                                                             Fit
privacy requirements. Table 1 summarizes the distributions of the                                           10 -8
                                                                                                                    10 1                        10 2                          10 3
various data sets including active members generating shopping                                                                            Shopping Trips Frequency
trips, shopping trip counts, clicked coupons, and retailers that issued
the offer.
                                                                                               Figure 4: Probability density function (p(X ), blue) and com-
    Coupons are characterized by a textual descriptions (e.g., “25%
                                                                                               plementary cumulative density function (p(X ≥ x ), red) with
off Tees Shorts Swim for Women Men Boys and Girls Extra 30% off Sale
                                                                                               power law distribution fit for xmin = 1 and α = 3.27
Styles and Extra 50% off Final Sale Styles”.) and a date range defin-
ing their validity (e.g., from 12/18/2016 to 12/27/2016). Optionally,
coupons are classified by three categories: 1) percentage discount; 2)                            Based on the shopping trips frequency (see Table 2), we split the
money discount; and / or 3) free shipping offer for specific brands or                         data in three segments: 1) The Head comprising the most active
products. Additionally, they can directly refer to a specific product                          members with more than 8 shopping trips; 2) the Torso including
or retailer. Retailers’ information includes a business name (e.g.,                            the members with a click rate between 2 and 7; and 3) the Tail for
Stila Cosmetics) and an optional category (e.g., Health & Beauty)                              the less active members with 1 or 2 clicks.
A Content-based Recommender System for E-commerce
Offers and Coupons                                                                            SIGIR eCom 2017, August 2017, Tokyo, Japan


  This data partition will be used in the Section 4.2 to evaluate the   with a substantial higher recall of 0.83. This pushed the XGBoost
user-based cold-start problem.                                          predicted click-through rate to 8.30%. Both PBB and GVPBB base-
                                                                        lines suffer of low recall since the popularity lists, once selected the
4     EXPERIMENTS                                                       shopping trip day, are the same for all the users.
To build and evaluate random forests, XGBoost, and the baseline
models, we partitioned the data by members using 90% of the pop-        Table 3: Precision (P), recall (R), F-measure (F) and Click-
ulation for training and 10% for testing and derived the associated     through rate (CTR) for random forest classification (RFC),
shopping trips sets. Partitioning by members rather than by shop-       XGBoost, popularity-based baseline (PBB), and God-View
ping trips would increase the chances to capture the model gen-         PBB (GVPBB) with 10 maximum coupon recommendations.
eralization capabilities, since the shopping trips history of unseen    Baseline methods are marked with (*).
test users will not be biasing the training data. For the negative
samples, as explained in Section 2.1, we select 9 negative shopping           Model           P       R       F   NDCG@10          CTR
trips for each positive sample.
   To capture the users / coupons affinity, we experimented with a            PBB*         0.91    0.12   0.21          0.776    1.20%
number of features involving specific users’ and coupons’ attributes,         GVPBB*       0.83    0.34   0.48          0.892    3.40%
as well as shopping history features. The following list summarize            XGBoost      0.77    0.83   0.80          0.857    8.30%
them:                                                                         RFC          0.93    0.78   0.85          0.969    7.80%
       • User-based features
            – Devices used                                                 Overall, both RFC and XGBoost outperform the baselines metrics.
            – OS systems                                                In addition to precision, recall, and F-measure, we also report the
            – Email domain                                              Normalized Discounted Cumulative Gain (NDCG) metric [14] for
       • Coupon-based features                                          a list of 10 offers generated by combining 1 positive sample with
            – Text description                                          9 randomly selected negative samples. NDCG is a ranking quality
            – Originating retailer                                      measure that keeps into consideration the position of the clicked
            – Retailer’s category                                       item. Generally, positive samples should be ranked higher than the
            – Discount info (free shipping, dollar off, percent off)    negative ones.
            – Cashback info                                                As further validation, we evaluated the quality of the confidence
       • Shopping trip history-based features                           scores estimated by the RFC model. Well-calibrated predicted prob-
            – Times the user has visited this coupon before             abilities are fundamental when the classification model scores are
            – Recent visited retailers                                  used for ranking [18].
            – Recent discount information
            – Recent cash back information
            – Recent coupon keywords (from descriptions)
            – Recent retailer’s categories
            – Recent luxury purchases
   In the data pre-processing stage, we encoded text as bag of word
frequencies and members’ shopping behavior histories as bag of
frequencies, and categorical features as one-hot vector represen-
tation. All the other features are either binary or numeric values.
Since both RF and XGBoost models are insensitive to monotonic
feature transformations, we avoided numeric standardization and
normalization. We carefully handled the normalization of missing
information and replaced missing text with the label unk and both
categorical and numerical features with 0s. After pre-processing,
the resulting number of features is 45,329. For the XGBoost model,
we ran a partial grid-search optimization to set the most important
hyper-parameters, the maximum tree depth and the number of
estimators, to 15 and 1,000, respectively. We used the scikit-learn
[19] implementation of Random Forest Tree and XGBoost.

4.1    Results                                                          Figure 5: Probability calibration curve for the RFC model. X-
                                                                        axis: mean predicted value; Y-axis: fraction of positive sam-
Table 3 shows the experimental results for the machine learning
                                                                        ples.
models and the baselines. We evaluated performance by truncating
the ranked prediction list at 10 items. RFC model could predict
click-through for 96.80% of the tested coupons (accuracy), while the      The confidence calibration plot in Figure 5 shows that the RFC
XGBoost model followed with 95.85% correct click predictions, but       model has a minor bias around a predicted value of 0.2 and 0.6, but
SIGIR eCom 2017, August 2017, Tokyo, Japan                 Y. Xia et al., Yandi Xia, Giuseppe Di Fabbrizio, Shikhar Vaibhav, and Ankur Datta


overall the curve follows the ideal behavior by closely mimicking         is no need to re-train when adding new offers, although including
the diagonal.                                                             new retailers requires updating the model to capture the new user
                                                                          / item affinity relations expressed in the new shopping trips. In
4.2    User-based cold-start evaluation                                   contrast to SVD and CF-based models, the content-based approach
When new members join a reward-based system or new offers are             can leverage different classification techniques and can be combined
issued by retailers, recommender systems are affect by the sparse-        with CF.
ness of the new data that compromise prediction performance [19].
To mitigate this problem, a different selection strategy is used for      6    CONCLUSIONS
the tail of the distribution (i.e., hybrid methods) [19]. However,        In this work, we developed a coupon recommender system with
we rely on the user’s and coupon’s based features to capture in-          content-based filtering. We experimented with two popularity-
formation that may generalize well with minimum shopping trip             based baselines and compared them to both a random forest and
history.                                                                  gradient boosted tree classification models. To address the one-class
   We evaluated our RFC model with the three segments of the              classification problem, our framework automatically extracted neg-
shopping trip distribution (see also Section 3). Table 4 shows the        ative samples from customers’ shopping trips. Negative samples
results of such a test.                                                   simulate the real selection scenario where one coupon, over the
                                                                          visualized ten coupons, is actually selected by members. In this
Table 4: Precision (P), recall (R), F-measure (F), Click-                 experimental setup, both random forests and XGBoost classifiers
through rate (CTR), and NDCG@10 for the RFC model when                    perform substantially better than the two baselines except for the
testing users in the head, torso, and tail sections of the shop-          NDCG10 metric.
ping trip distribution.                                                      In the future, to improve ranking results, we will extend the
                                                                          classification models with pair-based and list-based cost functions.
       Data set      P      R      F    NDCG@10        CTR                We will also validate our models under A/B testing evaluations with
                                                                          real customers.
       Head        0.95   0.77   0.85         0.980   7.70%
       Torso       0.94   0.79   0.86         0.985   7.90%               ACKNOWLEDGMENTS
       Tail        0.90   0.78   0.84         0.976   7.80%
                                                                          The authors would like to thank Dr. David Inouye for providing
                                                                          insightful suggestions and comments.
   On average, the RCF model is quite robust across the three distri-        The authors would also like to thank the anonymous reviewers
bution segments. When analyzing the main feature predictors, the          for their helpful advice and Yiu-Chang Lin for presenting our work
number of times this user clicked on the coupons before and being a       at the workshop.
referred member are the major contributions to the predictions, but
further investigation is necessary to better understand the feature       REFERENCES
interaction.                                                               [1] Jeff Alstott, Ed Bullmore, and Dietmar Plenz. 2014. powerlaw: A Python Package
                                                                               for Analysis of Heavy-Tailed Distributions. PLOS ONE 9, 1 (01 2014), 1–11.
                                                                           [2] Ana Belén Barragáns-Martínez, Enrique Costa-Montenegro, Juan C. Burguillo,
5     RELATED WORK                                                             Marta Rey-López, Fernando A. Mikic-Fonte, and Ana Peleteiro. 2010. A Hybrid
                                                                               Content-based and Item-based Collaborative Filtering Approach to Recommend
A large body of work on recommender systems focuses on explicit                TV Programs Enhanced with Singular Value Decomposition. Inf. Sci. 180, 22
feedback where users express their preference with ratings [10,                (Nov. 2010), 4290–4311.
22]. More recently, there is an increasing number of contributions         [3] Immanuel Bayer, Xiangnan He, Bhargav Kanagal, and Steffen Rendle. 2017. A
                                                                               Generic Coordinate Descent Framework for Learning from Implicit Feedback. In
that rely on the more abundant and noisy implicit feedback data                Proceedings of the 26th International Conference on World Wide Web (WWW ’17).
[3, 11, 13]. However, most of this work frames the problem as a                International World Wide Web Conferences Steering Committee, Republic and
collaborative filtering (CF) model. CF is notoriously sensitive to             Canton of Geneva, Switzerland, 1341–1350.
                                                                           [4] J. Bobadilla, F. Ortega, A. Hernando, and A. GutiéRrez. 2013. Recommender
the cold-start problem and not flexible to handle extremely volatile           Systems Survey. Know.-Based Syst. 46 (July 2013), 109–132.
items such as online offers and coupons. Additionally, it is hard to       [5] Leo Breiman. 2001. Random Forests. Mach. Learn. 45, 1 (Oct. 2001), 5–32.
                                                                           [6] Walter Carrer-Neto, María Luisa Hernández-Alcaraz, Rafael Valencia-García,
incorporate features that capture users’ and items’ properties.                and Francisco García-Sánchez. 2012. Social Knowledge-based Recommender
   The hybrid recommender system described in [11] is the closest              System. Application to the Movies Domain. Expert Syst. Appl. 39, 12 (Sept. 2012),
to our work. To address the cold-start problem, they use implicit              10990–11000.
                                                                           [7] Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting
feedback to recommend online coupons by integrating both user’s                System. In Proceedings of the 22nd ACM SIGKDD International Conference on
information and items features. In this work, the traditional col-             Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17,
laborative filtering matrix, including users and items, is enriched            2016. 785–794.
                                                                           [8] Keunho Choi, Donghee Yoo, Gunwoo Kim, and Yongmoo Suh. 2012. A Hybrid
with additional user and item features and factorized with SVD.                Online-product Recommendation System: Combining Implicit Rating-based
Nevertheless, the hybrid system requires to be retrained every time            Collaborative Filtering and Sequential Pattern Analysis. Electron. Commer. Rec.
                                                                               Appl. 11, 4 (July 2012), 309–317.
there is a new coupon or user. In our case, there are thousands of         [9] Jerome H. Friedman. 2000. Greedy Function Approximation: A Gradient Boosting
new coupons every day and re-training that often is impractical.               Machine. Annals of Statistics 29 (2000), 1189–1232.
   Our work is in line with content-based filtering (CBF) approaches      [10] Carlos A. Gomez-Uribe and Neil Hunt. 2015. The Netflix Recommender System:
                                                                               Algorithms, Business Value, and Innovation. ACM Trans. Manage. Inf. Syst. 6, 4,
such as [2, 8]. CBF methods are able to abstract items and represent           Article 13 (Dec. 2015), 19 pages.
user / item interactions by encoding shopping trip history. There
A Content-based Recommender System for E-commerce
Offers and Coupons                                                                       SIGIR eCom 2017, August 2017, Tokyo, Japan


[11] J. Grivolla, D. Campo, M. Sonsona, J. M. Pulido, and T. Badia. 2014. A Hybrid
     Recommender Combining User, Item and Interaction Data. In 2014 International
     Conference on Computational Science and Computational Intelligence, Vol. 1. 297–
     301.
[12] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The elements of
     statistical learning: data mining, inference and prediction (2 ed.). Springer.
[13] Xiangnan He, Hanwang Zhang, Min-Yen Kan, and Tat-Seng Chua. 2016. Fast
     Matrix Factorization for Online Recommendation with Implicit Feedback. In
     Proceedings of the 39th International ACM SIGIR Conference on Research and
     Development in Information Retrieval (SIGIR ’16). ACM, New York, NY, USA,
     549–558.
[14] Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated Gain-based Evaluation
     of IR Techniques. ACM Trans. Inf. Syst. 20, 4 (Oct. 2002), 422–446.
[15] Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased
     Learning-to-Rank with Biased Feedback. In Proceedings of the Tenth ACM Inter-
     national Conference on Web Search and Data Mining (WSDM ’17). ACM, New
     York, NY, USA, 781–789.
[16] Seok Kee Lee, Yoon Ho Cho, and Soung Hie Kim. 2010. Collaborative Filtering
     with Ordinal Scale-based Implicit Ratings for Mobile Music Recommendations.
     Inf. Sci. 180, 11 (June 2010), 2142–2155.
[17] Pasquale Lops, Marco de Gemmis, and Giovanni Semeraro. 2011. Content-based
     Recommender Systems: State of the Art and Trends. Springer US, Boston, MA,
     73–105.
[18] Alexandru Niculescu-Mizil and Rich Caruana. 2005. Predicting Good Probabilities
     with Supervised Learning. In Proceedings of the 22Nd International Conference on
     Machine Learning (ICML ’05). ACM, New York, NY, USA, 625–632.
[19] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel,
     Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron
     Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau,
     Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn:
     Machine Learning in Python. J. Mach. Learn. Res. 12 (Nov. 2011), 2825–2830.
[20] Filip Radlinski and Thorsten Joachims. 2005. Query chains: learning to rank
     from implicit feedback. In KDD ’05: Proceeding of the eleventh ACM SIGKDD
     international conference on Knowledge discovery in data mining. ACM Press, New
     York, NY, USA, 239–248.
[21] Oxana Ye. Rodionova, Paolo Oliveri, and Alexey L. Pomerantsev. 2016. Rigor-
     ous and compliant approaches to one-class classification. Chemometrics and
     Intelligent Laboratory Systems 159, Complete (2016), 89–96.
[22] Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey Hinton. 2007. Restricted
     Boltzmann Machines for Collaborative Filtering. In Proceedings of the 24th Inter-
     national Conference on Machine Learning (ICML ’07). ACM, New York, NY, USA,
     791–798.
[23] Chao Wang, Yiqun Liu, and Shaoping Ma. 2016. Building a click model: From
     idea to practice. CAAI Transactions on Intelligence Technology 1, 4 (2016), 313 –
     322.

</pre>