<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Tokyo, Japan, August</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Content-based Recommender System for E-commerce O ers and Coupons</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yandi Xia</string-name>
          <email>yandi.xia@rakuten.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shikhar Vaibhav</string-name>
          <email>svaibhav@ebates.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Di Fabbrizio</string-name>
          <email>difabbrizio@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ankur Datta</string-name>
          <email>ankur.datta@rakuten.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ebates</institution>
          ,
          <addr-line>San Francisco, California - USA 94100</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Rakuten Institute of Technology</institution>
          ,
          <addr-line>Boston, Massachusetts - USA 02110</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <volume>7</volume>
      <abstract>
        <p>Reward-based e-commerce companies expose thousands of online o ers and coupons every day. Customers who signed up for online coupon services either receive a daily digest email with selected o ers or select speci c o ers on the company website front-page. High-quality online discounts are selected and delivered through these two means by applying a manual process that involves a team of experts who are responsible for evaluating recency, product popularity, retailer trends, and other business-related criteria. Such a process is costly, time-consuming, and not customized on users' preferences or shopping history. In this work, we propose a contentbased recommender system that streamlines the coupon selection process and personalizes the recommendation to improve the clickthrough rate and, ultimately, the conversion rates. When compared to the popularity-based baseline, our content-based recommender system improves F-measures from 0.21 to 0.85 and increases the estimated click-through rate from 1.20% to 7.80%. The experimental system is currently scheduled for A/B testing with real customers.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Computing methodologies → Classi cation and regression trees;
• Applied computing → Online shopping;</p>
      <p>Copyright © 2017 by the paper’s authors. Copying permitted for private and academic
purposes.</p>
      <p>In: J. Degenhardt, S. Kallumadi, M. de Rijke, L. Si, A. Trotman, Y. Xu (eds.):
Proceedings of the SIGIR 2017 eCom workshop, August 2017, Tokyo, Japan, published
at http://ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>Reward-based e-commerce services are a fast-growing online sector
that provides cashback to subscribers by leveraging discounted
prices from a broad network of a liated companies.</p>
      <p>Typically, a consumer would use the company website to search
for speci c products, retailers or potentially click on one of the
prominent published o ers. Shoppers are then redirected to the
websites of the respective retailers. A visit to the merchant’s website
generated by this redirection is de ned as shopping trip.</p>
      <p>At the same time, subscribers receive daily digest email with the
most popular discounts. Figure 1 shows a fragment of a daily digest
email featuring a list of current o ers and coupons. Both website
and email contents are manually curated by experts focusing on
delivering the highest quality o ers to customers.</p>
      <p>Merchant A
Merchant B</p>
      <p>
        This manual process is laborious and does not scale well to
millions of online customers who are receiving the same o er
recommendations regardless of their previous browsing or purchase
history. A recommender system would be able to capture the
customers’ preferences by optimizing the coupon selection based on
the previous history and the similarity across users and / or items.
However, compared to traditional recommender systems in other
domains, such as movies [
        <xref ref-type="bibr" rid="ref10 ref6">6, 10</xref>
        ] and music [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], online o ers di er
for a number of reasons: 1) Coupons are highly volatile items only
valid in a limited time span and removed after their expiration
date; 2) Customers using coupons are a ected by high turnover; 3)
Users’ population is skewed between a minority of heavy users and
a majority of occasional or light users; and nally, 4) Online o ers
are rarely rated like usually happens for movie and music domains.
      </p>
      <p>
        Consequently, the online discount domain is more prone to the
cold-start problem [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], where there is insu cient click history for
a new item or, conversely, there is not enough history for a new
subscriber. Matrix factorization methods are less robust to the
coldstart problem [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and lack the capability to capture the number of
features necessary to model preferences of high volatile items and
large customer populations.
      </p>
      <p>
        For these reasons, we adopted a content-based ltering (CBF)
approach [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] where we exploit coupons and customers’ attributes
to build a stochastic classi cation model that predicts the posterior
probability for a coupon to be clicked by the customer. We rely on
users’ implicit feedback which indirectly express users’ preferences
through behavior like clicking on speci c o ers in a ranked list. This
is typically a more noisy signal compared to explicit feedback such
as ratings and reviews, but it is also abundant and with consistent
bias across user’s click-through history [
        <xref ref-type="bibr" rid="ref15 ref23">15, 23</xref>
        ].
      </p>
      <p>
        However, click information is more challenging to use since it
does not directly re ect the users’ satisfaction neither provides
clues about items that are irrelevant (negative feedback) [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] which
is fundamental to build an e ective discriminative model. In this
paper, we illustrate how to utilize noisy implicit feedback to build
a CFB model for o ers and coupons recommendations. The main
contributions are the following:
(1) We describe a method to generate negative samples from
the implicit feedback data;
(2) We perform extensive experiments to compare di erent
learning models;
(3) We demonstrate that our approach is e ective with respect
to the baseline and the upper-bound baseline.
2
      </p>
    </sec>
    <sec id="sec-3">
      <title>MODELING APPROACHES</title>
      <p>
        We formalize the coupon recommendation problem as a
contentbased ltering [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] with implicit feedback. Let M and N indicate
the number of users and coupons, respectively. For each shopping
trip interaction included in the matrix Y 2 RM N , a user ui is
exposed to a number of coupons L and selects a speci c coupon cj
generating the implicit feedback yui;cj = 1, and yui;ck = 0 for all
the other items in the list of length L, where k = f0; 1; : : : ; L 1g
and k , j. Formally, predicting a user’s shopping trip for an online
o er requires to estimate the function yˆui;cj = f (ui ; cj jΘ), where
yˆui;cj is the predicted score for the online interaction (ui ; cj ); Θ
represents the model parameters to learn, and f is the mapping
function between the parameters and the predicted scores. The
learning can be done by applying a machine learning classi cation
algorithm that optimizes an object function such as logistic loss or
cross-entropy loss [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        However, taking a closer look at the implicit feedback used for
modeling, yui;ck = 0 does not necessarily mean that the user ui
dislikes the o ers yui;ck , but merely indicates the choice preference
in this round of interactions. The user may go back to the o er list
and select another item which is di erent from cj . This makes the
prediction modeling challenging since user’s implicit feedback is
noisy and there is a lack of negative feedback. In our case, we only
considered the positive feedback and modeled the shopping trip
predictions as one-class classi cation (OCC) problem [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], where
we only use the positive samples and arti cially derive the negative
samples from the user’s shopping history.
      </p>
      <p>
        To handle the classi cation task, we experimented with two
nonparametric learning methods that are resilient to noisy features,
robust to outliers, and capable of handling missing features (but
not necessarily robust to noisy labels): 1) Random forests [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] [12,
Chapter 15]; and 2) Gradient boosted trees [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
2.1
      </p>
    </sec>
    <sec id="sec-4">
      <title>Generating negative samples</title>
      <p>
        When relying on the user’s implicit feedback, an accepted
convention is that any higher ranked item that was not clicked on is likely
less relevant than the clicked ones [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. In an industrial setting, it
is often the case that neither the coupon rank or the coupon list
exposed to users are available. In this case, it is necessary to derive
“negative shopping trips” (i.e., not relevant coupons) by some other
means. For the negative training and testing sets, we followed the
procedure described below:
      </p>
      <p>For each shopping trip, we extracted the triple: user, coupon, and
click time;</p>
      <p>We retrieved the user’s shopping trip history and select all
the coupons clicked by them up to the current click time.
We call this data set A;
Given the considered shopping trip click time, we retrieved
all the non-expired coupons at that time. We call this data
set S;
We removed all the clicked coupons by the user from the
non-expired coupons set and obtained the data set U =
S A;
We uniformly sampled 2, 4, 8, and 9 coupons from U, and
build from each of the sample set a list of negative shopping
trips that would simulate the missing negative sample in
our data set.</p>
      <p>Note that we sampled di erent sizes of negative sets for evaluation
purposes, but we only reported the results for negative sets of size
9 since it is the closest to the real system conditions where the user
is exposed to a list of 10 o ers and selects one of them.
2.2</p>
    </sec>
    <sec id="sec-5">
      <title>Random forests</title>
      <p>
        Random forest (RF) models [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ][12, Chapter 15] use bagging
techniques to overcome decision trees’s low-bias and high variance
problem. A RF model averages the results of several noisy and
relatively unbiased decision trees trained on di erent parts of the same
training set. The general method to create a RF model follows the
steps below [12, Chapter 15]:
      </p>
      <p>For b = 1; : : : ; B, where B is the total number of bagged tree:
(1) Draw N samples from the training set and build a tree
Tb by recursively apply the steps below for each terminal
node until reaching the minimum node size nmin :
(a) Randomly select m features from the total p features,
where m p;
(b) Pick the best node split among the m features;
(c) Split the selected node into two children nodes;
B
(2) Save the obtained ensemble tree fTb g1</p>
      <p>To determine the quality of the node split, either Gini impurity
or cross-entropy measures can be used for the candidate split [12,
Chapter 9]. For a classi cation task, the prediction is the majority
vote among the results from the tree ensemble. In other terms, if
Cˆb (x ) is the class prediction for the bth random-forest (rf) tree,
then the predicted class is: CˆrBf (x ) = majority vote fCˆb (x )g1B .
2.3</p>
    </sec>
    <sec id="sec-6">
      <title>Gradient boosted trees</title>
      <p>
        Gradient boosted trees (GBTs) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] optimize a loss functional: L =
Ey [L(y; F (x) jX)] where F (x) can be a mathematically di cult to
characterize function, such as a decision tree f (x) over X. The
optimal value of the function is expressed as F ? (x) = PM
m=0 fm (x; a; w),
where f0 (x; a; w) is the initial guess and f fm (x; a; w)gmM=1 are
additive boosts on x de ned by the optimization method. The parameter
am of fm (x; a; w) denotes split points of predictor variables and
wm denotes the boosting weights on the leaf nodes of the decision
trees corresponding to the partitioned training set Xj for region j.
To compute F ? (x), we need to calculate, for each boosting round
m,
      </p>
      <p>N
fam ; wm g = arg mina;w X L(yi ; Fm (xi ))
i=1
with Fm (x) = Fm 1 (x) + fm (x; am ; wm ). This expression is
indicative of a gradient descent step:</p>
      <p>
        Fm (x) = Fm 1 (x) + ρm ( gm (xi ))
(2)
where ρm is the step length and @L@(yF;(Fx()x)) F (xi )=Fm 1 (xi ) =
gm (xi ) being the search direction. To solve am and wm , we make
the basis functions fm (xi ; a; w) correlate most to gm (xi ), where
the gradients are de ned over the training data distribution. In
particular, using Taylor series expansion, we can get closed form
solutions for am and wm – see [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] for details. It can be shown that
      </p>
      <p>N
am = arg mina X ( gm (xi )
i=1
ρm fm (xi ; a; wm ))2
and
which yields,</p>
      <p>N
ρm = arg minρ X L(yi ; Fm 1 (xi ) + ρ fm (xi ; am ; wm ))
i=1</p>
      <p>Fm (x) = Fm 1 (x) + ρm fm (x; am ; wm )</p>
      <p>
        Each boosting round updates the weights wm; j on the leaves and
helps create a new tree in the next iteration. The optimal selection
of decision tree parameters is based on optimizing the fm (x; a; w)
using a logistic loss. For GBTs, each decision tree is resistant to
(1)
(3)
(4)
(5)
imbalance and outliers [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and F (x) can approximate arbitrarily
complex decision boundaries.
2.4
      </p>
    </sec>
    <sec id="sec-7">
      <title>Popularity-based baseline</title>
      <p>To evaluate the contributions of our coupon recommender system, it
is necessary to establish a baseline measure that correlates with the
current recommender system performance. However, the current
coupon recommendation process cannot be evaluated o ine since
recommendations are done by manually selecting popular retailers
and o ers. The resulting list of coupons is then exposed to all the
users by either daily email or website front-page visits. All users see
the same list of coupons regardless of their shopping trip history or
preferences. A direct comparison with such a system would require
exposing both recommendation techniques through A/B testing
with real customers.
2017
…....... Jun 07 Jun 08 Jun 09 Jun 10 Jun 11 Jun 12 Jun 13 Jun 14 Jun 15 Jun 16 ….......</p>
      <p>Mimic
experts</p>
      <p>Shopping
trips
history
Top N
most
clicked
coupons</p>
      <p>Recommend to all members</p>
      <p>As an alternative approach, we propose an o ine baseline
evaluation based on popular coupons, where the popularity is derived
from shopping trips click-through rates. In addition to popularity
criteria, there could be other business reasons for o er selection
in the current setting, but for baseline and simulation purposes we
have limited the selection to the popularity criteria only.</p>
      <p>The assumption is that user-de ned popularity correlates with
the current recommender system and it is likely to represent an
upper-bound estimation of the performance. At a high-level, we
give two baselines which select the most popular coupons based
only on the past click data (PBB) or only based on the future click
data (God-View PBB), which represents a strong upper bound.</p>
      <p>As illustrated in Figure 2, we rst divide users’ shopping trips into
training and testing partitions with a 90% and 10% ratio, respectively.
For each shopping trip in the shopping trip test set, we performed
the following steps:</p>
      <p>We extracted the shopping trip click date;
From the shopping trip train set, we extracted all the
shopping trips prior the click date with a valid coupon (i.e.,
unexpired coupons at the time of the click date);
We sorted the obtained list of coupons by click-through
rate and selected the top N popular coupons;
If the coupon in the test shopping trip is in the top N
most popular coupons (with N = 10), then there would
be a match and the prediction is correct; otherwise the
prediction is wrong.
The general idea is that the customized popularity list simulates
the expert selection by identifying the best o er for each of the
considered day in the test set. We also de ne a stronger baseline
where we have access to future click-through information after the
shopping trip under evaluation. By predicting the future, we can
build a baseline which is the upper-bound of the popularity-based
baselines when recommending the same list of o ers to the whole
user population. We call this baseline God-View PBB (GVPBB).
2.5</p>
    </sec>
    <sec id="sec-8">
      <title>Run-time</title>
      <p>Figure 3 shows the simpli ed architecture of the run-time system.
A daily list of thousands of candidate coupons is submitted for
selection to the recommender system. For each member subscribing
to the reward-based e-commerce company, each pair of member
/ o er is evaluated by the recommender system and scored for
a nity. The resulting score list is used to rank the predictions
which are then truncated to the top 10 candidates and used either
as personalized email campaigns or personalized front page o ers.</p>
      <p>MerchantA
MerchantB
MerchantC
MerchantD
MerchantE
MerchantF
0.1
0.8
0.5
0.2
0.9
0.3
Machine
Learning</p>
      <p>Model</p>
      <sec id="sec-8-1">
        <title>Predictions</title>
      </sec>
      <sec id="sec-8-2">
        <title>Ranking</title>
        <p>MerchantC
MerchantF
MerchantA</p>
      </sec>
      <sec id="sec-8-3">
        <title>All the coupons</title>
      </sec>
      <sec id="sec-8-4">
        <title>Affinity 0.7</title>
        <p>scores
We used data collected from customers directly clicking on the
published o ers, e-shopping at the merchants’ website and
consequently adding o ers, or auto-applying o ers through browser’s
button extensions.</p>
        <p>The data includes historical information about customers’
clickthroughs (i.e., shopping trips) to visit a liated retailer websites. The
data was sampled across a full year of tra c and anonymized for
privacy requirements. Table 1 summarizes the distributions of the
various data sets including active members generating shopping
trips, shopping trip counts, clicked coupons, and retailers that issued
the o er.</p>
        <p>Coupons are characterized by a textual descriptions (e.g., “25%
o Tees Shorts Swim for Women Men Boys and Girls Extra 30% o Sale
Styles and Extra 50% o Final Sale Styles”.) and a date range de
ning their validity (e.g., from 12/18/2016 to 12/27/2016). Optionally,
coupons are classi ed by three categories: 1) percentage discount; 2)
money discount; and / or 3) free shipping o er for speci c brands or
products. Additionally, they can directly refer to a speci c product
or retailer. Retailers’ information includes a business name (e.g.,
Stila Cosmetics) and an optional category (e.g., Health &amp; Beauty)</p>
        <p>The details about the negative data sets in Table 1 are explained
in the previous Section 2.1.</p>
        <p>
          The number of shopping trips per customers follow the typical
power law probability distribution [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] where the heavy-tailed data
still holds a substantial amount of probability information. Figure 4
shows the probability density function (PDF) p (X ) in blue, the
complementary cumulative density function (CCDF) p (X x ) in red,
and the corresponding power law distribution t with parameters
xmin = 1 and α = 3:27.
        </p>
        <p>100
10-1
10-2
)x10-3
≥
(pX10-4
,)
(pX10-5
10-6
10-7
10-8
101</p>
        <p>PDF
Fit
CCDF
Fit</p>
        <p>102
Shopping Trips Frequency
103</p>
        <p>Based on the shopping trips frequency (see Table 2), we split the
data in three segments: 1) The Head comprising the most active
members with more than 8 shopping trips; 2) the Torso including
the members with a click rate between 2 and 7; and 3) the Tail for
the less active members with 1 or 2 clicks.</p>
        <p>This data partition will be used in the Section 4.2 to evaluate the
user-based cold-start problem.
4</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>EXPERIMENTS</title>
      <p>To build and evaluate random forests, XGBoost, and the baseline
models, we partitioned the data by members using 90% of the
population for training and 10% for testing and derived the associated
shopping trips sets. Partitioning by members rather than by
shopping trips would increase the chances to capture the model
generalization capabilities, since the shopping trips history of unseen
test users will not be biasing the training data. For the negative
samples, as explained in Section 2.1, we select 9 negative shopping
trips for each positive sample.</p>
      <p>To capture the users / coupons a nity, we experimented with a
number of features involving speci c users’ and coupons’ attributes,
as well as shopping history features. The following list summarize
them:</p>
      <p>User-based features
– Devices used
– OS systems
– Email domain
Coupon-based features
– Text description
– Originating retailer
– Retailer’s category
– Discount info (free shipping, dollar o , percent o )
– Cashback info
Shopping trip history-based features
– Times the user has visited this coupon before
– Recent visited retailers
– Recent discount information
– Recent cash back information
– Recent coupon keywords (from descriptions)
– Recent retailer’s categories
– Recent luxury purchases</p>
      <p>
        In the data pre-processing stage, we encoded text as bag of word
frequencies and members’ shopping behavior histories as bag of
frequencies, and categorical features as one-hot vector
representation. All the other features are either binary or numeric values.
Since both RF and XGBoost models are insensitive to monotonic
feature transformations, we avoided numeric standardization and
normalization. We carefully handled the normalization of missing
information and replaced missing text with the label unk and both
categorical and numerical features with 0s. After pre-processing,
the resulting number of features is 45,329. For the XGBoost model,
we ran a partial grid-search optimization to set the most important
hyper-parameters, the maximum tree depth and the number of
estimators, to 15 and 1,000, respectively. We used the scikit-learn
[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] implementation of Random Forest Tree and XGBoost.
4.1
      </p>
    </sec>
    <sec id="sec-10">
      <title>Results</title>
      <p>Table 3 shows the experimental results for the machine learning
models and the baselines. We evaluated performance by truncating
the ranked prediction list at 10 items. RFC model could predict
click-through for 96.80% of the tested coupons (accuracy), while the
XGBoost model followed with 95.85% correct click predictions, but
with a substantial higher recall of 0.83. This pushed the XGBoost
predicted click-through rate to 8.30%. Both PBB and GVPBB
baselines su er of low recall since the popularity lists, once selected the
shopping trip day, are the same for all the users.</p>
      <p>CTR</p>
      <p>
        Overall, both RFC and XGBoost outperform the baselines metrics.
In addition to precision, recall, and F-measure, we also report the
Normalized Discounted Cumulative Gain (NDCG) metric [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] for
a list of 10 o ers generated by combining 1 positive sample with
9 randomly selected negative samples. NDCG is a ranking quality
measure that keeps into consideration the position of the clicked
item. Generally, positive samples should be ranked higher than the
negative ones.
      </p>
      <p>
        As further validation, we evaluated the quality of the con dence
scores estimated by the RFC model. Well-calibrated predicted
probabilities are fundamental when the classi cation model scores are
used for ranking [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>The con dence calibration plot in Figure 5 shows that the RFC
model has a minor bias around a predicted value of 0.2 and 0.6, but
overall the curve follows the ideal behavior by closely mimicking
the diagonal.</p>
    </sec>
    <sec id="sec-11">
      <title>4.2 User-based cold-start evaluation</title>
      <p>
        When new members join a reward-based system or new o ers are
issued by retailers, recommender systems are a ect by the
sparseness of the new data that compromise prediction performance [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
To mitigate this problem, a di erent selection strategy is used for
the tail of the distribution (i.e., hybrid methods) [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. However,
we rely on the user’s and coupon’s based features to capture
information that may generalize well with minimum shopping trip
history.
      </p>
      <p>We evaluated our RFC model with the three segments of the
shopping trip distribution (see also Section 3). Table 4 shows the
results of such a test.</p>
      <p>P</p>
      <p>R</p>
      <p>F</p>
      <p>NDCG@10</p>
      <p>On average, the RCF model is quite robust across the three
distribution segments. When analyzing the main feature predictors, the
number of times this user clicked on the coupons before and being a
referred member are the major contributions to the predictions, but
further investigation is necessary to better understand the feature
interaction.</p>
    </sec>
    <sec id="sec-12">
      <title>5 RELATED WORK</title>
      <p>
        A large body of work on recommender systems focuses on explicit
feedback where users express their preference with ratings [
        <xref ref-type="bibr" rid="ref10 ref22">10,
22</xref>
        ]. More recently, there is an increasing number of contributions
that rely on the more abundant and noisy implicit feedback data
[
        <xref ref-type="bibr" rid="ref11 ref13 ref3">3, 11, 13</xref>
        ]. However, most of this work frames the problem as a
collaborative ltering (CF) model. CF is notoriously sensitive to
the cold-start problem and not exible to handle extremely volatile
items such as online o ers and coupons. Additionally, it is hard to
incorporate features that capture users’ and items’ properties.
      </p>
      <p>
        The hybrid recommender system described in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is the closest
to our work. To address the cold-start problem, they use implicit
feedback to recommend online coupons by integrating both user’s
information and items features. In this work, the traditional
collaborative ltering matrix, including users and items, is enriched
with additional user and item features and factorized with SVD.
Nevertheless, the hybrid system requires to be retrained every time
there is a new coupon or user. In our case, there are thousands of
new coupons every day and re-training that often is impractical.
      </p>
      <p>
        Our work is in line with content-based ltering (CBF) approaches
such as [
        <xref ref-type="bibr" rid="ref2 ref8">2, 8</xref>
        ]. CBF methods are able to abstract items and represent
user / item interactions by encoding shopping trip history. There
is no need to re-train when adding new o ers, although including
new retailers requires updating the model to capture the new user
/ item a nity relations expressed in the new shopping trips. In
contrast to SVD and CF-based models, the content-based approach
can leverage di erent classi cation techniques and can be combined
with CF.
      </p>
    </sec>
    <sec id="sec-13">
      <title>6 CONCLUSIONS</title>
      <p>In this work, we developed a coupon recommender system with
content-based ltering. We experimented with two
popularitybased baselines and compared them to both a random forest and
gradient boosted tree classi cation models. To address the one-class
classi cation problem, our framework automatically extracted
negative samples from customers’ shopping trips. Negative samples
simulate the real selection scenario where one coupon, over the
visualized ten coupons, is actually selected by members. In this
experimental setup, both random forests and XGBoost classi ers
perform substantially better than the two baselines except for the
NDCG10 metric.</p>
      <p>In the future, to improve ranking results, we will extend the
classi cation models with pair-based and list-based cost functions.
We will also validate our models under A/B testing evaluations with
real customers.</p>
    </sec>
    <sec id="sec-14">
      <title>ACKNOWLEDGMENTS</title>
      <p>The authors would like to thank Dr. David Inouye for providing
insightful suggestions and comments.</p>
      <p>The authors would also like to thank the anonymous reviewers
for their helpful advice and Yiu-Chang Lin for presenting our work
at the workshop.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Je</given-names>
            <surname>Alstott</surname>
          </string-name>
          , Ed Bullmore, and
          <string-name>
            <given-names>Dietmar</given-names>
            <surname>Plenz</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>powerlaw: A Python Package for Analysis of Heavy-Tailed Distributions</article-title>
          .
          <source>PLOS ONE 9</source>
          ,
          <issue>1</issue>
          (
          <issue>01</issue>
          <year>2014</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Ana</given-names>
            <surname>Belén</surname>
          </string-name>
          Barragáns-Martínez, Enrique Costa-Montenegro, Juan C. Burguillo, Marta Rey-López,
          <article-title>Fernando A</article-title>
          .
          <string-name>
            <surname>Mikic-Fonte</surname>
            , and
            <given-names>Ana</given-names>
          </string-name>
          <string-name>
            <surname>Peleteiro</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>A Hybrid Content-based and Item-based Collaborative Filtering Approach to Recommend TV Programs Enhanced with Singular Value Decomposition</article-title>
          .
          <source>Inf. Sci</source>
          .
          <volume>180</volume>
          ,
          <issue>22</issue>
          (Nov.
          <year>2010</year>
          ),
          <fpage>4290</fpage>
          -
          <lpage>4311</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Immanuel</given-names>
            <surname>Bayer</surname>
          </string-name>
          , Xiangnan He,
          <string-name>
            <surname>Bhargav Kanagal</surname>
          </string-name>
          , and Ste en Rendle.
          <year>2017</year>
          .
          <article-title>A Generic Coordinate Descent Framework for Learning from Implicit Feedback</article-title>
          .
          <source>In Proceedings of the 26th International Conference on World Wide Web (WWW '17)</source>
          .
          <source>International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland</source>
          ,
          <fpage>1341</fpage>
          -
          <lpage>1350</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bobadilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ortega</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hernando</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>GutiéRrez</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Recommender Systems Survey</article-title>
          .
          <source>Know.-Based Syst. 46 (July</source>
          <year>2013</year>
          ),
          <fpage>109</fpage>
          -
          <lpage>132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Leo</given-names>
            <surname>Breiman</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <string-name>
            <given-names>Random</given-names>
            <surname>Forests</surname>
          </string-name>
          .
          <source>Mach. Learn</source>
          .
          <volume>45</volume>
          ,
          <issue>1</issue>
          (Oct.
          <year>2001</year>
          ),
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Walter</given-names>
            <surname>Carrer-Neto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>María</given-names>
            <surname>Luisa</surname>
          </string-name>
          Hernández-Alcaraz, Rafael Valencia-García, and Francisco García-Sánchez.
          <year>2012</year>
          .
          <article-title>Social Knowledge-based Recommender System</article-title>
          .
          <article-title>Application to the Movies Domain</article-title>
          .
          <source>Expert Syst. Appl</source>
          .
          <volume>39</volume>
          ,
          <issue>12</issue>
          (Sept.
          <year>2012</year>
          ),
          <fpage>10990</fpage>
          -
          <lpage>11000</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Tianqi</given-names>
            <surname>Chen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Carlos</given-names>
            <surname>Guestrin</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>XGBoost: A Scalable Tree Boosting System</article-title>
          .
          <source>In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          , San Francisco, CA, USA,
          <year>August</year>
          13-
          <issue>17</issue>
          ,
          <year>2016</year>
          .
          <fpage>785</fpage>
          -
          <lpage>794</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Keunho</given-names>
            <surname>Choi</surname>
          </string-name>
          , Donghee Yoo, Gunwoo Kim, and
          <string-name>
            <given-names>Yongmoo</given-names>
            <surname>Suh</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>A Hybrid Online-product Recommendation System: Combining Implicit Rating-based Collaborative Filtering and Sequential Pattern Analysis</article-title>
          .
          <source>Electron. Commer. Rec. Appl</source>
          .
          <volume>11</volume>
          ,
          <issue>4</issue>
          (
          <year>July 2012</year>
          ),
          <fpage>309</fpage>
          -
          <lpage>317</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Jerome</surname>
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Friedman</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Greedy Function Approximation: A Gradient Boosting Machine</article-title>
          .
          <source>Annals of Statistics</source>
          <volume>29</volume>
          (
          <year>2000</year>
          ),
          <fpage>1189</fpage>
          -
          <lpage>1232</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Carlos</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Gomez-Uribe</surname>
            and
            <given-names>Neil</given-names>
          </string-name>
          <string-name>
            <surname>Hunt</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>The Net ix Recommender System: Algorithms, Business Value, and Innovation</article-title>
          .
          <source>ACM Trans. Manage. Inf. Syst. 6</source>
          ,
          <issue>4</issue>
          ,
          <string-name>
            <surname>Article 13</surname>
          </string-name>
          (
          <issue>Dec</issue>
          .
          <year>2015</year>
          ),
          <volume>19</volume>
          pages.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Grivolla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Campo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sonsona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Pulido</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Badia</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>A Hybrid Recommender Combining User, Item and Interaction Data</article-title>
          .
          <source>In 2014 International Conference on Computational Science and Computational Intelligence</source>
          , Vol.
          <volume>1</volume>
          .
          <fpage>297</fpage>
          -
          <lpage>301</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Trevor</surname>
            <given-names>Hastie</given-names>
          </string-name>
          , Robert Tibshirani, and
          <string-name>
            <given-names>Jerome</given-names>
            <surname>Friedman</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>The elements of statistical learning: data mining, inference and prediction (2 ed</article-title>
          .). Springer.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Xiangnan</surname>
            <given-names>He</given-names>
          </string-name>
          , Hanwang Zhang,
          <string-name>
            <surname>Min-Yen Kan</surname>
          </string-name>
          , and
          <string-name>
            <surname>Tat-Seng Chua</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Fast Matrix Factorization for Online Recommendation with Implicit Feedback</article-title>
          .
          <source>In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '16)</source>
          . ACM, New York, NY, USA,
          <fpage>549</fpage>
          -
          <lpage>558</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Kalervo</given-names>
            <surname>Järvelin</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jaana</given-names>
            <surname>Kekäläinen</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Cumulated Gain-based Evaluation of IR Techniques</article-title>
          .
          <source>ACM Trans. Inf. Syst</source>
          .
          <volume>20</volume>
          ,
          <issue>4</issue>
          (Oct.
          <year>2002</year>
          ),
          <fpage>422</fpage>
          -
          <lpage>446</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Thorsten</surname>
            <given-names>Joachims</given-names>
          </string-name>
          , Adith Swaminathan, and
          <string-name>
            <given-names>Tobias</given-names>
            <surname>Schnabel</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Unbiased Learning-to-Rank with Biased Feedback</article-title>
          .
          <source>In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM '17)</source>
          . ACM, New York, NY, USA,
          <fpage>781</fpage>
          -
          <lpage>789</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Seok</given-names>
            <surname>Kee</surname>
          </string-name>
          <string-name>
            <surname>Lee</surname>
          </string-name>
          , Yoon Ho Cho, and Soung Hie Kim.
          <year>2010</year>
          .
          <article-title>Collaborative Filtering with Ordinal Scale-based Implicit Ratings for Mobile Music Recommendations</article-title>
          .
          <source>Inf. Sci</source>
          .
          <volume>180</volume>
          ,
          <issue>11</issue>
          (
          <year>June 2010</year>
          ),
          <fpage>2142</fpage>
          -
          <lpage>2155</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Pasquale</surname>
            <given-names>Lops</given-names>
          </string-name>
          , Marco de Gemmis, and
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Semeraro</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Content-based Recommender Systems: State of the Art</article-title>
          and Trends. Springer US, Boston, MA,
          <fpage>73</fpage>
          -
          <lpage>105</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Alexandru</surname>
            Niculescu-Mizil and
            <given-names>Rich</given-names>
          </string-name>
          <string-name>
            <surname>Caruana</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Predicting Good Probabilities with Supervised Learning</article-title>
          .
          <source>In Proceedings of the 22Nd International Conference on Machine Learning (ICML '05)</source>
          . ACM, New York, NY, USA,
          <fpage>625</fpage>
          -
          <lpage>632</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Fabian</surname>
            <given-names>Pedregosa</given-names>
          </string-name>
          , Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          , Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau,
          <string-name>
            <given-names>Matthieu</given-names>
            <surname>Brucher</surname>
          </string-name>
          , Matthieu Perrot, and
          <string-name>
            <given-names>Édouard</given-names>
            <surname>Duchesnay</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Scikit-learn: Machine Learning in Python</article-title>
          .
          <source>J. Mach. Learn. Res</source>
          .
          <volume>12</volume>
          (
          <issue>Nov</issue>
          .
          <year>2011</year>
          ),
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Filip</given-names>
            <surname>Radlinski</surname>
          </string-name>
          and
          <string-name>
            <given-names>Thorsten</given-names>
            <surname>Joachims</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Query chains: learning to rank from implicit feedback</article-title>
          .
          <source>In KDD '05:</source>
          <article-title>Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining</article-title>
          . ACM Press, New York, NY, USA,
          <fpage>239</fpage>
          -
          <lpage>248</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Oxana</given-names>
            <surname>Ye</surname>
          </string-name>
          . Rodionova, Paolo Oliveri,
          <string-name>
            <given-names>and Alexey L.</given-names>
            <surname>Pomerantsev</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Rigorous and compliant approaches to one-class classi cation</article-title>
          .
          <source>Chemometrics and Intelligent Laboratory Systems</source>
          <volume>159</volume>
          ,
          <string-name>
            <surname>Complete</surname>
          </string-name>
          (
          <year>2016</year>
          ),
          <fpage>89</fpage>
          -
          <lpage>96</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Ruslan</surname>
            <given-names>Salakhutdinov</given-names>
          </string-name>
          , Andriy Mnih, and Geo rey Hinton.
          <year>2007</year>
          .
          <article-title>Restricted Boltzmann Machines for Collaborative Filtering</article-title>
          .
          <source>In Proceedings of the 24th International Conference on Machine Learning (ICML '07)</source>
          . ACM, New York, NY, USA,
          <fpage>791</fpage>
          -
          <lpage>798</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Chao</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Yiqun Liu, and Shaoping Ma.
          <year>2016</year>
          .
          <article-title>Building a click model: From idea to practice</article-title>
          .
          <source>CAAI Transactions on Intelligence Technology 1</source>
          ,
          <issue>4</issue>
          (
          <year>2016</year>
          ),
          <fpage>313</fpage>
          -
          <lpage>322</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>