=Paper=
{{Paper
|id=Vol-2311/paper_20
|storemode=property
|title=A Content-based Recommender System for E-commerce Offers and Coupons
|pdfUrl=https://ceur-ws.org/Vol-2311/paper_20.pdf
|volume=Vol-2311
|authors=Yandi Xia,Giuseppe Di Fabbrizio,Shikhar Vaibhav,Ankur Datta
|dblpUrl=https://dblp.org/rec/conf/sigir/XiaFVD17
}}
==A Content-based Recommender System for E-commerce Offers and Coupons==
A Content-based Recommender System for E-commerce Offers and Coupons Yandi Xia Giuseppe Di Fabbrizio Rakuten Institute of Technology Rakuten Institute of Technology Boston, Massachusetts - USA 02110 Boston, Massachusetts - USA 02110 yandi.xia@rakuten.com difabbrizio@gmail.com Shikhar Vaibhav Ankur Datta Ebates Rakuten Institute of Technology San Francisco, California - USA 94100 Boston, Massachusetts - USA 02110 svaibhav@ebates.com ankur.datta@rakuten.com ABSTRACT 1 INTRODUCTION Reward-based e-commerce companies expose thousands of online Reward-based e-commerce services are a fast-growing online sector offers and coupons every day. Customers who signed up for online that provides cashback to subscribers by leveraging discounted coupon services either receive a daily digest email with selected prices from a broad network of affiliated companies. offers or select specific offers on the company website front-page. Typically, a consumer would use the company website to search High-quality online discounts are selected and delivered through for specific products, retailers or potentially click on one of the these two means by applying a manual process that involves a team prominent published offers. Shoppers are then redirected to the of experts who are responsible for evaluating recency, product websites of the respective retailers. A visit to the merchant’s website popularity, retailer trends, and other business-related criteria. Such generated by this redirection is defined as shopping trip. a process is costly, time-consuming, and not customized on users’ At the same time, subscribers receive daily digest email with the preferences or shopping history. In this work, we propose a content- most popular discounts. Figure 1 shows a fragment of a daily digest based recommender system that streamlines the coupon selection email featuring a list of current offers and coupons. Both website process and personalizes the recommendation to improve the click- and email contents are manually curated by experts focusing on through rate and, ultimately, the conversion rates. When compared delivering the highest quality offers to customers. to the popularity-based baseline, our content-based recommender system improves F-measures from 0.21 to 0.85 and increases the estimated click-through rate from 1.20% to 7.80%. The experimental Merchant A system is currently scheduled for A/B testing with real customers. CCS CONCEPTS Merchant B • Computing methodologies → Classification and regression trees; • Applied computing → Online shopping; Merchant C KEYWORDS Recommender Systems; Personalization; E-commerce offers and coupons Merchant D ACM Reference format: Yandi Xia, Giuseppe Di Fabbrizio, Shikhar Vaibhav, and Ankur Datta. 2017. A Content-based Recommender System for E-commerce Offers and Coupons. In Proceedings of SIGIR eCom 2017, Tokyo, Japan, August Merchant E 2017, 7 pages. Copyright © 2017 by the paper’s authors. Copying permitted for private and academic Figure 1: Example of email campaign message including of- purposes. In: J. Degenhardt, S. Kallumadi, M. de Rijke, L. Si, A. Trotman, Y. Xu (eds.): fers and coupons. Proceedings of the SIGIR 2017 eCom workshop, August 2017, Tokyo, Japan, published at http://ceur-ws.org This manual process is laborious and does not scale well to millions of online customers who are receiving the same offer rec- ommendations regardless of their previous browsing or purchase history. A recommender system would be able to capture the cus- tomers’ preferences by optimizing the coupon selection based on the previous history and the similarity across users and / or items. SIGIR eCom 2017, August 2017, Tokyo, Japan Y. Xia et al., Yandi Xia, Giuseppe Di Fabbrizio, Shikhar Vaibhav, and Ankur Datta However, compared to traditional recommender systems in other dislikes the offers yui ,c k , but merely indicates the choice preference domains, such as movies [6, 10] and music [16], online offers differ in this round of interactions. The user may go back to the offer list for a number of reasons: 1) Coupons are highly volatile items only and select another item which is different from c j . This makes the valid in a limited time span and removed after their expiration prediction modeling challenging since user’s implicit feedback is date; 2) Customers using coupons are affected by high turnover; 3) noisy and there is a lack of negative feedback. In our case, we only Users’ population is skewed between a minority of heavy users and considered the positive feedback and modeled the shopping trip a majority of occasional or light users; and finally, 4) Online offers predictions as one-class classification (OCC) problem [21], where are rarely rated like usually happens for movie and music domains. we only use the positive samples and artificially derive the negative Consequently, the online discount domain is more prone to the samples from the user’s shopping history. cold-start problem [4], where there is insufficient click history for To handle the classification task, we experimented with two non- a new item or, conversely, there is not enough history for a new parametric learning methods that are resilient to noisy features, subscriber. Matrix factorization methods are less robust to the cold- robust to outliers, and capable of handling missing features (but start problem [4] and lack the capability to capture the number of not necessarily robust to noisy labels): 1) Random forests [5] [12, features necessary to model preferences of high volatile items and Chapter 15]; and 2) Gradient boosted trees [9]. large customer populations. For these reasons, we adopted a content-based filtering (CBF) 2.1 Generating negative samples approach [17] where we exploit coupons and customers’ attributes When relying on the user’s implicit feedback, an accepted conven- to build a stochastic classification model that predicts the posterior tion is that any higher ranked item that was not clicked on is likely probability for a coupon to be clicked by the customer. We rely on less relevant than the clicked ones [20]. In an industrial setting, it users’ implicit feedback which indirectly express users’ preferences is often the case that neither the coupon rank or the coupon list through behavior like clicking on specific offers in a ranked list. This exposed to users are available. In this case, it is necessary to derive is typically a more noisy signal compared to explicit feedback such “negative shopping trips” (i.e., not relevant coupons) by some other as ratings and reviews, but it is also abundant and with consistent means. For the negative training and testing sets, we followed the bias across user’s click-through history [15, 23]. procedure described below: However, click information is more challenging to use since it For each shopping trip, we extracted the triple: user, coupon, and does not directly reflect the users’ satisfaction neither provides click time; clues about items that are irrelevant (negative feedback) [18] which • We retrieved the user’s shopping trip history and select all is fundamental to build an effective discriminative model. In this the coupons clicked by them up to the current click time. paper, we illustrate how to utilize noisy implicit feedback to build We call this data set A; a CFB model for offers and coupons recommendations. The main • Given the considered shopping trip click time, we retrieved contributions are the following: all the non-expired coupons at that time. We call this data (1) We describe a method to generate negative samples from set S; the implicit feedback data; • We removed all the clicked coupons by the user from the (2) We perform extensive experiments to compare different non-expired coupons set and obtained the data set U = learning models; S − A; (3) We demonstrate that our approach is effective with respect • We uniformly sampled 2, 4, 8, and 9 coupons from U, and to the baseline and the upper-bound baseline. build from each of the sample set a list of negative shopping trips that would simulate the missing negative sample in 2 MODELING APPROACHES our data set. We formalize the coupon recommendation problem as a content- Note that we sampled different sizes of negative sets for evaluation based filtering [17] with implicit feedback. Let M and N indicate purposes, but we only reported the results for negative sets of size the number of users and coupons, respectively. For each shopping 9 since it is the closest to the real system conditions where the user trip interaction included in the matrix Y ∈ RM ×N , a user ui is is exposed to a list of 10 offers and selects one of them. exposed to a number of coupons L and selects a specific coupon c j generating the implicit feedback yui ,c j = 1, and yui ,c k = 0 for all 2.2 Random forests the other items in the list of length L, where k = {0, 1, . . . , L − 1} Random forest (RF) models [5][12, Chapter 15] use bagging tech- and k , j. Formally, predicting a user’s shopping trip for an online niques to overcome decision trees’s low-bias and high variance offer requires to estimate the function ŷui ,c j = f (ui , c j |Θ), where problem. A RF model averages the results of several noisy and rela- ŷui ,c j is the predicted score for the online interaction (ui , c j ); Θ tively unbiased decision trees trained on different parts of the same represents the model parameters to learn, and f is the mapping training set. The general method to create a RF model follows the function between the parameters and the predicted scores. The steps below [12, Chapter 15]: learning can be done by applying a machine learning classification For b = 1, . . . , B, where B is the total number of bagged tree: algorithm that optimizes an object function such as logistic loss or (1) Draw N samples from the training set and build a tree cross-entropy loss [12]. Tb by recursively apply the steps below for each terminal However, taking a closer look at the implicit feedback used for node until reaching the minimum node size nmin : modeling, yui ,c k = 0 does not necessarily mean that the user ui A Content-based Recommender System for E-commerce Offers and Coupons SIGIR eCom 2017, August 2017, Tokyo, Japan (a) Randomly select m features from the total p features, imbalance and outliers [12], and F (x) can approximate arbitrarily where m ≤ p; complex decision boundaries. (b) Pick the best node split among the m features; (c) Split the selected node into two children nodes; 2.4 Popularity-based baseline (2) Save the obtained ensemble tree {Tb }1B To evaluate the contributions of our coupon recommender system, it To determine the quality of the node split, either Gini impurity is necessary to establish a baseline measure that correlates with the or cross-entropy measures can be used for the candidate split [12, current recommender system performance. However, the current Chapter 9]. For a classification task, the prediction is the majority coupon recommendation process cannot be evaluated offline since vote among the results from the tree ensemble. In other terms, if recommendations are done by manually selecting popular retailers Ĉb (x ) is the class prediction for the bt h random-forest (rf) tree, and offers. The resulting list of coupons is then exposed to all the then the predicted class is: ĈrBf (x ) = majority vote{Ĉb (x )}1B . users by either daily email or website front-page visits. All users see the same list of coupons regardless of their shopping trip history or 2.3 Gradient boosted trees preferences. A direct comparison with such a system would require Gradient boosted trees (GBTs) [9] optimize a loss functional: L = exposing both recommendation techniques through A/B testing Ey [L(y, F (x)|X)] where F (x) can be a mathematically difficult to with real customers. characterize function, such as a decision tree f (x) over X. The opti- PM 2017 mal value of the function is expressed as F ? (x) = m=0 fm (x, a, w), …....... Jun 07 Jun 08 Jun 09 Jun 10 Jun 11 Jun 12 Jun 13 Jun 14 Jun 15 Jun 16 …....... M are addi- where f 0 (x, a, w) is the initial guess and { fm (x, a, w)}m=1 tive boosts on x defined by the optimization method. The parameter Shopping trips am of fm (x, a, w) denotes split points of predictor variables and history wm denotes the boosting weights on the leaf nodes of the decision Mimic experts trees corresponding to the partitioned training set Xj for region j. Top N most To compute F ? (x), we need to calculate, for each boosting round clicked coupons Recommend to all members m, N X {am , wm } = arg mina,w L(yi , Fm (xi )) (1) Figure 2: Popularity-based baseline (PBB) coupon selection i=1 obtained by considering all the valid coupons prior the shop- with Fm (x) = Fm−1 (x) + fm (x, am , wm ). This expression is in- ping trip date. dicative of a gradient descent step: As an alternative approach, we propose an offline baseline eval- Fm (x) = Fm−1 (x) + ρm (−дm (xi )) (2) uation based on popular coupons, where the popularity is derived from shopping trips click-through rates. In addition to popularity ∂L(y, F (x)) where ρm is the step length and ∂F (x) = criteria, there could be other business reasons for offer selection F (xi )=Fm−1 (xi ) in the current setting, but for baseline and simulation purposes we дm (xi ) being the search direction. To solve am and wm , we make have limited the selection to the popularity criteria only. the basis functions fm (xi ; a, w) correlate most to −дm (xi ), where The assumption is that user-defined popularity correlates with the gradients are defined over the training data distribution. In the current recommender system and it is likely to represent an particular, using Taylor series expansion, we can get closed form upper-bound estimation of the performance. At a high-level, we solutions for am and wm – see [7] for details. It can be shown that give two baselines which select the most popular coupons based N X only on the past click data (PBB) or only based on the future click am = arg mina (−дm (xi ) − ρm fm (xi , a, wm )) 2 (3) data (God-View PBB), which represents a strong upper bound. i=1 As illustrated in Figure 2, we first divide users’ shopping trips into and training and testing partitions with a 90% and 10% ratio, respectively. For each shopping trip in the shopping trip test set, we performed N X the following steps: ρm = arg minρ L(yi , Fm−1 (xi ) + ρ fm (xi ; am , wm )) (4) • We extracted the shopping trip click date; i=1 • From the shopping trip train set, we extracted all the shop- which yields, ping trips prior the click date with a valid coupon (i.e., unexpired coupons at the time of the click date); Fm (x) = Fm−1 (x) + ρm fm (x, am , wm ) (5) • We sorted the obtained list of coupons by click-through rate and selected the top N popular coupons; Each boosting round updates the weights wm, j on the leaves and • If the coupon in the test shopping trip is in the top N helps create a new tree in the next iteration. The optimal selection most popular coupons (with N = 10), then there would of decision tree parameters is based on optimizing the fm (x, a, w) be a match and the prediction is correct; otherwise the using a logistic loss. For GBTs, each decision tree is resistant to prediction is wrong. SIGIR eCom 2017, August 2017, Tokyo, Japan Y. Xia et al., Yandi Xia, Giuseppe Di Fabbrizio, Shikhar Vaibhav, and Ankur Datta The general idea is that the customized popularity list simulates Table 1: Shopping trips data distribution across different the expert selection by identifying the best offer for each of the data sets. considered day in the test set. We also define a stronger baseline where we have access to future click-through information after the Data set Members Shopping trips Coupons Retailers shopping trip under evaluation. By predicting the future, we can Train 448,055 1,037,030 15,276 1,528 build a baseline which is the upper-bound of the popularity-based Test 49,789 114,300 7594 1,192 baselines when recommending the same list of offers to the whole Train Negative 358,444 11,666,592 9,705 1,453 user population. We call this baseline God-View PBB (GVPBB). Test Negative 39,831 1,285,875 8,190 1,413 2.5 Run-time Figure 3 shows the simplified architecture of the run-time system. Table 2: Data counts and stats for head, torso, and tail sets of A daily list of thousands of candidate coupons is submitted for the shopping trips distribution. selection to the recommender system. For each member subscribing to the reward-based e-commerce company, each pair of member Data Members Shopping % Min Max Avg / offer is evaluated by the recommender system and scored for set trips affinity. The resulting score list is used to rank the predictions Head 25,149 317,623 28.7% 8 1,569 15.79 which are then truncated to the top 10 candidates and used either Torso 132,337 441,533 39.8% 3 7 4.17 as personalized email campaigns or personalized front page offers. Tail 333,686 349,262 31.5% 1 2 1.31 0.1 Merchant A The details about the negative data sets in Table 1 are explained 0.8 Merchant B in the previous Section 2.1. 0.5 Merchant C The number of shopping trips per customers follow the typical Merchant C Machine Ranking power law probability distribution [1] where the heavy-tailed data Learning 0.2 Merchant F Merchant D Model still holds a substantial amount of probability information. Figure 4 0.9 shows the probability density function (PDF) p(X ) in blue, the com- Predictions Merchant A Merchant E plementary cumulative density function (CCDF) p(X ≥ x ) in red, 0.3 Merchant F and the corresponding power law distribution fit with parameters Affinity 0.7 xmin = 1 and α = 3.27. All the coupons scores 10 0 Figure 3: Run-time coupons selection and ranking. 10 -1 10 -2 3 DATA 10 -3 p(X), p(X ≥ x) We used data collected from customers directly clicking on the 10 -4 published offers, e-shopping at the merchants’ website and conse- quently adding offers, or auto-applying offers through browser’s 10 -5 button extensions. PDF 10 -6 The data includes historical information about customers’ click- Fit throughs (i.e., shopping trips) to visit affiliated retailer websites. The 10 -7 CCDF data was sampled across a full year of traffic and anonymized for Fit privacy requirements. Table 1 summarizes the distributions of the 10 -8 10 1 10 2 10 3 various data sets including active members generating shopping Shopping Trips Frequency trips, shopping trip counts, clicked coupons, and retailers that issued the offer. Figure 4: Probability density function (p(X ), blue) and com- Coupons are characterized by a textual descriptions (e.g., “25% plementary cumulative density function (p(X ≥ x ), red) with off Tees Shorts Swim for Women Men Boys and Girls Extra 30% off Sale power law distribution fit for xmin = 1 and α = 3.27 Styles and Extra 50% off Final Sale Styles”.) and a date range defin- ing their validity (e.g., from 12/18/2016 to 12/27/2016). Optionally, coupons are classified by three categories: 1) percentage discount; 2) Based on the shopping trips frequency (see Table 2), we split the money discount; and / or 3) free shipping offer for specific brands or data in three segments: 1) The Head comprising the most active products. Additionally, they can directly refer to a specific product members with more than 8 shopping trips; 2) the Torso including or retailer. Retailers’ information includes a business name (e.g., the members with a click rate between 2 and 7; and 3) the Tail for Stila Cosmetics) and an optional category (e.g., Health & Beauty) the less active members with 1 or 2 clicks. A Content-based Recommender System for E-commerce Offers and Coupons SIGIR eCom 2017, August 2017, Tokyo, Japan This data partition will be used in the Section 4.2 to evaluate the with a substantial higher recall of 0.83. This pushed the XGBoost user-based cold-start problem. predicted click-through rate to 8.30%. Both PBB and GVPBB base- lines suffer of low recall since the popularity lists, once selected the 4 EXPERIMENTS shopping trip day, are the same for all the users. To build and evaluate random forests, XGBoost, and the baseline models, we partitioned the data by members using 90% of the pop- Table 3: Precision (P), recall (R), F-measure (F) and Click- ulation for training and 10% for testing and derived the associated through rate (CTR) for random forest classification (RFC), shopping trips sets. Partitioning by members rather than by shop- XGBoost, popularity-based baseline (PBB), and God-View ping trips would increase the chances to capture the model gen- PBB (GVPBB) with 10 maximum coupon recommendations. eralization capabilities, since the shopping trips history of unseen Baseline methods are marked with (*). test users will not be biasing the training data. For the negative samples, as explained in Section 2.1, we select 9 negative shopping Model P R F NDCG@10 CTR trips for each positive sample. To capture the users / coupons affinity, we experimented with a PBB* 0.91 0.12 0.21 0.776 1.20% number of features involving specific users’ and coupons’ attributes, GVPBB* 0.83 0.34 0.48 0.892 3.40% as well as shopping history features. The following list summarize XGBoost 0.77 0.83 0.80 0.857 8.30% them: RFC 0.93 0.78 0.85 0.969 7.80% • User-based features – Devices used Overall, both RFC and XGBoost outperform the baselines metrics. – OS systems In addition to precision, recall, and F-measure, we also report the – Email domain Normalized Discounted Cumulative Gain (NDCG) metric [14] for • Coupon-based features a list of 10 offers generated by combining 1 positive sample with – Text description 9 randomly selected negative samples. NDCG is a ranking quality – Originating retailer measure that keeps into consideration the position of the clicked – Retailer’s category item. Generally, positive samples should be ranked higher than the – Discount info (free shipping, dollar off, percent off) negative ones. – Cashback info As further validation, we evaluated the quality of the confidence • Shopping trip history-based features scores estimated by the RFC model. Well-calibrated predicted prob- – Times the user has visited this coupon before abilities are fundamental when the classification model scores are – Recent visited retailers used for ranking [18]. – Recent discount information – Recent cash back information – Recent coupon keywords (from descriptions) – Recent retailer’s categories – Recent luxury purchases In the data pre-processing stage, we encoded text as bag of word frequencies and members’ shopping behavior histories as bag of frequencies, and categorical features as one-hot vector represen- tation. All the other features are either binary or numeric values. Since both RF and XGBoost models are insensitive to monotonic feature transformations, we avoided numeric standardization and normalization. We carefully handled the normalization of missing information and replaced missing text with the label unk and both categorical and numerical features with 0s. After pre-processing, the resulting number of features is 45,329. For the XGBoost model, we ran a partial grid-search optimization to set the most important hyper-parameters, the maximum tree depth and the number of estimators, to 15 and 1,000, respectively. We used the scikit-learn [19] implementation of Random Forest Tree and XGBoost. 4.1 Results Figure 5: Probability calibration curve for the RFC model. X- axis: mean predicted value; Y-axis: fraction of positive sam- Table 3 shows the experimental results for the machine learning ples. models and the baselines. We evaluated performance by truncating the ranked prediction list at 10 items. RFC model could predict click-through for 96.80% of the tested coupons (accuracy), while the The confidence calibration plot in Figure 5 shows that the RFC XGBoost model followed with 95.85% correct click predictions, but model has a minor bias around a predicted value of 0.2 and 0.6, but SIGIR eCom 2017, August 2017, Tokyo, Japan Y. Xia et al., Yandi Xia, Giuseppe Di Fabbrizio, Shikhar Vaibhav, and Ankur Datta overall the curve follows the ideal behavior by closely mimicking is no need to re-train when adding new offers, although including the diagonal. new retailers requires updating the model to capture the new user / item affinity relations expressed in the new shopping trips. In 4.2 User-based cold-start evaluation contrast to SVD and CF-based models, the content-based approach When new members join a reward-based system or new offers are can leverage different classification techniques and can be combined issued by retailers, recommender systems are affect by the sparse- with CF. ness of the new data that compromise prediction performance [19]. To mitigate this problem, a different selection strategy is used for 6 CONCLUSIONS the tail of the distribution (i.e., hybrid methods) [19]. However, In this work, we developed a coupon recommender system with we rely on the user’s and coupon’s based features to capture in- content-based filtering. We experimented with two popularity- formation that may generalize well with minimum shopping trip based baselines and compared them to both a random forest and history. gradient boosted tree classification models. To address the one-class We evaluated our RFC model with the three segments of the classification problem, our framework automatically extracted neg- shopping trip distribution (see also Section 3). Table 4 shows the ative samples from customers’ shopping trips. Negative samples results of such a test. simulate the real selection scenario where one coupon, over the visualized ten coupons, is actually selected by members. In this Table 4: Precision (P), recall (R), F-measure (F), Click- experimental setup, both random forests and XGBoost classifiers through rate (CTR), and NDCG@10 for the RFC model when perform substantially better than the two baselines except for the testing users in the head, torso, and tail sections of the shop- NDCG10 metric. ping trip distribution. In the future, to improve ranking results, we will extend the classification models with pair-based and list-based cost functions. Data set P R F NDCG@10 CTR We will also validate our models under A/B testing evaluations with real customers. Head 0.95 0.77 0.85 0.980 7.70% Torso 0.94 0.79 0.86 0.985 7.90% ACKNOWLEDGMENTS Tail 0.90 0.78 0.84 0.976 7.80% The authors would like to thank Dr. David Inouye for providing insightful suggestions and comments. On average, the RCF model is quite robust across the three distri- The authors would also like to thank the anonymous reviewers bution segments. When analyzing the main feature predictors, the for their helpful advice and Yiu-Chang Lin for presenting our work number of times this user clicked on the coupons before and being a at the workshop. referred member are the major contributions to the predictions, but further investigation is necessary to better understand the feature REFERENCES interaction. [1] Jeff Alstott, Ed Bullmore, and Dietmar Plenz. 2014. powerlaw: A Python Package for Analysis of Heavy-Tailed Distributions. PLOS ONE 9, 1 (01 2014), 1–11. [2] Ana Belén Barragáns-Martínez, Enrique Costa-Montenegro, Juan C. Burguillo, 5 RELATED WORK Marta Rey-López, Fernando A. Mikic-Fonte, and Ana Peleteiro. 2010. A Hybrid Content-based and Item-based Collaborative Filtering Approach to Recommend A large body of work on recommender systems focuses on explicit TV Programs Enhanced with Singular Value Decomposition. Inf. Sci. 180, 22 feedback where users express their preference with ratings [10, (Nov. 2010), 4290–4311. 22]. More recently, there is an increasing number of contributions [3] Immanuel Bayer, Xiangnan He, Bhargav Kanagal, and Steffen Rendle. 2017. A Generic Coordinate Descent Framework for Learning from Implicit Feedback. In that rely on the more abundant and noisy implicit feedback data Proceedings of the 26th International Conference on World Wide Web (WWW ’17). [3, 11, 13]. However, most of this work frames the problem as a International World Wide Web Conferences Steering Committee, Republic and collaborative filtering (CF) model. CF is notoriously sensitive to Canton of Geneva, Switzerland, 1341–1350. [4] J. Bobadilla, F. Ortega, A. Hernando, and A. GutiéRrez. 2013. Recommender the cold-start problem and not flexible to handle extremely volatile Systems Survey. Know.-Based Syst. 46 (July 2013), 109–132. items such as online offers and coupons. Additionally, it is hard to [5] Leo Breiman. 2001. Random Forests. Mach. Learn. 45, 1 (Oct. 2001), 5–32. [6] Walter Carrer-Neto, María Luisa Hernández-Alcaraz, Rafael Valencia-García, incorporate features that capture users’ and items’ properties. and Francisco García-Sánchez. 2012. Social Knowledge-based Recommender The hybrid recommender system described in [11] is the closest System. Application to the Movies Domain. Expert Syst. Appl. 39, 12 (Sept. 2012), to our work. To address the cold-start problem, they use implicit 10990–11000. [7] Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting feedback to recommend online coupons by integrating both user’s System. In Proceedings of the 22nd ACM SIGKDD International Conference on information and items features. In this work, the traditional col- Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, laborative filtering matrix, including users and items, is enriched 2016. 785–794. [8] Keunho Choi, Donghee Yoo, Gunwoo Kim, and Yongmoo Suh. 2012. A Hybrid with additional user and item features and factorized with SVD. Online-product Recommendation System: Combining Implicit Rating-based Nevertheless, the hybrid system requires to be retrained every time Collaborative Filtering and Sequential Pattern Analysis. Electron. Commer. Rec. Appl. 11, 4 (July 2012), 309–317. there is a new coupon or user. In our case, there are thousands of [9] Jerome H. Friedman. 2000. Greedy Function Approximation: A Gradient Boosting new coupons every day and re-training that often is impractical. Machine. Annals of Statistics 29 (2000), 1189–1232. Our work is in line with content-based filtering (CBF) approaches [10] Carlos A. Gomez-Uribe and Neil Hunt. 2015. The Netflix Recommender System: Algorithms, Business Value, and Innovation. ACM Trans. Manage. Inf. Syst. 6, 4, such as [2, 8]. CBF methods are able to abstract items and represent Article 13 (Dec. 2015), 19 pages. user / item interactions by encoding shopping trip history. There A Content-based Recommender System for E-commerce Offers and Coupons SIGIR eCom 2017, August 2017, Tokyo, Japan [11] J. Grivolla, D. Campo, M. Sonsona, J. M. Pulido, and T. Badia. 2014. A Hybrid Recommender Combining User, Item and Interaction Data. In 2014 International Conference on Computational Science and Computational Intelligence, Vol. 1. 297– 301. [12] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The elements of statistical learning: data mining, inference and prediction (2 ed.). Springer. [13] Xiangnan He, Hanwang Zhang, Min-Yen Kan, and Tat-Seng Chua. 2016. Fast Matrix Factorization for Online Recommendation with Implicit Feedback. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’16). ACM, New York, NY, USA, 549–558. [14] Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated Gain-based Evaluation of IR Techniques. ACM Trans. Inf. Syst. 20, 4 (Oct. 2002), 422–446. [15] Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased Learning-to-Rank with Biased Feedback. In Proceedings of the Tenth ACM Inter- national Conference on Web Search and Data Mining (WSDM ’17). ACM, New York, NY, USA, 781–789. [16] Seok Kee Lee, Yoon Ho Cho, and Soung Hie Kim. 2010. Collaborative Filtering with Ordinal Scale-based Implicit Ratings for Mobile Music Recommendations. Inf. Sci. 180, 11 (June 2010), 2142–2155. [17] Pasquale Lops, Marco de Gemmis, and Giovanni Semeraro. 2011. Content-based Recommender Systems: State of the Art and Trends. Springer US, Boston, MA, 73–105. [18] Alexandru Niculescu-Mizil and Rich Caruana. 2005. Predicting Good Probabilities with Supervised Learning. In Proceedings of the 22Nd International Conference on Machine Learning (ICML ’05). ACM, New York, NY, USA, 625–632. [19] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 12 (Nov. 2011), 2825–2830. [20] Filip Radlinski and Thorsten Joachims. 2005. Query chains: learning to rank from implicit feedback. In KDD ’05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM Press, New York, NY, USA, 239–248. [21] Oxana Ye. Rodionova, Paolo Oliveri, and Alexey L. Pomerantsev. 2016. Rigor- ous and compliant approaches to one-class classification. Chemometrics and Intelligent Laboratory Systems 159, Complete (2016), 89–96. [22] Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey Hinton. 2007. Restricted Boltzmann Machines for Collaborative Filtering. In Proceedings of the 24th Inter- national Conference on Machine Learning (ICML ’07). ACM, New York, NY, USA, 791–798. [23] Chao Wang, Yiqun Liu, and Shaoping Ma. 2016. Building a click model: From idea to practice. CAAI Transactions on Intelligence Technology 1, 4 (2016), 313 – 322.