Recommender Systems for Product Bundling
           Moran Beladev                                        Bracha Shapira                                       Lior Rokach
Ben-Gurion University of the Negev                  Ben-Gurion University of the Negev              Ben-Gurion University of the Negev
      belachde@bgu.ac.il                                  bshapira@bgu.ac.il                                 liorrk@bgu.ac.il


ABSTRACT                                                                   bundle recommendation problem, in which its solution is a set of
Recommender systems (RSs) enhance e-commerce sales by                      items that maximizes some total expected reward. However, the
recommending relevant products to their customers. RSs aim at              price aspect was not considered in the model. Our paper maximizes
implementing the firm's web-based marketing strategy to increase           the expected revenue by considering the item-to-item cross
revenues. Generating bundles is an example of a marketing strategy         dependencies, user-item collaborative filtering techniques and the
that aims to satisfy consumer needs and preferences, and at the            demand-price function—resulting in recommendation of the best
same time, to increase customers' buying scope and the firm's              bundle and price proposal to the user.
income. Thus, finding and recommending an optimal and personal
bundle becomes very important. In this paper we introduce a novel
                                                                           2. BUNDLE RECOMMENDATION MODEL
model of bundle recommendations that integrates collaborative              We maximize the following retailer expected revenue function:
filtering (CF) techniques, personalized demand functions, and price          (1) 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝑅𝑒𝑣𝑒𝑛𝑢𝑒 = 𝑃𝑖 (𝐴, 𝐵, 𝑇) ∙ (𝑇 − 𝑐𝑜𝑠𝑡𝐴 − 𝑐𝑜𝑠𝑡𝐵 )
modeling. This model provides a recommendation list by finding             where 𝑃𝑖 (𝐴, 𝐵, 𝑇) is the probability that user i will purchase the
pairs of products that maximizes both, the probability of their            bundle, which is composed of products A and 𝐵, at price T. The
purchase by the user and the revenue received by selling this              𝑐𝑜𝑠𝑡A is the retailer’s cost for product A and 𝑐𝑜𝑠𝑡𝐵 is the retailer’s
bundles.                                                                   cost for product B. The proposed bundle and the price T for user i
Categories and Subject Descriptors                                         is set to maximize the expected revenue:
H.3.3 [Information Storage and Retrieval]: Information Search and             (2) (𝐴, 𝐵, 𝑇) = 𝑎𝑟𝑔𝑚𝑎𝑥∀𝐴,𝐵,𝑇 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝑅𝑒𝑣𝑒𝑛𝑢𝑒(𝐴, 𝐵, 𝑇)
Retrieval—Information Filtering.                                           In order to find 𝑃𝑖 (𝐴, 𝐵, 𝑇) we find the corresponding prices 𝐶𝐴 of
Keywords                                                                   product 𝐴 and 𝐶𝐵 of product 𝐵 aggregated to the bundle price 𝑇:
Bundle Recommendation, Recommender Systems, E-Commerce,                             (3) 𝑃𝑖 (𝐴, 𝐵, 𝑇) = 𝑚𝑎𝑥∀𝑐𝐴,𝑐𝐵 |𝑐𝐴+𝑐𝐵 =𝑇 𝑃𝑖 (𝐴 ∩ 𝐵 ∩ 𝐶𝐴 ∩ 𝐶𝐵 )
Collaborative Filtering, SVD.                                              Thus, we have to find the prices 𝐶𝐴 and 𝐶𝐵 that maximize the
                                                                           probability of the user i to buy products 𝐴 and 𝐵 while paying those
1. INTRODUCTION                                                            prices. According to Bayes' law:
Bundling refers to the practice of selling two or more items together
                                                                           (4) 𝑃𝑖 (𝑙𝑖𝑘𝑒 𝐴 ∩ 𝑤𝑖𝑙𝑙𝑖𝑛𝑔 𝑡𝑜 𝑝𝑎𝑦 𝐶𝐴 𝑓𝑜𝑟 𝐴) = 𝑃𝑖 (𝑙𝑖𝑘𝑒 𝐴) ∙
as a package at a price that is below the sum of the independent
prices. Optimal bundling would combine items into bundles that                          𝑃𝑖 (𝑤𝑖𝑙𝑙𝑖𝑛𝑔 𝑡𝑜 𝑝𝑎𝑦 𝐶𝐴 𝑓𝑜𝑟 𝐴|𝑙𝑖𝑘𝑒 𝐴)
best fit the retailer’s needs and the user's preferences, and maximize     According to the Jaccard measure: (5) 𝐽𝑎𝑐𝑐𝑎𝑟𝑑 = 𝐽𝐴,𝐵 =
                                                                                                                                                  𝑃(𝐴∩𝐵)
product compliance within the bundle. Thus, a single price                                                                                        𝑃(𝐴∪𝐵)

𝐏𝐀+𝐁 <𝐏𝐀 + 𝐏𝐁 is set for the two products (A, B) if purchased              Using combinatorial mathematics, the inclusion–exclusion
jointly. One challenge is to suggest a price for a bundle that fits both   principle: (6) 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∪ 𝐵)
the customer reservation price i.e., the maximal price buyers are                                                              𝑃(𝐴)+𝑃(𝐵)
accepted to pay, and the retailer’s revenue [1]. Very few studies          Using equations (5) + (6): (7) 𝑃(𝐴 ∩ 𝐵) =                   1
                                                                                                                                 1+
have combined bundling strategy with recommender systems
                                                                                                                                      𝐽𝐴,𝐵


(RSs). The field of frequent item set mining and association rules         Using Bayes’ law and equation (7):
deals with finding a basket of items that are frequently bought                                                 𝑃𝑖 (𝐴) ∙ 𝑃𝑖 (𝐶𝐴 |𝐴) + 𝑃𝑖 (𝐵) ∙ 𝑃𝑖 (𝐶𝐵 |𝐵)
together [2]. However, these techniques are not personalized, thus           (8) 𝑃𝑖 ((𝐴 ∩ 𝐶𝐴 ) ∩ (𝐵 ∩ 𝐶𝐵 )) =
                                                                                                                                      1
not applicable for RSs. The recommendation of bundles were                                                                       1+
                                                                                                                                     𝐽𝐴,𝐵
presented as a tailored solution for the tourism domain using case-
                                                                           We assume that the Jaccard measure, 𝐽𝐴,𝐵 , which denotes the
based reasoning where case models representing the travel plan
                                                                           products' compatibility, is not affected by the price. The 𝑃𝑖 (𝐴),
bundle were matched against the user profile and preferences [3].
The authors of [4] presented a bundle optimization using a genetic         𝑃𝑖 (𝐵) probabilities are found using the CF technique;
algorithm to maximize the compatibility of the products within a           𝑃𝑖 (𝐶𝐴 |𝐴), 𝑃𝑖 (𝐶𝐵 |𝐵) is found by the upcoming personal demand.
bundle. However, these studies did not measure the                         2.1 Personalized Demand Graph
recommendation aspect, i.e., if it is at all feasible and beneficial to    We would like to assume that each customer has its own demand
predict bundle purchasing. The study presented in [5] introduces a         graph for each product based on his/her preferences. Thus, we
                                                                           developed heuristics for estimating the “personalized” demand
                                                                           graph for user i and item j using very sparse data. Figure 1
                                                                           demonstrates the demand of a generic customer versus an
                                                                           enthusiastic one (i.e., one that would pay high prices) as well as
                                                                           an indifferent one. We assume that the difference between the
                                                                           demand graphs can be reflected by the following:
Copyright is held by the author(s). RecSys               2015   Poster     (9) 𝑃𝑖 (𝐶𝐴 |𝐴) = 𝑚𝑖𝑛(𝑃∙,𝐴 (𝐶𝐴 ) × 𝛼𝑖,𝐴 , 100%)
Proceedings, September 16-20, 2015, Austria, Vienna.
where 𝑃∙,𝐴 (𝐶A ) is the generic demand graph for item 𝐴 given that            dataset 2, for the personal demand graph, we received an RMSE
the price 𝐶A and 𝛼𝑖,𝐴 is the personalized bias factor for user i and          error of alpha of 1.067 and an RMSE of 0.34, compared to the
item 𝐴. In order to find the personal bias factor, 𝛼𝑖,𝐴 , we scan each        median. The results for the product evaluation are presented in table
customer's previous purchases or his/her highest bid on an item. We           1 and 3 and the results for the price evaluation are presented in table
compare his/her price to the median of the generic graph. For                 2 and 4. Bundle (1) and Bundle (2) represent the two strategies of
example if customer i purchased item 𝐴 for price 𝐶A * then his/her            maximizing probability and the expected revenue, respectively.
                                                                 0.5
bias     factor      is   estimated      as:    (10) 𝛼𝑖,𝐴 =
                                                                  (𝐶 ∗ )
                                                                         .    Table 1. Product bundling results for dataset 1
                                                                   𝑃∙,𝐴   A
For example (Figure 1), assume that a customer purchased the item                               Precision      Recall         Q           Price
for 𝐶A *=1300; according to the generic graph, this price would be                 CF          0.027      0.012     0.133        12.133
considered only by 35% of the interested population. Thus the                     SVD          0.013      0.033     0.067        70.533
                                                         0.5
personalized bias for this user is calculated as: 𝛼𝑖,𝐴 =     = 1.42.           Bundle (1)      0.088       0.09      0.8        469.133
                                                            0.35
                                                                               Bundle (2)      0.071       0.08      0.6        457.467
                                                                              Table 2. Price bundling recommendation results for dataset 1
                                                                                                    Recommended price             Mean price
                                                                                  Bundle(1)                  0.043                   788.89
                                                                                  Bundle(2)                 29.989                   788.89
                                                                              Table 3. Product bundling results for dataset 2
                                                                                                Precision      Recall         Q           Price
                                                                                   CF          0.052      0.003      0.26         1.852
                                                                                  SVD           0.58      0.024      2.9         29.981
                                                                               Bundle (1)      0.728      0.018      4.44        38.069
Figure 1. Personalized demand for various customer types                       Bundle (2)        0.2      0.004      1.02        12.882
We can create a bias matrix for all purchases of items by users. The          Table 4. Price bundling recommendation results for dataset 2
bias factor of products that have not been purchased by the                                         Recommended price              Mean price
customer can be predicted using the SVD method. Given the
                                                                                  Bundle(1)                  170.8                     87.1
complete matrix, we can infer the personalized demand graph of
each user i and item 𝐴 from the generic demand graph calculated                   Bundle(2)                 117.44                   104.057
for item 𝐴 and multiply it by the predicted alpha.
                                                                              4. CONCLUSIONS AND FUTURE WORK
3. EVALUATION                                                                 Our results demonstrate that bundles are predictable and may
Our model was evaluated based on two datasets. (Dataset 1)                    increase users' purchase scope. The first dataset is more difficult to
consists of transactions from a shopping website that sells                   predict, but the bundle model is at least comparable to state of the
electronics and furniture. (Dataset 2) is a supermarket dataset from          art algorithms and is even superior in some cases. The personal
Kaggle         (https://www.kaggle.com/c/acquire-valued-shoppers-             demand graph tends to be very accurate as was observed by the
challenge). We used offline evaluation and compared our model to              price recommendation accuracy. The second dataset contains
SVD and CF as baseline models. We evaluated: (i) The personal                 commodities data, thus a personal demand graph is more difficult
demand function by using a validation set in order to test the                to predict. The recommended price was not as accurate as in the
predicted alphas compared to the actual alphas, using the RMSE                first dataset. Moreover, for dataset 2 the products are more
measure, and comparing the personal demand graph probability to               predictable and the first bundle strategy yields the best results. For
0.5 (median probability) of all purchased products in the test set-           both datasets maximizing the probability of the user's purchase is
using the RMSE measure too; (ii) The product bundling                         more effective than maximizing the expected revenue. Future work
recommendation by comparing the top 5 bundles to the top 5 items              will aim at improving the personal demand graph of dataset 2,
recommended by CF and SVD algorithms. For this we used                        examining more datasets and providing live user experiments.
precision, recall, the average quantity that was recommended and
purchased, and the average price paid for the recommended and                 5. REFERENCES
purchased products; (iii) The price bundling recommendation by                [1] Guiltinan, J. 1987. The price bundling of services: a normative
comparing the recommended price to the actual price the user paid                 framework. The Journal of Marketing, 74-85.
in the test set, measuring the sum of the absolute difference. The
recommended price was compared to the mean price of the product.              [2] Agrawal, R., Imielinski, T., and Swami, A.N. 1993. Mining
We also compared two strategies: (1) maximizing the bundle                        association rules between sets of items in large databases.
buying probability of the user, and (2) maximizing the expected                   ACM SIGMOD, volume 22,2 of SIGMOD Record, 207–216.
revenue. For both datasets we evaluated our model on the top 1,000            [3] Ricci, F. 2002. ITR: a case-based travel advisory system.
customers and top 300 products. The first dataset resulted in 3,425               Advances in Case-Based Reasoning. Springer Berlin
transactions and the second in 836,846 transactions. A                            Heidelberg, 613-627.
recommended bundle is considered a hit in the test set if the two
                                                                              [4] Birtolo, C. 2013. Searching optimal product bundles by means
products have been purchased by the user within a week. In dataset
                                                                                  of GA-based Engine and Market Basket Analysis. IFSA
1 for the personal demand graph we received an RMSE of alpha of
0.072 and an RMSE error compared to the median of 0.261. Thus,                [5] Zhu, T. 2014. Bundle recommendation in ecommerce. SIGIR
the personal graphs are compatible to the users’ preferences. In                  2014, 657-666.