Carl: A Sports Award Recommender Martin Pichl, Bernward Pichl Eva Zangerle Pichl Medaillen GmbH Department of Computer Science, Universität Innsbruck Schießstand 10 Austria Inzing, Austria eva.zangerle@uibk.ac.at firstname.lastname@pichl.com ABSTRACT new products for each season, this segment is moreover charac- Due to the rise of the web, today’s huge open source community terized by a regularly changing product assortment. Particularly, and the numerous publications of industry as well as academia in every year, about one-third of the complete assortment is replaced the field of computer science, nowadays even small and mid-sized by new products. This is the reason why collaborative-filtering ap- companies can access state-of-the-art machine learning technolo- proaches or model-based approaches leveraging a user-item matrix gies, that can be leveraged for their businesses. In this paper, we as SVD [19], which are already available in shop applications and present Carl, a hybrid recommender system utilizing content-based known to work well in other domains, fail in the presented use case: filtering combined with a context-aware sales model trained via for collaborative filtering-based systems, the estimation of a user- XGBoost to recommend sports awards to customers. The computed or item-similarity by leveraging user-item interactions is difficult recommendations are sent via e-mail to regular customers, who with always changing items as no interactions for new items are have already bought sports awards before. Hence, this systems available. This cold-start problem is referred to as the new item aims to increase customer satisfaction by simplifying the decision problem [26]. To avoid this problem, together with the Databases which sports awards to buy every season. In offline experiments and Information System Group at the University of Innsbruck, the we observe, that XGBoost compared to other state-of-the-art ap- Pichl Medaillen GmbH decided to develop a hybrid approach rec- proaches as Factorization Machines and Neural Networks provides ommendation facilitating content- and contextual information. the best recommendation performance. However, more impor- The developed approach is a hybrid recommender system facili- tantly, in the complementary online evaluation, we monitor that tating both, content-based filtering and predictive modeling. The the interaction- and conversion rates of the e-mails sent via Carl are first component based on content-based filtering covers the per- a magnitude higher compared to our corporate newsletter, relying sonalization aspect, whereas the second component is a global (not on a non-personalized most popular approach. personalized) classification model that is capable to predict whether a certain product (with certain features) is likely to be sold in a ACM Reference format: certain period (i.e., winter or summer seasons) or not. We refer to Martin Pichl, Bernward Pichl and Eva Zangerle. 2018. Carl: A Sports Award Recommender. In Proceedings of The 2018 SIGIR Workshop On eCommerce, this as saleability. Ann Arbor, Michigan, USA, July 2018 (eCom’18), 5 pages. Using an offline evaluation, we show that eXtreme Gradient DOI: 10.475/123 4 Boosting [10] provides the best performance for the prediction task, compared to Factorization Machines [21] and Multilayer Perceptron 1 INTRODUCTION Neural Networks [20]. Moreover, we are able to show that such a hybrid system overcomes the limitation of collaborative filtering The Pichl Medaillen GmbH is a family business founded in 1846 for domains where the product assortment is changing regularly. and specialized in producing custom medals and mints. Since the Along with that, we show that the recommendations computed by 1980s, Pichl also sells sports awards. Today, this segment is re- the proposed system are not only highly precise if evaluated offline sponsible for about 20% of the whole annual turnover. Due to the but also deliver a high conversion rate in the real-life application. highly standardized products in this segment and the fact that cus- Hence, our proposed hybrid recommendation model is capable of tomers demand those products regularly to have different awards recommending new products to customers and thus is applicable for their events, Pichl decided to implement the company’s first for domains with regularly changing product assortments. recommender system in this segment as part of the digital innova- The remainder of this paper is structured as follows: We intro- tion agenda. The system Carl, named after Karl Pichl introducing duce the used machine learning methodologies in Section 3 and the industrial manufacturing in the company, aims at helping the give the reader an overview about the whole recommender system company to (i) design the workflow of selling sports awards more in Section 4, where we also present the underlying recommenda- efficiently and (ii) increase customer satisfaction by suggesting tion model in more detail. Next, we introduce the dataset and the products, as these suggestions ease the choice overflow by decreas- conducted offline evaluations in Section 5. We present the real-life ing the search time for sports awards. Because customers demand application in Section 6 and finally conclude this work Section 7. Permission to make Copyright © 2018 by thedigital paper’s or hard Copying authors. copies of part orforallprivate permitted of this work and for personal academic purposes. or classroom use is granted In: J. Degenhardt, without S.feeKallumadi, G. Di Fabbrizio, providedM. that copiesY.-C. Kumar, are Lin, not A. made or distributed Trotman, H. Zhao 2 BACKGROUND (eds.): for Proceedings profit of the SIGIR or commercial 2018 eCom advantage and workshop, 12bear that copies July, this 2018,notice Ann Arbor, Michigan, and the USA, full citation published at http://ceur-ws.org on the first page. Copyrights for third-party components of this work must be honored. Since the 2000s, recommender system research focused on collabo- For all other uses, contact the owner/author(s). rative filtering approaches and in particular, on matrix-factorization eCom’18, Ann Arbor, Michigan, USA © 2018 Copyright held by the owner/author(s). 123-4567-24-567/08/06. . . $15.00 techniques such as singular value decomposition (SVD) as these DOI: 10.475/123 4 approaches have been shown to achieve the best recommendation eCom’18, July 2018, Ann Arbor, Michigan, USA Martin Pichl, Bernward Pichl and Eva Zangerle accuracies [7, 17, 25] and to be useful for implicit feedback [14, 22]. C1: Personalization C2: Predictive Model However, as outlined in the introduction, collaborative filtering- based approaches fail in our setting due to the new item problem. select purchase histories select current To handle the new item problem, content-based approaches or hy- of relevant customers turnover data brids facilitating content-based information to find similar items to the new item are suitable [26]. In this work, we present a hy- compute top-n near- compute model for new brid approach leveraging content-based information. Generally, est new products product success prediction content-based recommender systems focus on item characteris- tics to find similar items. In particular, these systems recommend items that are similar to the items a user already interacted with in the past. This is why these are also called content-based filtering approaches: they filter items based on previous user-item interac- Recommendations tions. These approaches have their roots in the field of information compute top-1 retrieval [4, 9, 24] and initially focused on recommending items recommendation containing text, for instance, news articles, websites or UseNet messages [1]. In this work, we use content-based filtering to derive an initial set of recommendation candidates. We rank the computed send e-mail to customers candidates using a context-aware classification approach similar to the approaches introduced next. In the late 2000s, research shifted towards hybrid approaches additionally integrating contextual information on top of the pre- Figure 1: Workflow for Computing Recommendations sented content- and collaborative filtering-based approaches: Sev- eral extensions of matrix factorization techniques have been in- predicting whether a certain product is likely to be sold. Hence, troduced, e.g., time-aware SVD++ [18]. As context is a broad con- it additionally ranks the top-n new products by the probability cept, subsuming any circumstances that influencing the perceived whether these will be sold. Finally, the top-1 recommendation is usefulness of an item [2], a variety of additional contextual infor- sent by a personalized e-mail to the customer periodically. In par- mation has been exploited in the field recommender systems, for ticular, we send the e-mails two weeks prior to a potential customer instance, the current time [6, 18], the current emotion and mood order. A potential customer order is computed using last order plus of a user [5, 8, 13, 23] or the user’s location [3, 11, 15, 16]. Due one period. For example, if a customer buys once a year sports to the success of context-aware approaches, we follow up this re- awards, he or she will get the e-mail 351 days after their last order. search and incorporate the current month as a proxy for the current An overview of the system is given in Figure 1. We describe both season as contextual information into our recommender system, components along with their interaction next. allowing us to estimate a product bias for certain seasons. To in- C1 finds the top-n nearest new products based on the products corporate context along with content-based features, we rely on a contained in a user’s purchase history. For the computation of classification approach (cf. Section 3). product similarity, we utilize a generalization of the Gower coeffi- cient [12]. We use this distance, as in contrast to Pearson correlation, 3 METHODOLOGY a measure that is widely used in the field of recommender systems, To select the best methodology for classifying whether a product the Gower coefficient allows us to incorporate factor variables into will be successfully sold or not, we evaluate three state-of-the-art the similarity computation. In Equation 1, we show the computa- classification approaches in an offline evaluation. We evaluate eX- tion of the Gower similarity G between two products i and j. We treme Gradient Boosting (XGBoost) [10], Factorization Machines denote дi, j,k to the contribution provided by the feature k weighted (FM) [21] and Multilayer Perceptron Neural Networks (MLP) [20]. by w i, j,k . The computation of the feature contribution дi, j,k for Using these three approaches, we cover a wide range of method- numeric features as price or height is shown in Equation 2, where ologies. In particular, we cover trees, factorization approaches we denote r k to the range of feature k. For the factorial features leveraging latent features and neural networks. To contextualize in our dataset (color, design, …), дi, j,k is computed as depicted in the performance of the different classification approaches, we con- Equation 3. duct an offline evaluation (Section 5). In this evaluation, we require k w i, j,k дi, j,k Í the classifiers to predict whether a certain product will be successful G i, j = Í (1) or not. For this two-class classification task, we consider products k w i, j,k as successful if they exceed a certain turnover threshold. |x i,k − x j , k | дi, j,k = 1 − (2) rk 4 SYSTEM OVERVIEW ( 1 i f x i,k = x j,k As already outlined in the introduction, our proposed recommender дi, j,k = (3) system is based on two major components (C): C1 is responsible 0 i f x i,k , x j,k for finding the top-n new and similar products to the products that For the final similarity computation of products that is used in are found in the customers’ purchase history. C2 is responsible for the real-life system, we set all weights w i, j,k = 1 and hence let Carl: A Sports Award Recommender eCom’18, July 2018, Ann Arbor, Michigan, USA each feature equally contribute to the product similarity. We leave i.e., marble, wood or plastic. Decorative states whether there is a a weighting scheme for future work. Using the presented computa- decorative element and the type, for instance, a colored orb. tion of the Gower distance, we derive the set of recommendation candidates for each user by computing the top-n similar new items. Feature Type A new item is an item that is (i) newly added to the assortment in height numeric the current year and simultaneously an item that is (ii) not found price numeric in a user’s buying history. We rank this set of recommendation color factor candidates using C2 as described in the remainder of this section. accent color 1 factor C2 is responsible for estimating the saleability of a product (in accent color 2 factor a certain season) and hence computes the probability whether a handle boolean certain product will be sold or not. As we observe that XGBoost de- decorative factor livers the best performance for this prediction task (cf. Section 5.3), emblem holder boolean we use XGBoost to compute the saleability of the recommendations emblem boolean candidates computed by C1. The saleability is computed by apply- design factor ing the pre-trained XGBoost model (trained with the turnover data stand factor of the last three years as described in Section 5) to the recommen- cap boolean dation candidates. For each candidate, we get a saleability value s material factor scaled between 0 and 1. Table 1: Product Features Overview Using both, the product similarity based on the Gower coeffi- cient д and the saleability s, we compute the final ranking of the new items for each user using the average of both values as depicted in Equation 4. 5.2 Experimental Setup Utilizing the previously introduced dataset, we conduct a k-fold r i, j = w 1дi, j + w 2s j (4) cross-validation to determine the most accurate model for the new In Equation 4, we denote r i, j as the ranking coefficient for an product success prediction. For this, we randomly split the dataset item i (contained in a user’s purchase history) and a new item j. into 5 folds of equal size where we use each fold as the test set once Furthermore, we denote дi, j as the Gower coefficient between item and the remaining folds for the training. For this evaluation, we i and j and s j as the predicted saleability of an item j. For the use the packages’ default parameters but vary the number of latent real-life application, we set w 1 = w 2 = 0.5 and hence, consider the features for the FM (k ∈ {1, 5, 10, 25, 50}) and the number nodes per personalization aspect represented by the Gower coefficient and layer nl as well as layers l of the neural network (nl ∈ 1, 2, 5, 10, 20, the saleability aspect equally. l ∈ {1, 2, 3}). To measure the classification performance, we rely on the accuracy measure and the Kappa statistic. While the first mea- 5 EXPERIMENTS sure solely considers the number of correctly classified instances, As already outlined in the previous section, we evaluate the classifi- the Kappa statistic compares an observed accuracy with an ex- cation accuracy of the predictive model using k-fold cross-validation. pected accuracy. The expected accuracy is based on the inter-rater Before describing the experimental setup and discussing the results, agreement. Due to this, the Kappa takes the possibility of correctly we introduce the reader to the used dataset. classifying a product to be successful by chance into account and hence is the more meaningful measure in our experiments. 5.1 Dataset 5.3 Experimental Results For evaluating the different classification methods, we use the turnover data of the previous three years (2015, 2016 and 2017) The results of the conducted offline evaluations are stated in Table 2. as training- and test data. Please note that we are not able to use For the FM and the MLP classifiers, we only state the best result of the turnover data of the current year (2018), as we cannot estimate our evaluations with different k and different n as well as l values the success of the products yet. The dataset contains 538 main prod- respectively. ucts and 1,939 product variations. A main product is available in different sizes, where each size is considered as a product variation. Algorithm Accuracy Kappa Hence, a product variant shares the same features besides the price XGBoost 0.74 0.37 and the height. As stated in Table 1, we characterize each product FM (k = 25) 0.69 0.02 by 13 features.The features price and height are self-explanatory. MLP (n 1 = 10, n 2 = 5, n 3 = 5) 0.60 0.29 Color, accent color 1 and accent color 2 specify the main color Table 2: Prediction Accuracy along with two accent colors of a product,i.e., silver or gold. Han- dle is a boolean feature, considering whether a cup has handles. Analogously, cap, emblem, and emblem holder are boolean features We observe that though the accuracies of XGBoost and the FM defining whether a cup or trophy features an emblem (holder) or only differs by 6.76%, the Kappa value of the FM is only 0.03. Hence, a cap. Please note that an emblem can be mounted on an emblem we cannot see a substantial performance difference to the random holder or on a cap. Stand indicates the material of a cup’s stand, baseline or rather an approach always predicting a product to be eCom’18, July 2018, Ann Arbor, Michigan, USA Martin Pichl, Bernward Pichl and Eva Zangerle Source Website Visits Avg. Session Duration Viewed Pages Conv./E-Mail Conv./Users Carl 20.61% 6.57 8.32 4.61% 22.34% Corporate Newsletter 1.38% 2.37 4.41 0.03% 2.50% Table 3: Recommender Key Metrics not successful. This is, as the FM predicts a success only for 0.72% of well as the precise timing. This will be a subject for further research the products. In contrast, XGBoost classifies 39.62% of the products in the next months. as a success, a more realistic number. Summing up, we see an excellent conversion rate in the e-mails For the MLP-based classifier, using a grid search, we find that sent via Carl. Our results show that for selecting products to a neural network with three layers containing 10, 5 and 5 nodes be promoted in e-mail newsletters, combining the predictor for performs well with an accuracy of 0.60. However, though a good the saleability of products with a traditional content-based recom- classification accuracy, according to the Kappa value, XGBoost mender system allows for a substantial improvement in a diverse works substantially better. In particular, the Kappa value is 27.59% set of quality measures. Hence, in a future work, we will run higher. experiments on fine-tuning the recommender system and aim to To conclude, we see that according to the Kappa statistic, both implement a latent feature approach for the content-based part. XGBoost and MLP classifiers show a fair agreement in contrast to Besides that, we aim to conduct a profound online analysis after a the FM which shows only a slight agreement. In addition, we ob- whole year to capture all seasonal effects. serve that to predict whether a product will be sold or not, XGBoost works best in terms of prediction accuracy and the Kappa statistic. 7 CONCLUSION This is why we use XGBoost’s computed probability that a product In this paper, we present Carl, a recommender system that aims to will be sold for our proposed recommender system. We refer to this improve customer satisfaction by suggesting sports awards to cus- probability as the saleability of the product. In the next section, we tomers of the Pichl Medaillen GmbH, an Austrian SME. In particular, show how the computed saleability is leveraged for sports award the presented system aims to increase the customer satisfaction by predictions. sending the recommendations via e-mail to our regular customers who buy sports awards every season (i.e., yearly) and hence helps to 6 REAL-LIFE APPLICATION find suitable sports awards out of a set of more than 200 awards that The go-live of Carl was on January, the 4th 2018 and until April, change regularly. The recommendations are computed using a hy- the 31st, more than 2,000 e-mails have been sent. As our business brid approach that leverages content-based filtering combined with is highly seasonal with a peak in the beginning and the end of a context-aware sales model. The latter is trained via XGBoost and a year, we consider the current analysis as late-breaking results estimates a general saleability coefficient for each product based on and aim to perform a more detailed online evaluation using the product features and contextual information as the current season sales data of a complete year in a future work. Nevertheless, in approximated by the current month. In an offline evaluation, we the current stage, we observe a very high interaction rate with the show that XGBoost delivers the best performance compared to Fac- personalized recommendations in the e-mails sent via Carl com- torization Machines and multilayer perceptron neural networks. In pared to the corporate newsletter. The latter follows a simple most a complementary online study can show that the conversion rate is popular approach, which suggests the most popular products of substantially higher than the conversion rate of the unpersonalized the current season and the globally most popular sale articles to corporate newsletter, promoting the most popular articles. our customers. In Table 3, we state the relative number of users who have visited the website via a link in the sent e-mail, which REFERENCES is the relative portion of the recipients actually visiting the web- [1] Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender sys- site, the corresponding average session duration in minutes, the tems: A survey of the state-of-the-art and possible extensions. IEEE Transactions average number of viewed pages as well as the relative number of on Knowledge and Data Engineering 17(6), 734–749 (Jun 2005) [2] Adomavicius, G., Tuzhilin, A.: Context-aware recommender systems. In: Ricci, conversions per user and per e-mail. For the latter two measures, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Handbook, we divided the number of conversions by the number of e-mails chap. 7, pp. 217–253. Springer-Verlag, New York, NY, USA, 1st edn. (2010) sent and by the number of users respectively. Please note that for [3] Ankolekar, A., Sandholm, T.: Foxtrot: a soundtrack for where you are. In: Pro- ceedings of Interacting with Sound Workshop: Exploring Context-Aware, Local this analysis, we only tracked orders made in the online shop as and Social Audio Applications (IwS 2011). pp. 26–31. ACM (2011) a conversion, no offline conversions as orders via telephone or an [4] Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison- informal mail. We observe, (cf. Table 3) that the conversion rate of Wesley Longman Publishing Co., Inc., Boston, MA, USA (1999) [5] Baltrunas, L., Kaminskas, M., Ludwig, B., Moling, O., Ricci, F., Aydin, A., Lke, the personalized e-mail is a magnitude higher compared to the stan- K.H., Schwaiger, R.: Incarmusic: Context-aware music recommendations in a dard (unpersonalized) newsletter. The 8.94 times higher conversion car. In: Huemer, C., Setzer, T. (eds.) E-Commerce and Web Technologies, Lecture Notes in Business Information Processing, vol. 85, pp. 89–100. Springer (2011) rate per user is accompanied by a 2.77 times higher session duration [6] Baltrunas, L., Ludwig, B., Ricci, F.: Matrix factorization techniques for con- with 1.89 as many page views per session. Moreover, we observe text aware recommendation. In: Proceedings of the 5th ACM Conference on that the relative number of website visits is already a magnitude Recommender Systems (RecSys 2011). pp. 301–304 (2011) [7] Bell, R.M., Koren, Y.: Lessons from the netflix prize challenge. ACM SIGKDD higher. However, we assume that this is not only rooted in the Explorations Newsletter - Special issue on visual analytics 9(2), 75–79 (Dec 2007), recommendations but also in the personalized the e-mail is sent as http://doi.acm.org/10.1145/1345448.1345465 Carl: A Sports Award Recommender eCom’18, July 2018, Ann Arbor, Michigan, USA [8] Braunhofer, M., Kaminskas, M., Ricci, F.: Recommending music for places of interest in a mobile travel guide. In: Proceedings of the 5th ACM Conference on Recommender Systems (RecSys 2011). pp. 253–256. ACM, New York, NY, USA (2011), http://doi.acm.org/10.1145/2043932.2043977 [9] Celma, O.: Music Recommendation and Discovery: The Long Tail, Long Fail, and Long Play in the Digital Music Space. Springer Publishing Company, Incorpo- rated, 1st edn. (2010) [10] Chen, T., Guestrin, C.: XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 785–794 (2016) [11] Cheng, Z., Shen, J.: Just-for-me: An adaptive personalization system for location- aware social music recommendation. In: Proceedings of the 16th ACM Interna- tional Conference on Multimedia Retrieval (ICMR 2014) (2014) [12] Gower, J.C.: A general coefficient of similarity and some of its properties. Bio- metrics 27(4), 857–871 (1971) [13] Han, B.j., Rho, S., Jun, S., Hwang, E.: Music emotion classification and context- based music recommendation. Multimedia Tools and Applications 47(3), 433–460 (2010) [14] Hu, Y., Koren, Y., Volinsky, C.: Collaborative filtering for implicit feedback datasets. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008). pp. 263–272 (2008) [15] Kaminskas, M., Ricci, F.: Location-adapted music recommendation using tags. In: User Modeling, Adaption and Personalization, pp. 183–194. Springer Berlin Heidelberg (2011) [16] Kaminskas, M., Ricci, F., Schedl, M.: Location-aware music recommendation using auto-tagging and hybrid matching. In: Proceedings of the 7th ACM Con- ference on Recommender Systems (RecSys 2013). pp. 17–24 (2013) [17] Kim, D., Yum, B.J.: Collaborative filtering based on iterative principal component analysis. Expert Systems with Applications 28(4), 823–830 (May 2005) [18] Koren, Y.: Collaborative filtering with temporal dynamics. In: Proceedings of the 15th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD 2009). pp. 447–456. ACM, New York, NY, USA (2009), http: //doi.acm.org/10.1145/1557019.1557072 [19] Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer Journal 42(8) (2009) [20] Popescu, M.C., Balas, V.E., Perescu-Popescu, L., Mastorakis, N.: Multilayer per- ceptron and neural networks. WSEAS Transactions on Circuits and Systems 8(7), 579–588 (2009) [21] Rendle, S.: Factorization machines with libFM. ACM Intelligent Systems and Technology 3(3), 57:1–57:22 (May 2012) [22] Rendle, S., Freudenthaler, C., Gantner, Z., Schmidt-Thieme, L.: Bpr: Bayesian personalized ranking from implicit feedback. In: Proceedings of the 25th Con- ference on Uncertainty in Artificial Intelligence (UAI 2009). pp. 452–461. AUAI Press, Arlington, Virginia, United States (2009), http://dl.acm.org/citation.cfm? id=1795114.1795167 [23] Rho, S., Han, B.j., Hwang, E.: Svr-based music mood classification and context- based music recommendation. In: Proceedings of the 17th ACM International Conference on Multimedia (MM 2009). pp. 713–716 (2009) [24] Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Re- trieval of Information by Computer. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA (1989) [25] Sarwar, B.M., Karypis, G., Konstan, J.A., Riedl, J.T.: Application of dimensionality reduction in recommender systems: A case study. In: Proceedings of the WebKDD Workshop at the 6th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD 2000) (2000) [26] Schein, A.I., Popescul, A., Ungar, L.H., Pennock, D.M.: Methods and metrics for cold-start recommendations. In: Proceedings of the 25th International Confer- ence on Research and Development in Information Retrieval (SIGIR 2002). pp. 253–260 (2002)