Learning Embeddings for Product Size Recommendations Kallirroi Dogani∗ Matteo Tomassetti∗ Sofie De Cnudde ASOS.com ASOS.com ASOS.com London, UK London, UK London, UK kallirroi.dogani@asos.com matteo.tomassetti@asos.com sofiede.cnudde@asos.com Saúl Vargas Ben Chamberlain ASOS.com ASOS.com London, UK London, UK saul.vargassandoval@asos.com ben.chamberlain@asos.com ABSTRACT with an even higher average return rate of 30-40 % for fashion Despite significant recent growth in online fashion retail, choosing products. It is desirable to minimise returns as the process incurs product sizes remains a major problem for customers. We tackle high operational and environmental costs. the problem of size recommendation in fashion e-commerce with The size problem can not be solved by simply mapping between the goal of improving customer experience and reducing financial different sizing schemes such as mapping a EUR shoe size 45 to a and environmental costs from returned items. We propose a novel UK size 11. There are two reasons for this: (1) inconsistent sizes, for size recommendation system that learns a latent space for product example a men’s US size 8 shoe is 10 inches for a Nike trainer [3] sizes using only past purchases and brand information. Key to the while an Adidas trainer measures 10.2 inches [1], (2) simple sizes success of our model is the application of transfer learning from a mask the complexity of the underlying products. For instance, a brand to a product level. We develop a neural collaborative filtering t-shirt will be sold as small, medium or large, but the size is at model that is applicable to every product, without requiring specific least seven dimensional∗ and there is no standardisation of these customer or product measurements or explicit customer feedback dimensions, even for a given brand. on the purchased sizes, which are not available for most customers Personalised size recommendations provide a general solution or products. Offline experiments using data from a major retailer to the size and fit problem. However, the development of a size show improvements of between 4-40 % over the matrix factorisation recommendation system is accompanied by a number of challenges, baseline. which we address in our model. Firstly, physical measurements of customers and products are generally not available. Secondly, data KEYWORDS indicating that a return was due to incorrect sizing is often missing Recommender Systems, Representation Learning, Transfer Learn- or unreliable, as it is optionally collected from customers without ing, E-Commerce verification. Thirdly, the presence of an additional size variable makes the data sparser than would be expected in the equivalent ACM Reference Format: product recommendations problem. Finally, the existence of differ- Kallirroi Dogani, Matteo Tomassetti, Sofie De Cnudde, Saúl Vargas, and Ben ent sizing schemes (e.g. EU, UK, US etc.) introduces heterogeneous Chamberlain. 2019. Learning Embeddings for Product Size Recommenda- tions. In Proceedings of the SIGIR 2019 Workshop on eCommerce (SIGIR data, which must be compared in some way. 2019 eCom), 9 pages. We propose the Product Size Embedding (PSE) model, which is a neural collaborative filtering approach that learns a latent 1 INTRODUCTION representation for all the possible size variations of products and customers’ sizing preferences using solely purchase data. By doing Providing customers with accurate size guidance is one of the main so we handle problems with missing physical measurements or challenges in the online fashion industry. Since customers can not returns reasons. We map all sizes into a common continuous latent try garments before purchasing them, e-commerce platforms often space, which neatly overcomes heterogeneity in sizing schemes and adopt free return policies to motivate customers to purchase items addresses the inconsistency in sizes that would be hard to address regardless of concerns about size. This effectively turns homes into with a discrete combinatorial representation† . To deal with sparsity, fitting rooms and encourages customers to order multiple sizes of we first solve the problem at a brand level by accepting the loose the same product and return the items that do not fit. According assumption that sizing within the same brand is consistent. Then, to a recent estimate [2], 15-40 % of online purchases are returned, we transfer this knowledge onto a product level, where sizes of ∗Both authors contributed equally to this research. products within the same brand now have separate representations. Our main contributions are: Copyright © 2019 by the paper’s authors. Copying permitted for private and academic purposes. • A novel size recommendation system that maps sizes into a In: J. Degenhardt, S. Kallumadi, U. Porwal, A. Trotman (eds.): single latent space without requiring customer or product Proceedings of the SIGIR 2019 eCom workshop, July 2019, Paris, France, published at http://ceur-ws.org ∗ neck circumference, arm circumference, arm length, height, chest circumference, waist circumference, shoulder width † such as mapping products to a discrete platonic size scale SIGIR 2019 eCom, July 2019, Paris, France K. Dogani, M. Tomassetti, S. De Cnudde et. al physical measurements or explicit customers’ feedback on complex statistical models. [4] filters out users where the mean and returned items (e.g. too big/small). Our model leads to an im- standard deviation of the purchased sizes exceeds a category-level provement of between 4-40 % when compared to the matrix threshold. [18] uses a hierarchical clustering method where clusters factorisation baseline. are iteratively merged as long as the standard deviation of the • We show that transferring knowledge learned from a higher cluster does not exceed an empirically determined threshold. Each level (brands) leads to improved and generalised solutions persona is then treated as a separate customer in the subsequent at a lower level (products). prediction problem. An improvement to the latter work is made • We introduce a method to filter out multiple personas from in [19], where a persona distribution is drawn from a Dirichlet our dataset. Our solution is independent of fixed thresholds distribution. Latent variables related to the specific persona are or empirically-tuned hyperparameters. then appended to each purchase transaction. Finally, [8] follows a Gaussian kernel density estimation approach which is further The rest of the paper is structured as follows: Section 2 presents refined to a Gaussian mixture model. Two assumptions are made previous related work, Section 3 introduces our proposed model here: (i) the maximum number of personas is fixed at four, and (ii) and Section 4 describes how we handle accounts used by multiple the case where only one persona is active is deemed more likely. personas. Finally, in Section 5 we discuss our experiments and the Each identified persona is subsequently retained in the dataset. performance of our model. A similar problem is tackled in literature focused on identifying active household members in online rental services [5]. Contextual 2 RELATED WORK variables such as day of week or time of day are used to identify The size recommendation problem has been previously studied in which member is responsible for which actions and which member [4, 8, 13, 18–20]. Specifically, [18] models the size prediction task is active at a certain point in time. as an ordinal regression problem, where the customer and prod- uct true sizes are learned by taking their differences and feeding them into a linear model. [19] extends the work of [18] with a Bayesian logit and probit regression model with ordinal categories. 3 THE PRODUCT SIZE EMBEDDING MODEL The posterior distribution over customer and product true sizes The Product Size Embedding (PSE) model follows a neural collabo- is based on mean-field variational inference with Polya-Gammma rative filtering approach to learn embeddings for each product-size augmentation. The Bayesian approach allows the use of priors for combination. The main advantage of the PSE over related latent handling data sparsity and the computation of confidence intervals variable models (e.g. [13]) is that it does not rely on noisy and sparse for dealing with noisy data. Both [18] and [19] generate ordinal customer feedback on the returned items (i.e. customers optionally categorical variables based on explicit customer feedback on re- reporting that the item was too big / small). Instead, only implicit turned items (e.g. too small, too big or no return). [8] proposes a signals are used; the products that are purchased and the subset Bayesian model that learns the joint probability of a customer pur- that are returned. chasing a given product size and the resulting return status being Collaborative filtering [9, 17] uses customer-product interactions either too small, too big or no return. The probability distribution and is based on the assumption that customers buying similar prod- over sizes is conditioned on the return status and the probability ucts have similar tastes. This principle naturally translates into the over return statuses is modeled as the empirical distribution over size and fit domain as "customers with similar body shapes tend the three possible return events along with a Dirichlet prior based to buy clothes in similar sizes". Matrix factorisation approaches, on the counts at the brand and category level. [13] learns a latent such as the one proposed by Hu et al. [9], have been proposed to space for customers and products by applying ordinal regression. capture the latent taste/preference/style space as reflected by the A fitness score is computed for each purchase and size ordering is interactions between customers and products. Matrix factorisation enforced based on customer’s feedback on the purchased size (i.e. decomposes customer-product interaction matrices into low-rank too small, too big or a good fit). In order to handle class imbalances, user and item matrices that represent, respectively, customers and metric learning techniques are applied to transform data into a products as vectors in a latent space that captures preferences and space where purchases of the same class are closer and purchases styles. Our proposed PSE model similarly represents customers and of different classes are separated by a margin. product sizes in a vector space. However, there are two important There are two additional studies [4, 20] that tackle the size and differences between our approach and most matrix factorisation fit problem. [4] learns latent product features using Word2Vec [12] approaches. Firstly, we learn a latent space at a product size level and feeds them into a Gradient Boosting classifier along with ad- instead of at a product level i.e. we have a different vector for every ditional product features (e.g physical measurements, colour, etc.). possible size of a product. Secondly, we adopt an asymmetric frame- However, additional product features are often difficult to obtain work [15] so that users are not represented explicitly, but as the [6]. Finally, [20] extends [4] to the specific case of footwear size rec- aggregate of the product vectors with which they have interacted. ommendations and also proposes a probabilistic graphical approach Accordingly, we train different models for each product category that exploits brand similarities. (tops, bottoms or shoes), so all trained embeddings belong to the In literature covering the size recommendation problem, multiple same category and the learned latent space represents the same approaches have been employed to reduce noise by identifying body part. The asymmetric approach eliminates learning an em- multiple personas. The approaches vary from using empirically bedding layer for customers, which greatly reduces the number of determined thresholds on the range of purchased sizes to more parameters. For example, the symmetric approach for menswear Learning Embeddings for Product Size Recommendations SIGIR 2019 eCom, July 2019, Paris, France Figure 1: The architecture of the Product Size Embedding model, which is trained independently for each product category (tops, bottoms or shoes) by maximising the dot product between the user vector Vu and the product size vector Vps of a pur- chased size ps . The softmax is computed for each product over all of its possible sizes (i.e. the purchased size ps and the non-purchased sizes p¬s ). shoes requires ∼ 780K product size and ∼ 3M customer parame- user and product vectors from all contiguous subsequences of length ters, therefore the asymmetric model is approximately five times k where the first (k-1) elements form a customer vector and the smaller. Another advantage of the asymmetric approach is that the k th is the target product-size. The similarity τ between customers model does not require retraining for new customers since their and product-sizes is given by the dot product between the user and representations can be inferred from their purchase history. The product vectors architecture for the PSE model is shown in Figure 1. τu,ps = VTu Vps , (2) We model size recommendation as a multi-class classification task. Given a user u and a product p, the task is to predict the and product size probabilities are computed as the softmax of the customers’ size in that product, ps∗ . This differs from standard multi- similarity scores normalised over all sizes of the given product class classification as each product is only available in a small subset e τu,pi of all possible size classes (t-shirts don’t come in shoe sizes etc.). f (τ )u,pi = P(s = i |u, p) = Í τu,p , (3) je j The input to the model is a set of user purchase histories, Hu . For every customer we create a sequence of previously purchased where the index j runs over all possible sizes of product p. To (and not returned) product sizes {ps1 , ps2 , ...psn }. For a sequence evaluate this softmax we require the product-size vectors for ps ∀s, which are stored in a key-value stored keyed on the product id. on length n, the nt h product-size is the target and the previous n − 1 The PSE is trained in Keras using the Adam optimiser [10] with products are used to construct a customer vector. Each product-size parameters α = 0.001, b1 = 0.9, b2 = 0.999 and the categorical in the history indexes into an embedding matrix using a neural cross-entropy loss network embedding layer to produce a product-size vector Vps ∈ ( Rk . User vectors Vu ∈ Rk are constructed by taking the first n − ÕÕ 1 if j = s L=− t j log(f (τ )u,p j ) , t j = (4) 1 product-sizes in the Hu , retrieving the associated product-size 0 otherwise D j vectors and taking the mean where D is the extended set of purchase histories and s is the 1 Õ purchased size. Vu = Vps , (1) n−1 ps ∈Hu\n 3.1 Transfer from Brands to Products where Hu\n is the history minus the target size-product. In practice, As we model product-size combinations instead of just products, to increase the amount of training data, for each Hu we will create our product-size interaction matrix is roughly ten times sparser (e.g. SIGIR 2019 eCom, July 2019, Paris, France K. Dogani, M. Tomassetti, S. De Cnudde et. al Figure 2: The size embeddings learned at a brand level are used to initialise the size embeddings at a product level. from ∼ 3×10−4 to ∼ 4×10−5 for menswear shoes) than the data used with at least two purchases and with a size difference‡ larger than for product recommendations. As a result, learning representations one, are potential candidates for the multiple persona detection pro- for all possible product-size combinations is challenging. Transfer cess. The output of the GMM consists of a mixture of components, learning is a popular technique to generalise from small datasets to each representing a different persona in the purchase history. Each larger ones [14]. We assume that each brand has consistent sizes component (or persona) is represented by a Gaussian distribution, and we learn latent representations Vbs for every combination of whose mean µ corresponds to the persona’s core size. brand b = {p} and size s. Then, we transfer this knowledge to a Since the number of personas λ using an account is unknown, we product level by initialising employ the silhouette score s λ [16] to find the optimal number of mixture components λopt (see Algorithm 1). The silhouette score is Vps = Vbs , ∀ps ∈ bs . (5) a cluster evaluation metric that measures how well each purchased As shown in Figure 2, we train the model at a brand size level, size is clustered with similar purchased sizes. An s λ ≈ 1 implies then we initialise the product size vectors Vps with the trained non-overlapping clusters with high density, while s λ = 0 points to brand size vectors Vbs and finally we train the model at a product overlapping clusters. size level to fine tune the product size vectors. Applying the pre- trained brand size vectors at a product level improves generalisation, Algorithm 1 Algorithm for multiple persona detection boosts performance and leads to faster convergence. In Section 5.3, Input: purchase history Hu we demonstrate the improvements transfer learning offers over Output: λopt persona random initialisation of latent vectors. λ←2 s λ−1 ← 0 4 DETECTING MULTIPLE PERSONAS s λ = getSilhouetteScore(GMM(Hu , λ)) A major challenge in the design of recommender systems is identi- λ while s λ > s λ−1 and min |µ i − µ j | > 1 do fying accounts that are shared across multiple users. Some services, i, j=1;i,j such as Netflix [7], solve this problem by creating explicit user λ =λ+1 profiles for each persona. In our work, user profiles are not viable s λ ← getSilhouetteScore(GMM(Hu , λ)) and so we detect multiple personas as a preprocessing step. end while To detect multiple personas we employ a Gaussian Mixture λopt = λ − 1 Model (GMM) [11] that predicts the number of individuals us- ing an account and identifies each persona’s purchases. Our pro- The process of identifying multiple personas consists of running posed method is independent of assumption-based thresholds or the GMM to detect λ personas within Hu and calculating the sil- empirically-tuned hyperparameters. When we detect an account houette score s λ associated with that mixture. The parameter λ with multiple personas, we subsequently remove it from both train- ‡ We have ordered each sizing scheme from the smallest to the largest size found in our ing and test sets. dataset and defined a set of sizing indexes. For examples, the sizing index for the sizing Our GMM approach is based on the assumption that the pur- scheme CAT ranges from 0 (3XS) to 25 (8XL). When referring to the size difference chases of every persona are centred around a core size. Customers between two sizes, we mean their difference when mapped to the sizing index. Learning Embeddings for Product Size Recommendations SIGIR 2019 eCom, July 2019, Paris, France Table 1: Example of the output of the multiple persona de- Table 2: Size range for all sizing schemes. tection process for womenswear shoes. Sizing Scheme Size Range Purchase history Hu Detection UK UK2, UK4, ..., UK34 UK3, UK3, UK3.5, UK4, UK4 1 persona EU EU30, EU32, ..., EU50 {UK2, UK2}, {UK5, UK5, UK6} 2 personas {UK2, UK3, UK3, UK3, UK4, UK4}, {UK6, UK6}, {UK9} 3 personas CAT 3XS, ..., 8XL UK2, UK3, UK4, UK5, UK6, UK6, UK7, UK8, UK9 reseller JNS W22in L26in, ..., W44in L34in WST W22in, ..., W44in CST Chest 32in, ..., Chest 56in The evaluation of the detected multiple personas is similar to evaluating clusters in unsupervised clustering techniques. During the detection process, we calculate the silhouette score, and thus have a built-in evaluation metric that guides the clustering. Figure 3 demonstrates that as the size difference of the purchases increases, the probability of detecting a multiple persona account steadily increases, but it then flattens out and decreases for very large size differences, which indicate a higher probability of detecting a re- seller. 5 EXPERIMENTS AND RESULTS In this section, we first describe the experimental setup, then detail the baselines for comparison and finally present our results. Our experiments are based on data from a major online retailer collected over one year. We have grouped all products into three categories (Tops, Bottoms and Shoes), two genders (menswear (MW) and Figure 3: Percentage of multiple persona accounts (red line), womenswear (WW)), and six sizing schemes (see Table 2). reseller accounts (blue line) and no multiple persona ac- The size recommendation problem is solved independently for counts (green line) in function of the size difference of the each product category-gender combination e.g. menswear-tops. purchases for menswear bottoms. Table 3 shows example product types that comprise each product category as well as the supported sizing schemes and high-level is iteratively increased as long as (i) s λ is higher than s λ−1 , and statistics. Products originate from a large and diverse network of (ii) the core size of each mixture component differs by at least 1 international suppliers, with thousands of new items added weekly size unit. When the iterative process is finished, λopt is set to λ and so in general, physical measurements of products are not avail- and if λopt > 1 that customer is identified as buying for multiple able. personas. While dealing with the multiple persona problem, two additional 5.1 Experimental Setup issues arise: i) the problem of resellers, and ii) the issue of purchases Since we solve the size prediction problem separately for each in multiple sizing schemes. Resellers are customers who purchase product category, the purchase history Hu has been computed products with the intention of reselling them, so it is likely that using all previous purchases of customer u from the same product their purchases cover a wider range of sizes. In that case, a Gaussian category (i.e. we do not use past purchases of shoes to predict sizes mixture model is not suitable for detecting them, as their purchases for tops). We exclude any returned products from the purchase are not centred around a core size, but instead have a uniform distri- history as there is no data specifying whether items are returned bution. Therefore, prior to performing multiple persona detection, due to poor fit or for other reasons. we eliminate all customers with a uniform purchase history. Table 4 shows examples of the same purchase history computed To apply the GMM model, we first need to convert all sizes into at different levels. In this case, applying transfer learning from a single sizing scheme. Since most existing conversion tables are the brand level to the product level means that we initialise the incomplete and inaccurate, we have used the data to approximate product size vector id43498_W34inL32in with the brand size vector size conversions. Specifically, we build a co-purchase matrix per Levis_W34inL32in. product category between two sizing schemes and we convert sizes We divide the dataset for each product category into a training according to the highest co-purchase frequency. Note that this and a test set using an 80:20 split. conversion is only an approximation for data cleaning purposes and is not used in the final size prediction model. 5.2 Comparison Methods Table 1 lists examples of purchase histories that are flagged as We compare the performance of the following personalised meth- either multiple personas or resellers. ods: SIGIR 2019 eCom, July 2019, Paris, France K. Dogani, M. Tomassetti, S. De Cnudde et. al Table 3: Properties and high level statistics of the product categories. WW and MW refer to womenswear and menswear, respectively. Product Category Product Types Sizing Schemes #users #products #brands % MP % Resellers TopsWW crop tops, hoodies, ... UK, CAT, EU 3.4M 105.6K 800 9.5% 0.3% BottomsWW jeans, leggings, ... JNS, CAT 1.3M 24.7K 609 4.9% 0.1% ShoesWW boots, trainers, ... UK, EU 1.2M 17.0K 206 3.0% 0.6% TopsMW shirts, t-shirts, ... CAT, CST 1.3M 66.5K 430 5.3% 0.9% BottomsMW jeans, chinos, ... JNS, CAT, WST 840.6K 21.0K 362 3.6% 0.4% ShoesMW boots, trainers, ... UK 391.5K 12.2K 182 2.3% 1.1% Table 4: The same purchase history generated at different levels. Level Applied Purchase History Hu Brand Level Adidas_L, Levis_W34inL32in Brand & Product Type Level Adidas_Shorts_L, Levis_Jeans_W34inL32in Product Level Adidas_Shorts_id3223_L, Levis_Jeans_id43498_W34inL32in • MCS-SS. This method predicts the user’s most common size (MCS) given the sizing scheme (SS) of product p. For instance, if Hu = (id1432_UK8, id1564_UK8, id1055_UK9, id1453_EU36) is the purchase history of user u, this method predicts UK8 for products available in UK sizes and EU36 for products available in EU sizes. If there is a tie, MCS-SS predicts the most recent purchased size. • ALS. This is a symmetric matrix factorisation model opti- mized through alternating least squares [9]. • LR. This is a multi-class Logistic Regression classifier that takes as input the normalised counts of the purchased sizes and one-hot encoded features for the product type, brand and sizing scheme. • PSE-B. Version of the PSE model where the size embeddings are learned at a brand level. Figure 4: PSE-B accuracy as a function of the latent space • PSE-BPT. Version of the PSE model where the size embed- dimension, k, for each category. The results are independent dings are learned at a brand and product type level. of k when k ≥ 10. • PSE. The size embeddings are learned at a product level. • t-PSE-BPT. The size embeddings are learned at a brand and product type level and the embedding layer is initialised with the accuracy increases when the size embeddings are learned at a the latent space learned from PSE-B. brand and product type level (PSE-BPT) as opposed to the brand • t-PSE. This is our proposed PSE model. The size embeddings level (PSE-B). However, when latent representations are learned are learned at a product level and the embedding layer is at a product size level (PSE), the accuracy drops for some product initialised with the latent space learned from PSE-B. categories. If we consider the case of menswear shoes, the num- We cannot compare our model against other size recommen- ber of latent vectors we need to train increases from 1.4K (PSE-B) dation algorithms recently published as they require extra data to 77.9K (PSE), therefore the latent space becomes sparser which sources that are not always available (i.e. the return reason). Our makes the model prone to overfitting (Figure 5). To overcome this model is more generic and could be applied to any fashion dataset. issue, we have used latent representations learned from PSE-B to All PSE experiments have been run with a fixed latent space initialise the embedding layer in tPSE-BPT and tPSE. The results dimension k = 10. We have explored the dependency of this param- show that transfer learning improves generalisation and leads to eter on our results and found no statistically significant difference more accurate predictions. when adopting a higher k (see Fig. 4). Table 7 shows examples where the tPSE model successfully pre- dicts sizes that are not included in the purchase history, illustrating 5.3 Results the benefits of learning latent size representations. The results of our experiments are summarised in Table 5. All vari- To better understand how tPSE performs in different scenarios, ations of the PSE model outperform the baselines. We observe that we have evaluated the model on purchase histories of different Learning Embeddings for Product Size Recommendations SIGIR 2019 eCom, July 2019, Paris, France Table 5: Accuracy of each tested model for all product categories. The improvement in accuracy for the tPSE model is statisti- cally significant (**α = 0.01). WW and MW used in the product categories refer to womenswear and menswear, respectively. Product Category MCS-SS ALS LR PSE-B PSE-BPT PSE tPSE-BPT tPSE TopsWW 38.917% 60.760% 60.361% 61.175% 61.302% 60.654% 61.294% 62.286%** BottomsWW 30.129% 56.440% 57.456% 58.287% 58.446% 58.574% 58.500% 60.083%** ShoesWW 63.098% 60.672% 68.354% 69.263% 69.276% 69.518% 69.289% 70.498%** TopsMW 64.009% 62.496% 68.689% 69.796% 70.135% 69.542% 70.134% 70.962%** BottomsMW 31.893% 52.789% 59.498% 59.964% 60.255% 57.910% 60.290% 61.992%** ShoesMW 64.467% 49.160% 68.209% 68.319% 68.612% 65.644% 68.691% 69.344%** Table 6: Hitrate@K for tPSE. Product Category Hitrate@2 Hitrate@3 TopsWW 88.711% 96.939% BottomsWW 84.909% 93.835% ShoesWW 87.529% 94.668% TopsMW 92.373% 98.315% BottomsMW 82.485% 90.455% ShoesMW 86.259% 93.793% Table 7: Examples of tPSE successfully predicting a size that has not been purchased before. Figure 5: Training (blue lines) and test (orange lines) accu- Purchase History Hu True Predicted Size racy as a function of the number of epochs for PSE (solid id3455_UK6.5, id5637_UK6, id4112_UK6.5 id9652_UK7 lines) and PSE-B (dashed lines) in menswear shoes. The id6563_UK6, id1463_UK8, id3004_UK6 id8102_EU34 model trained at a product level (PSE) starts overfitting af- ter the third epoch, while the model trained at a brand level (PSE-B) is more stable. Similar trends have been observed for the other product categories. lengths. Figure 6a shows that the accuracy for menswear shoes increases as more items are present in the purchase history. We observe that the accuracy of the model for purchase histories with area around redtape_UK8 contains brands of size UK8. The neigh- six or more items is more than 75%. However, this occurs for less bourhood in the upper-left corner consists of UK7 sizes, while the than 10% of the data (Figure 6b). The same figure shows that more area in the bottom-right corner is constructed mainly with UK9 than 50% of the customers only have one item in their purchase sizes. In the gap between these three big clusters, we observe the history, which is not sufficient to accurately learn the customer’s half sizes UK7.5 and UK8.5, which show the transitions from the true size. We observe similar trends for all other product categories. UK8 cluster to the UK7 and UK8 neighbourhood, respectively. In a To confirm that our model does not deviate significantly from the similar context, Figure 8 shows the latent space of sizes for wom- purchased size, we have also evaluated the Hitrate@K, defined as enswear tops. The size representations are sorted in ascending the fraction of times the correct size is within the top K predictions. order, starting with XS sizes in the upper-right corner and ending To retrieve the top K recommended sizes, we rank the predictions with the cluster of XL sizes in the bottom-right corner. Additionally, based on the similarity scores between the user vector Vu and the we observe that same or similar sizes from different sizing schemes product size vectors Vps . Hitrate@2 ranges between 85-92% for all (e.g. XS and UK6) are mapped into the same neighbourhoods of product categories (Table 6) and can explain cases where customers the latent space. Both figures confirm the assumption that similar may be in between two sizes. For instance, both sizes S and M could purchased sizes correspond to customers with similar body mea- fit well, but the customer has to pick just one when completing a surements. Based on this assumption, we can use customer-product purchase. interactions to learn a latent space for size representations. 5.4 Analysis on the Latent Space 6 CONCLUSION Figures 7 and 8 show instances of the latent representations mapped We introduced the Product Size Embedding (PSE) model, a novel onto a 3D space using the t-SNE technique for dimensionality re- approach to solve the size recommendation problem in fashion duction [21]. Specifically, Figure 7 shows the menswear shoes graph e-commerce. The PSE model requires only customer-product inter- constructed by retrieving the closest vectors to redtape_UK8. The actions and brand information without needing explicit customer SIGIR 2019 eCom, July 2019, Paris, France K. Dogani, M. Tomassetti, S. De Cnudde et. al (a) Accuracy of tPSE as a function of the number of items in (b) Distribution of the length of the purchase history Hu . the purchase history. The dataset is dominated by customers with only one pur- chased item. Figure 6: Relation between accuracy and the number of purchases in the purchase history Hu for menswear shoes. Similar trends have been observed for the other product categories. Figure 7: 3D t-SNE projection of the latent space of Figure 8: 3D t-SNE projection of the latent space of wom- menswear shoes centred around redtape_UK8 . Purple points enswear tops. The size representations are sorted in ascend- are closer to redtape_UK8 and represent UK8 or UK8.5 sizes, ing order, starting with XS sizes in the upper-right corner while orange points are more distant and represent UK7, and ending in XL sizes in the bottom-right corner. Similar UK7.5 or UK9 sizes. sizes of different sizing schemes are clustered together. performance of the model at a product level. Finally, we have pro- posed a technique to identify multiple personas in the purchase feedback on the returned items (i.e the item was too big or too history and applied it to reduce the noise in our data. small). Our offline evaluation on a large-scale e-commerce dataset shows that mapping product sizes into a single latent space leads to more accurate size predictions over a range of different base- lines. In addition, we have demonstrated the advantages of transfer learning and how knowledge learned at a brand level boosts the Learning Embeddings for Product Size Recommendations SIGIR 2019 eCom, July 2019, Paris, France REFERENCES [11] Bruce G. Lindsay. 1995. Mixture Models: Theory, Geometry and Applications. [1] 2019. Adidas Size Chart for Men’s Shoes | adidas UK. https://www.adidas.co.uk/ Institute of Mathematical Statistics. help/size_charts. Accessed: 2019-01-20. [12] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Esti- [2] 2019. Finding a Fix for Retail’s Trillion-Dollar Problem: Returns. mation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781 https://www.cnbc.com/2019/01/10/growing-online-sales-means-more- (2013). returns-and-trash-for-landfills.html. Accessed: 2019-01-20. [13] Rishabh Misra, Mengting Wan, and Julian McAuley. 2018. Decomposing Fit [3] 2019. Nike.com Size Fit Guide - Men’s Shoes. https://www.nike.com/us/en_us/c/ Semantics for Product Size Recommendation in Metric Spaces. In Proceedings of size-fit-guide/mens-shoe-sizing-chart. Accessed: 2019-01-20. the 12th Conference on Recommender Systems (RecSys ’18). ACM, pp. 422–426. [4] G. Mohammed Abdulla and Sumit Borar. 2017. Size Recommendation System for [14] Sinno Jialin Pan and Qiang Yang. 2010. A Survey on Transfer Learning. IEEE Fashion E-Commerce. In KDD Workshop on Machine Learning Meets Fashion. Transactions on Knowledge and Data Engineering 22 (2010), 1345–1359. [5] Pedro G. Campos, Alejandro Bellogin, Fernando Díez, and Iván Cantador. 2012. [15] Arkadiusz Paterek. 2007. Improving Regularised Singular Value Decomposition Time Feature Selection for Identifying Active Household Members. In Proceedings for Collaborative Filtering. In Proceedings of KDD Cup and Workshop. ACM, pp. of the 21st International Conference on Information and Knowledge Management 5–8. (CIKM ’12). ACM, pp. 2311–2314. [16] Peter J. Rousseeuw. 1987. Silhouettes: A Graphical Aid to the Interpretation and [6] Ângelo Cardoso, Fabio Daolio, and Saúl Vargas. 2018. Product Characterisation Validation of Cluster Analysis. Journal of Computational and Applied Mathematics towards Personalisation: Learning Attributes from Unstructured Data to Recom- 20, 1 (1987), pp. 53–65. mend Fashion Products. In Proceedings of the 24th International Conference on [17] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-Based Knowledge Discovery & Data Mining (KDD ’18). ACM, pp. 80–89. Collaborative Filtering Recommendation Algorithms. In Proceedings of the 10th [7] Carlos A. Gomez-Uribe and Neil Hunt. 2016. The Netflix Recommender System: International Conference on World Wide Web (WWW ’01). ACM, pp. 285–295. Algorithms, Business Value, and Innovation. ACM Transactions on Management [18] Vivek Sembium, Rajeev Rastogi, Atul Saroop, and Srujana Merugu. 2017. Recom- Information Systems (TMIS) 6, 4 (2016), pp. 13. mending Product Sizes to Customers. In Proceedings of the 11th Conference on [8] Romain Guigourès, Yuen King Ho, Evgenii Koriagin, Abdul-Saboor Sheikh, Urs Recommender Systems (RecSys ’17). ACM, pp. 243–250. Bergmann, and Reza Shirvany. 2018. A Hierarchical Bayesian Model for Size Rec- [19] Vivek Sembium, Rajeev Rastogi, Lavanya Tekumalla, and Atul Saroop. 2018. ommendation in Fashion. In Proceedings of the 12th Conference on Recommender Bayesian Models for Product Size Recommendations. In Proceedings of the 27th Systems (RecSys ’18). ACM, pp. 392–396. World Wide Web Conference (WWW ’18). ACM, pp. 679–687. [9] Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative Filtering for [20] Shreya Singh, G. Mohammed Abdulla, Sumit Borar, and Sagar Arora. 2018. Implicit Feedback Datasets. In Proceedings of the 8th International Conference on Footwear Size Recommendation System. arXiv preprint arXiv:1806.11423 (2018). Data Mining (ICDM ’08). IEEE, pp. 263–272. [21] L.J.P. van der Maaten and G.E. Hinton. 2008. Visualizing High-Dimensional Data [10] Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Opti- Using t-SNE. (2008). mization. arXiv preprint arXiv:1412.6980 (2014).