=Paper=
{{Paper
|id=Vol-2311/paper_19
|storemode=property
|title=Discovering Similar Products in Fashion E-commerce
|pdfUrl=https://ceur-ws.org/Vol-2311/paper_19.pdf
|volume=Vol-2311
|authors=Amber Madvariya,Sumit Borar
|dblpUrl=https://dblp.org/rec/conf/sigir/MadvariyaB17
}}
==Discovering Similar Products in Fashion E-commerce==
Discovering Similar Products in Fashion E-commerce Amber Madvariya Sumit Borar amber.madvariya@myntra.com sumit.borar@myntra.com ABSTRACT Therefore, discovering similar products in fashion e-commerce be- In recent years, item-item collaborative filtering algorithms have comes an interesting data mining and business problem. In this been studied thoroughly in recommender systems. When applied in paper, we present our work on prescribing similar products using the context of fashion e-commerce these algorithms can be used to item-item collaborative filtering algorithms in a fashion context. generate similar recommendations, personalize search results and Similarity between items is usually computed in terms of content to build a framework for creating clusters of similar products. The based similarity, item-item based collaborative filtering [12] or the efficacy of these algorithms when applied in fashion domain, rely hybrid of two. Content based similarity measures rely on a curated on accurately inferring a user’s fashion taste and matching them to taxonomy to represent a particular content. A widely cited example a product. Our work hinges around discovery of similar products of content based system is that of Pandora radio[11], which uses using two different item-item collaborative filtering algorithms. We the Music Genome Project to tag their songs and artists from a set identify and address some unique challenges while applying these of 450 manually curated attributes. Unlike music, the dynamicity of algorithms in the dynamic fashion e-commerce environment. We trends in fashion, makes its taxonomy ephemeral and new products study and evaluate their performances through precision/recall continuously require new attributes to be added to the taxonomy. measures, live A/B tests and baseline them against a content based Maintaining and curating this taxonomy system becomes an un- approach. We also discuss effects of the transient nature of the scalable task over time. Another limitation with this approach is industry on a user’s fashion taste. that attributes are unable to capture user taste completely. Unlike content based similarity measures, item-item collabora- tive filtering algorithms use user signals instead of a taxonomy to KEYWORDS compute item-item similarity. In the domain of e-commerce users’ Recommender Systems, Item-Item Collaborative Filtering feedback is not explicit [7] and needs to be inferred from user sig- nals. In traditional e-commerce platforms [9], user purchases are 1 INTRODUCTION used to compute item similarity. The challenge in using only user purchase data to compute item similarity in fashion e-commerce Fashion shopping is a complex intersection of product styles with users’ taste where products are usually described in terms of is that it is sparse and erratic. This is due to two primary factors, several attributes such as silhouette, line, hem length, color, the short life cycle of products and the breadth of the catalogue fabric, waist length and so forth. These attributes are dynamic available. Thus similarity measures relying solely on user purchase in nature and vary as trends in fashion change. However, these data tend to perform poorly in this setting. So we consider pur- attributes are not typical representatives of a user’s fashion taste. chase data along with signals like clicking on a product and adding Users rely more on look, feel, popularity and other intrinsic a product to a cart to compute item similarity. factors to identify their taste. The advent and growth of fashion e-commerce, pose even more The challenges with using these implicit feedback signals are challenges towards providing users with relevant products. As the following: industry grows, there are thousands of products available in a catalogue with similar set of attributes. Also users’ needs in fashion • Quantifying the chosen signals by assigning relative weights are not specific as compared to hard goods like electronics, to each. therefore they tend to browse significantly more products on a • Normalising effect of popular products i.e. products which fashion portal before clicking on a particular product. In general, have signals from a large number of users. at Myntra we observe the average click depth i.e. number of products viewed before the first click, to be around 90. Correctly In this work, we look at two representations to model item-item inferring a user’s intent in a session becomes critical for providing collaborative filtering. We compare these two approaches with a a better shopping experience. A user’s click on a prod-uct is a content based approach, where we use the annotated taxonomy of proxy for his/her interest in that product. We can utilize this a product to get a feature representation for it. information to narrow down his/her intent. Thereafter, serving a In the following sections of this work, we describe these three user with similar products to the clicked one, helps them navigate approaches in detail and compare their performance via A/B tests through the vast catalogue and find relevant products. We also use and precision-recall test. Finally, we try to conclude by inferring similar products to personalize search results as majority of queries whether the transient nature of the industry affects user taste or on our platform are broad in nature like ’t-shirts’ and ranking re- not. sults based on user’s interest improves the overall search relevance. 2 METHODOLOGY Copyright © 2017 by the paper’s authors. Copying permitted for private For the remaining sections of this paper, we will use the following and academic purposes. terminology. Let P be the set of all products and S be the set of In: J. Degenhardt, S. Kallumadi, M. de Rijke, L. Si, A. Trotman, Y. Xu (eds.): Proceedings of the SIGIR 2017 eCom workshop, August 2017, Tokyo, Japan, published at http://ceur-ws.org all user sessions. A session contains all activity by a user within In order to to generate this graph, we use a well-known technique a 30 minute window from the time he/she logs into the portal known as association rule mining[1] or co-visitation counts. We annotated by time-stamp. We record user signals like product list consider a user session si , where si ⊂ S, and generate a set of views, product clicks, addition to carts, orders placed and so on. products {p1 , p2 , . . . pn } clicked in that session. Within this set, we For all the three approaches mentioned below, we split our data compute all pair combinations of products (pi ,p j ) which were co- at the article type (e.g Men-Tshirts, Men-Shirts, Women-Dresses) browsed together. We count all the occurrences where pi and p j level, since products similar to a product pi should be from the were co-browsed together, across S and denote the total count of same article type. Splitting the data at the article type level has the co-occurrences of (pi ,p j ) as c i j . We assign edge weights as w i j and added advantage that it reduces the set of products from which we calculate it using the following formulation of normalised point- find similar products for pi . wise mutual information (NPMI) [2]: −1 p(i,j) 2.1 Product Attribute Vectors wi j = ∗ log log p(i,j) p(i)p(j) Each product in our system is annotated with attributes generated where p(i,j) is the probability of occurrence of the pair (pi ,p j ) and from a manually curated taxonomy. These attributes are broadly p(i) and p(j) is the probability of occurrence of pi and p j respectively. classified into two types, general attributes like brand, color, price We can use the following formulations for the values of p(i,j), p(i), bands etc which are applicable for all article-types and article-type p(j): specific attributes (ATSA) like collar type, sleeve length, neck type for T-shirts. Each attribute is represented as a key-value pair e.g. ci j ci cj collar type of a t-shirt is the key and different collar types like round- p(i,j) = p(i) = p(j) = C C C neck, v-neck, polo-neck are values. Each product is represented where, c i j = count of occurrences of pair (i,j), as a vector in the real space, where each dimension represents an attribute. We use binary values to populate a particular dimension X X i.e. if the attribute represented by the dimension is a part of the ci = c ik and C= ci j set of attributes which were annotated to the product, then we k ⊂P i, j ⊂P populate the dimension as 1, else we populate it as 0. Along with Then, to find similar products for a product pi , we locate pi in product attributes, we extract relevant bi-grams using log likelihood the graph, we consider it’s adjacent nodes and among them pick ratio scores[6] from the product descriptions. We utilize a L2 norm the top N neighbors, sorted in descending order by edge weight. to normalize these product vectors.In order to generate N similar Here N is a hyper-parameter which denotes the number of similar products for a product pi , we discover the N closest neighbors of products we want to find for pi . pi , keeping cosine distance as our distance metric. The advantage of using this formulation to generate edge weights, We analyse this approach here to benchmark the performance of is that it normalises the effect of popular products. If we consider item-item collaborative filtering algorithms. The major challenge a pair of products (pi ,p j ) where p j is browsed more across all ses- with this approach is that the set of attributes representing a prod- sions as compared to pi , then the value of c j will be high, resulting uct is not comprehensive and could results in products not being in a high value of p(j). The high value of p(j) results in the PMI annotated by some key information. of (pi ,p j ) turning out to be low, even though the co-occurrence counts of (pi ,p j ) might have been higher as compared to other pairs containing pi . 2.2 Item-Item Weighted Graph One challenge with using this approach is the noise prevalent in In this approach, we use an undirected weighted graph represen- the input data. Some pairs might have a low co-browsing count or tation to model product relationships in the system. In this repre- low PMI score to form a meaningful edge. To tackle this challenge, sentation nodes represent products, edges represent associativity we put a threshold on both the co-browsing counts and the edge between products and edge weights represent degree of associa- weights generated using PMI scores. After experimenting, we found tivity between products. This approach was first introduced by 5 to be a suitable threshold for co-browsing count and 0.15 to YouTube [4], however our formulation to calculate edge weight is be a suitable threshold for PMI score. These thresholds change different than the one showed in that work. depending upon the number of products and user sessions. Input Image (0.081) (0.078) (0.071) (0.059) (0.056) Figure 1: Example of similar products found using product attributes. The numbers at the bottom of a given image represent the similarity between the given product and the input product. Another source of noise is the session in consideration itself. being sparse, since the subset of users who have generated signals This entire approach hinges around the assumption that products for a product is very small as compared to all users. browsed in a session are similar i.e. there is some context to products After computing the vector for a product pi , we take it’s L2 norm browsed by a user in a session. If the number of products browsed in and find it’s N similar products by finding it’s N nearest neighbors a session are low, say 2 or 3, or if different article types were browsed using the following formulation of cosine similarity as the distance randomly in a session, without any intent, then these sessions will metric. add noise to our system. To tackle this problem, we use the concept of coherent sessions. We define a session as coherent, if in that p⃗i · p⃗j cos (i, j) = session, the user has browsed at least three products of the same |pi | ∗ |p j | article type, and not more than two article types. where p⃗i and |pi | represent the vector of pi and it’s magnitude, As we have considered sessions to construct the graph, it results "·" represents the dot product between the two vectors. in edges being formed only between products which were present In order to get accurate cosine similarities, we utilise a com- in the system at the same time. Therefore, this graph captures the pressed sparse row (CSR) [3] matrix representation of the vectors, ephemeral nature of fashion trends, because products present across instead of approximate nearest neighbor algorithms[10]. Each row two different points in time would never form an edge. So, while of this matrix represents a product and each column represents a finding similar products from this graph for pi , the output products user. We multiply this matrix by its transpose, such that the (i,j)th represent a trend from the time when pi was present. entry of the resultant matrix gives us the cosine similarity between pi and p j . Another advantage of using this representation is that 2.3 Product-User Vectors it reduces computation time because of inherent sparse matrix In this approach, we represent each product as a vector of user sig- optimisations. nals in the real space, with each dimension representing a user. We We observe that products which are globally popular are dense consider three user signals i.e. product clicks, addition of products as compared to products which are less popular. This results in to cart and checkout for populating values in each dimension. We popular products being close neighbors to many products, if their start by aggregating these user signals across S, for a product-user vectors are not normalised. Taking a L2 norm reduces the magnitude combination (pi ,ui ), and generate a chronologically sorted list of along each dimension of popular product and normalizes for global all the identified signals that user ui has generated for product pi , popularity. where pi ∈ P. If we denote a product click by C, addition to cart as In this approach, since we have aggregated user signals across T and checkout as O, one example of such representation could be S, it results in the system being agnostic of fashion trends. Unlike the item graph approach, we find multiple instances of linkage of (pi , ui ) = {C, C, C,T , O } products across time. Hence, while finding similar products for a product pi using this approach, we find that the output products do After generating this list, we quantify our user signals, based on not necessarily reflect trends from the same time-frame as when pi past data. We estimate the relationship between number of product was present in the system. clicks, add to carts and checkout using a linear classification model with each session as a data point. In the linear classification model, 3 ANALYSIS AND RESULTS we use product clicks and add to cart signals as input and the checkout event as the target value. 3.1 Data The weight for checkout event is considered as 1, since it is the We split our sessions data into two non overlapping sets, a training target value in the classification model. So, for the products which set, which is used to generate the similar products, and a test set, a user has bought, we populate a value of 1 in that user’s dimension which is used to evaluate the approaches. On an average, each for those products. For products, which the user hasn’t bought but session contains 5 products. The training set consists of ∼200M has clicked or added to cart, we populate it’s vector with weights sessions with ∼12M unique users. Using this training set, we find that we obtained from the model. For users, who have not generated similar products for ∼1.3M products. The test set is generated from any of the above signals for a product, we don’t populate any value ∼10M sessions with ∼500K unique users. To generate the test set in their dimensions. This results in the vector for each product of products for pi , we take each session in the test data, where Input Image (0.712) (0.702) (0.588) (0.473) (0.405) Figure 2: Example of similar products found using item-item graph. The first image is of the input product and the rest of the images are it’s top 5 adjacent nodes. The numbers at the bottom of a given image are the PMI scores between the given product and the input product pi was clicked, and we consider the product which was clicked article types. immediately after pi in this session. Aggregating across all the sessions in the test data, we create a set of products which were Article Type Number Number Average clicked immediately after pi . Using the session test data, we were of Vec- of Users Vector able to generate test sets for ∼330K products, with the average size tors Density of the set being 164. Women-Tops 68553 5038512 0.00042 We use a series of MapReduce computations [5] to generate the Men-Tshirts 85123 7155934 0.00027 item graph and to aggregate user signals for each product user Women-Jeans 8070 2497915 0.0013 combination. Further, to create the CSR matrix for product user Men-Jeans 22099 4648243 0.00072 vectors, we use the CSR sparse matrix routine in scipy [8]. Women-Casual Shoes 5942 2508693 0.0011 Men-Casual Shoes 28398 6984123 0.00057 3.2 Analysis Women-Sports Shoes 2459 1768939 0.0019 3.2.1 Browsing behaviour by Gender. To analyse browsing Men-Sports Shoes 9527 4738348 0.0011 behaviour by gender, we look at density statistics from both the Similar to item-item graph, we notice that women article types item-item graph and product-user vectors. are more dense as compared to men products. The higher density For item-item graph, we define, the density of a graph as of women product vectors corroborates our earlier inference that e women products are browsed more as compared to men products. d= n ∗ (n − 1)/2 3.2.2 Difference in Style Catalogued Date. Each product in where e is the number of edges and n is the number of nodes. Den- our system is tagged with a style catalogued date (SCD), which sity here represents the ratio of number of edges formed in a graph represents the date when the product was introduced to the system. to all possible edges in the graph. Below we tabulate density statis- We calculate the average difference between SCD of a product and tics of graphs for some prominent article types it’s similar products and further average it over an article type. Fig 8 represents average difference in SCD, between input prod- Article Type Number Number Density ucts and their similar products averaged over article types for the of of edges of two approaches. It can be observed from the plot that average dif- nodes Graph ference in SCD for product-user vectors is 2-3 times more than that Women-Tops 62727 7325578 0.0037 of item-item graph. Hence, it can be inferred from the plot that Men-Tshirts 74763 9092631 0.0033 similar products found using item-item graph are from the same Women-Jeans 7255 303871 0.0115 time-frame, while similar products found using product user vec- Men-Jeans 19958 1519793 0.0076 tors link across time. This validates our claim that similar products Women-Casual Shoes 6932 287098 0.0120 found using item-item graph are cohesive of fashion trends, while Men-Casual Shoes 26402 2279818 0.0065 those found using product user vectors are agnostic of them. Women-Sports Shoes 2124 31327 0.0139 Men-Sports Shoes 8918 341966 0.0086 3.3 Results As can be observed from the graph, for two comparable arti- 3.3.1 Precision Scores. Denoting the test set of pi by Ti and cle types like Men-Jeans and Women-Jeans, the women article the set of similar products for it as Ri , we use the following formu- types have higher density as compared to men article types, despite lation to calculate precision: women article types having lower number of nodes. It can be in- ferred from the higher density of women article types that women |Ri ∩ Ti | precision = browse more products than men before making a selection. |Ri | For product user vectors, density of a sparse vector is defined as The size of Ri is a hyper-parameter, since we can set the number the ratio of number of non-zero values in a vector to it’s length. In of similar products that we want to find for pi , and we denote it as the table below, we tabulate density statistics for vectors of different N. Input Image (0.151) (0.112) (0.108) (0.107) (0.100) Figure 3: Example of similar products found using product user vectors. The first image is of the input product and the rest of the images are it’s top 5 nearest neighbors. The numbers at the bottom of an image are the cosine similarities between the given product and the input product Fig 4: Precision versus N for the three approaches Fig 5: Recall versus N for the three approaches Fig 6: Precision for different article types Fig 7: Recall for different article types Fig 4 shows the average precision of the three approaches for of the week. From the test, we observe that product user vectors different values of N. Fig 6 shows precision by article type for the show an average CTR improvement of 5% over item-item graph three approaches calculated with a value of N as 25. with a p-value[13] of 0.019. Also, we notice that item-item graph and product user vectors show an improvement of 50% and 58.4% over 3.3.2 Recall Scores. Going with the same notations as above, product attribute vectors with a p-value of 1.2*10−5 and 6.6*10−6 we use the following formulation to calculate recall: respectively. |Ri ∩ Ti | recall = 4 CONCLUSION |Ti | In this paper, we have identified and addressed some of the chal- Fig 5 shows the average recall of the three approaches for dif- lenges faced by item-item collaborative filtering approaches which ferent values of N. Fig 7 shows recall by article type for the three are unique to fashion e-commerce domain. We formulate a new approaches, with the same value of N as for Fig 6 i.e. 25. method to combine and quantify various user signals, which can It is evident from Fig 4-7 that the two collaborative filtering be used as input to these approaches, in e-commerce domain. approaches significantly out perform the content based approach. We propose a new method to evaluate similar products in an Among the two item-item collaborative filtering approaches we offline setting. We compare and evaluate three approaches to discov- calculate from the numbers of Fig 4 & 5 that product user vectors ering similar products in both offline and online tests. We observe show an average improvement of 7% and 5.1% over item-item graph a significant improvement in the performance of item-item col- for precision and recall, respectively. Also we observe that product laborative filtering approaches as compared to the content based user vectors have better precision recall numbers across majority approach. Furthermore, among the two item-item collaborative of article types as compared to item-item graph. filtering approaches product user vectors perform noticeably better 3.3.3 A/B Tests. For the A/B test, we render the similar prod- than item-item graph. We also compare the effects of fashion trends ucts on the product details page of the input product. We test the with respect to the two approaches and based on results of the tests, above three approaches by assigning randomly selected 10% of traf- infer that a user’s preference to products is independent of rapidly fic to each treatment ( 100k users for each treatment). We recorded changing fashion trends. the number of product views and the number of clicks for the three approaches over a period of three weeks. Figure 9 shows the CTR 5 ACKNOWLEDGEMENTS (click through rate defined as ratio of clicks to views) recorded dur- We thank Deepak Warrier, Ankul Batra, Sagar Arora, Ullas Nam- ing the test period for the three approaches averaged over each day biar, Ashay Tamhane and Kunal Sachdeva for their contributions Fig 8: Average difference in SCD by article type Fig 9: Average CTR over a period of 3 weeks in reviewing this work and their inputs to algorithm design and [6] Ted Dunning. 1993. Accurate methods for the statistics of surprise and coinci- evaluation. dence. Computational linguistics 19, 1 (1993), 61–74. [7] Tv Genius. 2011. An Integrated Approach to TV & VOD Recommendations. (2011). REFERENCES [8] Eric Jones, Travis Oliphant, Pearu Peterson, et al. 2001–. SciPy: Open source [1] Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining association scientific tools for Python. (2001–). http://www.scipy.org/ [Online; accessed rules between sets of items in large databases. In Acm sigmod record, Vol. 22. 2016-10-14]. ACM, 207–216. [9] Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon. com recommenda- [2] Gerlof Bouma. 2009. Normalized (pointwise) mutual information in collocation tions: Item-to-item collaborative filtering. IEEE Internet computing 7, 1 (2003), extraction. Proceedings of GSCL (2009), 31–40. 76–80. [3] Aydin Buluç, Jeremy T Fineman, Matteo Frigo, John R Gilbert, and Charles E [10] Marius Muja and David G Lowe. 2009. Fast Approximate Nearest Neighbors with Leiserson. 2009. Parallel sparse matrix-vector and matrix-transpose-vector multi- Automatic Algorithm Configuration. VISAPP (1) 2, 331-340 (2009), 2. plication using compressed sparse blocks. In Proceedings of the twenty-first annual [11] Pandora Radio. [n. d.]. Music Genome Project. ([n. d.]). https://www.pandora. symposium on Parallelism in algorithms and architectures. ACM, 233–244. com/about/mgp [4] James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, [12] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, et al. 2010. collaborative filtering recommendation algorithms. In Proceedings of the 10th The YouTube video recommendation system. In Proceedings of the fourth ACM international conference on World Wide Web. ACM, 285–295. conference on Recommender systems. ACM, 293–296. [13] Bernard L Welch. 1947. The generalization ofstudent’s’ problem when several [5] Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing different population variances are involved. Biometrika 34, 1/2 (1947), 28–35. on large clusters. Commun. ACM 51, 1 (2008), 107–113.