=Paper=
{{Paper
|id=Vol-2311/paper_19
|storemode=property
|title=Discovering Similar Products in Fashion E-commerce
|pdfUrl=https://ceur-ws.org/Vol-2311/paper_19.pdf
|volume=Vol-2311
|authors=Amber Madvariya,Sumit Borar
|dblpUrl=https://dblp.org/rec/conf/sigir/MadvariyaB17
}}
==Discovering Similar Products in Fashion E-commerce==
<pdf width="1500px">https://ceur-ws.org/Vol-2311/paper_19.pdf</pdf>
<pre>
            Discovering Similar Products in Fashion E-commerce
                                     Amber Madvariya                                   Sumit Borar
                              amber.madvariya@myntra.com                         sumit.borar@myntra.com

ABSTRACT                                                                 Therefore, discovering similar products in fashion e-commerce be-
In recent years, item-item collaborative filtering algorithms have       comes an interesting data mining and business problem. In this
been studied thoroughly in recommender systems. When applied in          paper, we present our work on prescribing similar products using
the context of fashion e-commerce these algorithms can be used to        item-item collaborative filtering algorithms in a fashion context.
generate similar recommendations, personalize search results and             Similarity between items is usually computed in terms of content
to build a framework for creating clusters of similar products. The      based similarity, item-item based collaborative filtering [12] or the
efficacy of these algorithms when applied in fashion domain, rely        hybrid of two. Content based similarity measures rely on a curated
on accurately inferring a user’s fashion taste and matching them to      taxonomy to represent a particular content. A widely cited example
a product. Our work hinges around discovery of similar products          of content based system is that of Pandora radio[11], which uses
using two different item-item collaborative filtering algorithms. We     the Music Genome Project to tag their songs and artists from a set
identify and address some unique challenges while applying these         of 450 manually curated attributes. Unlike music, the dynamicity of
algorithms in the dynamic fashion e-commerce environment. We             trends in fashion, makes its taxonomy ephemeral and new products
study and evaluate their performances through precision/recall           continuously require new attributes to be added to the taxonomy.
measures, live A/B tests and baseline them against a content based       Maintaining and curating this taxonomy system becomes an un-
approach. We also discuss effects of the transient nature of the         scalable task over time. Another limitation with this approach is
industry on a user’s fashion taste.                                      that attributes are unable to capture user taste completely.
                                                                             Unlike content based similarity measures, item-item collabora-
                                                                         tive filtering algorithms use user signals instead of a taxonomy to
KEYWORDS                                                                 compute item-item similarity. In the domain of e-commerce users’
Recommender Systems, Item-Item Collaborative Filtering                   feedback is not explicit [7] and needs to be inferred from user sig-
                                                                         nals. In traditional e-commerce platforms [9], user purchases are
1 INTRODUCTION                                                           used to compute item similarity. The challenge in using only user
                                                                         purchase data to compute item similarity in fashion e-commerce
Fashion shopping is a complex intersection of product styles with
users’ taste where products are usually described in terms of            is that it is sparse and erratic. This is due to two primary factors,
several attributes such as silhouette, line, hem length, color,          the short life cycle of products and the breadth of the catalogue
fabric, waist length and so forth. These attributes are dynamic          available. Thus similarity measures relying solely on user purchase
in nature and vary as trends in fashion change. However, these           data tend to perform poorly in this setting. So we consider pur-
attributes are not typical representatives of a user’s fashion taste.    chase data along with signals like clicking on a product and adding
Users rely more on look, feel, popularity and other intrinsic            a product to a cart to compute item similarity.
factors to identify their taste.
   The advent and growth of fashion e-commerce, pose even more              The challenges with using these implicit feedback signals are
challenges towards providing users with relevant products. As the        following:
industry grows, there are thousands of products available in a
catalogue with similar set of attributes. Also users’ needs in fashion       • Quantifying the chosen signals by assigning relative weights
are not specific as compared to hard goods like electronics,                   to each.
therefore they tend to browse significantly more products on a               • Normalising effect of popular products i.e. products which
fashion portal before clicking on a particular product. In general,            have signals from a large number of users.
at Myntra we observe the average click depth i.e. number of
products viewed before the first click, to be around 90. Correctly          In this work, we look at two representations to model item-item
inferring a user’s intent in a session becomes critical for providing    collaborative filtering. We compare these two approaches with a
a better shopping experience. A user’s click on a prod-uct is a          content based approach, where we use the annotated taxonomy of
proxy for his/her interest in that product. We can utilize this          a product to get a feature representation for it.
information to narrow down his/her intent. Thereafter, serving a            In the following sections of this work, we describe these three
user with similar products to the clicked one, helps them navigate       approaches in detail and compare their performance via A/B tests
through the vast catalogue and find relevant products. We also use       and precision-recall test. Finally, we try to conclude by inferring
similar products to personalize search results as majority of queries    whether the transient nature of the industry affects user taste or
on our platform are broad in nature like ’t-shirts’ and ranking re-      not.
sults based on user’s interest improves the overall search relevance.
                                                                         2   METHODOLOGY
Copyright © 2017 by the paper’s authors. Copying permitted for private   For the remaining sections of this paper, we will use the following
and academic purposes.                                                   terminology. Let P be the set of all products and S be the set of
In: J. Degenhardt, S. Kallumadi, M. de Rijke, L. Si, A. Trotman, Y.
Xu (eds.): Proceedings of the SIGIR 2017 eCom workshop, August 2017,
Tokyo, Japan, published at http://ceur-ws.org
all user sessions. A session contains all activity by a user within             In order to to generate this graph, we use a well-known technique
a 30 minute window from the time he/she logs into the portal                 known as association rule mining[1] or co-visitation counts. We
annotated by time-stamp. We record user signals like product list            consider a user session si , where si ⊂ S, and generate a set of
views, product clicks, addition to carts, orders placed and so on.           products {p1 , p2 , . . . pn } clicked in that session. Within this set, we
   For all the three approaches mentioned below, we split our data           compute all pair combinations of products (pi ,p j ) which were co-
at the article type (e.g Men-Tshirts, Men-Shirts, Women-Dresses)             browsed together. We count all the occurrences where pi and p j
level, since products similar to a product pi should be from the             were co-browsed together, across S and denote the total count of
same article type. Splitting the data at the article type level has the      co-occurrences of (pi ,p j ) as c i j . We assign edge weights as w i j and
added advantage that it reduces the set of products from which we            calculate it using the following formulation of normalised point-
find similar products for pi .                                               wise mutual information (NPMI) [2]:

                                                                                                                −1             p(i,j)
2.1    Product Attribute Vectors                                                                   wi j =               ∗ log
                                                                                                             log p(i,j)       p(i)p(j)
Each product in our system is annotated with attributes generated
                                                                                where p(i,j) is the probability of occurrence of the pair (pi ,p j ) and
from a manually curated taxonomy. These attributes are broadly
                                                                             p(i) and p(j) is the probability of occurrence of pi and p j respectively.
classified into two types, general attributes like brand, color, price
                                                                             We can use the following formulations for the values of p(i,j), p(i),
bands etc which are applicable for all article-types and article-type
                                                                             p(j):
specific attributes (ATSA) like collar type, sleeve length, neck type
for T-shirts. Each attribute is represented as a key-value pair e.g.                                 ci j          ci              cj
collar type of a t-shirt is the key and different collar types like round-                  p(i,j) =        p(i) =         p(j) =
                                                                                                      C            C               C
neck, v-neck, polo-neck are values. Each product is represented
                                                                                where, c i j = count of occurrences of pair (i,j),
as a vector in the real space, where each dimension represents an
attribute. We use binary values to populate a particular dimension                                 X                               X
i.e. if the attribute represented by the dimension is a part of the                         ci =          c ik      and      C=             ci j
set of attributes which were annotated to the product, then we                                     k ⊂P                           i, j ⊂P
populate the dimension as 1, else we populate it as 0. Along with                Then, to find similar products for a product pi , we locate pi in
product attributes, we extract relevant bi-grams using log likelihood        the graph, we consider it’s adjacent nodes and among them pick
ratio scores[6] from the product descriptions. We utilize a L2 norm          the top N neighbors, sorted in descending order by edge weight.
to normalize these product vectors.In order to generate N similar            Here N is a hyper-parameter which denotes the number of similar
products for a product pi , we discover the N closest neighbors of           products we want to find for pi .
pi , keeping cosine distance as our distance metric.                             The advantage of using this formulation to generate edge weights,
    We analyse this approach here to benchmark the performance of            is that it normalises the effect of popular products. If we consider
item-item collaborative filtering algorithms. The major challenge            a pair of products (pi ,p j ) where p j is browsed more across all ses-
with this approach is that the set of attributes representing a prod-        sions as compared to pi , then the value of c j will be high, resulting
uct is not comprehensive and could results in products not being             in a high value of p(j). The high value of p(j) results in the PMI
annotated by some key information.                                           of (pi ,p j ) turning out to be low, even though the co-occurrence
                                                                             counts of (pi ,p j ) might have been higher as compared to other pairs
                                                                             containing pi .
2.2    Item-Item Weighted Graph                                                  One challenge with using this approach is the noise prevalent in
In this approach, we use an undirected weighted graph represen-              the input data. Some pairs might have a low co-browsing count or
tation to model product relationships in the system. In this repre-          low PMI score to form a meaningful edge. To tackle this challenge,
sentation nodes represent products, edges represent associativity            we put a threshold on both the co-browsing counts and the edge
between products and edge weights represent degree of associa-               weights generated using PMI scores. After experimenting, we found
tivity between products. This approach was first introduced by               5 to be a suitable threshold for co-browsing count and 0.15 to
YouTube [4], however our formulation to calculate edge weight is             be a suitable threshold for PMI score. These thresholds change
different than the one showed in that work.                                  depending upon the number of products and user sessions.


       Input Image                 (0.081)                  (0.078)               (0.071)                        (0.059)                     (0.056)

Figure 1: Example of similar products found using product attributes. The numbers at the bottom of a given image represent
                              the similarity between the given product and the input product.
   Another source of noise is the session in consideration itself.          being sparse, since the subset of users who have generated signals
This entire approach hinges around the assumption that products             for a product is very small as compared to all users.
browsed in a session are similar i.e. there is some context to products        After computing the vector for a product pi , we take it’s L2 norm
browsed by a user in a session. If the number of products browsed in        and find it’s N similar products by finding it’s N nearest neighbors
a session are low, say 2 or 3, or if different article types were browsed   using the following formulation of cosine similarity as the distance
randomly in a session, without any intent, then these sessions will         metric.
add noise to our system. To tackle this problem, we use the concept
of coherent sessions. We define a session as coherent, if in that                                                    p⃗i · p⃗j
                                                                                                    cos (i, j) =
session, the user has browsed at least three products of the same                                                  |pi | ∗ |p j |
article type, and not more than two article types.                              where p⃗i and |pi | represent the vector of pi and it’s magnitude,
   As we have considered sessions to construct the graph, it results        "·" represents the dot product between the two vectors.
in edges being formed only between products which were present                  In order to get accurate cosine similarities, we utilise a com-
in the system at the same time. Therefore, this graph captures the          pressed sparse row (CSR) [3] matrix representation of the vectors,
ephemeral nature of fashion trends, because products present across         instead of approximate nearest neighbor algorithms[10]. Each row
two different points in time would never form an edge. So, while            of this matrix represents a product and each column represents a
finding similar products from this graph for pi , the output products       user. We multiply this matrix by its transpose, such that the (i,j)th
represent a trend from the time when pi was present.                        entry of the resultant matrix gives us the cosine similarity between
                                                                            pi and p j . Another advantage of using this representation is that
2.3    Product-User Vectors                                                 it reduces computation time because of inherent sparse matrix
In this approach, we represent each product as a vector of user sig-        optimisations.
nals in the real space, with each dimension representing a user. We             We observe that products which are globally popular are dense
consider three user signals i.e. product clicks, addition of products       as compared to products which are less popular. This results in
to cart and checkout for populating values in each dimension. We            popular products being close neighbors to many products, if their
start by aggregating these user signals across S, for a product-user        vectors are not normalised. Taking a L2 norm reduces the magnitude
combination (pi ,ui ), and generate a chronologically sorted list of        along each dimension of popular product and normalizes for global
all the identified signals that user ui has generated for product pi ,      popularity.
where pi ∈ P. If we denote a product click by C, addition to cart as            In this approach, since we have aggregated user signals across
T and checkout as O, one example of such representation could be            S, it results in the system being agnostic of fashion trends. Unlike
                                                                            the item graph approach, we find multiple instances of linkage of
                       (pi , ui ) = {C, C, C,T , O }                        products across time. Hence, while finding similar products for a
                                                                            product pi using this approach, we find that the output products do
   After generating this list, we quantify our user signals, based on       not necessarily reflect trends from the same time-frame as when pi
past data. We estimate the relationship between number of product           was present in the system.
clicks, add to carts and checkout using a linear classification model
with each session as a data point. In the linear classification model,      3 ANALYSIS AND RESULTS
we use product clicks and add to cart signals as input and the
checkout event as the target value.                                         3.1 Data
   The weight for checkout event is considered as 1, since it is the        We split our sessions data into two non overlapping sets, a training
target value in the classification model. So, for the products which        set, which is used to generate the similar products, and a test set,
a user has bought, we populate a value of 1 in that user’s dimension        which is used to evaluate the approaches. On an average, each
for those products. For products, which the user hasn’t bought but          session contains 5 products. The training set consists of ∼200M
has clicked or added to cart, we populate it’s vector with weights          sessions with ∼12M unique users. Using this training set, we find
that we obtained from the model. For users, who have not generated          similar products for ∼1.3M products. The test set is generated from
any of the above signals for a product, we don’t populate any value         ∼10M sessions with ∼500K unique users. To generate the test set
in their dimensions. This results in the vector for each product            of products for pi , we take each session in the test data, where


      Input Image                  (0.712)                  (0.702)              (0.588)                 (0.473)                    (0.405)

 Figure 2: Example of similar products found using item-item graph. The first image is of the input product and the rest of
 the images are it’s top 5 adjacent nodes. The numbers at the bottom of a given image are the PMI scores between the given
                                                product and the input product
pi was clicked, and we consider the product which was clicked              article types.
immediately after pi in this session. Aggregating across all the
sessions in the test data, we create a set of products which were           Article Type                 Number      Number          Average
clicked immediately after pi . Using the session test data, we were                                      of Vec-     of Users        Vector
able to generate test sets for ∼330K products, with the average size                                     tors                        Density
of the set being 164.                                                       Women-Tops                   68553       5038512         0.00042
   We use a series of MapReduce computations [5] to generate the            Men-Tshirts                  85123       7155934         0.00027
item graph and to aggregate user signals for each product user              Women-Jeans                  8070        2497915         0.0013
combination. Further, to create the CSR matrix for product user             Men-Jeans                    22099       4648243         0.00072
vectors, we use the CSR sparse matrix routine in scipy [8].                 Women-Casual Shoes           5942        2508693         0.0011
                                                                            Men-Casual Shoes             28398       6984123         0.00057
3.2    Analysis                                                             Women-Sports Shoes           2459        1768939         0.0019
   3.2.1 Browsing behaviour by Gender. To analyse browsing                  Men-Sports Shoes             9527        4738348         0.0011
behaviour by gender, we look at density statistics from both the
                                                                              Similar to item-item graph, we notice that women article types
item-item graph and product-user vectors.
                                                                           are more dense as compared to men products. The higher density
   For item-item graph, we define, the density of a graph as
                                                                           of women product vectors corroborates our earlier inference that
                                     e                                     women products are browsed more as compared to men products.
                          d=
                               n ∗ (n − 1)/2
                                                                               3.2.2 Difference in Style Catalogued Date. Each product in
where e is the number of edges and n is the number of nodes. Den-          our system is tagged with a style catalogued date (SCD), which
sity here represents the ratio of number of edges formed in a graph        represents the date when the product was introduced to the system.
to all possible edges in the graph. Below we tabulate density statis-      We calculate the average difference between SCD of a product and
tics of graphs for some prominent article types                            it’s similar products and further average it over an article type.
                                                                               Fig 8 represents average difference in SCD, between input prod-
 Article Type                  Number       Number         Density
                                                                           ucts and their similar products averaged over article types for the
                               of           of edges       of
                                                                           two approaches. It can be observed from the plot that average dif-
                               nodes                       Graph
                                                                           ference in SCD for product-user vectors is 2-3 times more than that
 Women-Tops                    62727        7325578        0.0037
                                                                           of item-item graph. Hence, it can be inferred from the plot that
 Men-Tshirts                   74763        9092631        0.0033
                                                                           similar products found using item-item graph are from the same
 Women-Jeans                   7255         303871         0.0115          time-frame, while similar products found using product user vec-
 Men-Jeans                     19958        1519793        0.0076          tors link across time. This validates our claim that similar products
 Women-Casual Shoes            6932         287098         0.0120          found using item-item graph are cohesive of fashion trends, while
 Men-Casual Shoes              26402        2279818        0.0065          those found using product user vectors are agnostic of them.
 Women-Sports Shoes            2124         31327          0.0139
 Men-Sports Shoes              8918         341966         0.0086          3.3     Results
   As can be observed from the graph, for two comparable arti-                3.3.1 Precision Scores. Denoting the test set of pi by Ti and
cle types like Men-Jeans and Women-Jeans, the women article                the set of similar products for it as Ri , we use the following formu-
types have higher density as compared to men article types, despite        lation to calculate precision:
women article types having lower number of nodes. It can be in-
ferred from the higher density of women article types that women                                              |Ri ∩ Ti |
                                                                                                  precision =
browse more products than men before making a selection.                                                         |Ri |
   For product user vectors, density of a sparse vector is defined as         The size of Ri is a hyper-parameter, since we can set the number
the ratio of number of non-zero values in a vector to it’s length. In      of similar products that we want to find for pi , and we denote it as
the table below, we tabulate density statistics for vectors of different   N.


      Input Image                 (0.151)                  (0.112)               (0.108)                (0.107)                 (0.100)

 Figure 3: Example of similar products found using product user vectors. The first image is of the input product and the rest
of the images are it’s top 5 nearest neighbors. The numbers at the bottom of an image are the cosine similarities between the
                                              given product and the input product
           Fig 4: Precision versus N for the three approaches                   Fig 5: Recall versus N for the three approaches


                Fig 6: Precision for different article types                        Fig 7: Recall for different article types


   Fig 4 shows the average precision of the three approaches for         of the week. From the test, we observe that product user vectors
different values of N. Fig 6 shows precision by article type for the     show an average CTR improvement of 5% over item-item graph
three approaches calculated with a value of N as 25.                     with a p-value[13] of 0.019. Also, we notice that item-item graph and
                                                                         product user vectors show an improvement of 50% and 58.4% over
  3.3.2 Recall Scores. Going with the same notations as above,           product attribute vectors with a p-value of 1.2*10−5 and 6.6*10−6
we use the following formulation to calculate recall:                    respectively.
                                   |Ri ∩ Ti |
                         recall =                                        4   CONCLUSION
                                      |Ti |
                                                                         In this paper, we have identified and addressed some of the chal-
   Fig 5 shows the average recall of the three approaches for dif-
                                                                         lenges faced by item-item collaborative filtering approaches which
ferent values of N. Fig 7 shows recall by article type for the three
                                                                         are unique to fashion e-commerce domain. We formulate a new
approaches, with the same value of N as for Fig 6 i.e. 25.
                                                                         method to combine and quantify various user signals, which can
   It is evident from Fig 4-7 that the two collaborative filtering
                                                                         be used as input to these approaches, in e-commerce domain.
approaches significantly out perform the content based approach.
                                                                             We propose a new method to evaluate similar products in an
Among the two item-item collaborative filtering approaches we
                                                                         offline setting. We compare and evaluate three approaches to discov-
calculate from the numbers of Fig 4 & 5 that product user vectors
                                                                         ering similar products in both offline and online tests. We observe
show an average improvement of 7% and 5.1% over item-item graph
                                                                         a significant improvement in the performance of item-item col-
for precision and recall, respectively. Also we observe that product
                                                                         laborative filtering approaches as compared to the content based
user vectors have better precision recall numbers across majority
                                                                         approach. Furthermore, among the two item-item collaborative
of article types as compared to item-item graph.
                                                                         filtering approaches product user vectors perform noticeably better
    3.3.3 A/B Tests. For the A/B test, we render the similar prod-       than item-item graph. We also compare the effects of fashion trends
ucts on the product details page of the input product. We test the       with respect to the two approaches and based on results of the tests,
above three approaches by assigning randomly selected 10% of traf-       infer that a user’s preference to products is independent of rapidly
fic to each treatment ( 100k users for each treatment). We recorded      changing fashion trends.
the number of product views and the number of clicks for the three
approaches over a period of three weeks. Figure 9 shows the CTR          5   ACKNOWLEDGEMENTS
(click through rate defined as ratio of clicks to views) recorded dur-   We thank Deepak Warrier, Ankul Batra, Sagar Arora, Ullas Nam-
ing the test period for the three approaches averaged over each day      biar, Ashay Tamhane and Kunal Sachdeva for their contributions
              Fig 8: Average difference in SCD by article type                                      Fig 9: Average CTR over a period of 3 weeks


in reviewing this work and their inputs to algorithm design and                            [6] Ted Dunning. 1993. Accurate methods for the statistics of surprise and coinci-
evaluation.                                                                                    dence. Computational linguistics 19, 1 (1993), 61–74.
                                                                                           [7] Tv Genius. 2011. An Integrated Approach to TV & VOD Recommendations.
                                                                                               (2011).
REFERENCES                                                                                 [8] Eric Jones, Travis Oliphant, Pearu Peterson, et al. 2001–. SciPy: Open source
[1] Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining association                scientific tools for Python. (2001–). http://www.scipy.org/ [Online; accessed
    rules between sets of items in large databases. In Acm sigmod record, Vol. 22.             2016-10-14].
    ACM, 207–216.                                                                          [9] Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon. com recommenda-
[2] Gerlof Bouma. 2009. Normalized (pointwise) mutual information in collocation               tions: Item-to-item collaborative filtering. IEEE Internet computing 7, 1 (2003),
    extraction. Proceedings of GSCL (2009), 31–40.                                             76–80.
[3] Aydin Buluç, Jeremy T Fineman, Matteo Frigo, John R Gilbert, and Charles E            [10] Marius Muja and David G Lowe. 2009. Fast Approximate Nearest Neighbors with
    Leiserson. 2009. Parallel sparse matrix-vector and matrix-transpose-vector multi-          Automatic Algorithm Configuration. VISAPP (1) 2, 331-340 (2009), 2.
    plication using compressed sparse blocks. In Proceedings of the twenty-first annual   [11] Pandora Radio. [n. d.]. Music Genome Project. ([n. d.]). https://www.pandora.
    symposium on Parallelism in algorithms and architectures. ACM, 233–244.                    com/about/mgp
[4] James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet,        [12] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based
    Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, et al. 2010.              collaborative filtering recommendation algorithms. In Proceedings of the 10th
    The YouTube video recommendation system. In Proceedings of the fourth ACM                  international conference on World Wide Web. ACM, 285–295.
    conference on Recommender systems. ACM, 293–296.                                      [13] Bernard L Welch. 1947. The generalization ofstudent’s’ problem when several
[5] Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing              different population variances are involved. Biometrika 34, 1/2 (1947), 28–35.
    on large clusters. Commun. ACM 51, 1 (2008), 107–113.

</pre>