=Paper= {{Paper |id=Vol-2319/paper26 |storemode=property |title=Leveraging Catalog to Resolve Conflicting Query Attributes in e-Commerce Sites |pdfUrl=https://ceur-ws.org/Vol-2319/paper26.pdf |volume=Vol-2319 |authors=Suhas Ranganath |dblpUrl=https://dblp.org/rec/conf/sigir/Ranganath18 }} ==Leveraging Catalog to Resolve Conflicting Query Attributes in e-Commerce Sites== https://ceur-ws.org/Vol-2319/paper26.pdf
     Leveraging Catalog to Resolve Conflicting Query Attributes in
                          E-commerce Sites
                                                                                      Suhas Ranganath
                                                                                    Walmart Labs
                                                                          suhas.ranganath@walmartlabs.com

 ABSTRACT                                                                                             help e-commerce search systems to return more relevant items and
Millions of people use online e-commerce platforms to search and                                      better satisfy the buying needs of users.
buy products. Identifying attributes in a query is a critical com-                                       This task faces several challenges. First, queries are short and
ponent in connecting users to relevant items. However, in many                                        contain insufficient information for systems to identify attributes.
cases, the queries have multiple attributes, and some of them will                                    Second, knowledge bases like Wikipedia used to supplement query
be in conflict with each other. For example, the query “maroon 5                                      text in web search [2] are not focused on e-commerce domain, and
dvds” has two candidate attributes, the color “maroon” or the band                                    can lead to insufficient and noisy information. Third, e-commerce
“maroon 5”, where only one of the attributes can be present. In this                                  sites have millions of users, and search algorithms have significant
paper, we address the problem of resolving conflicting attributes                                     issues of scalability.
in e-commerce queries. A challenge in this problem is that knowl-                                        E-commerce search systems have access to a catalog which con-
edge bases like Wikipedia that are used to understand web queries                                     tains the attributes of items sold by the system. Leveraging the
are not focused on the e-commerce domain. E-commerce search                                           e-commerce catalog as a knowledge base to supplement the tex-
engines, however, have access to the catalog which contains de-                                       tual information can help to resolve conflicting query attributes. In
tailed information about the items and its attributes. We propose a                                   the example query ”maroon 5 dvds”, we see from the catalog that
framework that leverages catalog information to resolve conflicting                                   there is a significant number of items having the product type as
attributes in e-commerce queries. Our experiments on real-world                                       “dvds” and have a band attribute whereas very few of them have the
queries on e-commerce platforms demonstrate that resolving con-                                       color attribute. This indicates that the catalog can provide valuable
flicting attributes by leveraging catalog information significantly                                   information which can use to resolve conflicting query attributes.
improves attribute identification, and also gives out more relevant                                   Therefore, in this paper, we propose a framework to model catalog
search results.                                                                                       information to better identify attributes in e-commerce queries.
                                                                                                         Specifically, we address the following questions: How to model
                                                                                                      the catalog information to resolve conflicts in query attributes?
 1     INTRODUCTION                                                                                   How to evaluate the impact of the framework on on e-commerce
 E-commerce sites are being used by millions of people to buy prod-                                   search systems? The primary contributions of the work are
 ucts in a fast and seamless manner. Users express their buying
                                                                                                          • Proposing the problem of resolving conflicts in query at-
 needs through search queries, and an accurate understanding of
                                                                                                            tributes for e-commerce queries;
 the query is necessary to return relevant items [1]. A crucial part
                                                                                                          • Proposing a framework to model catalog information to
 of query understanding is to identify attributes inherent in the
                                                                                                            identify query attributes in e-commerce queries; and
 query [4]. For example, identifying that the query “maroon 5 dvds”
                                                                                                          • Presenting evaluations of the utility of catalog information
 has product type “dvd” helps in returning relevant items. Research
                                                                                                            in identifying query attributes on real-world data.
 on identifying query attributes in web search explores the use of
 semantic information [3], user engagement [5] and external knowl-                                       The rest of the paper is organized as follows. In Section 2, we
 edge bases [6]. There has been relatively less work in identifying                                   describe the proposed framework. In Section 3, we present evalua-
 query attributes in the e-commerce domain.                                                           tions of the framework for identifying the query attribute and its
    In many cases, query understanding systems have conflicting                                       impact on ranking relevant items. We conclude in Section 4 along
 candidate attributes for a given query. In the example query, “ma-                                   with possible future directions.
 roon 5 dvds” the candidate attributes are the product type “dvds”
 the color “maroon” and the band “maroon 5”. It is not straight-                                      2    THE PROPOSED FRAMEWORK
 forward for query understanding systems to infer whether the
                                                                                                      In this section, we present our proposed framework to identify at-
 query is referring to the band “maroon 5” or the color “maroon”.
                                                                                                      tributes in e-commerce queries. We first describe the notations used
 Designing algorithms to resolve conflicting query attributes can
                                                                                                      and then define the problem statement. We then use the notations
 Permission to make digital or hard copies of part or all of this work for personal or                to describe various aspects of the framework.
Copyright © 2018 by the paper’s authors. Copying permitted for private and academic purposes.
 classroom
In:            use is G.
    J. Degenhardt,     granted  withoutS.fee
                          Di Fabbrizio,      providedM.that
                                          Kallumadi,        copies
                                                         Kumar,     areLin,
                                                                 Y.-C.   notA.made  or distributed
                                                                               Trotman, H. Zhao          We now present the notations used in the paper. Let q be the
(eds.): Proceedings
 for profit           of the SIGIR
             or commercial         2018 eCom
                               advantage  andworkshop,  12 July,
                                               that copies bear2018,   Ann Arbor,
                                                                 this notice  andMichigan,  USA,
                                                                                  the full citation
published  at http://ceur-ws.org
 on the first   page. Copyrights for third-party components of this work must be honored.
                                                                                                      query and A = {a 1 , a 2 , ..., an } be the set of candidate attributes in
For all other uses, contact the owner/author(s).                                                      the query. Let {A−ak } be the set of attributes of size Na without the
SIGIR 2018 eCom, July 2018, Ann Arbor, Michigan, USA                                                  attribute ak . The problem can be formally stated as follows: “Given
© 2018 Copyright held by the owner/author(s).                                                         the query q, attribute ak , and the attribute set {A − ak } determine
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM.
https://doi.org/10.1145/nnnnnnn.nnnnnnn                                                               whether attribute ak is present in q.”
SIGIR 2018 eCom, July 2018, Ann Arbor, Michigan, USA                                                                                   S. Ranganath et al.


   We next present our framework to identify attribute values for                                 Precision         Recall          F1       nDCG@20
a given query. We explore two sets of metrics to model catalog
                                                                              Dict Lookup              1.0             1.0          1.0           1.0
information to assist in query attribute identification. We next
                                                                              Framework               1.06            1.11         1.08          1.05
describe two sets of metrics along with mathematical formulations,
                                                                                  Gain               +6.48%         +11.37%       +8.3%         +5.36%
one related to the presence of the attribute in the query, and the
second related to the presence of attribute value in the query.           Table 1: Performance of the framework on attribute identi-
   The first metric set computes p(m/q), the probability of an at-        fication and ranking
tribute m being present in the query q. This is formulated as
                p(m/q) ∝ p(m/n = x) ∗ p(n = x/q)
                                mn                                        query. We employ manual labeling by expert annotators for the
                p(m/n = x) =       ,                               (1)    ground truth and use Precision, Recall, and F1 as the evaluation
                                nx
                                                                          metrics. The second task assesses the impact of the framework on
where p(m/n = x) is the probability of attribute m being present          the ranking relevant items for the given query. We use the orders
where n = x, and n x is the number of items in the catalog which          of the query-item pair for the ground truth and nDCG@20 as the
have value x for attribute n. Among such items mn is the number           evaluation metric. The evaluation results are illustrated in Table 1.
of items having values for attribute m. For a given attribute m, we          From the table, we can see that the framework is significantly
compute Eq 1 for all attributes n ∈ {A − m}, resulting in a total of      better in identifying attributes of a given e-commerce query than
Na − 1 feature values. According to Eq 1, if the query has attribute      the baseline across all the Precision, Recall and F1 metrics. The
value n = x, it is more likely to have attribute m if more items in the   improvement in identifying query attributes is also reflected in
catalog with n = x also contain attribute m. In the query “maroon         showing better ranking results as shown by the lift in nDCG@20.
5 dvds”, very few items which have the product type “dvds” have           The improvement in both the tasks demonstrates that the catalog
values for color, and Eq 1 metric gives a lower value for p(color /q),    can be effectively leveraged as a knowledge base to identify at-
the probability of color attribute being present in the query.            tributes for a given query in a better manner, and the ability of the
   The second metric computes p(m = l/q), the probability of at-          metric to effectively capture the relevant catalog information.
tribute m having a value l for the query q. This is formulated as
            p(m = l/q) ∝ p(m = l/n = x) ∗ p(n = x/q)                      4    CONCLUSIONS AND FUTURE WORK
                                loд(mnl )                                 In this paper, we address the problem of identifying attributes for
            p(m = l/n = x) =                ,                      (2)    queries on e-commerce sites. General purpose knowledge bases
                                loд(n x )                                 used in identifying attributes for web queries are not focused on
where the number of items in the catalog having the value x for a         e-commerce needs. We design a framework that leverages catalog
given attribute n is n x . Among these items, let the number of items     as a knowledge base to resolve conflicts in query attributes. We
having value l for attribute m be denoted by mnl . The score is higher    evaluate the framework on the set of queries from Walmart.com
if more number of items having the value x for attribute n also have      and demonstrate that it significantly improves results in attribute
value l for attribute m. We repeat this for all possible attributes       identification and ranking relevant items for e-commerce queries.
n ∈ {A − m} for a given attribute n resulting in an additional            Future research directions can include leveraging query catalog
Na − 1 feature values. The value set for a given attribute follow         interactions and query sequences to design more involved metrics
a power law distribution where few values are prominent, so we            for attribute identification. The utility of catalog in other query
employ log smoothing to make the values linearly distributed.             understanding tasks such as query reformulation and type-ahead
   We integrate the scores derived from the catalog metrics into a        is also an interesting avenue for researchers to explore.
feature set. Our metrics are scalable and hence suited for handling
large-scale traffic common in an e-commerce site. We use an out of        REFERENCES
the box classifier on the feature set to determine whether or not          [1] Ricardo Baeza-Yates. 2017. Semantic Query Understanding. In Proceedings of
                                                                               the 40th International ACM SIGIR Conference on Research and Development in
the given query has the attribute value.                                       Information Retrieval (SIGIR ’17). ACM, New York, NY, USA, 1357–1357. https:
                                                                               //doi.org/10.1145/3077136.3096472
3   EVALUATION                                                             [2] Jian Hu, Gang Wang, Fred Lochovsky, Jian-tao Sun, and Zheng Chen. 2009.
                                                                               Understanding user’s query intent with wikipedia. In Proceedings of the 18th
We next evaluate our framework with the help of traffic weighted               international conference on World wide web. ACM, 471–480.
                                                                           [3] Hang Li and Zhengdong Lu. 2016. Deep learning for information retrieval.
random sample of 20000 queries on Walmart.com. To compare                      In Proceedings of the 39th International ACM SIGIR conference on Research and
our framework, we use the baseline Dict Lookup which identifies                Development in Information Retrieval. ACM, 1203–1206.
attributes for a query by matching overalapping phrases in the             [4] Yanen Li, Bo-June Paul Hsu, and ChengXiang Zhai. 2013. Unsupervised identifica-
                                                                               tion of synonymous query intent templates for attribute intents. In Proceedings of
query with terms in the attribute dictionary. This baseline does               the 22nd ACM international conference on Conference on information & knowledge
not address the possible conflicts that can arise between candidate            management. ACM, 2029–2038.
attributes. For the evaluation, we take color as the attribute that        [5] Xiang Ren, Yujing Wang, Xiao Yu, Jun Yan, Zheng Chen, and Jiawei Han.
                                                                               2014. Heterogeneous graph-based intent learning with queries, web pages and
has to be predicted for a query and the product type and brand as              wikipedia concepts. In Proceedings of the 7th ACM international conference on
the attributes whose values are known for the query.                           Web search and data mining. ACM, 23–32.
                                                                           [6] Zhongyuan Wang, Kejun Zhao, Haixun Wang, Xiaofeng Meng, and Ji-Rong Wen.
   We design two evaluation tasks. The first task assesses the ef-             2015. Query Understanding through Knowledge-Based Conceptualization.
fectiveness of the framework on identifying attributes of a given