=Paper=
{{Paper
|id=Vol-2319/paper26
|storemode=property
|title=Leveraging Catalog to Resolve Conflicting Query Attributes in e-Commerce Sites
|pdfUrl=https://ceur-ws.org/Vol-2319/paper26.pdf
|volume=Vol-2319
|authors=Suhas Ranganath
|dblpUrl=https://dblp.org/rec/conf/sigir/Ranganath18
}}
==Leveraging Catalog to Resolve Conflicting Query Attributes in e-Commerce Sites==
Leveraging Catalog to Resolve Conflicting Query Attributes in
E-commerce Sites
Suhas Ranganath
Walmart Labs
suhas.ranganath@walmartlabs.com
ABSTRACT help e-commerce search systems to return more relevant items and
Millions of people use online e-commerce platforms to search and better satisfy the buying needs of users.
buy products. Identifying attributes in a query is a critical com- This task faces several challenges. First, queries are short and
ponent in connecting users to relevant items. However, in many contain insufficient information for systems to identify attributes.
cases, the queries have multiple attributes, and some of them will Second, knowledge bases like Wikipedia used to supplement query
be in conflict with each other. For example, the query “maroon 5 text in web search [2] are not focused on e-commerce domain, and
dvds” has two candidate attributes, the color “maroon” or the band can lead to insufficient and noisy information. Third, e-commerce
“maroon 5”, where only one of the attributes can be present. In this sites have millions of users, and search algorithms have significant
paper, we address the problem of resolving conflicting attributes issues of scalability.
in e-commerce queries. A challenge in this problem is that knowl- E-commerce search systems have access to a catalog which con-
edge bases like Wikipedia that are used to understand web queries tains the attributes of items sold by the system. Leveraging the
are not focused on the e-commerce domain. E-commerce search e-commerce catalog as a knowledge base to supplement the tex-
engines, however, have access to the catalog which contains de- tual information can help to resolve conflicting query attributes. In
tailed information about the items and its attributes. We propose a the example query ”maroon 5 dvds”, we see from the catalog that
framework that leverages catalog information to resolve conflicting there is a significant number of items having the product type as
attributes in e-commerce queries. Our experiments on real-world “dvds” and have a band attribute whereas very few of them have the
queries on e-commerce platforms demonstrate that resolving con- color attribute. This indicates that the catalog can provide valuable
flicting attributes by leveraging catalog information significantly information which can use to resolve conflicting query attributes.
improves attribute identification, and also gives out more relevant Therefore, in this paper, we propose a framework to model catalog
search results. information to better identify attributes in e-commerce queries.
Specifically, we address the following questions: How to model
the catalog information to resolve conflicts in query attributes?
1 INTRODUCTION How to evaluate the impact of the framework on on e-commerce
E-commerce sites are being used by millions of people to buy prod- search systems? The primary contributions of the work are
ucts in a fast and seamless manner. Users express their buying
• Proposing the problem of resolving conflicts in query at-
needs through search queries, and an accurate understanding of
tributes for e-commerce queries;
the query is necessary to return relevant items [1]. A crucial part
• Proposing a framework to model catalog information to
of query understanding is to identify attributes inherent in the
identify query attributes in e-commerce queries; and
query [4]. For example, identifying that the query “maroon 5 dvds”
• Presenting evaluations of the utility of catalog information
has product type “dvd” helps in returning relevant items. Research
in identifying query attributes on real-world data.
on identifying query attributes in web search explores the use of
semantic information [3], user engagement [5] and external knowl- The rest of the paper is organized as follows. In Section 2, we
edge bases [6]. There has been relatively less work in identifying describe the proposed framework. In Section 3, we present evalua-
query attributes in the e-commerce domain. tions of the framework for identifying the query attribute and its
In many cases, query understanding systems have conflicting impact on ranking relevant items. We conclude in Section 4 along
candidate attributes for a given query. In the example query, “ma- with possible future directions.
roon 5 dvds” the candidate attributes are the product type “dvds”
the color “maroon” and the band “maroon 5”. It is not straight- 2 THE PROPOSED FRAMEWORK
forward for query understanding systems to infer whether the
In this section, we present our proposed framework to identify at-
query is referring to the band “maroon 5” or the color “maroon”.
tributes in e-commerce queries. We first describe the notations used
Designing algorithms to resolve conflicting query attributes can
and then define the problem statement. We then use the notations
Permission to make digital or hard copies of part or all of this work for personal or to describe various aspects of the framework.
Copyright © 2018 by the paper’s authors. Copying permitted for private and academic purposes.
classroom
In: use is G.
J. Degenhardt, granted withoutS.fee
Di Fabbrizio, providedM.that
Kallumadi, copies
Kumar, areLin,
Y.-C. notA.made or distributed
Trotman, H. Zhao We now present the notations used in the paper. Let q be the
(eds.): Proceedings
for profit of the SIGIR
or commercial 2018 eCom
advantage andworkshop, 12 July,
that copies bear2018, Ann Arbor,
this notice andMichigan, USA,
the full citation
published at http://ceur-ws.org
on the first page. Copyrights for third-party components of this work must be honored.
query and A = {a 1 , a 2 , ..., an } be the set of candidate attributes in
For all other uses, contact the owner/author(s). the query. Let {A−ak } be the set of attributes of size Na without the
SIGIR 2018 eCom, July 2018, Ann Arbor, Michigan, USA attribute ak . The problem can be formally stated as follows: “Given
© 2018 Copyright held by the owner/author(s). the query q, attribute ak , and the attribute set {A − ak } determine
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM.
https://doi.org/10.1145/nnnnnnn.nnnnnnn whether attribute ak is present in q.”
SIGIR 2018 eCom, July 2018, Ann Arbor, Michigan, USA S. Ranganath et al.
We next present our framework to identify attribute values for Precision Recall F1 nDCG@20
a given query. We explore two sets of metrics to model catalog
Dict Lookup 1.0 1.0 1.0 1.0
information to assist in query attribute identification. We next
Framework 1.06 1.11 1.08 1.05
describe two sets of metrics along with mathematical formulations,
Gain +6.48% +11.37% +8.3% +5.36%
one related to the presence of the attribute in the query, and the
second related to the presence of attribute value in the query. Table 1: Performance of the framework on attribute identi-
The first metric set computes p(m/q), the probability of an at- fication and ranking
tribute m being present in the query q. This is formulated as
p(m/q) ∝ p(m/n = x) ∗ p(n = x/q)
mn query. We employ manual labeling by expert annotators for the
p(m/n = x) = , (1) ground truth and use Precision, Recall, and F1 as the evaluation
nx
metrics. The second task assesses the impact of the framework on
where p(m/n = x) is the probability of attribute m being present the ranking relevant items for the given query. We use the orders
where n = x, and n x is the number of items in the catalog which of the query-item pair for the ground truth and nDCG@20 as the
have value x for attribute n. Among such items mn is the number evaluation metric. The evaluation results are illustrated in Table 1.
of items having values for attribute m. For a given attribute m, we From the table, we can see that the framework is significantly
compute Eq 1 for all attributes n ∈ {A − m}, resulting in a total of better in identifying attributes of a given e-commerce query than
Na − 1 feature values. According to Eq 1, if the query has attribute the baseline across all the Precision, Recall and F1 metrics. The
value n = x, it is more likely to have attribute m if more items in the improvement in identifying query attributes is also reflected in
catalog with n = x also contain attribute m. In the query “maroon showing better ranking results as shown by the lift in nDCG@20.
5 dvds”, very few items which have the product type “dvds” have The improvement in both the tasks demonstrates that the catalog
values for color, and Eq 1 metric gives a lower value for p(color /q), can be effectively leveraged as a knowledge base to identify at-
the probability of color attribute being present in the query. tributes for a given query in a better manner, and the ability of the
The second metric computes p(m = l/q), the probability of at- metric to effectively capture the relevant catalog information.
tribute m having a value l for the query q. This is formulated as
p(m = l/q) ∝ p(m = l/n = x) ∗ p(n = x/q) 4 CONCLUSIONS AND FUTURE WORK
loд(mnl ) In this paper, we address the problem of identifying attributes for
p(m = l/n = x) = , (2) queries on e-commerce sites. General purpose knowledge bases
loд(n x ) used in identifying attributes for web queries are not focused on
where the number of items in the catalog having the value x for a e-commerce needs. We design a framework that leverages catalog
given attribute n is n x . Among these items, let the number of items as a knowledge base to resolve conflicts in query attributes. We
having value l for attribute m be denoted by mnl . The score is higher evaluate the framework on the set of queries from Walmart.com
if more number of items having the value x for attribute n also have and demonstrate that it significantly improves results in attribute
value l for attribute m. We repeat this for all possible attributes identification and ranking relevant items for e-commerce queries.
n ∈ {A − m} for a given attribute n resulting in an additional Future research directions can include leveraging query catalog
Na − 1 feature values. The value set for a given attribute follow interactions and query sequences to design more involved metrics
a power law distribution where few values are prominent, so we for attribute identification. The utility of catalog in other query
employ log smoothing to make the values linearly distributed. understanding tasks such as query reformulation and type-ahead
We integrate the scores derived from the catalog metrics into a is also an interesting avenue for researchers to explore.
feature set. Our metrics are scalable and hence suited for handling
large-scale traffic common in an e-commerce site. We use an out of REFERENCES
the box classifier on the feature set to determine whether or not [1] Ricardo Baeza-Yates. 2017. Semantic Query Understanding. In Proceedings of
the 40th International ACM SIGIR Conference on Research and Development in
the given query has the attribute value. Information Retrieval (SIGIR ’17). ACM, New York, NY, USA, 1357–1357. https:
//doi.org/10.1145/3077136.3096472
3 EVALUATION [2] Jian Hu, Gang Wang, Fred Lochovsky, Jian-tao Sun, and Zheng Chen. 2009.
Understanding user’s query intent with wikipedia. In Proceedings of the 18th
We next evaluate our framework with the help of traffic weighted international conference on World wide web. ACM, 471–480.
[3] Hang Li and Zhengdong Lu. 2016. Deep learning for information retrieval.
random sample of 20000 queries on Walmart.com. To compare In Proceedings of the 39th International ACM SIGIR conference on Research and
our framework, we use the baseline Dict Lookup which identifies Development in Information Retrieval. ACM, 1203–1206.
attributes for a query by matching overalapping phrases in the [4] Yanen Li, Bo-June Paul Hsu, and ChengXiang Zhai. 2013. Unsupervised identifica-
tion of synonymous query intent templates for attribute intents. In Proceedings of
query with terms in the attribute dictionary. This baseline does the 22nd ACM international conference on Conference on information & knowledge
not address the possible conflicts that can arise between candidate management. ACM, 2029–2038.
attributes. For the evaluation, we take color as the attribute that [5] Xiang Ren, Yujing Wang, Xiao Yu, Jun Yan, Zheng Chen, and Jiawei Han.
2014. Heterogeneous graph-based intent learning with queries, web pages and
has to be predicted for a query and the product type and brand as wikipedia concepts. In Proceedings of the 7th ACM international conference on
the attributes whose values are known for the query. Web search and data mining. ACM, 23–32.
[6] Zhongyuan Wang, Kejun Zhao, Haixun Wang, Xiaofeng Meng, and Ji-Rong Wen.
We design two evaluation tasks. The first task assesses the ef- 2015. Query Understanding through Knowledge-Based Conceptualization.
fectiveness of the framework on identifying attributes of a given