<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>E-commerce Query Classification Using Product Taxonomy Mapping: A Transfer Learning Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Surya Kallumadi</string-name>
          <email>surya@ksu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>The Home Depot</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>1245</fpage>
      <lpage>1248</lpage>
      <abstract>
        <p>In web search, query classification (QC) is used to map a query to a user's search intent. In the e-commerce domain, user's product search queries can be broadly categorised into product specific queries and category specific queries [ 9]. In these instances, accurate classification of queries will help with identifying the right product categories from which relevant products can be retrieved. Thus, mapping a query to a pre-defined product taxonomy is an important step in e-commerce query understanding pipeline. A typical e-commerce website has thousands of categories, and curating a labeled data set for query classification is expensive, time consuming, and labor intensive. In addition, product search queries are short, and the vocabulary changes over time as the catalogue evolves. Reducing this efort of generating query-category labels would save time and resources. In this work we show how an existing product-taxonomy mapping can improve query classification, and reduce the need for labeled data, using transfer learning. Our results demonstrate that such an approach can match, and often exceed, the performance of direct training with a smaller computational budget. We further explore how performance varies as the amount of available training data varies, and show that transfer learning is most useful when the target data set size is small. In addition, we make available a large query data set of 535, 506 unique e-commerce labeled queries, mapped over 58 categories. The results and transfer learning approaches presented in this work can act as strong baselines for this collection and task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Information systems → Clustering and classification;</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>In the e-commerce domain query understanding can have a
significant impact on user satisfaction. An incorrectly interpreted query
can lead to search abandonment by the user, resulting in lower
conversion rates. E-commerce queries are usually short and lack
linguistic structure, and they can be ambiguous as a result. For
example the query ‘battery lawn tractor’, can be interpreted
as ‘battery for lawn tractor’ or ‘battery operated lawn
tractor’.</p>
      <p>In product search, the objective of query classification is to map
a user query to a pre-defined product category. QC can improve the
relevance of search results while preserving the recall. A typical
e-commerce site such as Amazon.com can have millions of products,
and thousands of product categories of various granularities.
Curating a query-category labeled data set with good coverage over
all the categories is expensive, labor intensive, and can take a long
time. Approaches that can reduce the efort needed to categorize
the search queries can significantly improve the performance of
QC. In this work, we propose a transfer learning approach for QC
by using product titles. As the products in the domain are mapped
to a well defined product taxonomy, the product mapping can be
exploited to improve QC, and reduce the need for labeled data.</p>
      <p>
        Transfer learning has proven to be an efective technique to
improve the performance of various tasks in computer vision and
natural language processing (NLP) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The goal of transfer learning
is to utilize knowledge present within a source domain to improve
a task within a target domain. Neural network and deep learning
based transfer learning approaches have been shown to be quite
useful to improve the performance of a wide range of target tasks in
NLP [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. To demonstrate transfer learning for QC in the e-commerce
domain, we use Amazon.com titles as the source data set [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and
queries obtained by crawling Amazon.com auto-complete service
as target data set.
      </p>
      <p>Academic research for e-commerce query classification task
has been limited because of a lack of availability of labeled data.
Through this work, and the query-category data set made available,
we hope to facilitate progress in this research area. In addition
to the introduction of a new data set, our contributions are as
follows: 1) We present a methodology for this domain-specific transfer
learning, in which the source model is tuned as a classifier on a
similar problem. 2) We demonstrate that such an approach can be
leveraged to speed training and improve results when compared
to direct training. 3) We explore the impact of target data size on
both direct and transferred models, showing that transfer learning
improves more on direct training as the target training data shrinks.
2</p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>
        In the query classification challenge, organized by ACM KDD cup
2005 competition, the task was to categorize 800, 000 web queries
into 67 predefined categories [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The data set for this challenge
contained 111 queries with category mappings, and the queries in
the test data set can be tagged by up to 5 categories. The submissions
were evaluated on an 800 query subset of the complete data set.
This competition highlighted the challenge of assigning labels to
queries.
Product Titles Category
Compaq 256MB 168-Pin 100Mhz DIMM SDRAM for Compaq Proliant Electronics
EK Ekcessories 10708C-BLUE-AM Blue Jeep Visor Clip Automotive
NHL Chicago Blackhawks Franchise Fitted Hat, Black, Extra Large Sports &amp; Outdoors
Sesame Street Robe with Embroidered Washcloth Health &amp; Personal Care
Emerica Men’s The Westgate Skate Shoe Clothing, Shoes &amp; Jewelry
Queries Category
13mm wrench tools
hip action zukes peanut butter pets
nerf guns under 30 dollars toys-and-games
bernaise sauce mix grocery
door lever lock child proof baby-products
Table 1: Examples from the source (titles) and target
(queries) data sets.
      </p>
      <p>
        Lin et al. propose using implicit feedback from user clicks as a
signal to collect training data for QC in e-commerce domain [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. We
consider this work to be complementary to the transfer learning
approach we propose in this paper. Leveraging user click stream
data and the product hierarchy together can be used to improve
the overall system performance. Click stream data is useful when a
suficient amount of user behavior has been observed for a category,
but this fails for new categories and items. The transfer learning
approach exploiting product titles does not sufer from item and
category cold start.
      </p>
      <p>
        Sondhi et al. identify a taxonomy of e-commerce queries intents,
based on search logs and user behavior data [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This work identifies
ifve categories of e-commerce queries based on user search behavior:
1) Shallow Exploration Queries, 2) Targeted Purchase Queries, 3)
Major-Item Shopping Queries, 4) Minor-Item Shopping Queries,
and 5) Hard-Choice Shopping Queries. This paper highlights the
complexity of user intent in the e-commerce domain, and proposes
techniques for leveraging these insights.
3
      </p>
      <p>
        DATA COLLECTION AND DATA SET
Domain adaptation and transfer learning usually requires two data
sets, a source data set and a target data set. For supervised tasks
such as QC, transfer learning would help in scenarios where we
have very little training data in the target data set, and lots of data in
the source data set. Also, the source and target data set should have
similar characteristics. In this work, as product titles and queries
share a similar vocabulary, we chose product titles as the source
data set. McAuley et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] provide a crawl of Amazon.com’s product
pages including 142.8 million reviews, 9.43 million products, and
6.83 million titles1. We utilize the titles data available in this data
set as the source data for transfer learning.
      </p>
      <p>As no product-query data sets are publicly available for QC, we
leveraged Amazon.com’s auto-completion to generate e-commerce
queries2. In addition to providing suggestions for partial queries,
auto-complete also provides high level candidate categories for
suggested queries. These query-category results serve as our target
data set for the QC task. The seeds for auto-complete crawl were
common terms and phrases found in the data set by McAuley et al.
In addition, we used random alpha-numeric character combination
as seeds for the query crawl. A total of 535, 506 query-category
labels were obtained by this exercise. To ascertain the accuracy of
this data, we manually evaluated 1000 randomly sampled queries
1http://jmcauley.ucsd.edu/data/amazon/
2http://completion.amazon.com/api/2017/suggestions
140000
120000
100000</p>
      <p>Training Validation Test
1
2
3
4
5
from this data set. The query-category labels suggested by
autocomplete had an accuracy of 98.6%. The auto-complete crawl was
performed over a duration of 1 week, in December 2018. The queries
in the resulting data-set were mapped to 58 high level categories.
3.1</p>
    </sec>
    <sec id="sec-4">
      <title>Data Splits</title>
      <p>Both the source and target data sets are split into training, validation,
and test sets, stratified by category. This resulted in 5, 811, 656
training examples for the source data, 500, 000 validation examples
and 500, 000 test examples. The target data had 435, 506 training
examples, with 50, 000 examples reserved for validation and test sets
each. The target training data was also progressively sub-sampled
to create smaller training sets of 50%, 20%, and 10% of the original
data, each a subset of the previous sample. In Figure 1 we can
see that the length of queries is similarly distributed across the 3
splits. Both the validation set and the training set show a Pearson’s
correlation of &gt; 0.99 with the test set. Due to the use of stratified
sampling, the category distributions over the three sets are similar.
3.2</p>
    </sec>
    <sec id="sec-5">
      <title>Data Characteristics</title>
      <p>
        SYSTEM ARCHITECTURE DESCRIPTION
Recent work in NLP has shown the wide utility of Long
ShortTerm Memory (LSTM) architectures for transfer learning tasks [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
Howard and Ruder used a pre-trained LSTM architecture to achieve
state-of-the-art results on several text classification tasks [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The
Balanced Pooling View (BPV) architecture, which builds on these
approaches, has been shown to be efective for product taxonomy
classification tasks [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        The model architecture, which can be seen in Figure 2, is centered
around a character-level LSTM, which is fed via an a embedding.
The time series output from the Recurrent Neural Network (RNN) is
then summarized in 4 ways: by taking the last value as in a typical
RNN architecture, and then with mean-pooling, max-pooling, and
min-pooling. Those 4 summaries are concatenated and fed through
a linear layer with output size equal to the number of categories.
When transferring, only the output layer needs to be replaced, in
order to accommodate the new category space. The embedding size,
RNN width and depth, and dropout settings are all set as in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>On the target problem, we explore two diferent training styles,
1) target only direct training and 2) transfer learning from a source
model. Direct training only uses the target data, without reference
to either the source model or the source data. Transfer learning
uses the source model to initialize network weights, replacing the
output layer to accommodate the new category set, and then
otherwise proceeding as before. Adam optimization was found to be
consistently better than stochastic gradient descent (SGD) and is
used for all target models. Cross-entropy loss is used throughout.</p>
      <p>Final hyper-parameters were tuned using a grid search around
those initial values, varying the learning rate schedule and peak
learning rate, as well as the number of training epochs for direct
training. Transfer learning was fixed at 5 epochs throughout, since
any increase in the number of epochs led to overfitting and an
increasing validation loss. This process was performed separately
for direct training and transfer learning, as well as for each of the 4
data scales.</p>
      <p>Hyper-parameters with consistently strong validation results
were then chosen for each of the two training styles. A learning rate
of 0.003 was best for all variants. A linearly decreasing "burndown"
schedule was better than 1cycle or a flat learning rate for transfer.
Direct training was most efective with 10 epochs when trained on
subsets of the target data, but better still with 20 epochs on a full
100% of the target data. Once settled, these parameters were used in
4 independent training runs for each training style and data scale.
Each model was used to make predictions over the test set, and the
results are based on these predictions.
We report cross-entropy loss, accuracy, precision, recall, and F 1
scores for our models. As the queries are not uniformly distributed
across the categories, we use weighted precision, recall, and F 1 to
measure the performance of the approaches on the test data. If (Pi ),
(Ri ) and (F 1i ) are precision, recall, and F 1 scores for each category
ci , then the corresponding weighted metrics can be calculated as:
Pw =</p>
      <p>K
Õ ni
i=1 N</p>
      <p>Pi
(1)
6</p>
    </sec>
    <sec id="sec-6">
      <title>RESULTS</title>
      <p>Rw =</p>
      <p>K
Õ ni
i=1 N</p>
      <p>Ri
(2)</p>
      <p>F 1w =</p>
      <p>K
Õ ni
i=1 N</p>
      <p>F1i
(3)
Figure 3 shows the results for test loss as the amount of target data
varies, for each of the two training approaches. The advantages
of transfer learning are most apparent at low data scales, where it
produces significantly better results. The two approaches eventually
converge in performance as target data becomes fully available.
Figure 4 shows the equivalent results for accuracy. In this case the
performance diference is not as large, and direct training closes
the gap at 50% of the target data. This corresponds to a regime in
which the training loss continues to drop rapidly while validation
loss levels of, which might indicate overfitting.
10%
20%
50%
100%
10
20
50</p>
      <p>100</p>
      <p>Table 3 shows the overall weighted precision, recall, and F 1
scores for each training variant across the diferent target data
scales. Recall is equal to the accuracy metrics reported in Figure 4.
Table 4 shows the per-category results in the case when the target
training data set is small (10%), for categories with at least 100 test
examples. Transfer learning is able to improve F 1 for nearly all
categories, sometimes significantly, and for categories that were
both dificult as well as easy for the directly trained model. Transfer
learning was particularly helpful for rare categories. The top 6 F 1
improvements (bolded) were achieved on the 6 categories with
the fewest examples in the 10% subset of target training data. This
highlights the benefit of a transfer learning approach for cold start
categories and items.</p>
    </sec>
    <sec id="sec-7">
      <title>7 CONCLUSION</title>
      <p>Our results show that product-title data is an efective pre-training
source for query-taxonomy classification. When there is not much
training data, transfer learning improves the quality of the final
target models. Although the results converge for larger target data
sets, we observe that pre-trained transfer learning models converge
in fewer epochs than models trained only on the target data set.</p>
      <p>This convergence is noteworthy and worth exploring in more
detail. The implication is that, at a certain data scale, the source
model does not contain any information that is more useful than
that in the target data. One possible reason for this is that the
model architecture can only encode so much information, and it
may be the case that the full target data can saturate it. If so, then
increasing the size of the pre-trained source model might lead to
further improvements.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Hal</given-names>
            <surname>Daumé</surname>
          </string-name>
          , III,
          <string-name>
            <surname>Abhishek</surname>
            <given-names>Kumar</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>Avishek</given-names>
            <surname>Saha</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Frustratingly Easy Semi-supervised Domain Adaptation</article-title>
          .
          <source>In Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing (DANLP</source>
          <year>2010</year>
          ).
          <article-title>Association for Computational Linguistics</article-title>
          , Stroudsburg, PA, USA,
          <fpage>53</fpage>
          -
          <lpage>59</lpage>
          . http://dl.acm.org/ citation.cfm?id=
          <volume>1870526</volume>
          .
          <fpage>1870534</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Jeremy</given-names>
            <surname>Howard</surname>
          </string-name>
          and
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Ruder</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Universal Language Model Fine-tuning for Text Classification</article-title>
          . (
          <year>2018</year>
          ). arXiv:arXiv:
          <year>1801</year>
          .06146
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Ying</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Zijian</given-names>
            <surname>Zheng</surname>
          </string-name>
          , and
          <string-name>
            <surname>Honghua (Kathy) Dai</surname>
          </string-name>
          .
          <year>2005</year>
          . KDD CUP-2005
          <source>Report: Facing a Great Challenge. SIGKDD Explor. Newsl. 7</source>
          ,
          <issue>2</issue>
          (Dec.
          <year>2005</year>
          ),
          <fpage>91</fpage>
          -
          <lpage>99</lpage>
          . https: //doi.org/10.1145/1117454.1117466
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Datta</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. D.</given-names>
            <surname>Fabbrizio</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>E-commerce Product Query Classification Using Implicit UserâĂŹs Feedback from Clicks</article-title>
          .
          <source>In 2018 IEEE International Conference on Big Data (Big Data)</source>
          .
          <fpage>1955</fpage>
          -
          <lpage>1959</lpage>
          . https://doi.org/10.1109/BigData.
          <year>2018</year>
          .8622008
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J</given-names>
            <surname>McAuley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R</given-names>
            <surname>Pandey</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J</given-names>
            <surname>Leskovec</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Inferring Networks of Substitutable and Complementary Products</article-title>
          .
          <source>In KDD</source>
          <year>2015</year>
          .
          <volume>785</volume>
          -
          <fpage>794</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Stephen</given-names>
            <surname>Merity</surname>
          </string-name>
          , Nitish Shirish Keskar, and Richard Socher.
          <year>2017</year>
          .
          <article-title>Regularizing and Optimizing LSTM Language Models</article-title>
          . (
          <year>2017</year>
          ).
          <source>arXiv:arXiv:1708.02182</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Lili</surname>
            <given-names>Mou</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            <given-names>Meng</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rui Yan</surname>
            ,
            <given-names>Ge</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            , Yan Xu, Lu Zhang, and
            <given-names>Zhi</given-names>
          </string-name>
          <string-name>
            <surname>Jin</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>How Transferable are Neural Networks in NLP Applications?</article-title>
          .
          <source>In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics</source>
          ,
          <fpage>479</fpage>
          -
          <lpage>489</lpage>
          . https://doi.org/10.18653/v1/
          <fpage>D16</fpage>
          -1046
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M</given-names>
            <surname>Skinner</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Product Categorization with LSTMs and Balanced Pooling Views</article-title>
          .
          <source>In Proceedings of the 2018 SIGIR Workshop On eCommerce.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Parikshit</given-names>
            <surname>Sondhi</surname>
          </string-name>
          , Mohit Sharma, Pranam Kolari, and
          <string-name>
            <given-names>Chengxiang</given-names>
            <surname>Zhai</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>A taxonomy of queries for e-commerce search</article-title>
          .
          <source>In 41st International ACM SIGIR</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>