Central Intention Identification for Natural Language
                             Search Query in E-Commerce
                      Xusheng Luo*                                                            Yu Gong*                                        Xi Chen
                   Alibaba Group                                                      Alibaba Group                                      Alibaba Group
             lxs140564@alibaba-inc.com                                          gongyu.gy@alibaba-inc.com                            gongda.cx@taobao.com
 ABSTRACT
This paper is a preliminary work, which studies the problem of                                         I want to buy a pair of stockings for my new short dress.
finding central intention of natural language queries with multiple                                    Query Tagging =>          intention term #1          intention term #2
intention terms in e-commerce search. We believe it is a new and                                       Intent Identiﬁcation =>   central intention
interesting topic since natural language based e-commercial search
is still very young currently. We propose a neural network model                                           Figure 1: Example query with multiple intention terms
with bi-LSTM and attention mechanism, aiming to find the semantic
relatedness between natural language context words and central
intention term. Initial experimental result reports that our model
outperforms baseline method and shows a positive and important                                         Query Tagging, which is similar to Named Entity Recognition[7, 8].
gain brought by a deep network model, comparing to rule based                                          Sometimes, there will be more than one intention term within a
approach.                                                                                              single natural language user query such as “I want to buy a pair
                                                                                                       of stockings for my new short dress.” (Figure 1), where “stockings”
 KEYWORDS                                                                                              and “short dress” are both intention terms, which makes it more
 Query Intent & Understanding, Natural Language Query                                                  difficult for machines to identify the true intention of this query
                                                                                                       (stockings rather than short dress). Cases like this are not rare in
ACM Reference Format:                                                                                  natural language queries, as we found that there are around 20% of
Xusheng Luo[1], Yu Gong[1], and Xi Chen. 2018. Central Intention Identifi-
                                                                                                       voice queries (voice query is more likely to be in natural language
cation for Natural Language Search Query in E-Commerce. In Proceedings
of ACM SIGIR Workshop on eCommerce (SIGIR 2018 eCom). ACM, New
                                                                                                       form since people tend to use natural language as they speak), which
York, NY, USA, 5 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn                                        contains more than one intention term after query tagging. This
                                                                                                       motivates us to identify the central intention of a user query among
 1     INTRODUCTION                                                                                    all intention terms so that our machine can better understand search
                                                                                                       queries.
 As the AI technologies develop rapidly, the services provided by
                                                                                                          Multiple intention terms in one query is also common in nowadays
 e-commerce companies become more and more intelligent. One
                                                                                                       key-word based e-commerce search. However, those queries are
 inevitable tendency, different from earlier online shopping experi-
                                                                                                       tend to be short and in fix-pattern such as “laptop backpack”, where
 ences, is that customers will be able to use natural language instead
                                                                                                       “laptop” and “backpack” are both intention terms and we all know the
 of key words when searching for the products they want to buy. For
                                                                                                       true intention is backpack. In general, we will analyze the query log
 example, customers can ask the online shopping search engine: “I
                                                                                                       and corresponding click log to find out what products the users are
 would like to buy a red fashionable short dress under 200 dollars.”
                                                                                                       clicking and viewing after type the query in the search box and then
 instead of type key words like “short dress, red, fashion, cheaper
                                                                                                       we construct a multi-terms→ central-term map offline. Thus, next
 than 200”. Comparing to key words, using natural language is a
                                                                                                       time when we see a query with multiple short intention terms, we
 more comfortable way for people to go online shopping since it is
                                                                                                       can easily know the actual intention by looking up the map. However,
 the way we communicate with each other in daily life.
                                                                                                       this method is not helpful and limited when dealing with natural
    The very first step for search engine to understand user query is
                                                                                                       language queries, which are much longer and more complicated.
 to identify the query intention. In the case of the previous query,
                                                                                                       With natural language interaction grows, there will be more and
 that means to know it is a dress the customer want to buy. Here
                                                                                                       more new intention combinations.
“short dress” is an intention term (a term can be a word or a phrase),
                                                                                                          We believe a deep model can work more effectively and hence
 which indicates the e-commercial category of a product. The recog-
                                                                                                       we dig a little deeper towards this topic and make the following
 nition of intention term is usually performed by a module called
                                                                                                       contributions:
 * Equal contribution.
                                                                                                           • We propose a new and interesting topic when e-commerce
 Permission to make digital or hard copies of part or all of this work for personal or                       search meet natural language queries with multiple intention
Copyright © 2018 by the paper’s authors. Copying permitted for private and academic purposes.
 classroom
In:            use isG.
    J. Degenhardt,     granted   withoutS.fee
                          Di Fabbrizio,       providedM.that
                                           Kallumadi,        copies
                                                          Kumar,     areLin,
                                                                  Y.-C.   notA.made  or distributed
                                                                                Trotman, H. Zhao             terms. And we attempt to identify the central intention so that
(eds.): Proceedings
 for profit           of the SIGIR
             or commercial         2018 eCom
                                advantage  andworkshop,  12 July,
                                                that copies bear2018,   Ann Arbor,
                                                                  this notice  andMichigan,  USA,
                                                                                   the full citation
published  at http://ceur-ws.org
 on the first  page. Copyrights for third-party components of this work must be honored.
                                                                                                             search engine can better understand queries.
 For all other uses, contact the owner/author(s).                                                          • We present a neural network with bi-LSTM and attention
 SIGIR 2018 eCom, July 2018, Ann Arbor, Michigan, USA                                                        mechanism to effectively capture the rich semantic related-
 © 2018 Copyright held by the owner/author(s).
 ACM ISBN 978-x-xxxx-xxxx-x/YY/MM.
                                                                                                             ness between context words and intention term in user query;
 https://doi.org/10.1145/nnnnnnn.nnnnnnn                                                                     Based on that, we identify the central intention.
SIGIR 2018 eCom, July 2018, Ann Arbor, Michigan, USA                                                                   Xusheng Luo* , Yu Gong* , and Xi Chen


                                                                                                                                         score
                                                               Attention                                              FC
                                                                                                            a


                              Bi-LSTM


                                                                                                                                    FC


                            Embedding

                          Query-context          ! "#$%&$'($)*+$$,$-.$/0                                         Query-intention         123
                                                  x1    x2     …    xi-1   )*+   xi+1    …   xn                                            xi

                                                               Figure 2: Overview of proposed model


      • We try to construct a dataset for experiments and find an                            and return another sequence (h1 , h2 , . . . , hn ) that represents the hid-
        alternative way to train our model although there is no direct                       den state information about the sequence at each time step in the
        ground truth available;                                                              input. In theory, RNNs can learn long dependencies but in practice
      • The proposed neural network model outperforms baseline                               they seem to be biased towards their most recent inputs of the se-
        method which is based on dependency parsing. Future work                             quence. Thus, LSTMs [3] are proposed and they have shown great
        is ongoing towards data collection, model upgrade, etc.                              capabilities to capture long-range dependencies.
                                                                                                To encode the query context, we first look up an embedding matrix
2     APPROACH                                                                               E x ∈ Rd ×v to get the term embeddings q = (x1 , x2 , . . . , xi−1 , [X],
The central intention identification task is defined as follows. The                         xi+1 , . . . , xn ). Here, [X] is a wildcard embedding to indicate the
input query is a sequence of word terms q = (x 1 , x 2 , . . . , x n ), with at              position of intention term in the query. d denotes the dimension
least two intention terms. A term x i can be a word or a phrase. Our                         of the embeddings and v denotes the vocabulary size of natural
task is to output only one intention term x i as the central intention,                      language words. Then, the embeddings are fed into a bidirectional
while other intention terms modify the central intention. Defined                            LSTM networks. If we use unidirectional LSTM, the outcome of
in this way, we actually make a hypothesis that each search query                            current word is only based on the words before it so the information
contains only one actual goal product. We do not consider queries                            of the words after it is totally lost. To avoid this, we use bi-LSTM
where a user ask for two or more items at the same time.                                     which consists a forward network handles the query from left to right
   Now, we describe our neural network model and baseline method                             and a backward network does in the reverse order. Therefore, we get
                                                                                                                                →
                                                                                                                                − →−        −
                                                                                                                                            →
for query intention identification. Figure 2 gives a general view of                         two hidden state sequences, (h1 , h2 , . . . , hn ) from forward network
the proposed neural network model. Given the context words of a                                     ←− ←    −       ←−
                                                                                             and (h1 , h2 , . . . , hn ) from backward network. We concatenate the
query qc = (x 1 , x 2 , . . . , x i−1 , x i+1 , . . . , x n ), which is the terms left       forward hidden state of each word with corresponding backward
after taking the intention term way , together with the intention term                                                                                  →
                                                                                                                                                        − ← −
                                                                                             hidden state, resulting in a representation Hi = [hi ; hi ] ∈ Rk ×1 .
qi = x i , our model will output a score score(qc, qi), measuring the
                                                                                             Thus, we obtain the representation of each word in the query context.
compatibility between them.
                                                                                                Attention mechanisms [1, 4] have become an integral part of
                                                                                             sequence modeling and transduction models in various NLP tasks,
2.1     Term Embedding
                                                                                             which allows better understanding sequential data. Based on our
Typically, a term contains up to three words, thus we simply represent                       assumption, different intention terms should have different attention
it as the average embedding of the words it contains. We train word                          towards the same query, The extent of attention can be calculated
embeddings and term embeddings on large text corpus. Embeddings                              by the relatedness between each word representation Hi and an
are fed to model as input and will be updated during training.                               intention embedding qi, where qi = WiT xi and Wi ∈ Rk ×1 . We
                                                                                             propose the following formulas to calculate the attention weights.
2.2     Bi-LSTM with Attention
Recurrent neural networks (RNNs) are a powerful family of neural
networks designed for sequential data and have shown great promise                                                            exp(w i )
                                                                                                                       ai = Ín                                       (1)
in many NLP tasks. RNNs take a sequence of vector (x1 , x2 , . . . , xn )                                                    i=1 exp(w i )
Central Intention Identification for Natural Language
Search Query in E-Commerce                                                       SIGIR 2018 eCom, July 2018, Ann Arbor, Michigan, USA

                                                                           intention term with highest output score is considered as the central
                        w i = WaT (tanh[Hi ; qi]) + b               (2)    intention.
   Here, ai denotes the attention weight of the ith term in the query
context, in terms of intention e, where qi is a hidden representation of
                                                                           2.4     Baseline
one intention term. n is the length of the query. Wa ∈ R2k ×1 is an in-    We use a rule based method as our baseline method. We perform
termediate matrix and b is an offset value. These two parameters are       dependency parsing on the input user query. A dependency parser
randomly initialized and updated during training. Subsequently, the        analyzes the grammatical structure of a sentence, establishing re-
attention weights a (Figure 2) are employed to calculate a weighted        lationships between “head” words and words which modify those
sum of the query terms, resulting in a semantic representation qc          heads. Among all the intention terms, we choose the one at high-
which represents the query context, according to the specific inten-       est position in the parsing tree as the central intention. As shown
tion term.                                                                 in Figure 3, we use an internal e-commercial query parser as our
                                                                           baseline method. In this example of query “I want to buy a pair of
                                      n
                                      Õ                                    stockings for my new short dress (我 想要 一双 搭配 连衣裙 的 长
                               qc =         a i Hi                  (3)    筒 丝袜)”, “丝袜 (Stocking)” is at a higher position than “连衣裙
                                      i=1                                  (Dress)” in the parsing tree. Thus we choose “丝袜 (Stocking)” as
   Thus, the final output score which is regarded as a measurement         the central intention of this query.
of the compatibility of query context qc and intention term qi can be
calculated as follows.                                                                                                              OBJ
                                                                                            SUB   VMOD     NMOD                           NMOD
                                                                                     ROOT                                    DEP            NMOD


                                                                                        !"#$"%&"'(")*+","-."/0
                             S(qc, qi) = qc · qi                    (4)
   Therefore, we use intention term qi as attention query to guide         ROOT
the model weighting each context term differently, aiming to better                                                  intention #1            intention #2
justify compatibility between current intention term and the whole
                                                                                                                                          central intention
user query. When we consider an intention term, we will re-read the
query to find out which part of the query should be more focused
(handling attention). We believe that this attention mechanism is          Figure 3: Dependency parsing example of query with multiple
beneficial for the system to better understand the query with the help     intention terms
of the intention term, and leads to a performance improvement.

2.3    Training and Prediction                                             3 EXPERIMENTS
Since there is no ground truth currently and it is extremely costly to     3.1 Dataset
annotate the central intention for user queries with multiple intention
                                                                           We train our model on 10, 000 single intention Chinese voice search
terms. Thus, we choose those natural language queries with only one
                                                                           queries and test on two datasets. We filter out queries whose length
intention term as our training data. We believe it is a reasonable de-
                                                                           is shorter than 10 words. One is single-intention query set. We con-
generation since our goal is to dig the semantic relationship between
                                                                           struct it by corrupting the intention term of 10, 000 single-intention
natural language context words and some target intention term. This
                                                                           queries with randomly chosen intention terms. The other one is multi-
relatedness can be learned from single-intention queries and then
                                                                           intention query set. It contains 300 multi-intention search queries,
apply to multi-intention queries. We use a dynamic programming
                                                                           which consists of 150 2-intentions queries, 100 3-intentions queries
max-matching algorithm to match terms in the query to an exist-
                                                                           and 50 4-intentions queries. The size of this dataset is limited since
ing dictionary containing all the intention terms such as “连衣裙
                                                                           it need a lot of human labeling efforts. We use an e-commerce query
(Dress)” and “丝袜 (Stocking)”. We only keep queries with only one
                                                                           tagging tool to preprocess all the training and testing queries.
exactly matched intention term. After this “query tagging” step, we
can identify the intention term and regard <query context, intention
                                                                           3.2     Implement Details
term> pair in each query as a positive sample. Then we randomly
choose some unrelated intention terms as negative samples. We use          We pre-train word and term embeddings on a large Chinese e-
hinge loss to train the model:                                             commerce corpus. This corpus comes from a module in Chinese
                Õ                                                          e-commerce giant Taobao* named “有好货”† , which is written by
       loss =         max(0, 1 − score(qc, qi) + score(qc, qi ′ ))  (5)    online merchants. We use word2vec [5] CBOW model with con-
              qi ′ ∈N                                                      text window size 5, negative sampling size 5, iteration steps 5 and
                                                                           hierarchical softmax 1. The size of pre-trained word embeddings
Where qc is the query context, qi is the positive query intention and
                                                                           is set to 200. For Out-Of-Vocabulary (OOV) words, embeddings
qi ′ is the corrupted query intention term from negative samples N .
                                                                           are initialized as zero. All embeddings are updated during training.
The function score represents the model output.
                                                                           We use an e-commerce Chinese word segmentation tool for word
    We evaluate our model on a dataset labeled by human. Each query
                                                                           segmentation.
in our testing set contains more than one intention term. When testing
a query with one intention term of it, we take away the intention          * https://www.taobao.com/
term and feed the rest of query, i.e. query context into model. The        † https://h5.m.taobao.com/lanlan/index.html
SIGIR 2018 eCom, July 2018, Ann Arbor, Michigan, USA                                                     Xusheng Luo* , Yu Gong* , and Xi Chen

                                            Table 1: Real cases of central intention identification

                                      我 想要 穿着 显瘦 only 牌子 的 连衣裙 最好 是 能 搭配 耳坠 的。
                             #1
                                  I want to buy an ONLY-brand thin-looking dress which is suitable for earrings.
                                                   汽车 上面 用 的 那个 小的 吸尘器 有没有 的？
                             #2
                                                    Do you have small vacuum cleaner for cars?
                                                    黄色 T恤衫 前面 就是 有 2个 耳坠 那种。
                             #3
                                                   Yellow T-shit with a pair of earrings in the front.
                                              打 篮球 踢 足球 都 可以 穿 的 nike 鞋，没有 鞋带。
                             #4
                                            Nike shoes without shoelace, for both basketball and soccer.


        Table 2: Accuracies on Single-Intention Queries                      between context words and intention terms regardless of sentence
                                                                             size.
                       Approach             Acc
                    Model (- attention)    0.803
                    Model (+ attention)    0.813
                                                                             3.4    Case Study & Error Analysis
                                                                            In Table 1, we show some real cases of intention identification of
        Table 3: Accuracies on Multi-Intention Queries
                                                                            search queries. In each case, the underlined terms are the intention
                                                                            terms recognized by query tagging and the red-colored term is the
          Approach        2-intents   3-intents    4-intents                central intention identified by model. Take the first query “我 想
          Baseline          0.60        0.54         0.32                   要 穿着 显瘦 only 牌子 的 连衣裙 最好 是 能 搭配 耳坠 的。”
         Model (- att)      0.67        0.66         0.40                   as example, the baseline method using e-commercial dependency
         Model (+ att)      0.68        0.67         0.46                   parsing regards “耳坠 (Earring)” as root thus discards terms includ-
                                                                            ing “连衣裙 (Dress)” which is actually the true central intention.
                                                                            Our model can output the correct intention after seeing enough se-
   For recurrent neural network component in our system, we used
                                                                            mantic information in training data and believes “穿着”,“显瘦”,
a two-layers LSTM network with unit size 512. All natural language
                                                                            “only” are more likely to describe “连衣裙 (Dress)” rather than “耳
queries are padded to a maximum sentence length of 30. We use
                                                                            坠（Earring）”.
Adam optimizer, and the learning rate is initialized with 0.01.
                                                                               Since this work is in the preliminary stage, we actually find several
   For baseline method, we use an internal e-commercial query
                                                                            problems in our experiments. First, the quality of queries are not as
parser to do dependency parsing. This parser is similar as the famous
                                                                            high as what we expect. Currently the main interactive way between
Stanford Dependency Parser [2] but is optimized specially for e-
                                                                            a customer and online e-commerce search engine is still based on key
commercial scenario.
                                                                            words. Thus, at current stage, it is hard to get enough high-quality
                                                                            natural language query log. That is why we choose voice queries as
3.3    End-to-end Result                                                    the source of natural language queries. However, the precision of
Now we report the experimental results as follows. First we show the        speech recognition becomes a problem, especially when people say
accuracy on single-intention query set. The goal of this experiment is      something very domain-specific.
to evaluate the training quality explicitly. The model has to identify         Second, the habit of using key words to do online shopping can
the correct intention terms from the corrupted ones. As shown in            not be easily changed. Within voice queries, there still exists quite
Table 2, it achieves 0.813 in accuracy. Considering the user queries        a few queries which are some combination of several similar key
always contain a lot of noises, this number shows power of our              words which actually mean the same product. However, the goal of
model at learning semantic relations between natural language query         our model is to dig the semantic relatedness between query words
context and query intention. Besides, the result proves that attention      and intention terms. This idea can not hold if the terms of a query
mechanism is effective in this task.                                        are not in natural order or the query is not even a natural language
   In the experiment on multi-intention query set, we assigned three        sentence.
human annotators to judge whether the model output is correct, i.e.            Besides, we also find some cases where simple rule or patterns
whether the intention term with the highest score is the central query      may works better than deep models. For example, the central in-
intention. Based on majority voting, we calculate the accuracy in           tention of “连衣裙上面的绿色纽扣(Green buttons of dress)” is
Table 3. Our model with attention mechanism outperforms baseline            “纽扣 (button)” but it becomes “连衣裙(dress)” if we change only
method and the one without attention mechanism by up to 13%.                one word to “连衣裙上面有绿色纽扣 (Dress with green buttons)”.
Baseline method based on dependency parsing suffers from bad                Although these cases are rare and extreme, it is indeed a challenge
performance on short sentence, since search queries in e-commerce           for our model. Maybe some syntactic and rule based features should
tend to be short and less grammatical. On the other hand, deep neural       be fed to model somehow to help it deal with this problem.
network model shows potential to learn rich semantic relatedness
Central Intention Identification for Natural Language
Search Query in E-Commerce                                                                   SIGIR 2018 eCom, July 2018, Ann Arbor, Michigan, USA


4     FUTURE WORK
In this paper, we explore the area where e-commerce search queries
are in natural language form and multiple intention terms are appear-
ing together in the same query. We proposed a deep neural network
to identify the true intention and made some delighted progress com-
paring to rule based method. In the future, we will try to construct a
larger and cleaner dataset for both training and testing and make it
public. This work is a preliminary attempt currently and it need to be
further improved such as adding syntactical and rule based features
to the model in the future.

REFERENCES
[1] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine
    translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
    (2014).
[2] Danqi Chen and Christopher Manning. 2014. A fast and accurate dependency
    parser using neural networks. In Proceedings of the 2014 conference on empirical
    methods in natural language processing (EMNLP). 740–750.
[3] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural
    computation 9, 8 (1997), 1735–1780.
[4] Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effec-
    tive approaches to attention-based neural machine translation. arXiv preprint
    arXiv:1508.04025 (2015).
[5] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient
    estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
    (2013).
[6] Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khu-
    danpur. 2010. Recurrent neural network based language model. In Eleventh Annual
    Conference of the International Speech Communication Association.
[7] David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and
    classification. Lingvisticae Investigationes 30, 1 (2007), 3–26.
[8] Erik F Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-
    2003 shared task: Language-independent named entity recognition. In Proceedings
    of the seventh conference on Natural language learning at HLT-NAACL 2003-
    Volume 4. Association for Computational Linguistics, 142–147.