=Paper=
{{Paper
|id=Vol-2446/paper4
|storemode=property
|title=Context-aware Deep Model for Entity Recommendation in Search Engine at Alibaba
|pdfUrl=https://ceur-ws.org/Vol-2446/paper4.pdf
|volume=Vol-2446
|authors=Qianghuai Jia,Ningyu Zhang,Nengwei Hua
|dblpUrl=https://dblp.org/rec/conf/eyre/Jia0H19
}}
==Context-aware Deep Model for Entity Recommendation in Search Engine at Alibaba==
<pdf width="1500px">https://ceur-ws.org/Vol-2446/paper4.pdf</pdf>
<pre>
Context-Aware Deep Model for Entity Recommendation System
                in Search Engine at Alibaba
                   Qianghuai Jia                                            Ningyu Zhang                                  Nengwei Hua∗
        qianghuai.jqh@alibaba-inc.com                               ningyu.zny@alibaba-inc.com                  nengwei.huanw@alibaba-inc.com
               Alibaba Group                                              Alibaba Group                                 Alibaba Group
                  Hangzhou                                                   Hangzhou                                     Hangzhou

ABSTRACT                                                                                                                          Query
                                                                                                                                  What food is good for
Entity recommendation, providing search users with an improved                                                                    cold weather
experience via assisting them in finding related entities for a given
query, has become an indispensable feature of today’s search en-
gines. Existing studies typically only consider the queries with
explicit entities. They usually fail to handle complex queries that
without entities, such as "what food is good for cold weather", be-
cause their models could not infer the underlying meaning of the
input text. In this work, we believe that contexts convey valuable                                                                Concepts of Entities
evidence that could facilitate the semantic modeling of queries, and                                                              Food
take them into consideration for entity recommendation. In order
to better model the semantics of queries and entities, we learn the
representation of queries and entities jointly with attentive deep                                                                Recommended Entities
neural networks. We evaluate our approach using large-scale, real-                                                                1.Grain nutrition powder
world search logs from a widely used commercial Chinese search                                                                    2.Honey walnut kernel
engine. Our system has been deployed in ShenMa Search Engine 1                                                                    3.Almond milk
and you can fetch it in UC Browser of Alibaba. Results from online
A/B test suggest that the impression efficiency of click-through rate
increased by 5.1% and page view increased by 5.5%.

CCS CONCEPTS
                                                                                        Figure 1: Example of entity recommendation results for the
• Information systems → Query suggestion.                                               query "what food is good for cold weather."

KEYWORDS
Entity Recommendation, Deep Neural Networks, Query Under-                               those queries with implicit user needs. Through analyzing hun-
standing, Knowledge Graph, Cognitive Concept Graph                                      dreds of million unique queries from search logs with named entity
                                                                                        recognition technology, we have found that more than 50% of the
                                                                                        queries do not have explicit entities. In our opinion, those queries
1    INTRODUCTION
                                                                                        without explicit entities are valuable for entity recommendation.
Over the past few years, major commercial search engines have                               The queries convey insights into a user’s current information
enriched and improved the user experience by proactively present-                       need, which enable us to provide the user with more relevant entity
ing related entities for a query along with the regular web search                      recommendations and improve user experience. For example, a
results. Figure 1 shows an example of Alibaba ShenMa search en-                         user’s search intent behind the query "what food is good for cold
gine’s entity recommendation results presented on the panel of its                      weather" could be a kind of food suitable to eat in cold weather. How-
mobile search result page.                                                              ever, most of the entities recommended for the query are mainly
   Existing studies [2, 7] in entity recommendation typically con-                      based on entities existed in the query such as given the query "cake"
sider the query containing explicit entities, while ignoring those                      and recommend those entities "cupcakes," "chocolate" and so on,
queries without entities. A main common drawback of these ap-                           and there is no explicit entity called "good food for cold weather" at
proaches is that they cannot handle well the complex queries, be-                       all. It is very likely that the user is interested in the search engine
cause they do not have informative evidence other than the entity                       that is able to recommend entities with arbitrary queries.
itself for retrieving related entities with the same surface form.                          However, recommending entities with such complex queries is
Therefore, existing entity recommendation systems tend to recom-                        extremely challenging. At first, many existing recommendation al-
mend entities with regard to the explicitly asked meaning, ignoring                     gorithms proven to work well on small problems but fail to operate
1 m.sm.cn                                                                               on a large scale. Highly specialized distributed learning algorithms
                                                                                        and efficient serving systems are essential for handling search en-
Copyright Âľ 2019 for this paper by its authors. Use permitted under Creative Commons   gine’s massive queries and candidate entities. Secondly, user queries
License Attribution 4.0 International (CC BY 4.0).                                      are extremely complex and diverse, and it is quite challenging to
understand the user’s true intention. Furthermore, historical user
behavior on the search engine is inherently difficult to predict due


                                                                                &""%"""$")&$
to sparsity and a variety of unobservable external factors. We rarely                                                      !!
                                                                                                                                            # ""$&              *#
                                                                                                                                                 &!                &$!
                                                                                                                                                                                        !!
obtain the ground truth of user satisfaction and instead model noisy                                                                                                                    $!!
                                                                                                                          $                                          '%&$!
implicit feedback signals.                                                                                               &!
                                                                                                                                             $#  !
                                                                                                                                                                                         
   In this paper, we study the problem of context-aware entity                                                                                 ""$&               $!&"
                                                                                                                                                                                      $&"!
                                                                                                                          !
                                                                                                                                                &$!                 !
recommendation and investigate how to utilize the queries without
explicit entities to improve the entity recommendation quality. Our
                                                                                                                      '$*$"%%!   &!&&*&$(           &!!
approach is based on neural networks, which maps both queries
and candidate entities into vector space via large-scale distributed                                                      '$* "                    !") $#           "!&("!#& $#
training.
   We evaluate our approach using large-scale, real-world search
logs of a widely used commercial Chinese search engine. Our system
has been deployed in ShenMa Search Engine and you can experience
this feature in UC Browser of Alibaba. Results from online A/B test         Figure 2: System overview of entity recommendation in
involving a large number of real users suggest that the impression          ShenMa search engine at Alibaba, the red part is the focus
efficiency of click-through rate (CTR) increased by 5.1% and page           of this paper.
view (PV) increased by 5.5%.
   The main contributions of our paper are summarized as follows:
                                                                               Our objective is to infer entities given diverse and complex
    • To the best of our knowledge, we are the first approach               queries for search assistance. Actually, there are little research
      to recommend entities for arbitrary queries in large-scale            papers that focus on this issue. In industry, there are three simple
      Chinese search engine.                                                approaches to handle those complex queries. One is tagging the
    • Our approach is flexible capable of recommending entities             query and then recommend the relevant entities based on those tags.
      for billions of queries.                                              However, the tagging space is so huge that it is difficult to cover all
    • We conduct extensive experiments on large-scale, real-world           domains. The second method is to use the query recommendation
      search logs which shows the effectiveness of our approach             algorithm to convert and disambiguate the queries into entities,
      in both offline evaluation and online A/B test.                       ignoring effect of error transmission from query recommendation.
                                                                            The last approach is to recall entities from the clicked documents.
2   RELATED WORK                                                            However, not all queries have clicked documents. To the best of
Previous work that is closest to our work is the task of entity rec-        our knowledge, we are the first end-to-end method that makes it
ommendation. Entity recommendation can be categorized into the              possible to recommend entities with arbitrary queries in large scale
following two categories: First, for query assistance for knowledge         Chinese search engine.
graphs [16, 17], GQBE [9] and Exemplar Queries [13] studied how
to retrieve entities from a knowledge base by specifying example            3                                        SYSTEM OVERVIEW
entities. For example, the input entity pair {Jerry Yang, Yahoo!}           The overall structure of our entity recommendation system is illus-
would help retrieve answer pairs such as {Sergey Brin, Google}.             trated in Figure 2. The system is composed of three modules: query
Both of them projected the example entities onto the RDF knowl-             processing, candidate generation and ranking. The query process-
edge graph to discover result entities as well as the relationships         ing module at first preproceses the queries, extract entities (cannot
around them. They used an edge-weighted graph as the underlying             extract any entities for complex queries) and then conceptualize
model and subgraph isomorphism as the basic matching scheme,                queries. The candidate generation module takes the output of query
which in general is costly.                                                 processing module as input and retrieves a subset (hundreds) of
   Second, to recommend related entities for search assistance. [2]         entities from the knowledge graph. For a simple query with entities,
proposed a recommendation engine called Spark to link a user’s              we utilize heterogeneous graph embedding [6] to retrieve relative
query word to an entity within a knowledge base and recommend               entities. For those complex queries with little entities, we propose
a ranked list of the related entities. To guide user exploration of         a deep collaborative matching model to get relative entities. These
recommended entities, they also proposed a series of features to            candidates are intended to be generally relevant to the query with
characterize the relatedness between the query entity and the re-           high recall. The candidate generation module only provides broad
lated entities. [11] proposed a similar entity search considering           relativity via multi-criteria matching. The similarity between enti-
diversity. [8] proposed to enhance the understandability of entity          ties is expressed in terms of coarse features. Presenting a few "best"
recommendations by captioning the results. [5] proposed a number            recommendations in a list requires a fine-level representation to
of memory-based methods that exploit user behaviors in search               distinguish relative importance among candidates with high preci-
logs to recommend related entities for a user’s full search session.        sion. The ranking module accomplishes this task by type filtering,
[7] propose a model in a multi-task learning setting where the              learning to rank, and click-through rate estimation. We also utilize
query representation is shared across entity recommendation and             online learning algorithm, including Thompson sampling, to bal-
context-aware ranking. However, none of those approaches take               ance the exploitation and exploration in entity ranking. In the final
into account queries without entities.                                      product representation of entity recommendation, we utilize the
                                                                        2
concept of entities to cluster the different entities with the same
concept in the same group to represent a better visual display and
provide a better user experience. In this paper, we mainly focus
on candidate generation, the first stage of entity recommendation
and present our approach (red part in Figure 2), which can handle
complex queries.

4     PRELIMINARIES
In this section, we describe the large knowledge graph that we use
to retrieve candidate entities and cognitive concept graph that we
use to conceptualize queries and entities.

4.1     Knowledge Graph
Shenma knowledge graph2 is a semantic network that contains
ten million of entities, thousand types and billions of triples. It has
a wide range of fields, such as people, education, film, tv, music,
sports, technology, book, app, food,plant, animal and so on. It is
rich enough to cover a large proportion of entities about worldly
facts. Entities in the knowledge graph are connected by a variety
of relationships.
                                                                                                Figure 3: Base deep match model.
4.2     Cognitive Concept Graph
Based on Shenma knowledge graph, we also construct a cognitive
concept graph which contains millions of instances and concepts.              and query embeddings, the entity recommendation becomes the
Different from Shenma knowledge graph, cognitive concept graph                calculation of cosine similarity between entity vectors and query
is a probabilistic graph mainly focus on the Is-A relationship. For ex-       vectors.
ample, "robin" is-a bird, and "penguin" is-a bird. Cognitive concept
graph is helpful in entity conceptualization and query understand-            5.2     Base Deep Match Model
ing.                                                                          Inspired by skip-gram language models [12], we map the user’s
                                                                              input query to a dense vector representation and learn high dimen-
5     DEEP COLLABORATIVE MATCH                                                sional embedding for each entity in a knowledge graph. Figure 3
In this section, we first introduce the basics of the deep collabora-         shows the architecture of the base deep match model.
tive match and then elaborate on how we design the deep model                    Input Layer. Input layer mainly contains the features from
architecture.                                                                 the input query, we first use word segmentation tool3 to segment
                                                                              queries, then fetch basic level tokens and semantic level tokens4 ,
5.1     Recommendation as Classification                                      and finally combine all the input features via the embedding tech-
                                                                              nique, as shown below:
Traditionally, major search engines recommend related entities
based on their similarities to the main entity that the user searched.              • word embedding: averaging the embedding of both the
[7] have detailed explained the procedure of entity recommenda-                       basic level tokens and semantic level tokens, and the final
tion in the search engine, including entity linking, related entity                   embedding dimension is 128.
discovery and so on. Unlike traditional methods, we regard recom-                   • ngram embedding: inspired by fasttext [10], we add ngram
mendation as large-scale multi-classification where the prediction                    (n=2,3) features to the input layer to import some local tem-
problem becomes how to accurately classify a specific entity ei                       poral information. The dimension of ngram embedding is
among millions of entities from a knowledge graph V based on a                        also 128.
user’s input query Q,                                                            Fully-Connected Layer. Following the input layer, we utilize
                                       ui q                                   three fully connected layers (512-256-128) with tanh activation func-
                        P(ei |Q) = Í
                                     j ∈V u j q                               tion. In order to speed up the training, we add batch normalization
                                                                              to each layer.
where q ∈ RN is a high-dimensional "embedding" of the user’s                     Softmax Layer. To efficiently train such a model with millions
input query, u j ∈ RN represents each entity embedding and V                  of classes, we apply sampled softmax [1] in our model. For each
is the entities from knowledge graph. In this setting, we map the             example, the cross-entropy loss is minimized for the true label and
sparse entity or query into a dense vector in RN . Our deep neural            the sampled negative classes. In practice, we sample 5000 negatives
model try to learn the query embedding via the user’s history                 instances.
behavior which is useful for discriminating among entities with
                                                                              3 AliWS, which is similar to jieba segmentation tool and uses CRF and user-defined
a softmax classier. Through joint learning of entity embeddings
                                                                              dictionary to segment queries.
2 kg.sm.cn                                                                    4 Tokens that in the same entity or phrase will not be segmented.

                                                                          3
                                                                                              where w i ∈ Rd is the word embedding for the i-th token in the
                             2
                                                                      2                       query. Q ∈ Rn×d is thus represented as a 2-D matrix, which concate-
                                            .             2
                                                                                              nates all the word embeddings together. To utilize the dependency
                                   -                                  .                       between adjacent words within a single sentence, we use the Bi-
                                                                      .
                                            1   2             2                               directional LSTM to represent the sentence and concatenate hif
                                 2 ..                                     2 .
                                                                                              with hib to obtain the hidden state hi :
                                                 +
                                                              .                                                     hi = [hif , hib ]
                                                              .
                                                    ···
                                                                                          The number of LSTM’s hidden unit is m. For simplicity, we concate-
                                                                                              nate all the hidden state hi as H ∈ Rn×2m . H = [h 1 , h 2 , · · · , hn−1 , hn ]
                                                                                              With the self-attention mechanism, we encode a variable length
              
              
                                                                                              sentence into a fixed size embedding. The attention mechanism
                                                                                              takes the whole LSTM hidden states H as input, and outputs the
                                                                                         weights α ∈ R1×k :
             
                                                                                                                  α = so f tmax(U tanh(W H T + b))
                                                                                              where W ∈ Rk ×2m ,U ∈ R1×k ,b ∈ Rk . Then we sum up the LSTM
                                             ···
                                                                                         hidden states H according to the weight provided by α to get the
                                                                                              final representation of the input query.
                                                                  .             .

                                                                                                                              Õn
                                                                                                                          q=      α i hi
               Figure 4: Enhanced deep match model.                                                                                i=1
                                                                                                 Note that, the query embeddings and entity embeddings are all
                                                                                              random initialized and trained from scratch. We have huge amounts
   Online Serving. At the serving time, we need to compute the                                of training data which is capable of modeling the relativity between
most likely K classes (entities) in order to choose the top K to                              queries and entities.
present to the user. In order to recall the given number of entities
within ten milliseconds, we deploy the vector search engine5 under                            6 EXPERIMENTS
the offline building index. In practice, our model can generate query                         6.1 Data Sets
embedding within 5ms and recall related entities within 3ms.                                  In this section, we illustrate how to generate the training samples to
                                                                                              learn the query-entity match model. Training samples are generated
5.3     Enhanced Deep Match Model                                                             from query logs and knowledge graph, which can be divided into
The above base model also remains two problems of on the semantic                             four parts as shown below:
representation of the input query: 1) ignoring the global temporal                                 • Query-Click-Entity: given a query, choose the clicked enti-
information, which is important for learning query’s sentence-level                                   ties with relatively high CTR. In practice, we collect thousand
representation; 2) different query tokens contribute equally to the                                   millions of data from the query logs in the past two months.
final input embedding, which is not a good hypnosis. For example,                                  • Query-Doc-Entity: we assume that high clicked doc is well
the entity token should be more important than other tokens such                                      matched to the query and the entities in title or summary
as stop words.                                                                                        are also related to the query. The procedure is 1) we first
   To address the first issue, we adopt the Bi-directional LSTM                                       fetch the clicked documents with title and summary from
model to encode the global and local temporal information. At the                                     the query log; 2) extract entities from title and summary via
same time, with the attention mechanism, our model can automati-                                      name entity recognition; 3) keep those high-quality entities.
cally learn the weights of different query tokens. Figure 4 shows                                     At last, we collect millions of unique queries.
the enhanced deep match model architecture.                                                        • Query-Query-Entity: given the text recommendation’s well
   The proposed model consists of two parts. The first is a Bi-                                       results, we utilize the entity linking method to extract en-
directional LSTM, and the second is the self-attention mechanism,                                     tities from those results. We also collect millions of unique
which provides weight vectors for the LSTM hidden states. The                                         queries.
weight vectors are dotted with the LSTM hidden states, and the                                     • Query-Tag-Entity: as to some specific queries, we will tag
weighted LSTM hidden states are considered as an embedding for                                        entity label to them and generate query-entity pairs. Here,
the input query. Suppose the input query has n tokens represented                                     we define hundreds of entity tags in advance.
with a sequence of word embeddings.                                                           After generating of query-entity pairs, we adopt the following data
                                                                                              prepossessing procedures:
                         Q = (w 1 , w 2 , · · · , w n−1 , w n )
                                                                                                   • low-quality filter: We filter low-quality entities via some
                                                                                                      basic rules, such as blacklist, authority, hotness, importance
5 The vector search engine is similar to the facebook’s faiss vector search engine, and               and so on.
optimized in the search algorithm.                                                                 • low-frequency filter: We filter low-frequency entities.
                                                                                          4
                             Figure 5: The top-N similar entities for given entities via entity embedding.


     • high-frequency sub-sampling: We make sub-sampling to                   1: DNN [3] is the base method with a DNN encoder; +ngram is
       those high-frequency entities.                                         method adding ngram features; att-BiLSTM is our method with
     • shuffle: We shuffle all samples.                                       BiLSTM encoder with attention mechanism. The DNN [3] is a very
   Apart from user clicked data, we construct millions of query-              famous recommendation baseline and we re-implement the algo-
entity relevant pairs at the semantic level, which are very important         rithm and modify the model for entity recommendation setting.
for the model to learn the query’s semantic representation. Finally,          Note that, there are no other baselines of entity recommendation
we generate billions of query-entity pairs and about one thousand             for complex queries with no entities at all. att-BiLSTM is slightly
billion unique queries.                                                       better than +ngram. The reasons are mainly that a certain percent-
                                                                              age of queries is without order and ngram is enough to provide
           Method      P@1 P@10 P@20 P@30                                     useful information.
             DNN       6.53    28.29   38.83    53.79                            Our approach achieves the comparable results in the offline
           +ngram      7.25    30.76   41.57    56.49                         evaluation. These results indicate that our method benefits a lot
         att-BiLSTM    7.34    30.95   41.56    56.02                         from joint representation learning in queries and entities. Note
                                                                              that, we learn the embedding of queries and entities with random
Table 1: The offline comparison results of different methods
                                                                              initialization. We believe the performance can be further improved
in large-scale, real-world search logs of a widely used com-
                                                                              by adopting more complex sentence encoder such as BERT[4] and
mercial web search engine.
                                                                              XLNet[15] and inductive bias from structure knowledge[14] to
                                                                              enhance the entity representation, which we plan to address in
                                                                              future work.
6.2    Evaluation Metric
To evaluate the effectiveness of different methods, we use Preci-
sion@M following [18]. Derive the recalled set of entities for a
query u as Pu (|Pu | = M) and the query’s ground truth set as Gu .            6.4   Online A/B Test
Precision@M are:                                                              We perform large-scale online A/B test to show how our approach
                                       |Pu ∩ Gu |                             on entity recommendation helps with improving the performance
                  Precision @M(u) =                           (1)
                                           M                                  of recommendation in real-world applications. We first retrieve
                                                                              candidate entities by matching queries, then we rank candidate
6.3    Offline Evaluation                                                     entities by a click-through rate (CTR) prediction model and Thomp-
To evaluate the performance of our model, we compare its perfor-              son sampling. The ranked entities are pushed to users in the search
mance with various baseline models. From unseen and real online               results of Alibaba UC Browser. For online A/B test, we split users
search click log, we collect millions of query-entity pairs as our test       into buckets. We observe and record the activities of each bucket
set (ground truth set). The evaluation results are shown in Table             for seven days.
                                                                          5
                           Figure 6: Entity recommendation results from complex and diverse queries.


                                                                           users spend more times and reading more articles in our search
                                                                           engine.

                                                                           6.5        Qualitative Analysis
                                                                           We make a qualitative analysis of the entity embeddings learned
                                                                           from scratch. Interestingly, we find that our approach is able to
                                                                           capture the restiveness of similar entities. As Figure 5 shows, the
                                                                           entities "Beijing University," "Fudan University" are similar to the
                                                                           entity "Tsinghua University." Those results demonstrate that our
                                                                           approach’s impressive power of representation learning of entities6 .
                                                                           It also indicates that the text is really helpful in representation
                                                                           learning in knowledge graph.
                                                                              We also make a qualitative analysis of the query embeddings. We
                                                                           find that our approach generates more discriminate query embed-
Figure 7: Attention weights visualization of six random                    ding for entity recommendation due to the attention mechanisms.
queries from search log.                                                   Specifically, we randomly selected six queries from the search log
                                                                           and then visualize the attention weights, as shown in Figure 7.
                                                                           Our approach is capable of emphasizing those relative words and
    We select two buckets with highly similar activities. For one          de-emphasizing those noisy terms in queries which boost the per-
bucket, we perform recommendation without the deep collabora-              formance.
tive match model. For another one, the deep collaborative match
model is utilized for the recommendation. We run our A/B test for
                                                                           6.6        Case Studies
seven days and compare the result. The page view (PV) and click-           We give some examples of how our deep collaborative matching
through rate (CTR) are the two most critical metrics in real-world         takes effect in entity recommendation for those complex queries.
application because they show how many contents users read and             In Figure 6, we display the most relative entities that are retrieved
how much time they spend on an application. In the online experi-          from the given queries. We observe that (1) given the interrogative
ment, we observe a statistically significant CTR gain (5.1%) and PV        query "what food is good for cold weather", our model is able to
(5.5%). These observations prove that the deep collaborative match         understand the meaning of query and get the most relative entities
for entity recommendation greatly benefits the understanding of            "Grain nutrition powder", "Almond milk"; (2) our model is able to
queries and helps to match users with their potential interested           handle short queries such as "e52640 and i73770s" which usually do
entities better. With the help of a deep collaborative match, we can       not have the syntax of a written language or contain little signals
better capture the contained implicit user’s need in a query even if       6 We do not have ground truth of similar entities so we cannot make quantitative
it does not explicitly have an entity. Given more matched entities,        analysis
                                                                       6
                                                                             to "apple", it may represent both "fruits" and "technology prod-
                                                                             ucts" as the Figure 8 shows. Actually, different users have different
                                                                             intentions. To give a better user experience, we develop the con-
                                                                             ceptualized multi-dimensional recommendation shown in Figure
                                                                             9. To be specific, we utilize the concepts of candidate entities to
                                                                             cluster the entities in the same group to give a better visual display.
                                                                             Those concepts are retrieved from our cognitive concept graph.
                                                                             Online evaluation shows that conceptualized multi-dimensional
                                                                             recommendation has the total coverage of 49.8% in entity recom-
                                                                             mendation and also achieve more than 4.1% gain of CTR.

                                                                             7    CONCLUSION
                                                                             In this paper, we study the problem of context modeling for im-
                                                                             proving entity recommendation. To this end, we develop a deep
                                                                             collaborative match model that learns representations from complex
                                                                             and diverse queries and entities. We evaluate our approach using
                                                                             large-scale, real-world search logs of a widely used commercial
                                                                             search engine. The experiments demonstrate that our approach can
           Figure 8: Multiple concepts of an entity.                         significantly improve the performance of entity recommendation.
                                                                                Generally speaking, the knowledge graph and cognitive concept
                                                                             graph can provide more prior knowledge in query understanding
                                                                             and entity recommendation. In the future, we plan to explore the fol-
                                                                             lowing directions: (1) we may combine our method with structure
                                                                             knowledge from knowledge graph and cognitive concept graph;
                                                                             (2) we may combine rule mining and knowledge graph reasoning
                                                                             technologies to enhance the interpretability of entity recommenda-
                                                                             tion; (3) it will be promising to apply our method to other industry
                                                                             applications and further adapt to other NLP scenarios.

                                                                             ACKNOWLEDGMENTS
                                                                             We would like to thank colleagues of our team - Xiangzhi Wang,
                                                                             Yulin Wang, Liang Dong, Kangping Yin, Zhenxin Ma, Yongjin Wang,
                                                                             Qiteng Yang, Wei Shen, Liansheng Sun, Kui Xiong, Weixing Zhang
                                                                             and Feng Gao for useful discussions and supports on this work. We
                                                                             are grateful to our cooperative team - search engineering team. We
                                                                             also thank the anonymous reviewers for their valuable comments
                                                                             and suggestions that help improve the quality of this manuscript.
Figure 9: Conceptualized multi-dimension entity recom-
mendation.
                                                                             REFERENCES
                                                                              [1] Guy Blanc and Steffen Rendle. 2017. Adaptive sampled softmax with kernel based
for statistical inference; (3) our model is able to infer some queries            sampling. arXiv preprint arXiv:1712.00527 (2017).
                                                                              [2] Roi Blanco, Berkant Barla Cambazoglu, Peter Mika, and Nicolas Torzec. 2013.
such as "multiply six by the largest single digit greater than fourth"            Entity recommendations in web search. In International Semantic Web Conference.
that need commonsense "number" is "mathematical terms" which                      Springer, 33–48.
                                                                              [3] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks
demonstrate the generalization of our approach; (4) our approach                  for youtube recommendations. In Proceedings of the 10th ACM conference on
can also handle multi-modal queries "the picture of baby walking                  recommender systems. ACM, 191–198.
feet outside" and get promising results although in recent version            [4] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert:
                                                                                  Pre-training of deep bidirectional transformers for language understanding. arXiv
of our model we do not consider the image representation in entity                preprint arXiv:1810.04805 (2018).
recommendation, which indicates that our approach can model the               [5] Ignacio Fernández-Tobías and Roi Blanco. 2016. Memory-based recommendations
presentation of queries which reveal the implicit need of users. We               of entities for web search users. In Proceedings of the 25th ACM International on
                                                                                  Conference on Information and Knowledge Management. ACM, 35–44.
believe the multi-modal information (images) will further boost the           [6] Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for
performance which will be left for our future work.                               networks. In Proceedings of the 22nd ACM SIGKDD international conference on
                                                                                  Knowledge discovery and data mining. ACM, 855–864.
                                                                              [7] Jizhou Huang, Wei Zhang, Yaming Sun, Haifeng Wang, and Ting Liu. 2018.
6.7    Conceptualized Entity Recommendation                                       Improving Entity Recommendation with Search Log and Multi-Task Learning..
                                                                                  In IJCAI. 4107–4114.
In the entity recommendation system, each entity may have dif-                [8] Jizhou Huang, Shiqi Zhao, Shiqiang Ding, Haiyang Wu, Mingming Sun, and
ferent views. For example, when recommending entities relative                    Haifeng Wang. 2016. Generating Recommendation Evidence Using Translation
                                                                         7
     Model.. In IJCAI. 2810–2816.                                                           [14] Hongwei Wang, Fuzheng Zhang, Xing Xie, and Minyi Guo. 2018. DKN: Deep
 [9] Nandish Jayaram, Mahesh Gupta, Arijit Khan, Chengkai Li, Xifeng Yan, and                    knowledge-aware network for news recommendation. In Proceedings of the 2018
     Ramez Elmasri. 2014. GQBE: Querying knowledge graphs by example entity                      World Wide Web Conference. International World Wide Web Conferences Steering
     tuples. In 2014 IEEE 30th International Conference on Data Engineering. IEEE,               Committee, 1835–1844.
     1250–1253.                                                                             [15] Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov,
[10] Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou,                and Quoc V Le. 2019. XLNet: Generalized Autoregressive Pretraining for Lan-
     and Tomas Mikolov. 2016. Fasttext. zip: Compressing text classification models.             guage Understanding. arXiv preprint arXiv:1906.08237 (2019).
     arXiv preprint arXiv:1612.03651 (2016).                                                [16] Ningyu Zhang, Shumin Deng, Zhanlin Sun, Xi Chen, Wei Zhang, and Huajun
[11] Steffen Metzger, Ralf Schenkel, and Marcin Sydow. 2013. Qbees: query by entity              Chen. 2018. Attention-based capsule networks with dynamic routing for relation
     examples. In Proceedings of the 22nd ACM international conference on Information            extraction. arXiv preprint arXiv:1812.11321 (2018).
     & Knowledge Management. ACM, 1829–1832.                                                [17] Ningyu Zhang, Shumin Deng, Zhanlin Sun, Guanying Wang, Xi Chen, Wei Zhang,
[12] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013.               and Huajun Chen. 2019. Long-tail Relation Extraction via Knowledge Graph
     Distributed representations of words and phrases and their compositionality. In             Embeddings and Graph Convolution Networks. arXiv preprint arXiv:1903.01306
     Advances in neural information processing systems. 3111–3119.                               (2019).
[13] Davide Mottin, Matteo Lissandrini, Yannis Velegrakis, and Themis Palpanas. 2014.       [18] Han Zhu, Xiang Li, Pengye Zhang, Guozheng Li, Jie He, Han Li, and Kun Gai.
     Exemplar queries: Give me an example of what you need. Proceedings of the                   2018. Learning Tree-based Deep Model for Recommender Systems. In Proceedings
     VLDB Endowment 7, 5 (2014), 365–376.                                                        of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data
                                                                                                 Mining. ACM, 1079–1088.


                                                                                        8

</pre>