<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Context-Aware Deep Model for Entity Recommendation System in Search Engine at Alibaba</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Qianghuai Jia</string-name>
          <email>qianghuai.jqh@alibaba-inc.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ningyu Zhang</string-name>
          <email>ningyu.zny@alibaba-inc.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nengwei Hua∗</string-name>
          <email>nengwei.huanw@alibaba-inc.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Entity Recommendation, Deep Neural Networks, Query Under-</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Alibaba Group</institution>
          ,
          <addr-line>Hangzhou</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>standing</institution>
          ,
          <addr-line>Knowledge Graph</addr-line>
          ,
          <country>Cognitive Concept Graph</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Entity recommendation, providing search users with an improved experience via assisting them in finding related entities for a given query, has become an indispensable feature of today's search engines. Existing studies typically only consider the queries with explicit entities. They usually fail to handle complex queries that without entities, such as "what food is good for cold weather", because their models could not infer the underlying meaning of the input text. In this work, we believe that contexts convey valuable evidence that could facilitate the semantic modeling of queries, and take them into consideration for entity recommendation. In order to better model the semantics of queries and entities, we learn the representation of queries and entities jointly with attentive deep neural networks. We evaluate our approach using large-scale, realworld search logs from a widely used commercial Chinese search engine. Our system has been deployed in ShenMa Search Engine 1 and you can fetch it in UC Browser of Alibaba. Results from online A/B test suggest that the impression eficiency of click-through rate increased by 5.1% and page view increased by 5.5%.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Information systems → Query suggestion.</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>Over the past few years, major commercial search engines have
enriched and improved the user experience by proactively
presenting related entities for a query along with the regular web search
results. Figure 1 shows an example of Alibaba ShenMa search
engine’s entity recommendation results presented on the panel of its
mobile search result page.</p>
      <p>
        Existing studies [
        <xref ref-type="bibr" rid="ref2 ref7">2, 7</xref>
        ] in entity recommendation typically
consider the query containing explicit entities, while ignoring those
queries without entities. A main common drawback of these
approaches is that they cannot handle well the complex queries,
because they do not have informative evidence other than the entity
itself for retrieving related entities with the same surface form.
Therefore, existing entity recommendation systems tend to
recommend entities with regard to the explicitly asked meaning, ignoring
      </p>
      <p>Query
What food is good for
cold weather
Concepts of Entities
Food
Recommended Entities
1.Grain nutrition powder
2.Honey walnut kernel
3.Almond milk
those queries with implicit user needs. Through analyzing
hundreds of million unique queries from search logs with named entity
recognition technology, we have found that more than 50% of the
queries do not have explicit entities. In our opinion, those queries
without explicit entities are valuable for entity recommendation.</p>
      <p>The queries convey insights into a user’s current information
need, which enable us to provide the user with more relevant entity
recommendations and improve user experience. For example, a
user’s search intent behind the query "what food is good for cold
weather" could be a kind of food suitable to eat in cold weather.
However, most of the entities recommended for the query are mainly
based on entities existed in the query such as given the query "cake"
and recommend those entities "cupcakes," "chocolate" and so on,
and there is no explicit entity called "good food for cold weather" at
all. It is very likely that the user is interested in the search engine
that is able to recommend entities with arbitrary queries.</p>
      <p>However, recommending entities with such complex queries is
extremely challenging. At rfist, many existing recommendation
algorithms proven to work well on small problems but fail to operate
on a large scale. Highly specialized distributed learning algorithms
and eficient serving systems are essential for handling search
engine’s massive queries and candidate entities. Secondly, user queries
are extremely complex and diverse, and it is quite challenging to
understand the user’s true intention. Furthermore, historical user
behavior on the search engine is inherently dificult to predict due
to sparsity and a variety of unobservable external factors. We rarely
obtain the ground truth of user satisfaction and instead model noisy
implicit feedback signals.</p>
      <p>In this paper, we study the problem of context-aware entity
recommendation and investigate how to utilize the queries without
explicit entities to improve the entity recommendation quality. Our
approach is based on neural networks, which maps both queries
and candidate entities into vector space via large-scale distributed
training.</p>
      <p>We evaluate our approach using large-scale, real-world search
logs of a widely used commercial Chinese search engine. Our system
has been deployed in ShenMa Search Engine and you can experience
this feature in UC Browser of Alibaba. Results from online A/B test
involving a large number of real users suggest that the impression
eficiency of click-through rate (CTR) increased by 5.1% and page
view (PV) increased by 5.5%.</p>
      <p>The main contributions of our paper are summarized as follows:
• To the best of our knowledge, we are the first approach
to recommend entities for arbitrary queries in large-scale
Chinese search engine.
• Our approach is flexible capable of recommending entities
for billions of queries.
• We conduct extensive experiments on large-scale, real-world
search logs which shows the efectiveness of our approach
in both ofline evaluation and online A/B test.
2</p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>
        Previous work that is closest to our work is the task of entity
recommendation. Entity recommendation can be categorized into the
following two categories: First, for query assistance for knowledge
graphs [
        <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
        ], GQBE [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and Exemplar Queries [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] studied how
to retrieve entities from a knowledge base by specifying example
entities. For example, the input entity pair {Jerry Yang, Yahoo!}
would help retrieve answer pairs such as {Sergey Brin, Google}.
Both of them projected the example entities onto the RDF
knowledge graph to discover result entities as well as the relationships
around them. They used an edge-weighted graph as the underlying
model and subgraph isomorphism as the basic matching scheme,
which in general is costly.
      </p>
      <p>
        Second, to recommend related entities for search assistance. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
proposed a recommendation engine called Spark to link a user’s
query word to an entity within a knowledge base and recommend
a ranked list of the related entities. To guide user exploration of
recommended entities, they also proposed a series of features to
characterize the relatedness between the query entity and the
related entities. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] proposed a similar entity search considering
diversity. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] proposed to enhance the understandability of entity
recommendations by captioning the results. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposed a number
of memory-based methods that exploit user behaviors in search
logs to recommend related entities for a user’s full search session.
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] propose a model in a multi-task learning setting where the
query representation is shared across entity recommendation and
context-aware ranking. However, none of those approaches take
into account queries without entities.
      </p>
      <p>Our objective is to infer entities given diverse and complex
queries for search assistance. Actually, there are little research
papers that focus on this issue. In industry, there are three simple
approaches to handle those complex queries. One is tagging the
query and then recommend the relevant entities based on those tags.
However, the tagging space is so huge that it is dificult to cover all
domains. The second method is to use the query recommendation
algorithm to convert and disambiguate the queries into entities,
ignoring efect of error transmission from query recommendation.
The last approach is to recall entities from the clicked documents.
However, not all queries have clicked documents. To the best of
our knowledge, we are the first end-to-end method that makes it
possible to recommend entities with arbitrary queries in large scale
Chinese search engine.
3</p>
    </sec>
    <sec id="sec-4">
      <title>SYSTEM OVERVIEW</title>
      <p>
        The overall structure of our entity recommendation system is
illustrated in Figure 2. The system is composed of three modules: query
processing, candidate generation and ranking. The query
processing module at first preproceses the queries, extract entities (cannot
extract any entities for complex queries) and then conceptualize
queries. The candidate generation module takes the output of query
processing module as input and retrieves a subset (hundreds) of
entities from the knowledge graph. For a simple query with entities,
we utilize heterogeneous graph embedding [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to retrieve relative
entities. For those complex queries with little entities, we propose
a deep collaborative matching model to get relative entities. These
candidates are intended to be generally relevant to the query with
high recall. The candidate generation module only provides broad
relativity via multi-criteria matching. The similarity between
entities is expressed in terms of coarse features. Presenting a few "best"
recommendations in a list requires a fine-level representation to
distinguish relative importance among candidates with high
precision. The ranking module accomplishes this task by type filtering,
learning to rank, and click-through rate estimation. We also utilize
online learning algorithm, including Thompson sampling, to
balance the exploitation and exploration in entity ranking. In the final
product representation of entity recommendation, we utilize the
concept of entities to cluster the diferent entities with the same
concept in the same group to represent a better visual display and
provide a better user experience. In this paper, we mainly focus
on candidate generation, the first stage of entity recommendation
and present our approach (red part in Figure 2), which can handle
complex queries.
4
      </p>
    </sec>
    <sec id="sec-5">
      <title>PRELIMINARIES</title>
      <p>In this section, we describe the large knowledge graph that we use
to retrieve candidate entities and cognitive concept graph that we
use to conceptualize queries and entities.
4.1</p>
    </sec>
    <sec id="sec-6">
      <title>Knowledge Graph</title>
      <p>Shenma knowledge graph2 is a semantic network that contains
ten million of entities, thousand types and billions of triples. It has
a wide range of fields, such as people, education, film, tv, music,
sports, technology, book, app, food,plant, animal and so on. It is
rich enough to cover a large proportion of entities about worldly
facts. Entities in the knowledge graph are connected by a variety
of relationships.
4.2</p>
    </sec>
    <sec id="sec-7">
      <title>Cognitive Concept Graph</title>
      <p>Based on Shenma knowledge graph, we also construct a cognitive
concept graph which contains millions of instances and concepts.
Diferent from Shenma knowledge graph, cognitive concept graph
is a probabilistic graph mainly focus on the Is-A relationship. For
example, "robin" is-a bird, and "penguin" is-a bird. Cognitive concept
graph is helpful in entity conceptualization and query
understanding.
5</p>
    </sec>
    <sec id="sec-8">
      <title>DEEP COLLABORATIVE MATCH</title>
      <p>In this section, we first introduce the basics of the deep
collaborative match and then elaborate on how we design the deep model
architecture.
5.1</p>
    </sec>
    <sec id="sec-9">
      <title>Recommendation as Classification</title>
      <p>
        Traditionally, major search engines recommend related entities
based on their similarities to the main entity that the user searched.
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] have detailed explained the procedure of entity
recommendation in the search engine, including entity linking, related entity
discovery and so on. Unlike traditional methods, we regard
recommendation as large-scale multi-classification where the prediction
problem becomes how to accurately classify a specific entity ei
among millions of entities from a knowledge graph V based on a
user’s input query Q,
      </p>
      <p>
        P (ei |Q) = Í
ui q
j ∈V uj q
where q ∈ RN is a high-dimensional "embedding" of the user’s
input query, uj ∈ RN represents each entity embedding and V
is the entities from knowledge graph. In this setting, we map the
sparse entity or query into a dense vector in RN . Our deep neural
model try to learn the query embedding via the user’s history
behavior which is useful for discriminating among entities with
a softmax classier. Through joint learning of entity embeddings
and query embeddings, the entity recommendation becomes the
calculation of cosine similarity between entity vectors and query
vectors.
Inspired by skip-gram language models [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], we map the user’s
input query to a dense vector representation and learn high
dimensional embedding for each entity in a knowledge graph. Figure 3
shows the architecture of the base deep match model.
      </p>
      <p>
        Input Layer. Input layer mainly contains the features from
the input query, we first use word segmentation tool 3 to segment
queries, then fetch basic level tokens and semantic level tokens4,
and finally combine all the input features via the embedding
technique, as shown below:
• word embedding: averaging the embedding of both the
basic level tokens and semantic level tokens, and the final
embedding dimension is 128.
• ngram embedding: inspired by fasttext [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], we add ngram
(n=2,3) features to the input layer to import some local
temporal information. The dimension of ngram embedding is
also 128.
      </p>
      <p>Fully-Connected Layer. Following the input layer, we utilize
three fully connected layers (512-256-128) with tanh activation
function. In order to speed up the training, we add batch normalization
to each layer.</p>
      <p>
        Softmax Layer. To eficiently train such a model with millions
of classes, we apply sampled softmax [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] in our model. For each
example, the cross-entropy loss is minimized for the true label and
the sampled negative classes. In practice, we sample 5000 negatives
instances.
3AliWS, which is similar to jieba segmentation tool and uses CRF and user-defined
dictionary to segment queries.
4Tokens that in the same entity or phrase will not be segmented.
2..
      </p>
      <p>.</p>
      <p>2
.
.
2.</p>
      <p>Online Serving. At the serving time, we need to compute the
most likely K classes (entities) in order to choose the top K to
present to the user. In order to recall the given number of entities
within ten milliseconds, we deploy the vector search engine5 under
the ofline building index. In practice, our model can generate query
embedding within 5ms and recall related entities within 3ms.
5.3</p>
    </sec>
    <sec id="sec-10">
      <title>Enhanced Deep Match Model</title>
      <p>The above base model also remains two problems of on the semantic
representation of the input query: 1) ignoring the global temporal
information, which is important for learning query’s sentence-level
representation; 2) diferent query tokens contribute equally to the
ifnal input embedding, which is not a good hypnosis. For example,
the entity token should be more important than other tokens such
as stop words.</p>
      <p>To address the first issue, we adopt the Bi-directional LSTM
model to encode the global and local temporal information. At the
same time, with the attention mechanism, our model can
automatically learn the weights of diferent query tokens. Figure 4 shows
the enhanced deep match model architecture.</p>
      <p>The proposed model consists of two parts. The first is a
Bidirectional LSTM, and the second is the self-attention mechanism,
which provides weight vectors for the LSTM hidden states. The
weight vectors are dotted with the LSTM hidden states, and the
weighted LSTM hidden states are considered as an embedding for
the input query. Suppose the input query has n tokens represented
with a sequence of word embeddings.</p>
      <p>Q = (w1, w2, · · · , wn−1, wn )
5The vector search engine is similar to the facebook’s faiss vector search engine, and
optimized in the search algorithm.
where wi ∈ Rd is the word embedding for the i-th token in the
query. Q ∈ Rn×d is thus represented as a 2-D matrix, which
concatenates all the word embeddings together. To utilize the dependency
between adjacent words within a single sentence, we use the
Bidirectional LSTM to represent the sentence and concatenate hif
with hib to obtain the hidden state hi :</p>
      <p>hi = [hif , hib ]
The number of LSTM’s hidden unit is m. For simplicity, we
concatenate all the hidden state hi as H ∈ Rn×2m . H = [h1, h2, · · · , hn−1, hn ]
With the self-attention mechanism, we encode a variable length
sentence into a fixed size embedding. The attention mechanism
takes the whole LSTM hidden states H as input, and outputs the
weights α ∈ R1×k :</p>
      <p>α = so f tmax (U tanh(W HT + b))
where W ∈ Rk×2m ,U ∈ R1×k ,b ∈ Rk . Then we sum up the LSTM
hidden states H according to the weight provided by α to get the
ifnal representation of the input query.</p>
      <p>n
q = Õ</p>
      <p>αi hi
i=1</p>
      <p>Note that, the query embeddings and entity embeddings are all
random initialized and trained from scratch. We have huge amounts
of training data which is capable of modeling the relativity between
queries and entities.
• Query-Click-Entity: given a query, choose the clicked
entities with relatively high CTR. In practice, we collect thousand
millions of data from the query logs in the past two months.
• Query-Doc-Entity: we assume that high clicked doc is well
matched to the query and the entities in title or summary
are also related to the query. The procedure is 1) we first
fetch the clicked documents with title and summary from
the query log; 2) extract entities from title and summary via
name entity recognition; 3) keep those high-quality entities.</p>
      <p>At last, we collect millions of unique queries.
• Query-Query-Entity: given the text recommendation’s well
results, we utilize the entity linking method to extract
entities from those results. We also collect millions of unique
queries.
• Query-Tag-Entity: as to some specific queries, we will tag
entity label to them and generate query-entity pairs. Here,
we define hundreds of entity tags in advance.</p>
      <p>After generating of query-entity pairs, we adopt the following data
prepossessing procedures:
• low-quality filter : We filter low-quality entities via some
basic rules, such as blacklist, authority, hotness, importance
and so on.
• low-frequency filter : We filter low-frequency entities.
• high-frequency sub-sampling: We make sub-sampling to
those high-frequency entities.</p>
      <p>• shufle : We shufle all samples.</p>
      <p>Apart from user clicked data, we construct millions of
queryentity relevant pairs at the semantic level, which are very important
for the model to learn the query’s semantic representation. Finally,
we generate billions of query-entity pairs and about one thousand
billion unique queries.</p>
      <p>Method P@1 P@10 P@20 P@30</p>
      <p>DNN 6.53 28.29 38.83 53.79
+ngram 7.25 30.76 41.57 56.49
att-BiLSTM 7.34 30.95 41.56 56.02
Table 1: The ofline comparison results of diferent methods
in large-scale, real-world search logs of a widely used
commercial web search engine.
6.2</p>
    </sec>
    <sec id="sec-11">
      <title>Evaluation Metric</title>
      <p>
        To evaluate the efectiveness of diferent methods, we use
Precision@M following [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Derive the recalled set of entities for a
query u as Pu (|Pu | = M) and the query’s ground truth set as Gu .
Precision@M are:
6.3
      </p>
    </sec>
    <sec id="sec-12">
      <title>Ofline Evaluation</title>
      <p>
        To evaluate the performance of our model, we compare its
performance with various baseline models. From unseen and real online
search click log, we collect millions of query-entity pairs as our test
set (ground truth set). The evaluation results are shown in Table
(1)
1: DNN [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is the base method with a DNN encoder; +ngram is
method adding ngram features; att-BiLSTM is our method with
BiLSTM encoder with attention mechanism. The DNN [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is a very
famous recommendation baseline and we re-implement the
algorithm and modify the model for entity recommendation setting.
Note that, there are no other baselines of entity recommendation
for complex queries with no entities at all. att-BiLSTM is slightly
better than +ngram. The reasons are mainly that a certain
percentage of queries is without order and ngram is enough to provide
useful information.
      </p>
      <p>
        Our approach achieves the comparable results in the ofline
evaluation. These results indicate that our method benefits a lot
from joint representation learning in queries and entities. Note
that, we learn the embedding of queries and entities with random
initialization. We believe the performance can be further improved
by adopting more complex sentence encoder such as BERT[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and
XLNet[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and inductive bias from structure knowledge[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] to
enhance the entity representation, which we plan to address in
future work.
6.4
      </p>
    </sec>
    <sec id="sec-13">
      <title>Online A/B Test</title>
      <p>We perform large-scale online A/B test to show how our approach
on entity recommendation helps with improving the performance
of recommendation in real-world applications. We first retrieve
candidate entities by matching queries, then we rank candidate
entities by a click-through rate (CTR) prediction model and
Thompson sampling. The ranked entities are pushed to users in the search
results of Alibaba UC Browser. For online A/B test, we split users
into buckets. We observe and record the activities of each bucket
for seven days.</p>
      <p>We select two buckets with highly similar activities. For one
bucket, we perform recommendation without the deep
collaborative match model. For another one, the deep collaborative match
model is utilized for the recommendation. We run our A/B test for
seven days and compare the result. The page view (PV) and
clickthrough rate (CTR) are the two most critical metrics in real-world
application because they show how many contents users read and
how much time they spend on an application. In the online
experiment, we observe a statistically significant CTR gain ( 5.1%) and PV
(5.5%). These observations prove that the deep collaborative match
for entity recommendation greatly benefits the understanding of
queries and helps to match users with their potential interested
entities better. With the help of a deep collaborative match, we can
better capture the contained implicit user’s need in a query even if
it does not explicitly have an entity. Given more matched entities,
users spend more times and reading more articles in our search
engine.
6.5</p>
    </sec>
    <sec id="sec-14">
      <title>Qualitative Analysis</title>
      <p>We make a qualitative analysis of the entity embeddings learned
from scratch. Interestingly, we find that our approach is able to
capture the restiveness of similar entities. As Figure 5 shows, the
entities "Beijing University," "Fudan University" are similar to the
entity "Tsinghua University." Those results demonstrate that our
approach’s impressive power of representation learning of entities6.
It also indicates that the text is really helpful in representation
learning in knowledge graph.</p>
      <p>We also make a qualitative analysis of the query embeddings. We
ifnd that our approach generates more discriminate query
embedding for entity recommendation due to the attention mechanisms.
Specifically, we randomly selected six queries from the search log
and then visualize the attention weights, as shown in Figure 7.
Our approach is capable of emphasizing those relative words and
de-emphasizing those noisy terms in queries which boost the
performance.
6.6</p>
    </sec>
    <sec id="sec-15">
      <title>Case Studies</title>
      <p>We give some examples of how our deep collaborative matching
takes efect in entity recommendation for those complex queries.
In Figure 6, we display the most relative entities that are retrieved
from the given queries. We observe that (1) given the interrogative
query "what food is good for cold weather", our model is able to
understand the meaning of query and get the most relative entities
"Grain nutrition powder", "Almond milk"; (2) our model is able to
handle short queries such as "e52640 and i73770s" which usually do
not have the syntax of a written language or contain little signals
6We do not have ground truth of similar entities so we cannot make quantitative
analysis
for statistical inference; (3) our model is able to infer some queries
such as "multiply six by the largest single digit greater than fourth"
that need commonsense "number" is "mathematical terms" which
demonstrate the generalization of our approach; (4) our approach
can also handle multi-modal queries "the picture of baby walking
feet outside" and get promising results although in recent version
of our model we do not consider the image representation in entity
recommendation, which indicates that our approach can model the
presentation of queries which reveal the implicit need of users. We
believe the multi-modal information (images) will further boost the
performance which will be left for our future work.</p>
    </sec>
    <sec id="sec-16">
      <title>6.7 Conceptualized Entity Recommendation</title>
      <p>In the entity recommendation system, each entity may have
different views. For example, when recommending entities relative
to "apple", it may represent both "fruits" and "technology
products" as the Figure 8 shows. Actually, diferent users have diferent
intentions. To give a better user experience, we develop the
conceptualized multi-dimensional recommendation shown in Figure
9. To be specific, we utilize the concepts of candidate entities to
cluster the entities in the same group to give a better visual display.
Those concepts are retrieved from our cognitive concept graph.
Online evaluation shows that conceptualized multi-dimensional
recommendation has the total coverage of 49.8% in entity
recommendation and also achieve more than 4.1% gain of CTR.</p>
    </sec>
    <sec id="sec-17">
      <title>7 CONCLUSION</title>
      <p>In this paper, we study the problem of context modeling for
improving entity recommendation. To this end, we develop a deep
collaborative match model that learns representations from complex
and diverse queries and entities. We evaluate our approach using
large-scale, real-world search logs of a widely used commercial
search engine. The experiments demonstrate that our approach can
significantly improve the performance of entity recommendation.</p>
      <p>Generally speaking, the knowledge graph and cognitive concept
graph can provide more prior knowledge in query understanding
and entity recommendation. In the future, we plan to explore the
following directions: (1) we may combine our method with structure
knowledge from knowledge graph and cognitive concept graph;
(2) we may combine rule mining and knowledge graph reasoning
technologies to enhance the interpretability of entity
recommendation; (3) it will be promising to apply our method to other industry
applications and further adapt to other NLP scenarios.</p>
    </sec>
    <sec id="sec-18">
      <title>ACKNOWLEDGMENTS</title>
      <p>We would like to thank colleagues of our team - Xiangzhi Wang,
Yulin Wang, Liang Dong, Kangping Yin, Zhenxin Ma, Yongjin Wang,
Qiteng Yang, Wei Shen, Liansheng Sun, Kui Xiong, Weixing Zhang
and Feng Gao for useful discussions and supports on this work. We
are grateful to our cooperative team - search engineering team. We
also thank the anonymous reviewers for their valuable comments
and suggestions that help improve the quality of this manuscript.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Guy</given-names>
            <surname>Blanc</surname>
          </string-name>
          and
          <string-name>
            <given-names>Stefen</given-names>
            <surname>Rendle</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Adaptive sampled softmax with kernel based sampling</article-title>
          .
          <source>arXiv preprint arXiv:1712.00527</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Roi</given-names>
            <surname>Blanco</surname>
          </string-name>
          , Berkant Barla Cambazoglu,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Mika</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Nicolas</given-names>
            <surname>Torzec</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Entity recommendations in web search</article-title>
          .
          <source>In International Semantic Web Conference</source>
          . Springer,
          <fpage>33</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Paul</given-names>
            <surname>Covington</surname>
          </string-name>
          , Jay Adams, and
          <string-name>
            <given-names>Emre</given-names>
            <surname>Sargin</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Deep neural networks for youtube recommendations</article-title>
          .
          <source>In Proceedings of the 10th ACM conference on recommender systems. ACM</source>
          ,
          <volume>191</volume>
          -
          <fpage>198</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Ignacio</given-names>
            <surname>Fernández-Tobías</surname>
          </string-name>
          and
          <string-name>
            <given-names>Roi</given-names>
            <surname>Blanco</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Memory-based recommendations of entities for web search users</article-title>
          .
          <source>In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM</source>
          ,
          <volume>35</volume>
          -
          <fpage>44</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Aditya</given-names>
            <surname>Grover</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jure</given-names>
            <surname>Leskovec</surname>
          </string-name>
          .
          <year>2016</year>
          . node2vec:
          <article-title>Scalable feature learning for networks</article-title>
          .
          <source>In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM</source>
          ,
          <volume>855</volume>
          -
          <fpage>864</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Jizhou</given-names>
            <surname>Huang</surname>
          </string-name>
          , Wei Zhang, Yaming Sun,
          <string-name>
            <given-names>Haifeng</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Ting</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Improving Entity Recommendation with Search Log and Multi-Task Learning.</article-title>
          .
          <source>In IJCAI. 4107-4114.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Jizhou</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Shiqi</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Shiqiang</given-names>
            <surname>Ding</surname>
          </string-name>
          , Haiyang Wu, Mingming Sun, and
          <string-name>
            <given-names>Haifeng</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Generating Recommendation Evidence Using Translation Model.</article-title>
          .
          <source>In IJCAI. 2810-2816.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Nandish</given-names>
            <surname>Jayaram</surname>
          </string-name>
          , Mahesh Gupta, Arijit Khan,
          <string-name>
            <given-names>Chengkai</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Xifeng</given-names>
            <surname>Yan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Ramez</given-names>
            <surname>Elmasri</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>GQBE: Querying knowledge graphs by example entity tuples</article-title>
          .
          <source>In 2014 IEEE 30th International Conference on Data Engineering. IEEE</source>
          ,
          <fpage>1250</fpage>
          -
          <lpage>1253</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Armand</surname>
            <given-names>Joulin</given-names>
          </string-name>
          , Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Fasttext. zip: Compressing text classification models</article-title>
          .
          <source>arXiv preprint arXiv:1612.03651</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Stefen</surname>
            <given-names>Metzger</given-names>
          </string-name>
          , Ralf Schenkel, and
          <string-name>
            <given-names>Marcin</given-names>
            <surname>Sydow</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Qbees: query by entity examples</article-title>
          .
          <source>In Proceedings of the 22nd ACM international conference on Information &amp; Knowledge Management. ACM</source>
          ,
          <year>1829</year>
          -
          <fpage>1832</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Tomas</surname>
            <given-names>Mikolov</given-names>
          </string-name>
          , Ilya Sutskever, Kai Chen, Greg S Corrado, and
          <string-name>
            <given-names>Jef</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          .
          <volume>3111</volume>
          -
          <fpage>3119</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Davide</surname>
            <given-names>Mottin</given-names>
          </string-name>
          , Matteo Lissandrini, Yannis Velegrakis, and
          <string-name>
            <given-names>Themis</given-names>
            <surname>Palpanas</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Exemplar queries: Give me an example of what you need</article-title>
          .
          <source>Proceedings of the VLDB Endowment 7</source>
          ,
          <issue>5</issue>
          (
          <year>2014</year>
          ),
          <fpage>365</fpage>
          -
          <lpage>376</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Hongwei</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Fuzheng Zhang, Xing Xie, and
          <string-name>
            <given-names>Minyi</given-names>
            <surname>Guo</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>DKN: Deep knowledge-aware network for news recommendation</article-title>
          .
          <source>In Proceedings of the 2018 World Wide Web Conference. International World Wide Web Conferences Steering Committee</source>
          ,
          <fpage>1835</fpage>
          -
          <lpage>1844</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Zhilin</surname>
            <given-names>Yang</given-names>
          </string-name>
          , Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and
          <string-name>
            <surname>Quoc</surname>
            <given-names>V</given-names>
          </string-name>
          <string-name>
            <surname>Le</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>XLNet: Generalized Autoregressive Pretraining for Language Understanding</article-title>
          . arXiv preprint arXiv:
          <year>1906</year>
          .
          <volume>08237</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Ningyu</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Shumin Deng, Zhanlin Sun, Xi Chen, Wei Zhang, and
          <string-name>
            <given-names>Huajun</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Attention-based capsule networks with dynamic routing for relation extraction</article-title>
          . arXiv preprint arXiv:
          <year>1812</year>
          .
          <volume>11321</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Ningyu</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Shumin Deng, Zhanlin Sun, Guanying Wang, Xi Chen, Wei Zhang, and
          <string-name>
            <given-names>Huajun</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Long-tail Relation Extraction via Knowledge Graph Embeddings and Graph Convolution Networks</article-title>
          . arXiv preprint arXiv:
          <year>1903</year>
          .
          <volume>01306</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Han</surname>
            <given-names>Zhu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Xiang</given-names>
            <surname>Li</surname>
          </string-name>
          , Pengye Zhang, Guozheng Li, Jie He,
          <string-name>
            <given-names>Han</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kun</given-names>
            <surname>Gai</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Learning Tree-based Deep Model for Recommender Systems</article-title>
          .
          <source>In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining. ACM</source>
          ,
          <volume>1079</volume>
          -
          <fpage>1088</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>